Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote / Late evaluation of tools #12459

Merged
merged 75 commits into from Nov 30, 2021

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Sep 14, 2021

The idea is that if we can evaluate tools close to the data we can fetch deferred data there and generate the necessary metadata that the evaluation depends on. This work should be flexible enough to occur within the job script (included in the PR), as a (celery) task that runs on a remote node (maybe with a celery-queue destination option ?) or as a side-pod (will be separate PR(s)).

Tests:

  • Unit test for serialising / deserializing JobIO class
    • Make sure ConfiguredFileSources work
  • Make sure xml/yml/cwl loading of serialized tool works
  • Integration tests that use tool data tables
  • Integration tests that use file sources
  • Integration test for implicit conversion
  • Integration test that exports history (Or maybe not if we merge celery task for history exports ??)
  • Unit test for testing FileTracebackException

Development:

  • Do we need to export using export_key ?
  • Make job_wrapper.remote_command_line configurable per destination
  • Integrate ToolApp with other app variants
  • Enable serializing non-xml tools
  • Dataset conversions are probably a hard requirement in practice … can’t have tools randomly fail on a remote_command_line destination based on whether conversion is necessary or not. But should be possible to do, we may just need to serialise ImplicitlyConvertedDatasetAssociation together with the input dataset. Jobs shouldn’t be dispatched before ImplicitlyConvertedDatasetAssociation is ready, so that should be doable. Turns out these don't actually run in the ToolEvaluator -> Remove unused dataset conversion code #12991

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

tdtm = ToolDataTableManager.from_dict(json.load(data_tables_json))
app = ToolApp(sa_session=import_store.sa_session, tool_app_config=tool_app_config, datatypes_registry=datatypes_registry, object_store=object_store, tool_data_table_manager=tdtm)
# TODO: could try to serialize just a minimal tool variant instead of the whole thing ?
tool_source = get_tool_source(os.path.join(WORKING_DIRECTORY, 'tool.xml'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a symlink ? Or load from original location ?

Problems:
Do we want to build a full app ? Probably not.
Do we want to supply galaxy config file ? Probably not.
Dataset conversions ... would need to move outside of ToolEvaluator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to take a look at this TODO today - I'll report back if I make any progress on determining if I think it is something I could plausibly help with.

Copy link
Member Author

@mvdbeek mvdbeek Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some more work locally, just didn't get around to pushing and updating this here.
I haven't solved / thought about the dataset conversions yet, but I think most of the rest I have figured out.
I'm positive I can wrap this up once we're done with the release process / the obvious bugs in the new release are worked out.

I think in the overall picture it would be good to work out how a deferred dataset would look like in the database and how the UI could look like, any chance you could take a look at this ?

@mvdbeek mvdbeek force-pushed the wip_remote_command_line branch 2 times, most recently from f9d8b27 to 1284635 Compare November 15, 2021 20:19
self.collections_attrs.append(collection)
self.included_collections.append(collection)
for dataset in collection.collection.dataset_instances:
self.add_dataset(dataset, include_files=include_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a variant of this called export_collection in the deferred data branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I copied that into 2c64e13

@mvdbeek mvdbeek force-pushed the wip_remote_command_line branch 11 times, most recently from 2a12fbf to 1fa9da1 Compare November 23, 2021 11:06
@mvdbeek mvdbeek changed the title Wip remote command line Remote / Late evaluation of tools Nov 23, 2021
@mvdbeek mvdbeek mentioned this pull request Nov 24, 2021
5 tasks
@mvdbeek mvdbeek force-pushed the wip_remote_command_line branch 4 times, most recently from 2408a6b to 1b7fcd9 Compare November 25, 2021 15:21
@mvdbeek
Copy link
Member Author

mvdbeek commented Nov 25, 2021

  • Dataset conversions are probably a hard requirement in practice … can’t have tools randomly fail on a remote_command_line destination based on whether conversion is necessary or not. But should be possible to do, we may just need to serialise ImplicitlyConvertedDatasetAssociation together with the input dataset. Jobs shouldn’t be dispatched before ImplicitlyConvertedDatasetAssociation is ready, so that should be doable.

lol, that just worked and I have no idea how 😆

@mvdbeek
Copy link
Member Author

mvdbeek commented Nov 25, 2021

lol, that just worked and I have no idea how 😆

We're still doing a first pass of tool_evaluator.build in the handler process, I guess we should try to eliminate that to avoid doubling the work ...

@mvdbeek mvdbeek force-pushed the wip_remote_command_line branch 3 times, most recently from ea3a8ec to e709a73 Compare November 26, 2021 14:51
that should build command line remotely.

We can't quite get away with not running the ToolEvaluator class
completely if a tool has environment variables or interactive tool
entrypoints that have to be templated.
We'd have the add `JobExportHistoryArchive` to the supported job outputs
in the model export store or move the `JobExportHistoryArchive` creation
from `execute()` ToolAction to the handler and skip creating
`JobExportHistoryArchive` altogether, but we want to perform the
job export in the Celery workers anyway, so I think it's not worth the
effort.
Note that exporting to file sources doesn't use
`JobExportHistoryArchive`, so for those we can still create the command
line remotely (and e.g materialize deferred datasets in a history).
The right fix might be to provide a session on app to avoid both of
these hacks ?
@mvdbeek
Copy link
Member Author

mvdbeek commented Nov 30, 2021

Hmm, those selenium tests are not failing on my fork (https://github.com/mvdbeek/galaxy/actions/workflows/selenium.yaml), seems like something that broke recently on dev ?

@mvdbeek mvdbeek marked this pull request as ready for review November 30, 2021 17:19
@github-actions github-actions bot added this to the 22.01 milestone Nov 30, 2021
@mvdbeek
Copy link
Member Author

mvdbeek commented Nov 30, 2021

@jmchilton I think this is ... well, not exactly useful yet, but maybe we can build on this and materialize deferred datasets next ?

Copy link
Member

@jmchilton jmchilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have took me twice as long and would have been half as a good, really amazing stuff in here! We will be building cool shit based on this for years - really impressive work.

@jmchilton
Copy link
Member

Admittedly took a very long road to get there but... feade2b

@mvdbeek mvdbeek added highlight/dev Included in admin/dev release notes and removed highlight/dev Included in admin/dev release notes labels Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants