Remote / Late evaluation of tools #12459

mvdbeek · 2021-09-14T17:01:15Z

The idea is that if we can evaluate tools close to the data we can fetch deferred data there and generate the necessary metadata that the evaluation depends on. This work should be flexible enough to occur within the job script (included in the PR), as a (celery) task that runs on a remote node (maybe with a celery-queue destination option ?) or as a side-pod (will be separate PR(s)).

Tests:

Unit test for serialising / deserializing JobIO class
- Make sure ConfiguredFileSources work
Make sure xml/yml/cwl loading of serialized tool works
Integration tests that use tool data tables
Integration tests that use file sources
Integration test for implicit conversion
Integration test that exports history (Or maybe not if we merge celery task for history exports ??)
Unit test for testing FileTracebackException

Development:

Do we need to export using export_key ?
Make job_wrapper.remote_command_line configurable per destination
Integrate ToolApp with other app variants
Enable serializing non-xml tools
Dataset conversions are probably a hard requirement in practice … can’t have tools randomly fail on a remote_command_line destination based on whether conversion is necessary or not. But should be possible to do, we may just need to serialise ImplicitlyConvertedDatasetAssociation together with the input dataset. Jobs shouldn’t be dispatched before ImplicitlyConvertedDatasetAssociation is ready, so that should be doable. Turns out these don't actually run in the ToolEvaluator -> Remove unused dataset conversion code #12991

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these contributions under Galaxy's current license.
I agree to allow the Galaxy committers to license these and all my past contributions to the core galaxy codebase under the MIT license. If this condition is an issue, uncheck and just let us know why with an e-mail to galaxy-committers@lists.galaxyproject.org.

mvdbeek · 2021-09-14T17:21:22Z

lib/galaxy/metadata/remote_tool_eval.py

+        tdtm = ToolDataTableManager.from_dict(json.load(data_tables_json))
+    app = ToolApp(sa_session=import_store.sa_session, tool_app_config=tool_app_config, datatypes_registry=datatypes_registry, object_store=object_store, tool_data_table_manager=tdtm)
+    # TODO: could try to serialize just a minimal tool variant instead of the whole thing ?
+    tool_source = get_tool_source(os.path.join(WORKING_DIRECTORY, 'tool.xml'))


Could be a symlink ? Or load from original location ?

jmchilton · 2021-09-22T13:39:45Z

lib/galaxy/tools/evaluation.py

+Problems:
+Do we want to build a full app ? Probably not.
+Do we want to supply galaxy config file ? Probably not.
+Dataset conversions ... would need to move outside of ToolEvaluator


I'm going to take a look at this TODO today - I'll report back if I make any progress on determining if I think it is something I could plausibly help with.

I have some more work locally, just didn't get around to pushing and updating this here.
I haven't solved / thought about the dataset conversions yet, but I think most of the rest I have figured out.
I'm positive I can wrap this up once we're done with the release process / the obvious bugs in the new release are worked out.

I think in the overall picture it would be good to work out how a deferred dataset would look like in the database and how the UI could look like, any chance you could take a look at this ?

jmchilton · 2021-11-16T15:45:30Z

lib/galaxy/model/store/__init__.py

        self.collections_attrs.append(collection)
        self.included_collections.append(collection)
+        for dataset in collection.collection.dataset_instances:
+            self.add_dataset(dataset, include_files=include_files)


I have a variant of this called export_collection in the deferred data branch.

Awesome, I copied that into 2c64e13

mvdbeek · 2021-11-25T15:33:12Z

Dataset conversions are probably a hard requirement in practice … can’t have tools randomly fail on a remote_command_line destination based on whether conversion is necessary or not. But should be possible to do, we may just need to serialise ImplicitlyConvertedDatasetAssociation together with the input dataset. Jobs shouldn’t be dispatched before ImplicitlyConvertedDatasetAssociation is ready, so that should be doable.

lol, that just worked and I have no idea how 😆

mvdbeek · 2021-11-25T15:39:08Z

lol, that just worked and I have no idea how 😆

We're still doing a first pass of tool_evaluator.build in the handler process, I guess we should try to eliminate that to avoid doubling the work ...

that should build command line remotely. We can't quite get away with not running the ToolEvaluator class completely if a tool has environment variables or interactive tool entrypoints that have to be templated.

The externalized commands may for instance set up conda dependency resolution.

We'd have the add `JobExportHistoryArchive` to the supported job outputs in the model export store or move the `JobExportHistoryArchive` creation from `execute()` ToolAction to the handler and skip creating `JobExportHistoryArchive` altogether, but we want to perform the job export in the Celery workers anyway, so I think it's not worth the effort. Note that exporting to file sources doesn't use `JobExportHistoryArchive`, so for those we can still create the command line remotely (and e.g materialize deferred datasets in a history).

The right fix might be to provide a session on app to avoid both of these hacks ?

mvdbeek · 2021-11-30T17:19:38Z

Hmm, those selenium tests are not failing on my fork (https://github.com/mvdbeek/galaxy/actions/workflows/selenium.yaml), seems like something that broke recently on dev ?

mvdbeek · 2021-11-30T17:20:21Z

@jmchilton I think this is ... well, not exactly useful yet, but maybe we can build on this and materialize deferred datasets next ?

jmchilton

This would have took me twice as long and would have been half as a good, really amazing stuff in here! We will be building cool shit based on this for years - really impressive work.

jmchilton · 2022-04-01T17:41:49Z

Admittedly took a very long road to get there but... feade2b

github-actions bot added area/jobs area/testing area/tool-framework labels Sep 14, 2021

mvdbeek commented Sep 14, 2021

View reviewed changes

jmchilton reviewed Sep 22, 2021

View reviewed changes

mvdbeek force-pushed the wip_remote_command_line branch 2 times, most recently from f9d8b27 to 1284635 Compare November 15, 2021 20:19

jmchilton reviewed Nov 16, 2021

View reviewed changes

mvdbeek force-pushed the wip_remote_command_line branch 11 times, most recently from 2a12fbf to 1fa9da1 Compare November 23, 2021 11:06

mvdbeek changed the title ~~Wip remote command line~~ Remote / Late evaluation of tools Nov 23, 2021

mvdbeek mentioned this pull request Nov 24, 2021

Add type hints to JobHandlerQueue #12976

Merged

5 tasks

mvdbeek force-pushed the wip_remote_command_line branch 4 times, most recently from 2408a6b to 1b7fcd9 Compare November 25, 2021 15:21

mvdbeek force-pushed the wip_remote_command_line branch 3 times, most recently from ea3a8ec to e709a73 Compare November 26, 2021 14:51

mvdbeek added 2 commits November 29, 2021 11:22

Move interactivetool manager setup out of ToolEvaluator

c8fee0e

Run PartialToolEvaluator when setting up job

a831488

that should build command line remotely. We can't quite get away with not running the ToolEvaluator class completely if a tool has environment variables or interactive tool entrypoints that have to be templated.

mvdbeek force-pushed the wip_remote_command_line branch from cab45eb to 753f454 Compare November 29, 2021 11:32

Load tool directly from JobIO class

80659ec

mvdbeek force-pushed the wip_remote_command_line branch from 753f454 to 80659ec Compare November 29, 2021 11:44

mvdbeek added 3 commits November 29, 2021 17:09

Enable serializing xml/cwl/yaml tool sources via JobIO

266df6b

Write externalized commands, append during remote tool evaluation

184c66c

The externalized commands may for instance set up conda dependency resolution.

Test tool derserialization

c2e7acd

mvdbeek force-pushed the wip_remote_command_line branch from a1da2cf to c2e7acd Compare November 29, 2021 19:16

mvdbeek added 5 commits November 30, 2021 10:58

Make tool_evaluation_strategy configurable on app and destination level

1a08eb5

Replace model hack with Bunch hack

543f6e9

The right fix might be to provide a session on app to avoid both of these hacks ?

Add MinimalToolApp container

617e664

Cleanup path handling and add unit test for handling traceback.txt

2676421

mvdbeek added kind/enhancement kind/feature labels Nov 30, 2021

mvdbeek marked this pull request as ready for review November 30, 2021 17:19

github-actions bot added this to the 22.01 milestone Nov 30, 2021

mvdbeek requested a review from jmchilton November 30, 2021 17:21

jmchilton approved these changes Nov 30, 2021

View reviewed changes

jmchilton merged commit 4ce8c35 into galaxyproject:dev Nov 30, 2021

mvdbeek mentioned this pull request Jan 10, 2022

Test Galaxy https://github.com/galaxyproject/galaxy/pull/13039 galaxyproject/tools-iuc#4318

Closed

nsoranzo deleted the wip_remote_command_line branch January 10, 2022 14:24

mvdbeek mentioned this pull request Jan 10, 2022

Don't template version command using cheetah #13142

Merged

5 tasks

mvdbeek mentioned this pull request Mar 16, 2022

[22.01] Write to exit code file before running metadata script #13557

Merged

5 tasks

mvdbeek added highlight/dev Included in admin/dev release notes and removed highlight/dev Included in admin/dev release notes labels Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote / Late evaluation of tools #12459

Remote / Late evaluation of tools #12459

mvdbeek commented Sep 14, 2021 •

edited

mvdbeek Sep 14, 2021

jmchilton Sep 22, 2021

mvdbeek Sep 22, 2021 •

edited

jmchilton Nov 16, 2021

mvdbeek Nov 17, 2021

mvdbeek commented Nov 25, 2021

mvdbeek commented Nov 25, 2021

mvdbeek commented Nov 30, 2021

mvdbeek commented Nov 30, 2021 •

edited

jmchilton left a comment

jmchilton commented Apr 1, 2022

Remote / Late evaluation of tools #12459

Remote / Late evaluation of tools #12459

Conversation

mvdbeek commented Sep 14, 2021 • edited

How to test the changes?

License

mvdbeek Sep 14, 2021

Choose a reason for hiding this comment

jmchilton Sep 22, 2021

Choose a reason for hiding this comment

mvdbeek Sep 22, 2021 • edited

Choose a reason for hiding this comment

jmchilton Nov 16, 2021

Choose a reason for hiding this comment

mvdbeek Nov 17, 2021

Choose a reason for hiding this comment

mvdbeek commented Nov 25, 2021

mvdbeek commented Nov 25, 2021

mvdbeek commented Nov 30, 2021

mvdbeek commented Nov 30, 2021 • edited

jmchilton left a comment

Choose a reason for hiding this comment

jmchilton commented Apr 1, 2022

mvdbeek commented Sep 14, 2021 •

edited

mvdbeek Sep 22, 2021 •

edited

mvdbeek commented Nov 30, 2021 •

edited