New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote / Late evaluation of tools #12459
Remote / Late evaluation of tools #12459
Conversation
tdtm = ToolDataTableManager.from_dict(json.load(data_tables_json)) | ||
app = ToolApp(sa_session=import_store.sa_session, tool_app_config=tool_app_config, datatypes_registry=datatypes_registry, object_store=object_store, tool_data_table_manager=tdtm) | ||
# TODO: could try to serialize just a minimal tool variant instead of the whole thing ? | ||
tool_source = get_tool_source(os.path.join(WORKING_DIRECTORY, 'tool.xml')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a symlink ? Or load from original location ?
lib/galaxy/tools/evaluation.py
Outdated
Problems: | ||
Do we want to build a full app ? Probably not. | ||
Do we want to supply galaxy config file ? Probably not. | ||
Dataset conversions ... would need to move outside of ToolEvaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to take a look at this TODO today - I'll report back if I make any progress on determining if I think it is something I could plausibly help with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some more work locally, just didn't get around to pushing and updating this here.
I haven't solved / thought about the dataset conversions yet, but I think most of the rest I have figured out.
I'm positive I can wrap this up once we're done with the release process / the obvious bugs in the new release are worked out.
I think in the overall picture it would be good to work out how a deferred dataset would look like in the database and how the UI could look like, any chance you could take a look at this ?
f9d8b27
to
1284635
Compare
lib/galaxy/model/store/__init__.py
Outdated
self.collections_attrs.append(collection) | ||
self.included_collections.append(collection) | ||
for dataset in collection.collection.dataset_instances: | ||
self.add_dataset(dataset, include_files=include_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a variant of this called export_collection in the deferred data branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I copied that into 2c64e13
2a12fbf
to
1fa9da1
Compare
2408a6b
to
1b7fcd9
Compare
lol, that just worked and I have no idea how 😆 |
We're still doing a first pass of |
ea3a8ec
to
e709a73
Compare
that should build command line remotely. We can't quite get away with not running the ToolEvaluator class completely if a tool has environment variables or interactive tool entrypoints that have to be templated.
cab45eb
to
753f454
Compare
753f454
to
80659ec
Compare
The externalized commands may for instance set up conda dependency resolution.
a1da2cf
to
c2e7acd
Compare
We'd have the add `JobExportHistoryArchive` to the supported job outputs in the model export store or move the `JobExportHistoryArchive` creation from `execute()` ToolAction to the handler and skip creating `JobExportHistoryArchive` altogether, but we want to perform the job export in the Celery workers anyway, so I think it's not worth the effort. Note that exporting to file sources doesn't use `JobExportHistoryArchive`, so for those we can still create the command line remotely (and e.g materialize deferred datasets in a history).
The right fix might be to provide a session on app to avoid both of these hacks ?
Hmm, those selenium tests are not failing on my fork (https://github.com/mvdbeek/galaxy/actions/workflows/selenium.yaml), seems like something that broke recently on dev ? |
@jmchilton I think this is ... well, not exactly useful yet, but maybe we can build on this and materialize deferred datasets next ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have took me twice as long and would have been half as a good, really amazing stuff in here! We will be building cool shit based on this for years - really impressive work.
Admittedly took a very long road to get there but... feade2b |
The idea is that if we can evaluate tools close to the data we can fetch deferred data there and generate the necessary metadata that the evaluation depends on. This work should be flexible enough to occur within the job script (included in the PR), as a (celery) task that runs on a remote node (maybe with a celery-queue destination option ?) or as a side-pod (will be separate PR(s)).
Tests:
Development:
How to test the changes?
(Select all options that apply)
License