-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection Operations (Limited) #2434
Collection Operations (Limited) #2434
Conversation
ce263a0
to
c886f1e
Compare
c886f1e
to
9a86a03
Compare
Test were passing but I rebased to fix conflict in bundled artifacts. |
</test> | ||
</tests> | ||
<help> | ||
This tool takes a list dataset collction and filters out the failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unclear on what input this tool wants.
This special class of tools leverages the infrastructure for tool inputs, tool state tracking, tool module for workflows, tool API, etc... without actually producing command-line jobs. Instead these tools are provided the input model objects and are expected to produce output model objects directly. This provides an oppertunity to copy HDAs without copying the underlying datasets. The first driving use case for these tools are also included - namely tools that allow zipping and unzipping paired collections. These tools can be mapped over lists (e.g. list:paired to (list, list) or the inverse) using much of the existing infrastructure for tools. Test cases included that validate these work with mapping operations and in workflows. The most obvious advantage of these versus traditional tools that do the same thing is that the data isn't copied on disk - new HDAs are created directly from the source datasets. Testing: This PR includes various API test cases for functionality, these can be run with the following command: ``` ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_unzip_collection ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_zip_inputs ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_zip_list_inputs ./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_zip_collections ```
This differs from a traditional tool in that its inputs don't need to be in an 'ok' state and instead of creating new datasets and duplicating data on disk, new HDAs are created from the existing datasets.
Testing: ``` ./run_tests.sh -framework -id __FLATTEN__ ```
4420830
to
80ac816
Compare
@martenson Thanks for the detailed review.
This still needs more polish - I'll admit that but I'd like to have something in by the GCC and something at least a little usable for large collections that encounter errors. |
Should address some confusion caused by the tool and reference in this review (galaxyproject#2434 (comment)).
@martenson - my response to your three points.
|
@jmchilton I did try to turn it off and back on again in fact |
And yes, the filtering on paired is definitely not a blocker I will dig around the invis collection a bit more. Can you still not reproduce it? |
Okay - I can get an invisible collection if I map something that doesn't result in a collection type that the GUI knows how to handle... so if I map over a I can get other tools that produce collections to result in the same behavior - if you map a tool that produces a list over a list - a totally reasonable and useful action - you'd get a |
nice work @jmchilton - thank you for pushing this forward! |
@jmchilton Is there any chance that this can be backported to 16.01 release? |
@Takadonet we backport only bugfixes; two releases backport would have to be a security bug or something similar... |
Sorry, let me rephrase, you THINK it will work on 16.01? Have user begging for this feature for many months. |
Overview
This PR introduces Tool-derived framework-level plumbing for dealing collections at the model level instead of at the file level, allowing operations that generate new HDAs and collections without duplicating Dataset objects. Together operations vastly expand the expressiveness of Galaxy workflows.
What's Different
This is progress toward #1644 and contains only the parts of the ill-fated #1313 that do not touch JavaScript or the idea of operations. Therefore this PR doesn't provide the ability to filter out datasets in a collection based on metadata or group datasets into lists of lists based on expressions. Loosing the former means it is impossible to use different tools in workflow for instance based on sample metadata (switch algorithms or switch various flags based on read statistics for instance) and loosing the latter vastly decreases the power of lists of lists - and makes it much harder to imagine workflows that use statistical clustering for instance to affect workflow structure.
This variant of these tools however does include better dataset labeling, actual help text, and introduces a mechanism to ensure they are in the tool panel by default.
The Collection Operations:
Issues and Notes
Unlike more traditional tools, these do interactive checking of inputs so they cannot be "queued up" ahead of time during interactive use. They are workflow aware though, so there is no problem using them with workflows.
Testing
Each operation contains tests (either in the form of API tests or simple tool tests that will be included with
-framwork
tests). These can all be executed using: