Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.01] Allow map-over when discovering dataset collections #5413

Conversation

Projects
None yet
4 participants
@mvdbeek
Copy link
Member

mvdbeek commented Jan 30, 2018

This would affect for example the mapping over of fastq-dump (and all other
tools in the sra-toolkit).

If one had attempted this previously the discovery phase would fail with:

galaxy.tools.parameters.output_collect ERROR 2018-01-30 08:56:46,369 Problem gathering output collection.
Traceback (most recent call last):
  File "/bioinfo/guests/mvandenb/galaxy/lib/galaxy/tools/parameters/output_collect.py", line 169, in collect_dynamic_collections
    collection
  File "/bioinfo/guests/mvandenb/galaxy/lib/galaxy/managers/collections.py", line 216, in collection_builder_for
    return builder.BoundCollectionBuilder(dataset_collection, collection_type_description)
  File "/bioinfo/guests/mvandenb/galaxy/lib/galaxy/dataset_collections/builder.py", line 84, in __init__
    raise Exception("Cannot reset elements of an already populated dataset collection.")
Exception: Cannot reset elements of an already populated dataset collection.

Instead we force the collection state to be new when we are discovering output collection datasets,
which seems reasonable to me. This includes an API testcase that would have failed previously.

@@ -180,16 +180,12 @@ def tool_output_to_structure(get_sliced_input_collection_type, tool_output, coll
collection_type_descriptions = collections_manager.collection_type_descriptions

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jan 30, 2018

Member

collection_type_descriptions is now unused and makes flake8 complain.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jan 30, 2018

Author Member

I've rebased the PR with the fix.

Allow map-over when discovering dataset collections
This would affect for example the mapping over of fastq-dump (and all other
tools in the sra-toolkit).

If one had attempted this previously the discovery phase would fail with:
```
galaxy.tools.parameters.output_collect ERROR 2018-01-30 08:56:46,369 Problem gathering output collection.
Traceback (most recent call last):
  File "/bioinfo/guests/mvandenb/galaxy/lib/galaxy/tools/parameters/output_collect.py", line 169, in collect_dynamic_collections
    collection
  File "/bioinfo/guests/mvandenb/galaxy/lib/galaxy/managers/collections.py", line 216, in collection_builder_for
    return builder.BoundCollectionBuilder(dataset_collection, collection_type_description)
  File "/bioinfo/guests/mvandenb/galaxy/lib/galaxy/dataset_collections/builder.py", line 84, in __init__
    raise Exception("Cannot reset elements of an already populated dataset collection.")
Exception: Cannot reset elements of an already populated dataset collection.
```

Instead we force the collection state to be new when we are discovering output collection datasets,
which seems reasonable to me. This includes an API testcase that would have failed previously.

@mvdbeek mvdbeek force-pushed the mvdbeek:fix_dynamic_collection_output_18_01 branch from 923ca99 to eb506b2 Jan 30, 2018

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 30, 2018

I was totally going to get to this before the 18.01 release - thanks so much for the fix. This should fix #5146 right?

Want to revert this change jmchilton@4925660 and let me know if the extra test assertion is fixed by this PR? I think it should be.

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 30, 2018

Awesome, awesome, awesome. Sorry for breaking this and thanks so much for the very clean fix - I don't know where I was going with that non-sense you deleted - this is beautiful.

@jmchilton jmchilton merged commit 54e0756 into galaxyproject:release_18.01 Jan 30, 2018

6 checks passed

api test Build finished. 344 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 344 tests run, 4 skipped, 0 failed.
Details
integration test Build finished. 79 tests run, 4 skipped, 0 failed.
Details
selenium test Build finished. 118 tests run, 2 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details
@galaxybot

This comment has been minimized.

Copy link

galaxybot commented Jan 30, 2018

This PR was merged without a milestone attached.

@mvdbeek mvdbeek added this to the 18.01 milestone Jan 30, 2018

@mvdbeek

This comment has been minimized.

Copy link
Member Author

mvdbeek commented Jan 30, 2018

No worries, I don't think you actually broke anything -- the use-case (#4768) I have for this hasn't worked properly for me in 17.09 and it does now.

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 30, 2018

I suppose since the test that I added to convince myself was broken I guess it isn't a surprise that I was mistaken in thinking it was working. At any rate thanks so much - yet another way collections are going to be super duper awesome in 18.01!

@nsoranzo nsoranzo changed the title Allow map-over when discovering dataset collections [18.01] Allow map-over when discovering dataset collections Jan 31, 2018

if tool_output.dynamic_structure:
# Two cases collection_type_source and collection_type right?
tree = UnitializedTree(collection_type_descriptions.for_type_description("list")) # list is obviously wrong...
structured_like = tool_output.structure.structured_like

This comment has been minimized.

Copy link
@jmchilton

jmchilton Feb 2, 2018

Member

#5431 made me relook at at this. So the problem here is that we should be filling out the backbone of the collection if it has a structured_like tag. This allows us to create more the jobs upfront and eases the workflow scheduling burden. If a workflow has no dynamic structures - Galaxy can prebuild all the collections and jobs - and just rely on job polling instead of workflow and job polling. Users can see more of their collections ahead of time as well.

I can take a stab at this and see if I can rearrange things. Certainly what I had before was broken - I knew it in that comment up there.

This comment has been minimized.

Copy link
@jmchilton

jmchilton Feb 2, 2018

Member

I've got it all in my head for just one second... I'm going to loose it so jotting it down here:

  • In execute sliced_input_collection_type should be sliced_input_collection_structure. That structure should never be uninitialized.
  • Pass that function in to here - use it for structured_like if we have it. If not we still need to check for collection_type_source and grab the structure but return an un-initialized tree with the same structure.

That would make structured_like work again and make type_source I think.

I'll work on this - but let me know if it sounds crazy wrong.

This comment has been minimized.

Copy link
@jmchilton
@@ -164,6 +164,9 @@ def collect_dynamic_collections(
else:
collection = has_collection

# We are adding dynamic collections, which may be precreated, but their actually state is still new!
collection.populated_state = collection.populated_states.NEW

This comment has been minimized.

Copy link
@jmchilton

jmchilton Feb 2, 2018

Member

This seems wrong, this piece of the collection is about to be materialized. Maybe the rest of the collection has new states but this one should have a populated state - at least by the end of this method.

@mvdbeek mvdbeek deleted the mvdbeek:fix_dynamic_collection_output_18_01 branch Jun 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.