Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for mapping tools with paired collection input and a structured_like tag over list:paired collections #3209

Merged
merged 5 commits into from Dec 16, 2016

Conversation

Projects
None yet
4 participants
@mvdbeek
Copy link
Member

commented Nov 22, 2016

collection input and a structured_like tag over list:paired collections.

I'm trying to fix #3202 here. This currently works, but is most likely a terrible way to do this,
and I can't get the test tool to pass.

@jmchilton do you have any advice here?

mvdbeek added some commits Nov 22, 2016

Preliminary fix for mapping tools with paired
collection input over list:paired collections.

@galaxybot galaxybot added the triage label Nov 22, 2016

@galaxybot galaxybot added this to the 17.01 milestone Nov 22, 2016

@mvdbeek mvdbeek changed the title Preliminary fix for mapping tools with paired [WIP] Preliminary fix for mapping tools with paired Nov 22, 2016

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Nov 22, 2016

So it looks like the tool test framework doesn't allow mapping over, i.e I can't pass a list:paired collection as input to a tool with paired input.
And the API testing framework doesn't seem to support the structured_like tag in tools, as that fails with

galaxy.tools: ERROR: Exception caught while attempting tool execution:
Traceback (most recent call last):
  File "/home/mvandenb/src/galaxy/lib/galaxy/tools/__init__.py", line 1192, in handle_single_execution
    job, out_data = self.execute( trans, incoming=params, history=history, rerun_remap_job_id=rerun_remap_job_id, mapping_over_collection=mapping_over_collection, execution_cache=execution_cache )
  File "/home/mvandenb/src/galaxy/lib/galaxy/tools/__init__.py", line 1272, in execute
    return self.tool_action.execute( self, trans, incoming=incoming, set_output_hid=set_output_hid, history=history, **kwargs )
  File "/home/mvandenb/src/galaxy/lib/galaxy/tools/actions/__init__.py", line 349, in execute
    known_outputs = output.known_outputs( input_collections, collections_manager.type_registry )
  File "/home/mvandenb/src/galaxy/lib/galaxy/tools/parser/output_objects.py", line 123, in known_outputs
    collection_prototype = inputs[ self.structure.structured_like ].collection
AttributeError: 'NoneType' object has no attribute 'collection'

mvdbeek and others added some commits Nov 22, 2016

@jmchilton

This comment has been minimized.

Copy link
Member

commented Dec 16, 2016

I pushed a fix to your test case - it wasn't executing the API call correctly to exhibit a subcollection mapping. I think it does now.

@jmchilton

This comment has been minimized.

Copy link
Member

commented Dec 16, 2016

If you take it out of WIP I will merge this. Thanks for fixing this up - I'd be happy to +1 a backport of this as well.

I would say though that structured_like should never really be used with inputs that are definitely paired. The more concrete you can be about outputs - the better Galaxy can reason about them upfront (I think the tools would be more intuitive also).

@mvdbeek mvdbeek changed the title [WIP] Preliminary fix for mapping tools with paired Fix for mapping tools with paired collection input and a structured_like tag over list:paired collections Dec 16, 2016

@mvdbeek mvdbeek added status/review and removed status/WIP labels Dec 16, 2016

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Dec 16, 2016

I would say though that structured_like should never really be used with inputs that are definitely paired. The more concrete you can be about outputs - the better Galaxy can reason about them upfront (I think the tools would be more intuitive also).

I agree, but how would you deal with read-trimming tools like trimmomatic and sickle, that take paired collections as input and output paired collections (I'm surprised no-one has hit this issue yet ... this seems like something everyone doing NGS needs to do). My understanding is that you could make them "list:pair" aware, and iterate over each pair individually (generating a single job per collection), but then you would lose the intrinsic parallelism of mapping a tool over a list. Or am I missing something obvious to do this?

@jmchilton

This comment has been minimized.

Copy link
Member

commented Dec 16, 2016

@mvdbeek Do sickle or trimmomatic need to know about the whole collection to process the collection? I'd think simple mapping over a tool that consumes a paired and explicitly produces a paired would be fine. So use

  <outputs>
      <collection name="output" type="paired" label="Trimmed Pair">
  </outputs>

instead of

  <outputs>
      <collection name="output" structured_like="input" label="Trimmed Pair">
  </outputs>

I don't think structured_like in this context buys you anything that the explicitly paired output doesn't. Is there something I am missing - was the structured_like doing something I don't understand?

@jmchilton jmchilton merged commit f4282c8 into galaxyproject:dev Dec 16, 2016

3 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
api test Build finished. 244 tests run, 0 skipped, 0 failed.
Details
framework test Build finished. 132 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 580 tests run, 0 skipped, 0 failed.
Details
@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2016

Yep, but by default the element identifier will not be carried over (or is that a bug as well?!).
I suppose this could be set explicitly in the output collection (or the galaxy.json file).

@jmchilton

This comment has been minimized.

Copy link
Member

commented Dec 18, 2016

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2016

That is a bug unknown to me. Are you sure about that?

I just tried it again (sickle without the structured_like tag), and it does keep the element identifier. Sorry for the false alarm.

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Dec 19, 2016

I don't think structured_like in this context buys you anything that the explicitly paired output doesn't. Is there something I am missing - was the structured_like doing something I don't understand?

@jmchilton @mvdbeek Sickle needs to use structured_like in conjunction with inherit_format="true" because it can handle all various FASTQ subtypes, see galaxyproject/tools-iuc#412 .

@jmchilton

This comment has been minimized.

Copy link
Member

commented Dec 19, 2016

Does inherit_format not work without structured_like? That sounds like a bug.

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Dec 19, 2016

Taking out structured_like but leaving in inherit_format="true" produces elements of type data. I'm not sure that without specifying what input to to inherit from you could really make sensible decisions though.

@jmchilton

This comment has been minimized.

Copy link
Member

commented Dec 19, 2016

Okay - I was confusing inherit_format with format_source. I'm not sure I'm very happy that I ever added inherit_format to the tool XML language. I understand what I was thinking - but it seems to encourage heterogeneous collections. I guess given the state of things - the current wrapper that uses structured_like is fine. Sorry for the confusion on my part.

@mvdbeek mvdbeek deleted the mvdbeek:fix_paired_list branch Jun 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.