Problems with Collection output created via structured_like from a data input with multiple=True #7392

blankenberg · 2019-02-20T15:00:08Z

I have a tool with an input of type data and multiple=True:

<param name="input_input" type="data" label="Input" format="anvio_db" optional="True" multiple="True" argument="" help="Anvi'o database for migration"/>

I want to create an output collection in a 1:1 fashion against this input dataset list:

<collection name="output_input" type="list" label="${tool.name} on ${on_string}: Input" structured_like="input_input" format_source="input_input" metadata_source="input_input"/>

If I use the ui-tab to switch the input to 'Dataset collections' it almost works as expected. However, the metadata is not properly propagated before command line generation. The values for every item in the new collection are inherited from the first element of the input dataset 'list', where they should be copied from each parallel element. In the below tool example case metadata.anvio_basename is incorrect in the generated command-line. I could work-around this metadata issue by instead directly using the metadata of the actual input, but that is bad hack.

When selecting 1+ datasets in the standard 'Multiple datasets' mode, it doesn't work at all, and instead complains that there is no input collection (there isn't, its just a 'list' of datasets). But this should work.

galaxy.tools DEBUG 2019-02-20 08:49:38,905 [p:64318,w:1,m:0] [uWSGIWorker1Core2] Validated and populated state for tool request (20.455 ms)
galaxy.tools ERROR 2019-02-20 08:49:38,946 [p:64318,w:1,m:0] [uWSGIWorker1Core2] Exception caught while attempting tool execution:
Traceback (most recent call last):
  File "lib/galaxy/tools/__init__.py", line 1435, in handle_single_execution
    collection_info=collection_info,
  File "lib/galaxy/tools/__init__.py", line 1517, in execute
    return self.tool_action.execute(self, trans, incoming=incoming, set_output_hid=set_output_hid, history=history, **kwargs)
  File "lib/galaxy/tools/actions/__init__.py", line 438, in execute
    known_outputs = output.known_outputs(input_collections, collections_manager.type_registry)
  File "lib/galaxy/tools/parser/output_objects.py", line 126, in known_outputs
    collection_prototype = self.structure.collection_prototype(inputs, type_registry)
  File "lib/galaxy/tools/parser/output_objects.py", line 203, in collection_prototype
    collection_prototype = inputs[self.structured_like].collection
KeyError: 'input_input'
galaxy.tools.execute WARNING 2019-02-20 08:49:38,946 [p:64318,w:1,m:0] [uWSGIWorker1Core2] There was a failure executing a job for tool [anvi_migrate_db] - Error executing tool: 'input_input'
galaxy.tools.execute DEBUG 2019-02-20 08:49:38,946 [p:64318,w:1,m:0] [uWSGIWorker1Core2] Executed 1 job(s) for tool anvi_migrate_db request: (40.876 ms)

here is an example tool xml:

<tool id="anvi_migrate_db" name="anvi-migrate-db" version="5.3.0">
    <requirements>
        <requirement type="package" version="5.3.0">anvio</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" />
    </stdio>
    <version_command>anvi-migrate-db --version</version_command>
    <command><![CDATA[
        
    #if $input_input:
        
                #for $GXY_I, ($gxy_input_input, $gxy_output_input) in $enumerate( $zip( $input_input, $output_input ) ):
                    #if $GXY_I != 0:
                    &&
                    #end if
                    cp -R '${gxy_input_input.extra_files_path}' '${gxy_output_input.extra_files_path}'
                #end for
                
    #else
        echo ''
    #end if
 &&
 anvi-migrate-db

            #for $gxy_output_input in $output_input:
                 "${gxy_output_input.extra_files_path}/${gxy_output_input.metadata.anvio_basename}"
            #end for
            
--just-do-it

#if $str( $target_version ):
    --target-version '${target_version}'
#end if

&> '${GALAXY_ANVIO_LOG}'

    ]]></command>
    <inputs>
        <param name="input_input" type="data" label="Input" format="anvio_db" optional="True" multiple="True" argument="" help="Anvi'o database for migration"/>
        <param name="target_version" type="text" label="Target Version" value="" optional="True" argument="--target-version" help="Anvi'o will stop upgrading your database when it reaches to this version."/>
    </inputs>
    <outputs>
        <collection name="output_input" type="list" label="${tool.name} on ${on_string}: Input" structured_like="input_input" format_source="input_input" metadata_source="input_input"/>
        <data name="GALAXY_ANVIO_LOG" format="txt" label="${tool.name} on ${on_string}: Log"/>
    </outputs>
</tool>

The text was updated successfully, but these errors were encountered:

mvdbeek · 2019-02-20T15:26:25Z

I'm probably missing something here, but structured_like only works for collection input, so you'd need to make this a list input I think. The documentation says This is the name of input collection or dataset to derive "structure" but that's wrong, it only works and is designed for collections.

If you want collection in / collection out while needing access to all input elements you need to make the input a collection, I think.

blankenberg · 2019-02-20T15:37:38Z

I agree that what you are saying is what is happening, that in my second case there is no 'real' collection. But it is not correct behavior. Abstractly, a list is a list.

Additionally, IIRC, by best practice standards, tool inputs taking a list should use a standard data input with multiple=True, and not a collection=list input; this standard is predicated on treating each kind of 'list' equally.

mvdbeek · 2019-02-20T15:48:32Z

Additionally, IIRC, by best practice standards, tool inputs taking a list should use a standard data input with multiple=True, and not a collection=list input; this standard is predicated on treating each kind of 'list' equally.

I think for reductions you should use data input and multiple="True", I disagree for the case when you need to keep the structure. There it should be a normal data input if all elements are independent, otherwise a list input / list output. I mean there are substantial differences in the flow of a multiple="true" input compared to a list input, for instance the multiple="true" input would let you create a collection from a single input. Worth implementing for sure, but I don't think this is a bug (except for the documentation ...).

blankenberg · 2019-02-20T22:42:54Z

Worth implementing for sure, but I don't think this is a bug (except for the documentation ...).

I am not convinced that the stated documentation is incorrect in the ideal behavior. I think if you select a list of datasets under multiple=True then it should behave the same as if you gave it collection=list containing datasets. In fact, this is exactly what happens, with the (imho incorrect) exception of the output creation.

I mean there are substantial differences in the flow of a multiple="true" input compared to a list input, for instance the multiple="true" input would let you create a collection from a single input.

I am not sure I see these differences here. When I put the interface into 'Data collection' mode (with multiple=True), it displays as HID: collection_name (as list), note the '(as list)', and the structured output is created properly -- it should also work when given just a standard dataset list.

Otherwise, the UX greatly suffers. I can already create a collection=list with a single dataset using the 'build list' tool, but that doesn't mean it is a good idea. But if it explicitly required a collection=list, we would force a user to manually create a collection=list in order to use a tool that just consumes a list of things and creates an equivalent output.

I am not sure if I am missing something important here, as the behavior I expect seems really non-ambiguous. A list is a list when multiple=True.

blankenberg added kind/bug area/dataset-collections area/tool-framework labels Feb 20, 2019

bernt-matthias mentioned this issue Mar 5, 2019

on_string for collections selected in multiple=true inputs #7467

Open

bernt-matthias mentioned this issue Aug 6, 2019

What limits usability of collections? #8403

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with Collection output created via structured_like from a data input with multiple=True #7392

Problems with Collection output created via structured_like from a data input with multiple=True #7392

blankenberg commented Feb 20, 2019

mvdbeek commented Feb 20, 2019

blankenberg commented Feb 20, 2019

mvdbeek commented Feb 20, 2019 •

edited

blankenberg commented Feb 20, 2019

Problems with Collection output created via structured_like from a data input with multiple=True #7392

Problems with Collection output created via structured_like from a data input with multiple=True #7392

Comments

blankenberg commented Feb 20, 2019

mvdbeek commented Feb 20, 2019

blankenberg commented Feb 20, 2019

mvdbeek commented Feb 20, 2019 • edited

blankenberg commented Feb 20, 2019

mvdbeek commented Feb 20, 2019 •

edited