Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsd, docs, and test for sort_by (discover_datasets) #9684

Merged
merged 5 commits into from Sep 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion lib/galaxy/tool_util/parser/xml.py
Expand Up @@ -605,12 +605,13 @@ def __parse_output_collection_elem(output_collection_elem):

def __parse_element_tests(parent_element):
element_tests = {}
for element in parent_element.findall("element"):
for idx, element in enumerate(parent_element.findall("element")):
element_attrib = dict(element.attrib)
identifier = element_attrib.pop('name', None)
if identifier is None:
raise Exception("Test primary dataset does not have a 'identifier'")
element_tests[identifier] = __parse_test_attributes(element, element_attrib, parse_elements=True)
element_tests[identifier][1]["element_index"] = idx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks YAML parsing because the general interface assumes this is here but only the XML parser adds it. This would break CWL tools, YAML tools, and Galaxy workflows (this last one breaks Planemo CI after this was published). So I'm going to refactor this addition into TestCollectionOutputDef - it will just make sure this is populated there. I hope that makes sense.

return element_tests


Expand Down
33 changes: 23 additions & 10 deletions lib/galaxy/tool_util/verify/interactor.py
Expand Up @@ -708,12 +708,6 @@ def verify_extra_files(extra_files):
def verify_collection(output_collection_def, data_collection, verify_dataset):
name = output_collection_def.name

def get_element(elements, id):
for element in elements:
if element["element_identifier"] == id:
return element
return False

expected_collection_type = output_collection_def.collection_type
if expected_collection_type:
collection_type = data_collection["collection_type"]
Expand All @@ -731,16 +725,35 @@ def get_element(elements, id):
raise AssertionError(message)

def verify_elements(element_objects, element_tests):
sorted_test_ids = [None] * len(element_tests)
for element_identifier, element_test in element_tests.items():
if isinstance(element_test, dict):
element_outfile, element_attrib = None, element_test
else:
element_outfile, element_attrib = element_test
sorted_test_ids[element_attrib["element_index"]] = element_identifier

i = 0
for element_identifier in sorted_test_ids:
element_test = element_tests[element_identifier]
if isinstance(element_test, dict):
element_outfile, element_attrib = None, element_test
else:
element_outfile, element_attrib = element_test

element = None
while i < len(element_objects):
if element_objects[i]["element_identifier"] == element_identifier:
element = element_objects[i]
i += 1
break
i += 1

element = get_element(element_objects, element_identifier)
if not element:
template = "Failed to find identifier [%s] for testing, tool generated collection elements [%s]"
message = template % (element_identifier, element_objects)
if element is None:
template = "Failed to find identifier '%s' of test collection %s in the tool generated collection elements %s (at the correct position)"
Copy link
Member

@mvdbeek mvdbeek Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the corrent position i ?
Is the correct position the index of the iteration over sorted_test_ids ?

Copy link
Member

@mvdbeek mvdbeek Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and wouldn't this mean you can simplify the logic to

for element_index, expected_element_identifier in enumerate(sorted_test_ids):
    assert element_objects[element_index]["element_identifier"] == expected_element_identifier

?

(You could add some length handling for missing elements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the test is just checking if the elements listed in the test occur in the same order in the generated elements.

Thereby the tool author does not need to list all elements (which might be very many).

Copy link
Member

@mvdbeek mvdbeek Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then how about this:

for element_index, expected_element_identifier in enumerate(sorted_test_ids):
    if expected_element_identifier is None:
        break
    assert element_objects[element_index]["element_identifier"] == expected_element_identifier

That way it's clear at which position the test data differs, and I think the logic is a bit easier to follow ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should work :( ..

The None entries from here:

sorted_test_ids = [None] * len(element_tests)

should all be overwritten by the actual identifiers. I only use [None] * len(element_tests) to initialize a list of the needed length.

I extended the test in order to clarify what I meant. The tool new generates 10 elements, but the test only uses three of them to verify the sorting order. The number of generated elements is tested with the count attribute.

Formally spoken we test if the list of elements in the test is a subsequence of the generated elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is element_tests ? Is that not the list of specified tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a dict, looking like this

{'1': [None, {'assert_list': [{'attributes': {'expression': '^.*$'}, 'children': [], 'tag': 'has_text_matching'}], 'checksum': None, 'compare': 'diff', 'decompress': False, 'delta': 10000, 'delta_frac': None, 'element_index': 0, 'elements': {}, 'extra_files': [], 'lines_diff': 0, 'md5': None, 'metadata': {}, 'primary_datasets': {}, 'sort': False}], '10': [None, {'assert_list': [{'attributes': {'expression': '^.*$'}, 'children': [], 'tag': 'has_text_matching'}], 'checksum': None, 'compare': 'diff', 'decompress': False, 'delta': 10000, 'delta_frac': None, 'element_index': 2, 'elements': {}, 'extra_files': [], 'lines_diff': 0, 'md5': None, 'metadata': {}, 'primary_datasets': {}, 'sort': False}], '2': [None, {'assert_list': [{'attributes': {'expression': '^.*$'}, 'children': [], 'tag': 'has_text_matching'}], 'checksum': None, 'compare': 'diff', 'decompress': False, 'delta': 10000, 'delta_frac': None, 'element_index': 1, 'elements': {}, 'extra_files': [], 'lines_diff': 0, 'md5': None, 'metadata': {}, 'primary_datasets': {}, 'sort': False}]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe calling it element_index is a bad idea if it isn't expected to match the actual element index. Something like expected_order_index maybe would be better?

I don't really have the throughput to deal with this, but maybe this change requiring the order to match should be hidden behind a profile version tag for the tool? If there wasn't a ton of tool breakage maybe it is not worth the effort, I will just update Planemo regardless.

eo_ids = [_["element_identifier"] for _ in element_objects]
message = template % (element_identifier, sorted_test_ids,
eo_ids)
raise AssertionError(message)

element_type = element["element_type"]
Expand Down
7 changes: 6 additions & 1 deletion lib/galaxy/tool_util/xsd/galaxy.xsd
Expand Up @@ -4219,6 +4219,11 @@ More information can be found on Planemo's documentation for
<xs:documentation xml:lang="en">Format (or datatype) of discovered datasets (an alias with ``format``).</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name="sort_by" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation xml:lang="en">A string `[reverse_][SORT_COMP_]SORTBY` describing the desired sort order of the collection elements. `SORTBY` can be `filename`, `name`, `designation`, `dbkey` and the optional `SORT_COMP` can be either `lexical` or `numeric`. Default is lexical sorting by filename.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name="visible" type="xs:boolean" use="optional">
<xs:annotation>
<xs:documentation xml:lang="en">Indication if this dataset is visible in output history. This defaults to ``false``, but probably shouldn't - be sure to set to ``true`` if that is your intention.</xs:documentation>
Expand Down Expand Up @@ -4274,7 +4279,7 @@ Galaxy, including:
</xs:attribute>
<xs:attribute name="sort_by" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation xml:lang="en">A string `[reverse_]SORTBY[_SORT_COMP]` describing the desired sort order of the collection elements. `SORTBY` can be `filename`, `name`, `designation`, `dbkey` and the optional `SORT_COMP` can be either `lexical` or `numeric`. Default is lexical sorting by filename.</xs:documentation>
<xs:documentation xml:lang="en">A string `[reverse_][SORT_COMP_]SORTBY` describing the desired sort order of the collection elements. `SORTBY` can be `filename`, `name`, `designation`, `dbkey` and the optional `SORT_COMP` can be either `lexical` or `numeric`. Default is lexical sorting by filename.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name="visible" type="xs:boolean" use="optional">
Expand Down
70 changes: 70 additions & 0 deletions test/functional/tools/discover_sort_by.xml
@@ -0,0 +1,70 @@
<tool id="discover_sort_by" name="discover_sort_by" version="0.1.0">
<command><![CDATA[
for i in \$(seq 1 10);
do
echo "\$i" > \$i.txt;
done
]]></command>
<inputs/>
<outputs>
<collection name="collection_numeric_name" type="list" label="num">
<discover_datasets pattern="__name_and_ext__" sort_by="numeric_name"/>
</collection>
<collection name="collection_rev_numeric_name" type="list" label="num rev">
<discover_datasets pattern="__name_and_ext__" sort_by="reverse_numeric_name"/>
</collection>
<collection name="collection_lexical_name" type="list" label="num">
<discover_datasets pattern="__name_and_ext__" sort_by="lexical_name" />
</collection>
<data name="data_reverse_lexical_name">
<discover_datasets pattern="__name_and_ext__" format="txt" assign_primary_output="true" sort_by="reverse_lexical_name" visible="true"/>
</data>
</outputs>
<tests>
<test expect_num_outputs="4">
<param name="input1" value="tinywga.fam" />
<output_collection name="collection_numeric_name" type="list" count="10">
<element name="1">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
<element name="2">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
<element name="10">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
</output_collection>
<output_collection name="collection_rev_numeric_name" type="list" count="10">
<element name="10">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
<element name="2">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
<element name="1">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
</output_collection>
<output_collection name="collection_lexical_name" type="list" count="10">
<element name="1">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
<element name="10">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
<element name="2">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</element>
</output_collection>
<output name="data_reverse_lexical_name">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
<discovered_dataset designation="10">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</discovered_dataset>
<discovered_dataset designation="1">
<assert_contents><has_text_matching expression="^.*$"/></assert_contents>
</discovered_dataset>
</output>
</test>
</tests>
</tool>
1 change: 1 addition & 0 deletions test/functional/tools/samples_tool_conf.xml
Expand Up @@ -164,6 +164,7 @@
<tool file="collection_creates_dynamic_nested_fail.xml" />
<tool file="collection_cat_group_tag.xml" />
<tool file="collection_cat_group_tag_multiple.xml" />
<tool file="discover_sort_by.xml" />
<tool file="expression_forty_two.xml" />
<tool file="expression_parse_int.xml" />
<tool file="expression_log_line_count.xml" />
Expand Down