New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xsd, docs, and test for sort_by (discover_datasets) #9684
Changes from all commits
77fc9a0
d747bf6
44f5bfd
8b94088
a5b898c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -708,12 +708,6 @@ def verify_extra_files(extra_files): | |||
def verify_collection(output_collection_def, data_collection, verify_dataset): | ||||
name = output_collection_def.name | ||||
|
||||
def get_element(elements, id): | ||||
for element in elements: | ||||
if element["element_identifier"] == id: | ||||
return element | ||||
return False | ||||
|
||||
expected_collection_type = output_collection_def.collection_type | ||||
if expected_collection_type: | ||||
collection_type = data_collection["collection_type"] | ||||
|
@@ -731,16 +725,35 @@ def get_element(elements, id): | |||
raise AssertionError(message) | ||||
|
||||
def verify_elements(element_objects, element_tests): | ||||
sorted_test_ids = [None] * len(element_tests) | ||||
for element_identifier, element_test in element_tests.items(): | ||||
if isinstance(element_test, dict): | ||||
element_outfile, element_attrib = None, element_test | ||||
else: | ||||
element_outfile, element_attrib = element_test | ||||
sorted_test_ids[element_attrib["element_index"]] = element_identifier | ||||
|
||||
i = 0 | ||||
for element_identifier in sorted_test_ids: | ||||
element_test = element_tests[element_identifier] | ||||
if isinstance(element_test, dict): | ||||
element_outfile, element_attrib = None, element_test | ||||
else: | ||||
element_outfile, element_attrib = element_test | ||||
|
||||
element = None | ||||
while i < len(element_objects): | ||||
if element_objects[i]["element_identifier"] == element_identifier: | ||||
element = element_objects[i] | ||||
i += 1 | ||||
break | ||||
i += 1 | ||||
|
||||
element = get_element(element_objects, element_identifier) | ||||
if not element: | ||||
template = "Failed to find identifier [%s] for testing, tool generated collection elements [%s]" | ||||
message = template % (element_identifier, element_objects) | ||||
if element is None: | ||||
template = "Failed to find identifier '%s' of test collection %s in the tool generated collection elements %s (at the correct position)" | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and wouldn't this mean you can simplify the logic to for element_index, expected_element_identifier in enumerate(sorted_test_ids):
assert element_objects[element_index]["element_identifier"] == expected_element_identifier ? (You could add some length handling for missing elements) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the test is just checking if the elements listed in the test occur in the same order in the generated elements. Thereby the tool author does not need to list all elements (which might be very many). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, then how about this: for element_index, expected_element_identifier in enumerate(sorted_test_ids):
if expected_element_identifier is None:
break
assert element_objects[element_index]["element_identifier"] == expected_element_identifier That way it's clear at which position the test data differs, and I think the logic is a bit easier to follow ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why this should work :( .. The
should all be overwritten by the actual identifiers. I only use I extended the test in order to clarify what I meant. The tool new generates 10 elements, but the test only uses three of them to verify the sorting order. The number of generated elements is tested with the Formally spoken we test if the list of elements in the test is a subsequence of the generated elements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is element_tests ? Is that not the list of specified tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe calling it element_index is a bad idea if it isn't expected to match the actual element index. Something like expected_order_index maybe would be better? I don't really have the throughput to deal with this, but maybe this change requiring the order to match should be hidden behind a profile version tag for the tool? If there wasn't a ton of tool breakage maybe it is not worth the effort, I will just update Planemo regardless. |
||||
eo_ids = [_["element_identifier"] for _ in element_objects] | ||||
message = template % (element_identifier, sorted_test_ids, | ||||
eo_ids) | ||||
raise AssertionError(message) | ||||
|
||||
element_type = element["element_type"] | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
<tool id="discover_sort_by" name="discover_sort_by" version="0.1.0"> | ||
<command><![CDATA[ | ||
for i in \$(seq 1 10); | ||
do | ||
echo "\$i" > \$i.txt; | ||
done | ||
]]></command> | ||
<inputs/> | ||
<outputs> | ||
<collection name="collection_numeric_name" type="list" label="num"> | ||
<discover_datasets pattern="__name_and_ext__" sort_by="numeric_name"/> | ||
</collection> | ||
<collection name="collection_rev_numeric_name" type="list" label="num rev"> | ||
<discover_datasets pattern="__name_and_ext__" sort_by="reverse_numeric_name"/> | ||
</collection> | ||
<collection name="collection_lexical_name" type="list" label="num"> | ||
<discover_datasets pattern="__name_and_ext__" sort_by="lexical_name" /> | ||
</collection> | ||
<data name="data_reverse_lexical_name"> | ||
<discover_datasets pattern="__name_and_ext__" format="txt" assign_primary_output="true" sort_by="reverse_lexical_name" visible="true"/> | ||
</data> | ||
</outputs> | ||
<tests> | ||
<test expect_num_outputs="4"> | ||
<param name="input1" value="tinywga.fam" /> | ||
<output_collection name="collection_numeric_name" type="list" count="10"> | ||
<element name="1"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
<element name="2"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
<element name="10"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
</output_collection> | ||
<output_collection name="collection_rev_numeric_name" type="list" count="10"> | ||
<element name="10"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
<element name="2"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
<element name="1"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
</output_collection> | ||
<output_collection name="collection_lexical_name" type="list" count="10"> | ||
<element name="1"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
<element name="10"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
<element name="2"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</element> | ||
</output_collection> | ||
<output name="data_reverse_lexical_name"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
<discovered_dataset designation="10"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</discovered_dataset> | ||
<discovered_dataset designation="1"> | ||
<assert_contents><has_text_matching expression="^.*$"/></assert_contents> | ||
</discovered_dataset> | ||
</output> | ||
</test> | ||
</tests> | ||
</tool> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks YAML parsing because the general interface assumes this is here but only the XML parser adds it. This would break CWL tools, YAML tools, and Galaxy workflows (this last one breaks Planemo CI after this was published). So I'm going to refactor this addition into TestCollectionOutputDef - it will just make sure this is populated there. I hope that makes sense.