xsd, docs, and test for sort_by (discover_datasets) #9684

bernt-matthias · 2020-04-29T10:34:01Z

Since planemo lint failed so far for this attribute we can be quite sure that this was never used (in a tool)

order of sort_comp and sort_by was wrong for sort_by in collections
sort_by was missing for data output
adds a test (though we can't check the generated order .. I manually tested with planemo serve)

follow up on #9674

test/functional/tools/discover_sort_by.xml

bernt-matthias · 2020-04-30T12:36:08Z

seems ready

mvdbeek · 2020-04-30T12:38:46Z

test/functional/tools/discover_sort_by.xml

+  <tests>
+    <test>
+      <param name="input1" value="tinywga.fam" />
+      <output_collection name="collection_numeric_name" type="list" count="11"/>


Don't you want to ascertain the new sort order ? You can list <element/> items here

That's true, but I did not expect the order of the elements is relevant. Lets try.

Test currently do not check for the order, but only if the elements contained in the test definition are produced by the test:

galaxy/lib/galaxy/tool_util/verify/interactor.py

Line 717 in 94390c2

def verify_elements(element_objects, element_tests):

This would be (quite) easy to change, but

the test definition is a dict and dictionary order can only be used for python >= 3.7

dictionary order would always be lexicographic as defined for strings

it seems that the dictionary order is lost in the get_tool_tests function called here

galaxy/lib/galaxy/tool_util/verify/interactor.py

Line 803 in 94390c2

tool_test_dicts = tool_test_dicts or galaxy_interactor.get_tool_tests(tool_id, tool_version=tool_version)

Seems difficult.

Spontaneous ideas

sort the test elements in verify_elements as given by sort_by

leave it as is and consider the test just checking for bugs

the test definition is a dict and dictionary order can only be used for python >= 3.7

3.6, it was made official in 3.7

dictionary order would always be lexicographic as defined for strings

this should be by insertion order on newer python's, right ?

Anyway, I agree this is not as straightforward. Maybe add a in the test, and then we can reconsider once sort order is guaranteed or we decide to write an API test, which could also assert this.

3.6, it was made official in 3.7

true

this should be by insertion order on newer python's, right ?

also true, my mistake

Anyway, I agree this is not as straightforward. Maybe add a in the test, and then we can reconsider once sort order is guaranteed or we decide to write an API test, which could also assert this.

Still I have the urge to understand this :) Somewehere here:

galaxy/lib/galaxy/tool_util/verify/interactor.py

Line 141 in 94390c2

def get_tool_tests(self, tool_id, tool_version=None):

the order seems to get lost. Any suggestions where I could dig?

bernt-matthias · 2020-05-03T20:51:08Z

Got it.

Problem seems to be that format_return_as_json is called with pretty=True which sorts keys in the generated json here

galaxy/lib/galaxy/web/framework/decorators.py

Line 323 in 7543b91

def format_return_as_json(rval, jsonp_callback=None, pretty=False):

pretty=True because debug is True here

galaxy/lib/galaxy/web/framework/decorators.py

Line 297 in 7543b91

rval = format_return_as_json(rval, jsonp_callback, pretty=trans.debug)

.. is this only in planemo calls?

Curious if this causes tests to fail...

One could add a tests with wrong element order ... ?

We should add docs here if this gets accepted https://docs.galaxyproject.org/en/master/dev/schema.html#tool-tests-test-output-collection

mvdbeek · 2020-05-03T21:04:54Z

lib/galaxy/web/framework/decorators.py

@@ -326,7 +326,7 @@ def format_return_as_json(rval, jsonp_callback=None, pretty=False):

    Use `pretty=True` to return pretty printed json.
    """
-    dumps_kwargs = dict(indent=4, sort_keys=True) if pretty else {}
+    dumps_kwargs = dict(indent=4) if pretty else {}


I don't think we should do this until we sunset python 3.5. You could add the index of the element here and then sort the test elements where needed. Do you maybe want to just add the xsd changes in for 20.05 ?

Your are probably right -- even if it seems safe in most places where format_return_as_json is used (pretty is either set to False or set to True only in debug mode).

The only place where its True is here:

galaxy/lib/galaxy/webapps/galaxy/api/workflows.py

Line 485 in 7eb2fcb

return format_return_as_json(ret_dict, pretty=True)

An alternative might be to add another parameter (e.g. sort_keys=False) to format_return_as_json. Tell me your preference and I will go ahead :)

I'd go with the xsd changes only, but if you want to dig deeper I think that adding the index of the element to the test definition seems more solid to me.

bernt-matthias · 2020-05-06T09:42:44Z

Done :)

Also changed the text of the Exception to something like:

Failed to find identifier '2' of test collection ['1', '10', '2'] in the tool generated collection elements ['1', '2', '10'] (at the correct position)

before the complete dict for the tool generated collection elements was dumped .. For me there was never any information of interest except for the IDs. Do you think this is OK.

bernt-matthias · 2020-05-06T09:49:14Z

Still, for <output><discovered_dataset> order is not checked. Only the first element needs to be at the correct position as far as I see.

test/functional/tools/discover_sort_by.xml

mvdbeek · 2020-09-12T07:21:41Z

lib/galaxy/tool_util/verify/interactor.py

-                template = "Failed to find identifier [%s] for testing, tool generated collection elements [%s]"
-                message = template % (element_identifier, element_objects)
+            if element is None:
+                template = "Failed to find identifier '%s' of test collection %s in the tool generated collection elements %s (at the correct position)"


~~Isn't the corrent position i ?~~
Is the correct position the index of the iteration over sorted_test_ids ?

and wouldn't this mean you can simplify the logic to

for element_index, expected_element_identifier in enumerate(sorted_test_ids): assert element_objects[element_index]["element_identifier"] == expected_element_identifier

?

(You could add some length handling for missing elements)

No, the test is just checking if the elements listed in the test occur in the same order in the generated elements.

Thereby the tool author does not need to list all elements (which might be very many).

Alright, then how about this:

for element_index, expected_element_identifier in enumerate(sorted_test_ids): if expected_element_identifier is None: break assert element_objects[element_index]["element_identifier"] == expected_element_identifier

That way it's clear at which position the test data differs, and I think the logic is a bit easier to follow ?

I don't see why this should work :( ..

The None entries from here:

galaxy/lib/galaxy/tool_util/verify/interactor.py

Line 728 in a5b898c

sorted_test_ids = [None] * len(element_tests)

should all be overwritten by the actual identifiers. I only use [None] * len(element_tests) to initialize a list of the needed length.

I extended the test in order to clarify what I meant. The tool new generates 10 elements, but the test only uses three of them to verify the sorting order. The number of generated elements is tested with the count attribute.

Formally spoken we test if the list of elements in the test is a subsequence of the generated elements.

what is element_tests ? Is that not the list of specified tests?

Its a dict, looking like this

{'1': [None, {'assert_list': [{'attributes': {'expression': '^.*$'}, 'children': [], 'tag': 'has_text_matching'}], 'checksum': None, 'compare': 'diff', 'decompress': False, 'delta': 10000, 'delta_frac': None, 'element_index': 0, 'elements': {}, 'extra_files': [], 'lines_diff': 0, 'md5': None, 'metadata': {}, 'primary_datasets': {}, 'sort': False}], '10': [None, {'assert_list': [{'attributes': {'expression': '^.*$'}, 'children': [], 'tag': 'has_text_matching'}], 'checksum': None, 'compare': 'diff', 'decompress': False, 'delta': 10000, 'delta_frac': None, 'element_index': 2, 'elements': {}, 'extra_files': [], 'lines_diff': 0, 'md5': None, 'metadata': {}, 'primary_datasets': {}, 'sort': False}], '2': [None, {'assert_list': [{'attributes': {'expression': '^.*$'}, 'children': [], 'tag': 'has_text_matching'}], 'checksum': None, 'compare': 'diff', 'decompress': False, 'delta': 10000, 'delta_frac': None, 'element_index': 1, 'elements': {}, 'extra_files': [], 'lines_diff': 0, 'md5': None, 'metadata': {}, 'primary_datasets': {}, 'sort': False}]}

Maybe calling it element_index is a bad idea if it isn't expected to match the actual element index. Something like expected_order_index maybe would be better?

I don't really have the throughput to deal with this, but maybe this change requiring the order to match should be hidden behind a profile version tag for the tool? If there wasn't a ton of tool breakage maybe it is not worth the effort, I will just update Planemo regardless.

- order of sort_comp and sort_by was wrong for sort_by in collections - sort_by was missing for data output - adds a test (though we can't check the generated order .. I manually tested with `planemo serve`)

by adding an element_index attribute to the element attributes. background: test collection elements were sorted alphabetically because: 1 they are stored in a dict which is unsorted in py<3.6 2 galaxy.web.framework.format_return_as_json dumps this dict to json with sort_keys=True (at least in debug mode which affects planemo t) dropping 2 would be sufficient for py>=3.6, since the test collection elements would be stored in insertion order in the dict.

create more elements, but still list only 3 in the test

mvdbeek · 2020-09-18T07:58:33Z

That's definitely a nice improvement, thanks a lot @bernt-matthias!

jmchilton · 2020-10-15T18:42:31Z

lib/galaxy/tool_util/parser/xml.py

        element_attrib = dict(element.attrib)
        identifier = element_attrib.pop('name', None)
        if identifier is None:
            raise Exception("Test primary dataset does not have a 'identifier'")
        element_tests[identifier] = __parse_test_attributes(element, element_attrib, parse_elements=True)
+        element_tests[identifier][1]["element_index"] = idx


This breaks YAML parsing because the general interface assumes this is here but only the XML parser adds it. This would break CWL tools, YAML tools, and Galaxy workflows (this last one breaks Planemo CI after this was published). So I'm going to refactor this addition into TestCollectionOutputDef - it will just make sure this is populated there. I hope that makes sense.

jmchilton · 2020-10-15T19:38:38Z

I think my issues are addressed in https://github.com/galaxyproject/galaxy/pull/10434/files. What do y'all think about that approach? Sorry for not catching this sooner.

mvdbeek reviewed Apr 29, 2020

View reviewed changes

test/functional/tools/discover_sort_by.xml Outdated Show resolved Hide resolved

bernt-matthias force-pushed the topic/sort_by2 branch from a98c123 to 38e05f2 Compare April 29, 2020 10:50

galaxybot added the triage label Apr 29, 2020

galaxybot added this to the 20.09 milestone Apr 29, 2020

bernt-matthias force-pushed the topic/sort_by2 branch from 38e05f2 to 020326a Compare April 30, 2020 10:33

bernt-matthias added area/documentation area/testing and removed triage labels Apr 30, 2020

bernt-matthias changed the title ~~fix docs & add test for discover_datasets sort_by~~ xsd, docs, and test for sort_by (discover_datasets) Apr 30, 2020

mvdbeek reviewed Apr 30, 2020

View reviewed changes

bernt-matthias force-pushed the topic/sort_by2 branch 2 times, most recently from 3a5d621 to 7543b91 Compare May 3, 2020 20:45

mvdbeek reviewed May 3, 2020

View reviewed changes

mvdbeek self-assigned this Sep 8, 2020

mvdbeek reviewed Sep 12, 2020

View reviewed changes

test/functional/tools/discover_sort_by.xml Outdated Show resolved Hide resolved

mvdbeek approved these changes Sep 12, 2020

View reviewed changes

mvdbeek reviewed Sep 12, 2020

View reviewed changes

bernt-matthias added 4 commits September 14, 2020 13:56

fix docs & add test for discover_datasets sort_by

77fc9a0

- order of sort_comp and sort_by was wrong for sort_by in collections - sort_by was missing for data output - adds a test (though we can't check the generated order .. I manually tested with `planemo serve`)

fix sorting of coutput collection elements in test definition

d747bf6

remove outdated comment

8b94088

bernt-matthias force-pushed the topic/sort_by2 branch from 2f72455 to 8b94088 Compare September 14, 2020 12:07

clarify test

a5b898c

create more elements, but still list only 3 in the test

mvdbeek merged commit aace792 into galaxyproject:dev Sep 18, 2020

mvdbeek added the kind/enhancement label Sep 18, 2020

jmchilton reviewed Oct 15, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xsd, docs, and test for sort_by (discover_datasets) #9684

xsd, docs, and test for sort_by (discover_datasets) #9684

bernt-matthias commented Apr 29, 2020 •

edited

bernt-matthias commented Apr 30, 2020

mvdbeek Apr 30, 2020

bernt-matthias Apr 30, 2020

bernt-matthias Apr 30, 2020

mvdbeek Apr 30, 2020

bernt-matthias Apr 30, 2020

bernt-matthias commented May 3, 2020 •

edited

mvdbeek May 3, 2020 •

edited

bernt-matthias May 4, 2020

mvdbeek May 4, 2020

bernt-matthias commented May 6, 2020

bernt-matthias commented May 6, 2020

mvdbeek Sep 12, 2020 •

edited

mvdbeek Sep 12, 2020 •

edited

bernt-matthias Sep 14, 2020

mvdbeek Sep 14, 2020 •

edited

bernt-matthias Sep 15, 2020

mvdbeek Sep 15, 2020

bernt-matthias Sep 15, 2020

jmchilton Oct 15, 2020

mvdbeek commented Sep 18, 2020

jmchilton Oct 15, 2020

jmchilton commented Oct 15, 2020

xsd, docs, and test for sort_by (discover_datasets) #9684

xsd, docs, and test for sort_by (discover_datasets) #9684

Conversation

bernt-matthias commented Apr 29, 2020 • edited

bernt-matthias commented Apr 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernt-matthias commented May 3, 2020 • edited

mvdbeek May 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernt-matthias commented May 6, 2020

bernt-matthias commented May 6, 2020

mvdbeek Sep 12, 2020 • edited

Choose a reason for hiding this comment

mvdbeek Sep 12, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvdbeek Sep 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvdbeek commented Sep 18, 2020

Choose a reason for hiding this comment

jmchilton commented Oct 15, 2020

bernt-matthias commented Apr 29, 2020 •

edited

bernt-matthias commented May 3, 2020 •

edited

mvdbeek May 3, 2020 •

edited

mvdbeek Sep 12, 2020 •

edited

mvdbeek Sep 12, 2020 •

edited

mvdbeek Sep 14, 2020 •

edited