-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[22.01] Limit required element validation to safe elements #13299
[22.01] Limit required element validation to safe elements #13299
Conversation
There used to be a long-standing bug in setting up the MetadataValidator if default values were used. We fixed this in galaxyproject@e194ef9, but that means we're now checking for all non-optional values before running a tool. We have a ton of non-optional MetadatElement items in datatypes that should maybe be optional (an indication might be if `default` and `no_value` are specified and set to the same value ... but I'm not sure that's a 100% thing). So I think that reviewing this requires domain knowledge of the datatypes and what elements are really required, and I'm not sure we can do this in a timely fashion, and not break something that used to work. So my suggestion is that we add `check_required_metadata=True` on datatypes for which we have checked that non-optional metadata elements are really non-optional. For those metadata elements for which this is not the case we skip the validation as we would do prior to galaxyproject@e194ef9. As an example I have marked RDS and RData with check_required_metadata and added a test for check_required_metadata.
9418379
to
370d343
Compare
Given the problems with getting an understanding what But I'm afraid that this will cement the current state of the art, as long as we do not have a plan how to achieve a good state. This should be that all required metadata is checked again. Maybe we can add a deprecation date to it (e.g. 1 or 2 releases)? Might it be an alternative to check the dozen cases identified here: galaxyproject/tools-iuc#4367 (comment) (but we might need a better grep because the definition of some metadataelements is broken over multiple lines) I guess it might be save to set the metadataelements to optional in case that they are not touched by |
Do we really need a plan ? This feature has been broken ... maybe from the start ? We didn't have any bug reports specifically about this. It feels like we're potentially shooting ourselves in the foot for little gain. If there's a good usecase that would require all datatypes to handle optionality correctly I would maybe revise my position, but I don't see it now. Mind you, for some specific datatypes this would makes sense, like in the BAM datatype we could say that the index is required and if it's not there you can't run tools on the dataset. But we handle this now by saying that setting metadata failed ... so, even for this case (which this PR gives us the option to do) the gain is small.
I think you're overestimating the coverage of datatypes that the IUC gives us, and as you correctly identified, we can never test datatypes that are |
You are probably right, but Anyway, if we do it I would argue pro documentation.
Note, that the list is not based on the results on the IUC tests, but a simple grep in the sources. And I'm optimistic that those coincide with the failing tool tests. But I just don't have the time to look in detail.
This is not entirely true unless we merge #12073 .. at the moment we can test anything in planemo tool tests .. as far as I got it Galaxy does not check the datatype of data parameters :) But I guess metadata will be completely screwed up in these cases. So probably irrelevant. |
Of course it is redundant, but optional can't be trusted, that's the point of the PR. |
So in Maybe this is an argument for your change .. because the old tool will stop working once we got strict metadata checking. Or even better: we could hide strict metadata checking behind a profile version? |
defintion. Fixes ``` 2022-02-03 12:40:03,491 ERROR [galaxy.web.framework.decorators] Uncaught exception in exposed API method: Traceback (most recent call last): File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/web/framework/decorators.py", line 320, in decorator rval = func(self, trans, *args, **kwargs) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/webapps/galaxy/api/tools.py", line 563, in create return self._create(trans, payload, **kwd) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/webapps/galaxy/api/tools.py", line 631, in _create vars = tool.handle_input(trans, incoming, history=target_history, use_cached_job=use_cached_job, input_format=input_format) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/__init__.py", line 1767, in handle_input all_params, all_errors, rerun_remap_job_id, collection_info = self.expand_incoming(trans=trans, incoming=incoming, request_context=request_context, input_format=input_format) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/__init__.py", line 1747, in expand_incoming populate_state(request_context, self.inputs, expanded_incoming, params, errors, simple_errors=False, input_format=input_format) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 348, in populate_state _populate_state_legacy(request_context, inputs, incoming, state, errors=errors, context=context, check=check, simple_errors=simple_errors) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 408, in _populate_state_legacy state[input.name] = input.get_initial_value(request_context, context) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/parameters/basic.py", line 1001, in get_initial_value options = list(self.get_options(trans, other_values)) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/parameters/basic.py", line 890, in get_options return self.options.get_options(trans, other_values) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/parameters/dynamic_options.py", line 763, in get_options rval = filter.filter_options(rval, trans, other_values) File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/tools/parameters/dynamic_options.py", line 207, in filter_options if not r.metadata.element_is_set(self.key): File "/tmp/tmp778b108w/galaxy-dev/lib/galaxy/model/metadata.py", line 175, in element_is_set meta_spec = self.parent.metadata.spec[name] KeyError: 'groups' ``` which happens when forcing a tabular dataset into a more specialized input https://github.com/galaxyproject/tools-iuc/blob/e22821f51ae326b3696a9456cbc44f9982512a52/tools/mothur/homova.xml#L48 Arguably this is a tool issue, but we already log a KeyError above and it seems more of a linting issue than something that should cause a test failure.
e5c9ec1
to
f7eaa0a
Compare
@@ -243,7 +243,10 @@ def missing_meta(self, dataset, check=None, skip=None): | |||
continue | |||
if not check and len(skip) == 0 and dataset.metadata.spec[key].get("optional"): | |||
continue # we skip check for optional and nonrequested values here | |||
if not dataset.metadata.element_is_set(key): | |||
if not dataset.metadata.element_is_set(key) and (check or dataset.metadata.spec[key].check_required_metadata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is to keep the strict check, but add a profile check. So metadata will be checked only for tools with profile >= 22.01. Then wenn do not need the additional variable check_required_metadata and old tools will still run.
In #13296 we can then take our time and document it the way we want to have this from 22.01.
Or we just go for the check and doc for 22.05...
Clearly adding a profile seems quite an effort to add to the approx 5 places where the function is called. But in my opinion this is the better way forward.
Rdata always has a version, but the tool uses the wrong datatype. So making it optional seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then that breaks runs on existing data.
Let's move ahead here so we can deploy on .org and start the release testing. |
There used to be a long-standing bug in setting up the
MetadataValidator
if default values were used. We fixed this ine194ef9
(#13139), but that means we're now checking for all non-optional values before running a tool. We have a ton of non-optionalMetadatElement
items in datatypes that should maybe be optional (an indication might be ifdefault
andno_value
are specified and set to the same value ... but I'm not sure that's a 100% thing). So I think that reviewing this requires domain knowledge of the datatypes and what elements are really required, and I'm not sure we can do this in a timely fashion, and not break something that used to work.So my suggestion is that we add
check_required_metadata=True
on datatypes for which we have checked that non-optional metadata elements are really non-optional. For those metadata elements for which this is not the case we skip the validation as we would do prior toe194ef9
(#13139).As an example I have marked RDS and RData with
check_required_metadata
and added a test forcheck_required_metadata
.How to test the changes?
(Select all options that apply)
License