Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MetadataValidator #13139

Merged
merged 14 commits into from
Jan 26, 2022
Merged

Conversation

bernt-matthias
Copy link
Contributor

@bernt-matthias bernt-matthias commented Jan 10, 2022

If no check or skip attribute is given, self.check and self.skip need to be initialized to None. Otherwise validation with value.missing_meta(check=self.check, skip=self.skip) is called with value.missing_meta(check=[""], skip=[""]).

This led to the stalling observed in #13136

Explanation: each data input parameter automatically gets a MetadataValidator. Since this validator was dysfunctional the job was scheduled anyway, but did not execute.

Appears that the validator might have been dysfunctional for a long time (since check and skip have been introduced here: bernt-matthias@b9e242e)

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

If no check or skip is given, self.check and self.skip
need to be initialised to None. Otherwise validation with
`value.missing_meta(check=self.check, skip=self.skip)`
is called with `value.missing_meta(check=[""], skip=[""])`.

This led to the stalling observed in
galaxyproject#13136

Explanation: each data input parameter automatically gets
a MetadataValidator. Since this validator was dysfunctional
the job was scheduled anyway, but did not execute.
@github-actions github-actions bot added this to the 22.01 milestone Jan 10, 2022
@mvdbeek
Copy link
Member

mvdbeek commented Jan 10, 2022

Can you add a test here ?

@bernt-matthias
Copy link
Contributor Author

Can you add a test here ?

Also thought that this would be a good idea. Will try.

disable the MetadataValidator that is added to data params
by default by adding `no_validation="true"` to the test parameters.
(see https://github.com/galaxyproject/galaxy/blob/bc939ca4e0f0404a20d2cd3fbcc922768c195c08/lib/galaxy/tools/parameters/basic.py#L1898)

Since the default is initialised with MetadataValidator() the message
was None. Now message is also initialised in __init__.

Analoguous change in DatasetOkValidator, UnspecifiedBuildValidator.
@nsoranzo
Copy link
Member

You can replace all the no_validation="true" with a better fix:

diff --git a/lib/galaxy/datatypes/data.py b/lib/galaxy/datatypes/data.py
index f7bc205d94..f3862a60e4 100644
--- a/lib/galaxy/datatypes/data.py
+++ b/lib/galaxy/datatypes/data.py
@@ -240,7 +240,7 @@ class Data(metaclass=DataMeta):
         for key, value in to_check:
             if key in skip or (not check and dataset.metadata.spec[key].get("optional")):
                 continue  # we skip check for optional and nonrequested values here
-            if not value:
+            if value is None:
                 return True
         return False
 

Values like 0, "" or [] for a metadata should be valid.

@bernt-matthias
Copy link
Contributor Author

You can replace all the no_validation="true" with a better fix:

Yes :) I just realized this as well. Not sure yet if I will keep it anyway, because removing the default metadata validator makes debugging the tests a bit easier ..

Values like 0, "" or [] for a metadata should be valid.

Hah, just stumbled over the same point. Intuitively I thought that we should check if the value is the no_value provided when initializing the MetadataElement. But it seems that this is not useful in some cases. For instance TabularData has comment_lines with no_value="0", i.e. we can not distinguish the cases where the comment_lines metadata has not been set from the case where it has been set to 0.

the `missing_meta` function skipped optional metadata if `skip`
was provided. Note skip only makes sense for optional metadata
since non-optional metadata is validated by the default metadata
validator (which uses check=None and skip=None).

In `missing_meta`: allow for 0, "", and None as valid metadata.

Fix tests for metadata validator:

- use test-data from GALAXY_ROOT/test-data/
- use real (optional) metadata for check and skip

Co-authored-by: Nicola Soranzo <nicola.soranzo@earlham.ac.uk>
@nsoranzo
Copy link
Member

You can replace all the no_validation="true" with a better fix:

Yes :) I just realized this as well. Not sure yet if I will keep it anyway, because removing the default metadata validator makes debugging the tests a bit easier ..

I'd not use no_validation (apart from the fact that it has a misleading name, it should be no_metadata_validation) so that issues like these are not missed.

@bernt-matthias
Copy link
Contributor Author

Values like 0, "" or [] for a metadata should be valid.

Spent quite a bit of time thinking about this. While formally correct the statement is not specific enough to ensure if a metadata element is set or not. In my test the tool did still stall, i.e. the failing set_meta in the upload job was not detected in the subsequent step.

I think the it might be the best to check if the value is equal to no_value that has been introduced here: 584999c

Unfortunately, apart from the commit message of the original commit no_value has not been documented and seems to be misused in some/many datatypes, e.g. there are metadata that have identical no_value and default which makes it impossible to check if metadata has been set. I estimate that there are approx 180 metadata elements in the sources that we would need to check...

Essentially we should "ensure" (at least in the datatype definitions) that no_value is a value that can not be set by the set_meta function of the corresponding data type (we could be really strict and check this in the setter or relaxed and just fix the metdata definitions of the data types). But I'm not sure about valid combinations of default, no_value, and optional.

Besides this I was wondering if the test meta_val != meta_spec.no_value in element_is_set is correct given that both values can be None.

I was also wondering if it wouldn't be better to fail the upload job if metadata setting fails.

bernt-matthias and others added 4 commits January 10, 2022 22:49
Co-authored-by: Nicola Soranzo <nicola.soranzo@gmail.com>
using the element_is_set function
bernt-matthias and others added 6 commits January 11, 2022 11:10
Co-authored-by: Nicola Soranzo <nicola.soranzo@gmail.com>
- make missing_meta return missing metadata keys / False
- include missing metadata name in MetadataValidator error
- proper messages for failing negated MetadataValidator
- reuse item getter in element_is_set
because it clearly is optional (in particular because default == no_value)
biom2 sets either format of format_version. since the metadata elements
are not optional we should just set both to the same value
Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready ?

@bernt-matthias
Copy link
Contributor Author

With an approval of @nsoranzo yes :)

@mvdbeek mvdbeek added kind/bug kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes labels Jan 25, 2022
@mvdbeek mvdbeek merged commit 4df0914 into galaxyproject:dev Jan 26, 2022
@mvdbeek mvdbeek changed the title fix MetadataValidator Fix MetadataValidator Feb 15, 2022
@bernt-matthias bernt-matthias deleted the fix/metadata-validation branch March 14, 2022 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tool-framework kind/bug kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants