Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic validation #51

Closed
14 tasks done
bittremieux opened this issue Oct 25, 2018 · 6 comments
Closed
14 tasks done

Semantic validation #51

bittremieux opened this issue Oct 25, 2018 · 6 comments

Comments

@bittremieux
Copy link
Collaborator

bittremieux commented Oct 25, 2018

Things that have to be validated after the syntactic validation outside of the JSON Schema:

  • Check that the name, accession (unit) references to CV elements in qualityParameters correspond to the information in the CV.
  • Check that all cvRefs link to valid cvs in the file.
  • Check that qualityParameters are unique within a run/setQuality.
  • Check that filenames are unique within a run/setQuality. Intra-document references #50
  • Check that multi-file metrics refer to existing filenames.
  • Check that the dimensions of multi-dimensional metrics are consistent.
  • Check if an ID metric was used, an ID input file is in the files section
  • Check all unit accessions are defined for data frames / tables
  • No metric duplicates within a runQuality/setQuality.
  • label (metadata) must be unique in the file.
  • all columns in tables have same length.
  • Unit MUST be present if specified in the CV for the metric.

Warnings:

  • If a non-public controlled vocabulary is specified.
  • Metric names should match the definition in the CV.

Please add any other checks that I might have missed at the moment.

@bittremieux
Copy link
Collaborator Author

See pymzqc for a semantic validator for mzQC files (WIP).

@bittremieux
Copy link
Collaborator Author

bittremieux commented May 4, 2022

Checks from our Slack channel:

So far collected rules for semantic validation:

  • must: no metric duplicates within a runQuality/setQuality
  • must: label (metadata) must be unique in the file
  • must: all columns in tables have same length
  • must: cv in file and obo mustmatch in id,name,type, ...

any 'may' rules??

Non-semantic checks:

  • automatic CV checks (be)for(e) cv integration ???
  • test multiple 'root' elements during syntax validation is invalid

@cbielow
Copy link
Collaborator

cbielow commented May 4, 2022

more from slack

  • Reviewer comment: What happens when a cvParameter from an unknown ontology is used?
  • Somewhat similarly: check that the term actually exists in the CV.
  • Unit MUST be present if specified in the CV for the metric.

@mwalzer
Copy link
Collaborator

mwalzer commented Jan 25, 2024

Here's what the validator auto-doc/API documentation provides (which is all the above afaik):
The following aspects of mzQC are validated as described in the respective section.


schema validation

The value to the 'schema validation' key is the parsed result to the JSONschema validation of given file, using the current schema (unless stated otherwise).

semantic validation

The value to the 'semantic validation' key is an array of checks performed on the deserialised mzQC object according to the latest specification. The checks are the following:

'input files':

  • Inconsistent input file of severity 4 and message: Inconsistent file name and location: auto_doc
  • Reused file location of severity 6 and message: Duplicate inputFile locations within a metadata object: accession = auto_doc
  • Duplicate input files of severity 5 and message: Duplicate input files in a run/set: accession = auto_doc
    '## metric use':
  • ID based metric but no ID input file of severity 6 and message: ID based metrics present but no ID input file could be found registered in the mzQC file: accession = auto_doc
  • Metric uniqueness of severity 6 and message: Duplicate quality metric in a run/set: accession = auto_doc
  • Metric use of severity 5 and message: Non-metric CV term used in metric context: accession = auto_doc
  • Metric value non-table of severity 6 and message: Table metric CV term used without being a table: accession = auto_doc
  • Metric value non-column of severity 6 and message: Table metric CV term used with non-column elements: accession = auto_doc
  • Metric value disproportional table of severity 9 and message: Table metric CV term used with differing column lengths: accession = auto_doc
  • Metric value missing table column of severity 8 and message: Table metric CV term used missing required column(s): accession(s) = auto_doc
  • Metric value undefined table column of severity 5 and message: Table metric CV term used with extra (undefined) columns: accession(s) = auto_doc
  • Metric value no-unit of severity 3 and message: Metric CV term used without value unit specification. accession(s) = auto_doc

'ontology load errors':

  • Loading local vocabulary of severity 5 and message: Loading the following local ontology referenced in mzQC file: auto_doc
  • Loading online vocabulary of severity 5 and message: Error loading the following online ontology referenced in mzQC file: auto_doc

'ontology term errors':

  • Ambiguous CVTerms of severity 6 and message: term found in multiple vocabularies = auto_doc
  • Unknown CVTerm of severity 7 and message: CV term used without matching ontology entry: accession = auto_doc
  • Used CVTerm without definition of severity 4 and message: Term instance used in file missing definition: accession = auto_doc
  • Used CVTerms definition conflict of severity 5 and message: Term instance used in file with definition different from ontology: accession = auto_doc
  • Used CVTerms name conflict of severity 6 and message: Term instance used in file with name different from ontology: accession = auto_doc

'label uniqueness':

  • Metadata labels of severity 6 and message: Run/SetQuality label auto_doc is not unique in file!

API doc

This is the response to the API call for documentation. The API call for status will be responded with a JSON object summarising the API status and list of endpoints. The API call for validator with a POST of a mzqc JSON object responds with a JSON object, nested for each validation mode: semantic validation and schema validation. For each mode, the value will be a list of validation items found to not (completely) correspond to the standard format.

@mwalzer mwalzer closed this as completed Jan 25, 2024
@bittremieux
Copy link
Collaborator Author

I guess higher is worse for severity? 9 levels is pretty detailed, maybe we could even do with just 3? Warning, error, critical in analogy to the Python logging levels.

@mwalzer
Copy link
Collaborator

mwalzer commented Jan 25, 2024

Would work for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants