Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to rules tables in Validate #780

Merged
merged 2 commits into from Jan 21, 2021
Merged

Switch to rules tables in Validate #780

merged 2 commits into from Jan 21, 2021

Conversation

beckyjackson
Copy link
Contributor

@beckyjackson beckyjackson commented Nov 30, 2020

Resolves #771

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

Overview

Validates tables (CSV or TSV files) (--table) against a reasoned input ontology (--input) using the sets of rules defined in a rules table (--rules). The reasoner used is specified by the --reasoner option, which defaults to ELK. We recommend using the HermiT reasoner, as it supports generalized class expression queries. This command writes the output to TXT, HTML, or XLSX files in the output directory (--output-dir) with the same base filename. If no output format is specified then the output is directed to STDOUT. For example:

robot validate --input immune_exposures.owl \
  --rules immune_exposures_rules.csv \
  --table immune_exposures.csv \
  --reasoner hermit \
  --no-fail true \
  --format TXT \
  --output-dir results/

In this case the command will generate a single file called immune_exposures.txt in the results/ directory.

One can also specify multiple table files to validate against a single input ontology. In that case there will be multiple output files corresponding to each table in the output directory. For example:

robot validate --input immune_exposures.owl \
  --rules immune_exposures_rules.csv \
  --table immune_exposures.csv \
  --table immune_exposures_2.csv \
  --reasoner hermit \
  --no-fail true \
  --format HTML \
  --output-dir results/

In this case two files: immune_exposures.html and immune_exposures_2.html will appear in the results/ directory.

Finally, you can specify a whole directory to validate against a single input ontology. The command will generate the text outputs matching all table names in the results/ directory. Note that a table will only be validated if it appears in the rules table.

robot validate --input immune_exposures.owl \
  --rules immune_exposures_rules.csv \
  --tables tables/ \
  --reasoner hermit \
  --format TXT \
  --output-dir results/

Rules Table

The rules table contains the validation rules. This table should be either TSV or CSV and must have three required columns:

  • table: name of table to validate
  • column: name of column to validate
  • validation: the validation rule (see Validation rule syntax)

You can include as many extra columns as you would like (e.g., comments), but these columns will be ignored.

...

Substitution

Special variables can be specified within <value>, <when-value>, and <when-subject-expr> clauses to indicate another entity described in the data. These two types of variables can be used interchangably.

Column Names

Column names of the form {column name} are used to indicate the entity described by the data in that column of a cell of a given row. E.g.:

is-required (when {exposure material reported} equivalent-to ('Dengue virus' or 'Dengue virus 2'))

requires data in the current cell whenever the class indicate in the "exposure material reported" column of the current row is either 'Dengue virus' or 'Dengue virus 2'.

subclass-of hasBasisIn in some {exposure material id} (when {exposure material reported} subclass-of ('Dengue virus' or 'Dengue virus 2'))

requires that, whenever the class indicated in "exposure material reported" of the current row is a subclass of the class consisting of the union of 'Dengue virus' and 'Dengue virus 2', the data in the current cell must be a subclass of the set of classes that bear the relation hasBasisIn to the class indicated in column "exposure material id" of the same row.

Wildcards

Wildcards of the form %n are used to indicate the entity described by the data in the nth cell of a given row. E.g.:

is-required (when %1 equivalent-to ('Dengue virus' or 'Dengue virus 2'))

requires data in the current cell whenever the class indicated in column 1 of the current row is either 'Dengue virus' or 'Dengue virus 2'.

subclass-of hasBasisIn in some %2 (when %1 subclass-of ('Dengue virus' or 'Dengue virus 2'))

requires that, whenever the class indicated in column 1 of the current row is a subclass of the class consisting of the union of 'Dengue virus' and 'Dengue virus 2', the data in the current cell must be a subclass of the set of classes that bear the relation hasBasisIn to the class indicated in column 2 of the same row.

@beckyjackson beckyjackson changed the title Switch to rules tables Switch to rules tables in Validate Nov 30, 2020
Copy link
Member

@jamesaoverton jamesaoverton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving this a lot of thought, I've decided that we won't move forward with robot validate at this time. We put a lot of work into it, and it basically works. But the performance is not good enough for our use cases, so we've been using other tools instead. Since we aren't actively using this, I'm not confident in the design. I don't want to be answering user questions and trying to debug a ROBOT command that I'm not actually using. Worse, having this code on master but incomplete is making it harder to merge and release other code.

I do want to be able to find this code later, however.

So the plan is:

  1. merge this PR
  2. add a tag to the repo so we can find this code later
  3. delete the validate code
  4. get on with our lives

@jamesaoverton jamesaoverton added this to the v1.8.0 milestone Jan 20, 2021
@beckyjackson beckyjackson merged commit a4d7900 into master Jan 21, 2021
@beckyjackson beckyjackson deleted the validate-config branch June 30, 2021 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop using a header row for validate, use a configuration table instead
2 participants