Switch to rules tables in Validate #780
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #771
docs/
have been added/updatedmvn verify
says all tests passmvn site
says all JavaDocs correctCHANGELOG.md
has been updatedOverview
Validates tables (CSV or TSV files) (
--table
) against a reasoned input ontology (--input
) using the sets of rules defined in a rules table (--rules
). The reasoner used is specified by the--reasoner
option, which defaults to ELK. We recommend using the HermiT reasoner, as it supports generalized class expression queries. This command writes the output to TXT, HTML, or XLSX files in the output directory (--output-dir
) with the same base filename. If no output format is specified then the output is directed to STDOUT. For example:In this case the command will generate a single file called
immune_exposures.txt
in theresults/
directory.One can also specify multiple table files to validate against a single input ontology. In that case there will be multiple output files corresponding to each table in the output directory. For example:
In this case two files:
immune_exposures.html
andimmune_exposures_2.html
will appear in theresults/
directory.Finally, you can specify a whole directory to validate against a single input ontology. The command will generate the text outputs matching all table names in the
results/
directory. Note that a table will only be validated if it appears in the rules table.Rules Table
The rules table contains the validation rules. This table should be either TSV or CSV and must have three required columns:
table
: name of table to validatecolumn
: name of column to validatevalidation
: the validation rule (see Validation rule syntax)You can include as many extra columns as you would like (e.g., comments), but these columns will be ignored.
...
Substitution
Special variables can be specified within
<value>
,<when-value>
, and<when-subject-expr>
clauses to indicate another entity described in the data. These two types of variables can be used interchangably.Column Names
Column names of the form
{column name}
are used to indicate the entity described by the data in that column of a cell of a given row. E.g.:requires data in the current cell whenever the class indicate in the "exposure material reported" column of the current row is either 'Dengue virus' or 'Dengue virus 2'.
requires that, whenever the class indicated in "exposure material reported" of the current row is a subclass of the class consisting of the union of
'Dengue virus'
and'Dengue virus 2'
, the data in the current cell must be a subclass of the set of classes that bear the relationhasBasisIn
to the class indicated in column "exposure material id" of the same row.Wildcards
Wildcards of the form
%n
are used to indicate the entity described by the data in the nth cell of a given row. E.g.:requires data in the current cell whenever the class indicated in column 1 of the current row is either 'Dengue virus' or 'Dengue virus 2'.
requires that, whenever the class indicated in column 1 of the current row is a subclass of the class consisting of the union of
'Dengue virus'
and'Dengue virus 2'
, the data in the current cell must be a subclass of the set of classes that bear the relationhasBasisIn
to the class indicated in column 2 of the same row.