Skip to content

Add sparql check for taxon range#2241

Merged
matentzn merged 3 commits intomasterfrom
anitacaron/sparql-check
Jan 31, 2022
Merged

Add sparql check for taxon range#2241
matentzn merged 3 commits intomasterfrom
anitacaron/sparql-check

Conversation

@anitacaron
Copy link
Collaborator

@anitacaron anitacaron commented Jan 21, 2022

This is a test for this INCATools/ontology-development-kit#520

At the moment, there're 18 violations:

FAIL Rule ../sparql/taxon-range-violation.sparql: 18 violation(s)
term,taxon
http://purl.obolibrary.org/obo/UBERON_0006611,Euarchontoglires
http://purl.obolibrary.org/obo/UBERON_8410062,Homo sapiens
http://purl.obolibrary.org/obo/UBERON_8410010,Mammalia
http://purl.obolibrary.org/obo/UBERON_8410003,Mammalia
http://purl.obolibrary.org/obo/UBERON_8410025,Homo sapiens
http://purl.obolibrary.org/obo/UBERON_8410026,Homo sapiens
http://purl.obolibrary.org/obo/UBERON_8410027,Homo sapiens
http://purl.obolibrary.org/obo/UBERON_8410050,Homo sapiens
http://purl.obolibrary.org/obo/UBERON_0001154,Mammalia
http://purl.obolibrary.org/obo/UBERON_0000022,Euarchontoglires
http://purl.obolibrary.org/obo/UBERON_8400003,Mammalia
http://purl.obolibrary.org/obo/UBERON_8400002,Mammalia
http://purl.obolibrary.org/obo/UBERON_8400001,Mammalia
http://purl.obolibrary.org/obo/UBERON_8400007,Mammalia
http://purl.obolibrary.org/obo/UBERON_8400008,Mammalia
http://purl.obolibrary.org/obo/UBERON_8400005,Mammalia
http://purl.obolibrary.org/obo/UBERON_8400006,Mammalia
http://purl.obolibrary.org/obo/UBERON_0010053,Euarchontoglires

@anitacaron anitacaron self-assigned this Jan 21, 2022
@dosumis
Copy link
Contributor

dosumis commented Jan 21, 2022

Looks good. I think we fixed the errors in this PR: #2231

Do we have a way to tailor the error text in these cases, or some doc to help editors interpret the errors? If do I'd like to add some text, if not I'm happy to approve and make this a more general issue for discussion.

@matentzn
Copy link
Contributor

This is awesome thank you!

@dosumis this is odd - I would have expected a proper summary of the SPARQL verify command.

@anitacaron can you check that if you run sh run.sh make sparql_test locally you don't see the error summary (listing all the sparql checks performed? If so, can your remove the conditionals https://github.com/obophenotype/uberon/blob/master/src/ontology/Makefile#L562 and see whether it has something to do with that ifeq thing?

@anitacaron
Copy link
Collaborator Author

@matentzn it's not generating the reports either way. I think the problem is with the command verify.

@anitacaron
Copy link
Collaborator Author

Ah no, sorry. It's normally generating the report, but the src/ontology/reports/* folder is in the .gitignore.

@dosumis
Copy link
Contributor

dosumis commented Jan 21, 2022

My question was more about whether/how it is possible to specify some more specific, editor friendly error message. Something more than just the name of the test / sparql file or details of the sparql query?

Should I just create an ODK issue for this?

@matentzn
Copy link
Contributor

@dosumis @anitacaron my bad. Because of the enormous size of the Uberon Log, I have piped it to a log file:

https://github.com/obophenotype/uberon/blob/master/.github/workflows/qc.yml#L33

Whose last rows are shown in the QC report:

image

We need to prime curators to look in there, sorry about that. That is standard output and should be sufficient to understand the problem. Sorry @anitacaron to send you on a goose chase, all good!

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@matentzn
Copy link
Contributor

The way we do it at Mondo is this:

When the curator sees this error, they can go here:
https://mondo.readthedocs.io/en/latest/editors-guide/quality-control-tests/

They can then find the check that has failed. If you include actually human readable descriptions of the SPARQL query inline as a comment on the query, that should do that trick.

A more hacky way is to grep the last lines of log for the failing SPARQLS and cat grep the Human readable description from out there. All that said - I think its better to train people to look at the SPARQL queries when in doubt and include good human readable explanations in the metadata.

@dosumis
Copy link
Contributor

dosumis commented Jan 21, 2022

One other thing: I think this SPARQL check should be built into ODK and configurable per repo. It's needed by CL and GO as well as Uberon.

@balhoff
Copy link
Member

balhoff commented Jan 21, 2022

We have a check in GO for IRI values: https://github.com/geneontology/go-ontology/blob/master/src/sparql/non-IRI-value-violation.sparql

We do use non-NCBITaxon_ IRIs: the taxon union classes.

@dosumis
Copy link
Contributor

dosumis commented Jan 21, 2022

We do use non-NCBITaxon_ IRIs: the taxon union classes.

Good point. We could extend to check both patterns, but maybe it's better to use a generic IRI check here (in the absence of a way of specifying range semantically). I guess the same check could be used with in_subset too and will work for any other AP shortcut.

Maybe we should just add the generic check to ODK?

@shawntanzk
Copy link
Collaborator

Do we fix the rest in this PR and update that PR when merging this PR?

ok think its probably less messy if we merged #2260 first then update this to check

@matentzn matentzn merged commit 34fcb5d into master Jan 31, 2022
@matentzn matentzn deleted the anitacaron/sparql-check branch January 31, 2022 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants