Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to report command #391

Closed
dougli1sqrd opened this issue Oct 30, 2018 · 29 comments
Closed

Improvements to report command #391

dougli1sqrd opened this issue Oct 30, 2018 · 29 comments
Assignees

Comments

@dougli1sqrd
Copy link
Contributor

Hello,
I'm attempting to replace the set of robot commands and queries in the geneontology/go-ontology repo that we use to validate changes to the ontology in travis checks and the like with robot report. These are run on the go-edit.obo file in that repository. go-edit.obo imports chebi.

In go-ontology/src/ontology/ I run robot report --input go-edit.obo --output report.tsv --profile profile.txt, and the profile.txt looks like:

ERROR	annotation_whitespace
ERROR	deprecated_class_reference
ERROR	duplicate_definition
WARN	duplicate_exact_synonym
ERROR	duplicate_label_synonym
ERROR	duplicate_label
WARN	duplicate_scoped_synonym
ERROR	equivalent_pair
WARN	invalid_xref
ERROR	label_formatting
ERROR	label_whitespace
INFO	lowercase_definition
ERROR	missing_definition
ERROR	missing_label
WARN	missing_obsolete_label
ERROR	missing_ontology_description
ERROR	missing_ontology_license
ERROR	missing_ontology_title
INFO	missing_superclass
ERROR	misused_obsolete_label
INFO	multiple_asserted_superclasses
ERROR	multiple_definitions
ERROR	multiple_labels
ERROR	file:./../sparql/never-in-taxon-non-IRI-value-violation.sparql

The report.tsv indicates about 2000 missing_label errors on CHEBI terms implying they are in this ontology, but these CHEBI terms are not in the go-edit.obo file, they're imported.

So am I doing something wrong in the way I invoke the report command, or on the file that I'm reporting on? Or is this a bug? I see in the code it's supposed to not pull in all the imports I think? Thanks for your help!

@cmungall
Copy link
Contributor

Good point, this is underspecified at the moment.

There was an analogous issue with the query command: #158
see the fix: #328 and the instructions http://robot.obolibrary.org/query

Should we have equivalent options for report?

However, the defaults may have to be different, because as you point out any ontology that has imports and references classes in the import will produce false positive errors. @jamesaoverton @rctauber how do you feel about having the defaults be different here?

@cmungall cmungall changed the title Robot report command seems to merge imports during queries Robot report command does not merge imports during queries, resulting in false positives Oct 30, 2018
@beckyjackson
Copy link
Contributor

beckyjackson commented Oct 30, 2018 via email

@cmungall
Copy link
Contributor

I would think that you’re only reporting on your base ontology.

I think what is happening here is that there is leakage of declarations from the import into the base.

E.g. if base has an axiom X SubClassOf R some Y, and Y is in the import, then I believe the declaration that Y is a class leaks into the base, so we end up with a triple in base with Y as subject.

Looking at https://github.com/ontodev/robot/blob/master/robot-core/src/main/resources/report_queries/missing_label.rq#L11

What will then happen is that it will report that Y has no label.

I'm not sure of a good way around this.

That way,
issues with other ontologies won’t break the report.

Yes, this is a problem with my proposal to bring in the import

Not sure of the best way around this

@dougli1sqrd
Copy link
Contributor Author

@cmungall where do you think the Y class get turned into a subject in this? Is this a Jena thing, OWL API thing, or something we do in ROBOT?

I did a small work-around where I also filtered like: FILTER(strstarts(str(?entity), "http://purl.obolibrary.org/obo/GO_") ) and that successfully detects no missing labels. Not sure there is a way to generify that though. It's maybe a data point that the filter works though?

@cmungall
Copy link
Contributor

I believe it's OWLAPI, but can you try some SPARQL to check?

@beckyjackson
Copy link
Contributor

beckyjackson commented Oct 31, 2018 via email

@cmungall
Copy link
Contributor

I believe the OWLAPI injects the triple, I can't recall if this is a consequence of the source file being obo

@cmungall
Copy link
Contributor

cmungall commented Oct 31, 2018

Yes, annoying

import: http://purl.obolibrary.org/obo/go/imports/cl_import.owl
ontology: test

[Term]
id: TEST:0000001
name: foo
relationship: BFO:0000050 CL:0009004

convert this to any other serialization injects the declaration triple.

But note this isn't completely unique to obo format. If we remove the import (i.e make it a base file) then the declaration is auto-injected.

e.g. if you roundtrip this with robot:

@prefix : <http://purl.obolibrary.org/obo/test.owl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://purl.obolibrary.org/obo/test.owl> .

<http://purl.obolibrary.org/obo/test.owl> rdf:type owl:Ontology .


###  http://purl.obolibrary.org/obo/TEST_0000001
<http://purl.obolibrary.org/obo/TEST_0000001> rdf:type owl:Class ;
                                              rdfs:subClassOf [ rdf:type owl:Restriction ;
                                                                owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ;
                                                                owl:someValuesFrom <http://purl.obolibrary.org/obo/CL_0009004>
                                                              ] ;
                                              <http://www.geneontology.org/formats/oboInOwl#id> "TEST:0000001"^^xsd:string ;
                                              rdfs:label "foo"^^xsd:string .

you will see the class and OP declaration injected

@jamesaoverton
Copy link
Member

Yes, OWLAPI inserts those declarations into serializations.

The SPARQL queries for report are already a bit tricky for reasons like this.

We could add code to strip those bare declarations. Or we could annotate stuff with isDefinedBy #390 and only query for those. There are other options but I can't think of anything elegant.

@dosumis
Copy link

dosumis commented Nov 2, 2018

From discussion in ODK meeting - code could be edited to only check axioms - ignoring declarations.

@beckyjackson
Copy link
Contributor

Re @dosumis - the query could also ignore any subjects that only have an rdf:type axiom and no other axioms. This seems like an easy fix.

@jamesaoverton
Copy link
Member

@rctauber Yes, let's try that approach.

@dougli1sqrd
Copy link
Contributor Author

Last Friday I was trying to write a query like you described, @rctauber, but I was having trouble getting it to work. If you have a way to help, I'd love a pointer!

@beckyjackson
Copy link
Contributor

beckyjackson commented Nov 5, 2018

Hi @dougli1sqrd - please look at my PR at #394 (and if you have time, test it 😄) The relevant code is:

  FILTER EXISTS {
    ?entity ?property ?object .
    FILTER (?property != rdf:type)
  }

@ontodev/robot-team I tested this on a few different inputs, but if anybody else has time to test it, I would really appreciate it.

@cmungall
Copy link
Contributor

cmungall commented Nov 5, 2018

Thanks, we'll test it!

@dougli1sqrd
Copy link
Contributor Author

Ah, cool, so this is working in go-ontology go-edit.obo except for http://purl.obolibrary.org/obo/go# relations. For example:

Level	Rule Name	Subject	Property	Value
ERROR	missing_label	http://purl.obolibrary.org/obo/go#gocheck_do_not_annotate	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#gocheck_do_not_manually_annotate	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_agr	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_aspergillus	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_candida	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_chembl	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_flybase_ribbon	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_generic	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_metagenomics	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_mouse	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_pir	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_plant	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_pombe	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_synapse	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#goslim_yeast	rdfs:label	

@beckyjackson
Copy link
Contributor

This check is only ignoring entities that only have a declaration triple (x rdf:type y). Looking at these subset defs, these all have a declaration, plus an rdfs:comment annotation and a super-property.

subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"

Is translated to:

:gocheck_do_not_annotate rdf:type owl:AnnotationProperty ;
                         rdfs:subPropertyOf oboInOwl:SubsetProperty ;
                         rdfs:comment "Term not to be used for direct annotation"^^xsd:string .

This is an interesting problem because it's unique to ontologies developed in OBO format... @cmungall & @jamesaoverton - do you have any thoughts on this? Potential options I can think of:

  1. ROBOT doesn't worry about handling subset defintions. OBO format ontologies would either need to give their subset definitions a label (easy in OWL, not sure about OBO), or exclude/rewrite this query.
  2. The report queries ignore subproperties of oboInOwl:SubsetProperty. Some ontologies developed in OWL are now annotating the subset definitions with other things though, and may want them to be included in these checks...

@dougli1sqrd
Copy link
Contributor Author

Yeah something like your option 2 could work! Just so you know though there are also a couple of properties in go-edit:

ERROR	missing_label	http://purl.obolibrary.org/obo/go#gocheck_do_not_annotate	rdfs:label		
ERROR	missing_label	http://purl.obolibrary.org/obo/go#gocheck_do_not_manually_annotate	rdfs:label

And would also have to be dealt with somehow. Maybe we could ignore all properties that are subsets of oboInOwl properties? Or something?

@cmungall
Copy link
Contributor

cmungall commented Nov 5, 2018

1 would be difficult for people

I vote for 2. Either exclude subProps of oboInOwl:SubsetProperty or targets of oboInOwl:inSubset

@dougli1sqrd
Copy link
Contributor Author

Oops I guess I didn't know what I was talking about. My previous comment, gocheck... are actually subsets, so ignore that comment

@beckyjackson
Copy link
Contributor

@cmungall - I can exclude the subprops. Should it just be for this label check? I'm guessing they could also be excluded from missing & multiple definitions?

@dougli1sqrd
Copy link
Contributor Author

dougli1sqrd commented Nov 6, 2018

It looks like this same kind of error is happening for missing_superclass.
Then, I'm not totally sure if this is the same kind of thing as this, but deprecated_class_reference also has a ton of results for go-edit.obo.

ERROR	deprecated_class_reference	GO:0000004	IAO:0000231	IAO:0000227

This is saying GO:0000004 was deprecated because it was merged. And it's merged into GO:0008151 biological process.

This is the owl for GO:0000004:

<owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0000004">
        <obo:IAO_0000231 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000227"/>
        <obo:IAO_0100001 rdf:resource="http://purl.obolibrary.org/obo/GO_0008150"/>
        <owl:deprecated rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">true</owl:deprecated>
</owl:Class>

I guess it's catching this because GO:0000004 is deprecated?
Thanks!

@cmungall
Copy link
Contributor

cmungall commented Nov 6, 2018 via email

@dosumis
Copy link

dosumis commented Nov 6, 2018

re subsets: Maybe I'm stating the obvious here - but I don't think we should be checking for labels / definitions of annotation properties.

@beckyjackson
Copy link
Contributor

@cmungall & @dougli1sqrd - I can include this line in the deprecated_class_reference check:

FILTER (?property != obo:IAO_0000231) .

That would allow the obsolete classes to use IAO:0000231 as an object property. Obsolete entities are already excluded from the label check.

@dosumis - I think that's reasonable. That would solve a lot of the issues with the default annotation properties, as well. What are others' thoughts on this?

@jamesaoverton
Copy link
Member

@dosumis Annotation properties should have labels and definitions, like any other term. I don't see why we would exclude them wholesale from these quality checks.

@cmungall Is there some reason you can't add labels and definitions to http://purl.obolibrary.org/obo/go#goslim_flybase_ribbon et al.?

Every one of these "little" exceptions will have to be documented and maintained, and every one of them will make the SPARQL run slower.

@jamesaoverton jamesaoverton changed the title Robot report command does not merge imports during queries, resulting in false positives Improvements to report command Nov 6, 2018
@cmungall
Copy link
Contributor

cmungall commented Nov 6, 2018

unfortunately labels for subsetdefs would get lost in the owl2obo translation. Many ontologies have the edit version maintained in obo for diff/vcs purposes. Subsets have other annoying properties such as hash URIs that don't resolve.

@dougli1sqrd
Copy link
Contributor Author

@rctauber I think your idea would work. I think most/all of those particular false positives are due to merged terms, which will all have that as a ?property (I think).

@jamesaoverton jamesaoverton removed this from the v1.2.0 milestone Nov 19, 2018
@beckyjackson
Copy link
Contributor

I think this issue has been resolved. We are working on further improvements to the report queries in #429. Please feel free to re-open this issue if any points here need more discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants