Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate annotations from one of two .owl files? #848

Closed
ddooley opened this issue May 11, 2021 · 10 comments
Closed

Remove duplicate annotations from one of two .owl files? #848

ddooley opened this issue May 11, 2021 · 10 comments
Assignees

Comments

@ddooley
Copy link

ddooley commented May 11, 2021

We have been migrating many ontology entities from a foodon-edit.owl file to google sheets as robot templates, then to robot managed .owl files. What I was trying to then do is find a robot command that would spot any entity annotation in the robot .owl file which was duplicated in the original foodon-edit.owl, and remove it from foodon-edit.owl , so that the robot .owl file would then be a clean, non-redundant import. (I didn't want to manually delete them from foodon-edit.owl, especially as there may still be some annotations there to preserve as they don't fit in the robot template pattern)

  1. I tried robot diff to see if it would list off the duplicate annotations but it doesn't have a "duplicates" switch.
  2. I tried robot remove but it doesn't have the facility to spot duplicate axioms.

So am I missing an obvious way?

Cheers for all the good work on robot btw.

Damion

@jamesaoverton
Copy link
Member

Possibly unmerge?

You can do almost anything with SPARQL and robot query.

If neither of those seems suitable, we can look into the problem more closely.

@ddooley
Copy link
Author

ddooley commented May 11, 2021

Unmerge hints at working right in one test case but fails in another. Possibly a bug?

Here's a sandbox main ontology file:
image

And sandbox import file:
image

Running:

robot unmerge --input source-edit.owl --input source-import.owl --output results/source-edit2.owl
yeilds:
image

Test 1 entity is preserved entirely which is great, and annotation on Test 2 rightly remains in source-edit2.owl, but main ontology file also has an annotation on Test 4, but unmerge blew that away.

Main file test 4:
image

I have the test files if needed.

I will tackle this via sparql if I have to though I'm not certain if OWL API can have the import file and main file loaded in one graph and be able to spot duplicates? I'm guessing I have to retrieve all annotations in the import file and then search and deleting them one by one in main ontology file.

@matentzn
Copy link
Contributor

very important to get this right. I am relying in unmerge for a lot of things. Can you share your entire test setup with me please?

@ddooley
Copy link
Author

ddooley commented May 12, 2021

I did one more test - I flattened the import file (i.e. removed subClassOf) by using robot:

robot remove --input source-import.owl --select classes --axioms subclass --output source-import-2.owl

image

Now the robot unmerge appears to work with this file! All the right entities and their annotations seem to be preserved in results/source-edit2.owl file. So I think the bug might be something to do with traversing the depth of source-import.owl file?

Note in "test 4" there is a "yada comment more test 4" - this differs from import file in that a language tag was applied there. I was testing to see if it spotted the difference and it does, which is appropriate.

@matentzn
Copy link
Contributor

matentzn commented May 13, 2021

Thank you @ddooley for sending me the setup.

I think unmerge works correct - with a small wrinkle. It does not preserve the ontological types. Look at this:

Test 4 on main:

<!-- http://purl.obolibrary.org/obo/FOODON_00003003 -->

<owl:Class rdf:about="http://purl.obolibrary.org/obo/FOODON_00003003">
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/FOODON_00003002"/>
    <obo:IAO_0000117 rdf:resource="http://orcid.org/0000-0002-8844-9165"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-05-11T20:48:03Z</dc:date>
    <rdfs:comment xml:lang="en">Test 4 annotation not in import file.</rdfs:comment>
    <rdfs:comment>yada comment more test 4</rdfs:comment>
    <rdfs:comment>yada comment test 4</rdfs:comment>
    <rdfs:isDefinedBy>an isDefinedBy for test 4</rdfs:isDefinedBy>
    <rdfs:label xml:lang="en">test 4</rdfs:label>
</owl:Class>

Test 4 on import:

<!-- http://purl.obolibrary.org/obo/FOODON_00003003 -->

<owl:Class rdf:about="http://purl.obolibrary.org/obo/FOODON_00003003">
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/FOODON_00003002"/>
    <obo:IAO_0000117 rdf:resource="http://orcid.org/0000-0002-8844-9165"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-05-11T20:48:03Z</dc:date>
    <rdfs:comment>other comment not in source-edit.owl but in test 4</rdfs:comment>
    <rdfs:comment xml:lang="en">yada comment more test 4</rdfs:comment>
    <rdfs:comment>yada comment test 4</rdfs:comment>
    <rdfs:isDefinedBy>an isDefinedBy for test 4</rdfs:isDefinedBy>
    <rdfs:label xml:lang="en">test 4</rdfs:label>
</owl:Class>

Test 4 unmerged file:

<rdf:Description rdf:about="http://purl.obolibrary.org/obo/FOODON_00003003">
    <rdfs:comment xml:lang="en">Test 4 annotation not in import file.</rdfs:comment>
    <rdfs:comment>yada comment more test 4</rdfs:comment>
</rdf:Description>

It seems like Test 4 on unmerged does have the annotation still; Protege just does not know about it, because it does not know that FOODON:00003003 is still a class.

Unfortunately, OWL API tries to be smart and inject declarations when saving. Fortunately OWL API does not inject declarations on load. So here is a hack I have been using to circumvent your issue:

Super hack creating an OFN, removing declarations, unmerging

Makefile

results/source-import-no-declaration.ofn:
	robot convert --input source-import.owl -f ofn --output $@
	grep -v '^Declaration' $@ > $@.txt && mv $@.txt $@

results/source-edit3.owl: results/source-import-no-declaration.ofn
	robot unmerge --input source-edit.owl --input results/source-import-no-declaration.ofn --output $@

The unmerged Test 4 now looks like this:

  <!-- http://purl.obolibrary.org/obo/FOODON_00003003 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/FOODON_00003003">
        <rdfs:comment xml:lang="en">Test 4 annotation not in import file.</rdfs:comment>
        <rdfs:comment>yada comment more test 4</rdfs:comment>
    </owl:Class>

And you can see it in Protege. Let me know if you have any more questions!

@matentzn
Copy link
Contributor

BTW @ddooley I really applaud the effort you are making here for FOODON, I think this is the right path and role model stuff! I have tried doing it for other ontologies as well, and its a lot of work - but I think this is what we need to do for OBO.

@ddooley
Copy link
Author

ddooley commented May 13, 2021

Ok, I never would have thought of the Description/Declaration statements and unknown rdf:about issue. I will check out your solution of removing declaration statements in the ofn format of source-import.owl . Thanks!

This reminds me, about ofn: I've been keeping all of my ontology files stored as .owl rdf/xml syntax. Should I be using ofn Owl Functional Syntax for all development / published files? Are there gotchas in switching from rdf/xml to ofn ? I recall ofn is great for consistent github file line ordering for better diffs. And finally, I'd save it as ofn but keep .owl suffixes? I guess ofn does not preserve xml entity namespace abbreviations?

@matentzn
Copy link
Contributor

I personally recommend to keep edit files in ofn, and release (published) files in RDFXML! The latter is actually OBO standard, the former convention mostly in my world.

OFN is just better for diffing.

@matentzn
Copy link
Contributor

I will close this ticket but feel free to open again if other problems!

@cmungall
Copy link
Contributor

cmungall commented May 13, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants