Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

Repo manager for ontology maps #980

Merged
merged 3 commits into from
Apr 6, 2016
Merged

Conversation

david4096
Copy link
Member

VA and SA both use sequence ontology. This adds the ability to add and remove ontologies using the repo manager.

@diekhans
Copy link
Contributor

I don't see any documentation, but it is incorrect to say this is adding an ontology. it's adding a mapping between to an term coming from a VCF and a reference to an ontology term (OntologyTerm) record.
An ontology format would be some think line OBO or OWL-RDF; however that is not something we need.

This is only a problem with the terminology, not the code. But that is very important, because it claims
to be doing something that it isn't doing.

@@ -104,7 +104,12 @@ def getOntology(self, name):
"""
Returns ontologies
"""
return self._ontologyNameMap[name]
ret = {}
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the try/except and instead do:

ret = self._ontologyNameMap.get(name, {})

You almost never want to write except: pass ... kinda defeats the point of exceptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that both of these approaches create an dict object that is then discarded even when the name map contains the object. This is inefficient.

@dcolligan
Copy link
Member

A good start. Also should have tests for the individual commands, including failure cases, in tests/unit/test_repo_manager.py and tests for the cli in tests/unit/test_cli.py

@david4096 david4096 force-pushed the repo_ontologies branch 3 times, most recently from 60980b9 to a132cec Compare March 25, 2016 01:06
@david4096 david4096 changed the title Repo manager for ontologies Repo manager for ontology maps Mar 25, 2016
@david4096 david4096 force-pushed the repo_ontologies branch 2 times, most recently from 425fe70 to 7a374cc Compare March 25, 2016 16:45
@david4096
Copy link
Member Author

Updated @dcolligan. They are now more appropriately called ontology maps.

self.ontologyMapName = args.ontologyMapName

def run(self):
self.repoManager.removeOntologyMap(self.ontologyMapName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to use confirmRun or else the force flag does nothing

@dcolligan
Copy link
Member

It's coming along. After the changes I mentioned, you should be good to go.

@david4096 david4096 force-pushed the repo_ontologies branch 3 times, most recently from d208847 to 0015896 Compare March 28, 2016 21:28
@@ -27,10 +27,20 @@ def add(self, id_, name):
self._idNameMap[name] = id_

def getId(self, name):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems problematic. Usually failing noisily is a good thing. We should provide methods like hasId and hasName for the cases when we want to know if a given id/name exists in the map.

@jeromekelleher
Copy link
Contributor

Looks good in general to me.

@cmungall
Copy link
Member

You assume id<->name will be 1:1 within any given ontology.

For the id->name direction this is not particularly problematic. Every id uniquely identifies an ontology class, and within the subset of the ontologies you are likely to use, there should be a single rdfs:label. However, this assumption cannot be made for every ontology ever constructed. You might also want to think about internationalization early on.

For name->id, for any well-behaved ontology this should be OK, provided you are only mapping non-obsolete classes. It's not clear what procedure you are using to generate the mapping tables, and if overwrites are happening here.

I recommend documenting these assumptions, and the set of ontologies for which the assumptions are assumed to hold (I can help set up upstream checks in the release pipelines of these ontologies).

You're also using strings like sequence_ontology as unique IDs for the ontologies. The only consistent and stable way to identify an ontology is by it's ontology id (for OBO library ontologies) or IRI (for any ontology). You should see this in the ontology field for any obo formatted ontology, or the Ontology object in RDF/OWL. E.g. so, go, hp. These can easily be found on obofoundry.org (the metadata is available in a variety of formats, you can also see the list here: https://github.com/OBOFoundry/OBOFoundry.github.io/tree/master/ontology).

I expect you will need a richer ontology model at some point, at least with support for synonyms, traversal, definitions, obsoletion metadata, versioning. Ping me if you want feedback on either requirements or code.

(none of this constitutes a downvote, code looks a great start, feel free to ask me to move my comments to a separate ticket etc)

@diekhans
Copy link
Contributor

Thanks @cmungall

note that the mapping goal is to link existing data that only
has ontology names to full specified ontology ids. So the
outdated name ->id relationship should not be an issue.

The goal is not to develop an ontology service, only to
be able to link to them.

Chris Mungall notifications@github.com writes:

You assume id<->name will be 1:1 within any given ontology.

For the id->name direction this is not particularly problematic. Every id
uniquely identifies an ontology class, and within the subset of the ontologies
you are likely to use, there should be a single rdfs:label. However, this
assumption cannot be made for every ontology ever constructed. You might also
want to think about internationalization early on.

For name->id, for any well-behaved ontology this should be OK, provided you are
only mapping non-obsolete classes. It's not clear what procedure you are using
to generate the mapping tables, and if overwrites are happening here.

I recommend documenting these assumptions, and the set of ontologies for which
the assumptions are assumed to hold (I can help set up upstream checks in the
release pipelines of these ontologies).

You're also using strings like sequence_ontology as unique IDs for the
ontologies. The only consistent and stable way to identify an ontology is by
it's ontology id (for OBO library ontologies) or IRI (for any ontology). You
should see this in the ontology field for any obo formatted ontology, or the
Ontology object in RDF/OWL. E.g. so, go, hp. These can easily be found on
obofoundry.org (the metadata is available in a variety of formats, you can also
see the list here: https://github.com/OBOFoundry/OBOFoundry.github.io/tree/
master/ontology).

I expect you will need a richer ontology model at some point, at least with
support for synonyms, traversal, definitions, obsoletion metadata, versioning.
Ping me if you want feedback on either requirements or code.


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub*

@david4096 david4096 force-pushed the repo_ontologies branch 3 times, most recently from 1fc3992 to 4c7130e Compare March 29, 2016 23:34
@david4096
Copy link
Member Author

Added the hasId, hasName functions for future use and added ontology maps to repo list.

Now throw an exception when a key isn't found that has to be caught elsewhere.

@david4096
Copy link
Member Author

Could I get a review/merge?

@@ -994,7 +994,10 @@ def convertSeqOntology(self, seqOntStr):
so = self._createGaOntologyTermSo()
so.term = soName
if self._sequenceOntology is not None:
so.id = self._sequenceOntology.getId(soName)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the semantics we want (using the empty string when the key isn't defined)? If so, how about we change the getId function to follow the semantics of dict.get, and add a second parameter to give the default value when the key is not found? So, here we'd have so.id = self._sequenceOntology.getId(soName, "")

Change ontology to ontology maps
Update prepare compliance data
Add some documentation
Confirm run and remove tests
Gracefully fail if the id or name is not found
Raise a named exception when lookup fails
Add functions for testing for presence of ID or name
@david4096 david4096 force-pushed the repo_ontologies branch 2 times, most recently from 1ae677a to 34a0572 Compare March 31, 2016 17:36

def getName(self, id_):
return self._nameIdMap[id_]
def getId(self, name, default=""):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function can be rewritten

return self._idNameMap.get(name, default)

One is able to provide the default value of getting by name or ID similar to python dictionary
Improve add-remove test
@david4096
Copy link
Member Author

Thanks @dcolligan got it.

@macieksmuga
Copy link
Contributor

Most of this the changes here are in the repo manager, so from the perspective of the other parts of the server's datamodel, the only difference is that I would now call (for example) backend.getOntologyMap('sequence_ontology') instead of backend.getOntologyMap('sequence_ontology'), right?

@david4096
Copy link
Member Author

@macieksmuga
Copy link
Contributor

+1

1 similar comment
@dcolligan
Copy link
Member

+1

@dcolligan dcolligan merged commit 3f91752 into ga4gh:master Apr 6, 2016
@david4096 david4096 deleted the repo_ontologies branch April 15, 2016 23:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants