-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an OntologyConfiguration class #260
Comments
|
OntologyConfiguration
classOntologyConfiguration
class
OntologyConfiguration
class
This seems useful. We should beware of feature scope and define what ontologies phenol is "for" -- certainly all the phenotype ontologies, GO, MONDO, ECTO, probably also NCIT. |
@julesjacobsen Should I have a first go at this? For now, do we basically want the configurations to specify the ontologies whose terms should be included? The default would remain "everything"? |
@julesjacobsen Should we address this? I am not 100% sure what the status is -- should I draft an |
@julesjacobsen Trying to understand the issue here -- any chance we can work on the GO configuration class as an example? Would the configuration classes simply contain recommended lists of term prefixes, i.e., from GO we would import GO terms but not BFO/RO etc? And the curie_map takes care of relations? For GO, we additionally need owl:Thing as the artificial root, I think. |
@pnrobinson yes, that's the idea, if I can remember that far back! I guess the general idea is to filter for terms in the RelationshipType class and a defined set based on wanted prefixes, e.g. |
@julesjacobsen it seems we need to make the new classes interact with the
So it seems that the easiest way to move forward would be to replace
which would allow an arbitrary list of prefixes? Right now, loadOntology just works for a number of our ontologies, so I am wondering if it would be better to make the class less forgiving and add some static methods such as
|
@julesjacobsen looking at the code, it seems we do not check for individual prefixes right now (i.e., if we passed GO:1234567 as part of hp.obo, I do not think the current code would throw an error. This is possibly but not certainly forgivable :-0, but it might be OK to leave this altogether for now. In the medium term, if we want to bring phenol up to data (Java 11 or 14) we will probably need to remove the OWL-API dependency, and so this will require more surgery. I am not sure it is worth just fixing this right now. |
@pnrobinson Sorry, I've been swamped with childcare / homeschool / lockdown distractions this week and haven't had time to look at this in any meaningful way. It's been open for a while so clearly isn't that critical, but would certainly be a nice to have. If you think it worth me looking into I can devote some time to it next week. Like you I'm similarly rusty with this code, so I'll need to spend a bit of time to re-load it all back into RAM :) |
@julesjacobsen I think this issue has become stale. |
I've added an 'enhancement' label in case we ever want to find this again. This was obviously never high enough priority to need to implement. |
Currently the
OboGraphDocumentAdaptor
takes a limited configuration in the form of a CurieUtil and a set of term id prefixes in order to filter out unwanted/unknown nodes. For the HPO and MPO at least there is no additional configuration required in order to get the graph required for Phenol type operations.For the GO however it becomes more complicated with the addition or RO and BFO terms with relationships outside of the current enum.
in response to issue #163 the
RelationshipType
would no longer explode if given an unknown relationship and would instead return aRelationshipType .UNKNOWN
. The issue now is that people could mistakenly try to open an ontology and then have a bunch of meaningless relationships. So one question is should these be removed by default? (#163 (comment) and #163 (comment) suggest yes)If removal default removal of
RelationshipType .UNKNOWN
is the generally accepted way to go then ought there be cases where we want to be able to define all the required nodes and relationship types.How far do we want to go with this? Could it be the case where we might want to work with a full and complicated graph exactly as it comes out of ontobio?
ECTO being one example of containing too much information. Loading it like this gives us a sinle graph of is_a relations an only ECTO nodes:
However loading this, produces a lot of CHEBI, UBERON, BFO, RO nodes and a lot of UNKNOWN relationship types:
Bearing in mind Chris' comments, I think it should now be relatively easy to provide the
OboGraphDocumentAdaptor
with anOntologyConfiguration
(working name) which is an interface for enabling users to specify exactly what they want to see in the output. This will require users to be aware that the ontology they're loading might just enable them to create a horrible mess if they wish, but ought to stick to a safe default of onlyNode.RDFTYPES.CLASS
, and is_a relationships for example. During this work the RelationshipType ought to be migrated to something like a Term/TermId to enable any random ontology to be loaded without any preconceptions, or easily configured to filter out unwanted parts.So using the above examples, HPO would require no change, GO might have a preset if we want to consider that a 'supported' ontology.
Originally posted by @julesjacobsen in #184 (comment)
The text was updated successfully, but these errors were encountered: