Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an OntologyConfiguration class #260

Closed
julesjacobsen opened this issue Feb 27, 2020 · 11 comments
Closed

Create an OntologyConfiguration class #260

julesjacobsen opened this issue Feb 27, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@julesjacobsen
Copy link
Contributor

Currently the OboGraphDocumentAdaptor takes a limited configuration in the form of a CurieUtil and a set of term id prefixes in order to filter out unwanted/unknown nodes. For the HPO and MPO at least there is no additional configuration required in order to get the graph required for Phenol type operations.

// The HPO is in the default  curie map and only contains known relationships / HP terms
Ontology hpoOntology = OntologyLoader.loadOntology(hpoFile)

For the GO however it becomes more complicated with the addition or RO and BFO terms with relationships outside of the current enum.

// The GO is in the default curie map but also contains BFO and RO terms with unknown relationships
// we want to ignore these so here we specify the term prefixes we want to use. It has three possible root
// nodes (biological_process, cellular_component, biological_function) so an artificial GO:0000000 root is
// added
Ontology goOntology = OntologyLoader.loadOntology(goFile, "GO")

in response to issue #163 the RelationshipType would no longer explode if given an unknown relationship and would instead return a RelationshipType .UNKNOWN. The issue now is that people could mistakenly try to open an ontology and then have a bunch of meaningless relationships. So one question is should these be removed by default? (#163 (comment) and #163 (comment) suggest yes)

If removal default removal of RelationshipType .UNKNOWN is the generally accepted way to go then ought there be cases where we want to be able to define all the required nodes and relationship types.

How far do we want to go with this? Could it be the case where we might want to work with a full and complicated graph exactly as it comes out of ontobio?

ECTO being one example of containing too much information. Loading it like this gives us a sinle graph of is_a relations an only ECTO nodes:

// ECTO isn't mapped in the default Curie mappings, so we need to add it here (the PURL isn't correct)
CurieUtil curieUtil = CurieUtilBuilder.withDefaultsAnd(ImmutableMap.of("ECTO", "http://purl.obolibrary.org/obo/ECTO_"));

// ECTO also contains a bunch of unknown relationships so we're going to simplify this graph by only
// loading ECTO nodes (this ignores the true root term XCO:0000000) and other nodes from CHEBI,
// BFO and UBERON among others.
Ontology ecto = OntologyLoader.loadOntology(ectoFile, curieUtil, "ECTO");

However loading this, produces a lot of CHEBI, UBERON, BFO, RO nodes and a lot of UNKNOWN relationship types:

// ECTO isn't mapped in the default Curie mappings, so we need to add it here (the PURL isn't correct)
CurieUtil curieUtil = CurieUtilBuilder.withDefaultsAnd(ImmutableMap.of("ECTO", "http://purl.obolibrary.org/obo/ECTO_"));

// This now contains most of what was in the original graph loaded from ontolib
Ontology ecto = OntologyLoader.loadOntology(ectoFile, curieUtil);

Bearing in mind Chris' comments, I think it should now be relatively easy to provide the OboGraphDocumentAdaptor with an OntologyConfiguration (working name) which is an interface for enabling users to specify exactly what they want to see in the output. This will require users to be aware that the ontology they're loading might just enable them to create a horrible mess if they wish, but ought to stick to a safe default of only Node.RDFTYPES.CLASS, and is_a relationships for example. During this work the RelationshipType ought to be migrated to something like a Term/TermId to enable any random ontology to be loaded without any preconceptions, or easily configured to filter out unwanted parts.

So using the above examples, HPO would require no change, GO might have a preset if we want to consider that a 'supported' ontology.

Ontology hpoOntology = OntologyLoader.loadOntology(hpoFile)
// would be this under the hood:
Ontology hpoOntology = OntologyLoader.loadOntology(hpoFile, OntologyConfiguration.default());

// special GO config to replicate current setup returns a GoOntologyConfiguration 
Ontology goSubsetOntology = OntologyLoader.loadOntology(goFile, OntologyConfiguration.forGo())

// Empty configuration to reproduce the original original graph loaded from ontobio
Ontology goFullOntology = OntologyLoader.loadOntology(goFile, OntologyConfiguration.empty())

// ECTO isn't mapped in the default Curie mappings, so we need to add it here (the PURL isn't correct)
CurieUtil curieUtil = CurieUtilBuilder.withDefaultsAnd(ImmutableMap.of("ECTO", "http://purl.obolibrary.org/obo/ECTO_"));
OntologyConfiguration userDefinedConfiguration = OntologyConfiguration .builder()...build();

// This now contains most of what was in the original graph loaded from ontobio
Ontology ecto = OntologyLoader.loadOntology(ectoFile, userDefinedConfiguration);

Originally posted by @julesjacobsen in #184 (comment)

@julesjacobsen
Copy link
Contributor Author

#184 (comment)

+1
It would be great for us to define "sensible" defaults for ontologies we plan to use ourselves, and to document the library sufficiently that others will understand how to deviate from the default.

OntologyConfiguration.forGo()
OntologyConfiguration.forEcto()
.... for parts of the NCIT too?

would be great!

@julesjacobsen julesjacobsen changed the title Create an OntologyConfiguration class Create an OntologyConfiguration class Feb 27, 2020
@julesjacobsen julesjacobsen changed the title Create an OntologyConfiguration class Create an OntologyConfiguration class Feb 27, 2020
@pnrobinson
Copy link
Member

This seems useful. We should beware of feature scope and define what ontologies phenol is "for" -- certainly all the phenotype ontologies, GO, MONDO, ECTO, probably also NCIT.

@pnrobinson
Copy link
Member

@julesjacobsen Should I have a first go at this? For now, do we basically want the configurations to specify the ontologies whose terms should be included? The default would remain "everything"?

@pnrobinson
Copy link
Member

@julesjacobsen Should we address this? I am not 100% sure what the status is -- should I draft an OntologyConfiguration class as above? The default (empty) would be to take everything, and we would make some sensible choices for HP, MP, GO, MONDO, and ECTO?

@pnrobinson
Copy link
Member

@julesjacobsen Trying to understand the issue here -- any chance we can work on the GO configuration class as an example? Would the configuration classes simply contain recommended lists of term prefixes, i.e., from GO we would import GO terms but not BFO/RO etc? And the curie_map takes care of relations? For GO, we additionally need owl:Thing as the artificial root, I think.

@julesjacobsen
Copy link
Contributor Author

@pnrobinson yes, that's the idea, if I can remember that far back! I guess the general idea is to filter for terms in the RelationshipType class and a defined set based on wanted prefixes, e.g. GO:. Definitely want to filter for known things as who knows what unknown things might emerge in the future and break things.

@pnrobinson
Copy link
Member

@julesjacobsen it seems we need to make the new classes interact with the OntologyLoader, which has mutliple CTORs including this

public static Ontology loadOntology(GraphDocument graphDocument, CurieUtil curieUtil, String... termIdPrefixes) {

So it seems that the easiest way to move forward would be to replace String... termIdPrefixes with and OntologyConfiguration object that would return the needed prefixes.
Also, OntologyConfiguration would have some static members for GO and HP etc and also would have a constructor like

OntologyConfiguration constructConfiguration(String... termIdPrefixes) {
...
}

which would allow an arbitrary list of prefixes?

Right now, loadOntology just works for a number of our ontologies, so I am wondering if it would be better to make the class less forgiving and add some static methods such as

loadHpOntology(File f)

@pnrobinson
Copy link
Member

@julesjacobsen looking at the code, it seems we do not check for individual prefixes right now (i.e., if we passed GO:1234567 as part of hp.obo, I do not think the current code would throw an error. This is possibly but not certainly forgivable :-0, but it might be OK to leave this altogether for now. In the medium term, if we want to bring phenol up to data (Java 11 or 14) we will probably need to remove the OWL-API dependency, and so this will require more surgery. I am not sure it is worth just fixing this right now.

@julesjacobsen
Copy link
Contributor Author

@pnrobinson Sorry, I've been swamped with childcare / homeschool / lockdown distractions this week and haven't had time to look at this in any meaningful way. It's been open for a while so clearly isn't that critical, but would certainly be a nice to have. If you think it worth me looking into I can devote some time to it next week. Like you I'm similarly rusty with this code, so I'll need to spend a bit of time to re-load it all back into RAM :)

@pnrobinson
Copy link
Member

@julesjacobsen I think this issue has become stale.
We should at some point write an application note for phenol and clean up and document a few things, but this is not high priority right now.

@julesjacobsen julesjacobsen added the enhancement New feature or request label Jul 1, 2024
@julesjacobsen
Copy link
Contributor Author

I've added an 'enhancement' label in case we ever want to find this again. This was obviously never high enough priority to need to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants