Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Refactor or better document loaders #57

Open
cmungall opened this Issue Feb 28, 2017 · 10 comments

Comments

Projects
None yet
3 participants
Owner

cmungall commented Feb 28, 2017

The ways in which ontologies and annotations are loaded is confusing and not well documented.

  • everything can be combined into one OWL file, which works for us in monarch but it not intuitive, and requires something like owltools to make the files
  • the yaml allows stratification of ontologies and instances TSVs. However, there are 2 kinds of TSVs: associations and labels. It seems only the former is supported
  • we need to make this extensible, e.g. to load frequency info for #56
  • we have phenopacket loading in a branch by @balhoff but not fully integrated yet?

ultimately we want to be able to fetch latest version from API, to avoid stale files in github

cc @jnguyenx @damiansm @kshefchek

Contributor

julesjacobsen commented Mar 7, 2017

migrating relevant comments over to here...

Whilst setting up the new code, I was able to create a new BMKnowledgeBase using the OWLLoader like so:

public BMKnowledgeBase bmKnowledgeBase() throws Exception {
        OWLLoader owlLoader = new OWLLoader();
        owlLoader.load("src/main/resources/ontologies/hp.obo");
        owlLoader.loadDataFromTsvGzip("src/main/resources/data/human-pheno.assocs.gz");
        return owlLoader.createKnowledgeBaseInterface();
    }

This loaded and started up OK, but when a set of HP terms (HP:0001156,HP:0001363,HP:0011304,HP:0010055) were passed in to the controller

@GetMapping(path = "phenodigm")
public MatchSet matchPhenodigm(@RequestParam(value = "id") Set<String> ids) throws IncoherentStateException {
        ProfileMatcher phenodigmProfileMatcher = PhenodigmICProfileMatcher.create(bmKnowledgeBase);
        logger.info("Created {} profile matcher", phenodigmProfileMatcher.getShortName());
        ProfileQuery query = ProfileQueryFactory.createQuery(ids);
        logger.info("Querying {} matcher with Class ids: {}", phenodigmProfileMatcher.getShortName(), query.getQueryClassIds());
        return phenodigmProfileMatcher.findMatchProfile(query);
    }

the following NPE was thrown

java.lang.NullPointerException: null
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:210) ~[guava-18.0.jar:na]
	at org.monarchinitiative.owlsim.kb.impl.BMKnowledgeBaseOWLAPIImpl.getIndexForClassNode(BMKnowledgeBaseOWLAPIImpl.java:740) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.owlsim.kb.impl.BMKnowledgeBaseOWLAPIImpl.getIndex(BMKnowledgeBaseOWLAPIImpl.java:625) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.owlsim.kb.impl.BMKnowledgeBaseOWLAPIImpl.getClassIndex(BMKnowledgeBaseOWLAPIImpl.java:634) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.owlsim.compute.matcher.impl.AbstractProfileMatcher.getProfileSetBM(AbstractProfileMatcher.java:94) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.owlsim.compute.matcher.impl.PhenodigmICProfileMatcher.findMatchProfileImpl(PhenodigmICProfileMatcher.java:66) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.owlsim.compute.matcher.impl.AbstractProfileMatcher.findMatchProfileAll(AbstractProfileMatcher.java:223) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.owlsim.compute.matcher.impl.AbstractProfileMatcher.findMatchProfile(AbstractProfileMatcher.java:194) ~[owlsim-core-3.0-SNAPSHOT.jar:na]
	at org.monarchinitiative.controllers.MatchController.matchPhenodigm(MatchController.java:43) ~[classes/:na]

debugging the application it turned out this was because there was no curie mappings provided to the BMKnowledgebase, so this is a bit of an issue with the OWLLoader which provides no means of adding a set of mappings to the CurieUtil.

Contributor

julesjacobsen commented Mar 7, 2017 edited

To fix the above, and to try and lead people through the creating of the BMKnowledgeBase, I've implemented a new OwlKnowlegeBase class. This is a bit of a mash-up of the OWLLoader and KnowledgeBaseModule which allows for fluent loading of a BMKnowledgeBase. For example

    OwlKnowledgeBase.loader()
            .loadCuries(curies())
            .loadOntology("src/main/resources/ontologies/hp.obo")
            .loadData("src/main/resources/data/human-pheno.assocs.gz")
            .createKnowledgeBase();

Where curies() defines a map {'HP' : 'http://purl.obolibrary.org/obo/HP_'}
This partly addresses this issue. It's not fully ready yet, but it works - the main issue being that things need to be called in the correct order, which is clearly a bad thing.

Owner

drseb commented Mar 8, 2017

Is the code of OwlKnowledgeBase available somewhere?

Also, what is the rational behind the format of the G2P annotations (human-pheno.assocs.gz) ? This is yet another format... can we create a public addX2Pannotation(...) method so that I can put content into the KB from my code?

Contributor

julesjacobsen commented Mar 8, 2017

Not yet available - I'm working on it, but will let you know when I'm done.

We need to think more on the actual import format used - I'm just taking what's currently in place and trying to consolidate code to reduce duplication. The loadData() method currently parses the data and loads it into an OWLOntology. It needn't have to take a file - it could accept a list of objects with an id (Individual) and a phenotype term (Class).

Alternatively, it might be worthwhile allowing people to define their own parsers and domain objects so long as these implement standard interfaces. That way you can load up a knowledgebase with any given set of data, providing they have a unique identifier and a set of associated phenotypes. Ultimately these all end up as asserted axioms of class to individual.

Contributor

julesjacobsen commented Mar 8, 2017

@drseb do you want to load these annotations as a Map or from file or both? I think it only really makes sense to load them in in bulk as they are all added to the ontology which is then passed to the BMKnowledgeBase. The data could either be a Map<String, List> where the key is the individual's identifier and the List are the curies to be associated with the individual. Alternatively a map or String key value pairs such that the individual's identifier is repeated with every class assertion. What would you prefer? @cmungall @jnguyenx - ideas?

Owner

drseb commented Mar 8, 2017

Loading in bulk is totally ok. I would suggest to make the key of the Map generic - sometimes one may use int (entrez-id) and sometimes Strings (disease-id).

Owner

cmungall commented Mar 9, 2017

We use URIs internally as identifiers. Anything coming in or our has to be a CURIE or a URI. We have all the prefixes you should need in the Monarch prefixmap, e.g. NCBIGene

Owner

drseb commented Mar 9, 2017

And if my annotation set is a bunch of patients? Or papers?

Contributor

julesjacobsen commented Mar 9, 2017 edited

I think you can define these in the CurieUtil initialisation map, making-up a namespace. So you could map PTNT:1, PTNT:2 etc. using the mapping "PTNT": "http://seb.org/PTNT_".

In the tests there are examples of gene identifiers just being used as is without any real URI, these are mapped to HP terms or whatever, so these will probably be OK, at least for matching purposes.

Papers presumably have a pubmed id or some such. These are mapped in Monarch using these identifiers:

# publication/reference sources
  'DOI' : 'http://dx.doi.org/'
  'GeneReviews' : 'http://www.ncbi.nlm.nih.gov/books/'  # diseases too
  'ISBN': 'https://monarchinitiative.org/ISBN_'
  'ISBN-10': 'https://monarchinitiative.org/ISBN10_'
  'ISBN-13': 'https://monarchinitiative.org/ISBN13_'
  'ISBN-15': 'https://monarchinitiative.org/ISBN15_'
  'J' : 'http://www.informatics.jax.org/reference/J:'  # MGI-internal identifiers for pubs
  'MPD':  'http://phenome.jax.org/'
  'MPD-assay': 'http://phenome.jax.org/db/qp?rtn=views/catlines&keymeas='
  'PMID': 'http://www.ncbi.nlm.nih.gov/pubmed/'
  'PMCID' : 'http://www.ncbi.nlm.nih.gov/pmc/'
  'AQTLPub' : 'http://www.animalgenome.org/cgi-bin/QTLdb/BT/qabstract?PUBMED_ID='
  'GO_REF' : 'http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:'

@julesjacobsen julesjacobsen added a commit to julesjacobsen/owlsim-v3 that referenced this issue Mar 13, 2017

@julesjacobsen julesjacobsen Issue #57 - Added new OwlKnowledgeBase class as a main entry point fo…
…r creating a BMKnowledgeBaseOWLAPIImpl - @deprecated OWLLoader.

Added new supporting classes Ontology and OntologyDataSource, OntologyLoadException to help the OwlKnowledgeBase do its thing.
OWLLoader is only partially removed from the tests at present.
Removed overly verbose logging from FilterEngine.
Added commented-out clean-up of KnowledgeBaseModule - server still fails to start with ot without this.
b3c3422
Contributor

julesjacobsen commented Mar 15, 2017 edited

@drseb I've added a loadDataFromMap method for inserting individuals. Here is a complete code snippet for loading the HPO with a single ORPHA:710 individual:

Map<String, String> curies = new LinkedHashMap<>();
curies.put("ORPHA", "'http://www.orpha.net/ORDO/Orphanet_");
curies.put("HP", "http://purl.obolibrary.org/obo/HP_");

Map<String, Collection<String>> individuals = new LinkedHashMap<>();
individuals.put("ORPHA:710", Arrays.asList("HP:0000194",            
        "HP:0000218",                                               
        "HP:0000262",                                               
        "HP:0000303",                                               
        "HP:0000316",                                               
        "HP:0000322",                                               
        "HP:0000324",                                               
        "HP:0000348",                                               
        "HP:0000431",                                               
        "HP:0000470",                                               
        "HP:0000508",                                               
        "HP:0001156",                                               
        "HP:0001385",                                               
        "HP:0003307",                                               
        "HP:0004209",                                               
        "HP:0004322",                                               
        "HP:0005048",                                               
        "HP:0006101",                                               
        "HP:0009773",                                               
        "HP:0010669",                                               
        "HP:0011304",                                               
        "HP:0012368"));                                         

BMKnowledgeBase knowledgeBase = BMKnowledgeBase.owlLoader()
                .loadOntology("http://purl.obolibrary.org/obo/hp.owl")
                .loadCuries(curies)
                .loadDataFromMap(data)
                .createKnowledgeBase();

Will this work for you, or do you need some other way of loading in the individuals data?

@julesjacobsen julesjacobsen added a commit to julesjacobsen/owlsim-v3 that referenced this issue Mar 15, 2017

@julesjacobsen julesjacobsen Issue #57 Added static owlLoader() method to BMKnowledgebase to make …
…API more self-explanatory.
bdee79f

@julesjacobsen julesjacobsen added a commit to julesjacobsen/owlsim-v3 that referenced this issue Mar 15, 2017

@julesjacobsen julesjacobsen Issue #57 Added methods to enable adding individuals to the Ontology …
…via the OwlKnowledgeBase and the OntologyDataSource.
ed63056

@julesjacobsen julesjacobsen added a commit to julesjacobsen/owlsim-v3 that referenced this issue Mar 17, 2017

@julesjacobsen julesjacobsen Issue #57 changed uses of 'data' to 'individualAssociations' in conte…
…xt of the Ontology/OntologySourceData/OwlKnowledgebase in order to be more specific.
75bf89f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment