Help needed moving away from the MySQL database #126

juanjoDiaz · 2018-05-10T08:59:10Z

Hi,

I've been using the MySQL dump of GO in the past for my Cytoscape app (http://apps.cytoscape.org/apps/gfdnet)

I would like to move to the new model which I understand that is based on blazegraph or OBO files.

Essentially, given a gene network, I was using GO to:

Get the list of possible genus and species so the user can select the right one for the analysis.
Match all the synonyms of a gene (in case a non-standard name is used)
Match each gene with its associated gene products (including synonyms)
Match each gene product with its associated GO terms
Construct the DAG (Tree) from the annotations of those gene products all the way to the ontology root.

You can see my DB queries in https://github.com/juanjoDiaz/gfdnet/tree/master/src/main/java/org/cytoscape/gfdnet/model/dataaccess/go

Any help to port the tool to the latest available data (either Blazegraph, OBO files, or whatnot) would be really appreciated.

kltm · 2018-05-10T16:37:51Z

@juanjoDiaz

Welcome to the transition--there's a lot going on and we're still trying to get the all documentation and links in order as well.

While OBO is still used as a format for the ontologies, you may want to look at OWL as a possibility moving forward as well: in the future the most data rich "annotation" models that we'll be creating will be based on that (often in TTL format).

The the latest blazegraph dumps for a release can be found at http://current.geneontology.org, monthly-ish releases at http://release.geneontology.org (we still need to clean that out--please ignore anything before March 2018); frequent (almost daily) snapshots can be found at http://snapshot.geneontology.org. For all of these under products/blazegraph, with a date first for the "releases". You would be interested in the blazegraph-production.jnl.gz file.

I am actually unfamiliar with the relations used to store information like synonyms, or if we're even loading that into the blazegraph at this point; I'll defer comment about that and public data model documentation to @cmungall .

juanjoDiaz · 2018-05-10T17:40:29Z

Thanks for the response @kltm

I'll look into OWL and wait for @cmungall for the details :)

Couple more questions:

Is there any information about how those dumps files are organized and how the match the old MySQL structure?
Is there any publicly accessible database and any guidelines of how to do so?

kltm · 2018-05-10T20:16:03Z

@juanjoDiaz

For annotation/modeling data, the "dump" files that we provide are either going to be OWL TTL, so an OWL model, or a GAF 2.1. For the blazegraph, the layout in the triplestore is an extremely close analog to how the data is modeled in the OWL, but with small adjustments where they do not fit.

For the time being, we have made available http://rdf.geneontology.org . While not widely advertised, it should be usable as a beta tool as we shore it up.

suzialeksander · 2018-06-27T16:21:29Z

@cmungall, was there anything you can add to this?

juanjoDiaz · 2018-07-05T22:22:34Z

Thanks for following up!
To be honest I'm still 100% stuck on this one.

No clue on how to get the information about gene synonyms or gene products.
And no clue on how to query the new Blazergraph. I've tried the examples at https://github.com/geneontology/go-graphstore/wiki/Example-queries but they don't seem to work. http://geneontology.org/rdf/ just returns not found, for example.

I hope we can get more info soon. I'm surprised if I'm the only one having this problem. I was expecting more people relying on the MySQL database...

kltm · 2018-07-06T01:23:43Z

@juanjoDiaz Ugh, sorry, actually those wiki examples are a little crusty. Our public RDF endpoint is more properly at http://rdf.geneontology.org . I tagged on @dougli1sqrd , who is currently working on the graph store and new documentation--he should be able to help get you started a bit.

dougli1sqrd · 2018-07-10T00:38:20Z

Hi @juanjoDiaz!
Check out https://github.com/geneontology/go-site/blob/master/graphstore/triplestore_info.md. This is an in progress document about how we construct the triplestore. SPARQL is the way to query the data there, and I have links on SPARQL tutorials in the document. There is some explanation of the data model, as well as links to resources in more detail on GO-CAM models.

I can also help with specific SPARQL queries you might want. Check out the document, and let me know what other questions you have.

FYI, we currently do not have taxon information on gene products in the triple store directly. But we are actively working on that right now, and should have that completed in a day or so, trickling into the triple store after that.

juanjoDiaz · 2018-07-15T16:12:51Z

Thanks @dougli1sqrd !

The queries that I'm trying to do are listed on my first message.

Get the list of available genus and species in GO.
Given a gene name get all the synonyms (so, in case a non-standard name is used, I can use the standard one for the following queries)
Given a gene get its gene products (including synonyms)
Given a gene product get its associated GO terms
Construct the DAG (Tree) from the annotations of those gene products all the way to the ontology root. I guess I might not need this anymore if OBO/OWL/Blazegraph/etc. have a way to get a GO term depth, the shortest distance between two GO Term and the LCA (lowest common ancestor) of two GO terms.

You can see my MySQL DB queries in https://github.com/juanjoDiaz/gfdnet/tree/master/src/main/java/org/cytoscape/gfdnet/model/dataaccess/go

Any help on getting those queries in the new system is more than welcome.

suzialeksander · 2018-07-30T18:08:25Z

@dougli1sqrd can you help out @juanjoDiaz with these queries?
Thanks!

kltm · 2018-07-30T21:32:09Z

@juanjoDiaz
Unfortunately, while we have some examples (https://github.com/geneontology/sparqlr/tree/master/templates), and are working on getting more examples into the system, we don't really have the capacity to work through these right now. As well, on a little inspection with our public endpoint, some of these queries would timeout before completion.
For the final query you have, it may be worth noting that "distance" and term "depth" are often not particularly information carrying concepts in the GO in many use cases, as there are multiple paths over the closure of many types of relationships. For some of your use cases, you may actually be interested in the BioLink API https://biolink.geneontology.io/ (apologies, HTTPS exception needed for the moment).

suzialeksander · 2018-08-23T22:12:17Z

@juanjoDiaz, is there anything else we can clarify before I close this ticket?

juanjoDiaz · 2018-09-02T15:42:43Z

Hi @suzialeksander,
Sorry for the delay answering.

Unfortunately, I still haven't been able to find a clear path to migrate away from MySQL.

I understand from @kltm response that BioLink API might be a better option than using the new GO database directly. Is it correct my understanding that BioLink just wraps GO new database together with other databases to offer a consistent API?

I've taken a look at the API definition and it seems that I'd be able to build the subsection of the GO tree for a set of genes using:

/bioentity/goterm/{id}/genes/ to extract the related GO terms
/ontol/subgraph/{ontology}/{node} to get the tree up to the root for each GO term
custom code to assemble the tree at my end.

Is that the suggested approach?

However, it also seems that I won't be able Get the list of available genus and species in GO.
In that sense, I can't see any reference to genus or species. How do I specify the organism when doing queries against GO like the ones mentione above?

Also, I'd like to know how production ready, is this Biolink API? I wouldn't like to put the effor to migrate from MySQL to BioLink to then have it changed or deprecated and have to start all over again.

Thanks for all the support!

[P.S.: as a side note, I disagree on "distance" and "depth" not being relevant concepts. They definitely are relevant for my research (https://www.sciencedirect.com/science/article/pii/S1532046417300382) 🙂]

kltm · 2018-09-04T21:10:42Z

@juanjoDiaz No worries--it's not like we're speed demons ourselves.

The BioLink API (https://github.com/biolink/) is indeed production and be used for some things (currently used for by Monarch and the Alliance Genome Ribbon), but it is still in progress, with routes still being filled out and fixed as use cases arise. It may be useful to look at their tracker (biolink-api) and see if the features that you want are in progress, or to suggest them for addition.

That said, for what you're doing, you may have better luck with directly sending queries to the SPARQL endpoint (http://rdf.geneontology.org) to get the data that you're interested in.

[I'd agree that distance may be fine for some kinds of topological use cases; we just try and get users to think about what they're doing in case they are using it as a proxy for some concept of "information", in which case is may be misleading, depending on the use. It's boilerplate that I add to any query we get that makes reference to anything "depth"-like.]

suzialeksander · 2018-10-30T22:36:09Z

Hi @juanjoDiaz, did you have any more questions on this right now? If not, I'd like to close this ticket, and you are welcome to open a new ticket anytime. If there's something else on this ticket you still would like more information on, please let us know!

Thanks!

juanjoDiaz · 2018-10-31T07:49:10Z

Hi @suzialeksander ,

Feel free to close this tickets. hen I finally can start working on this I'll create more specific tickets as questions arise.

Thanks everyone for the help!

kltm added accepted Software labels May 10, 2018

kltm assigned cmungall May 10, 2018

kltm mentioned this issue May 14, 2018

archive/lite/latest/go_weekly-assocdb.rdf-xml.gz empty or missing. #127

Closed

suzialeksander added the stale label Jun 27, 2018

suzialeksander removed the stale label Jul 5, 2018

kltm assigned dougli1sqrd and unassigned cmungall Jul 6, 2018

suzialeksander added the stale label Aug 23, 2018

kltm removed the stale label Sep 4, 2018

suzialeksander closed this as completed Nov 2, 2018

hoogla mentioned this issue Jun 26, 2020

How to query GO graph and all annotations locally? #264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help needed moving away from the MySQL database #126

Help needed moving away from the MySQL database #126

juanjoDiaz commented May 10, 2018

kltm commented May 10, 2018

juanjoDiaz commented May 10, 2018

kltm commented May 10, 2018

suzialeksander commented Jun 27, 2018

juanjoDiaz commented Jul 5, 2018

kltm commented Jul 6, 2018

dougli1sqrd commented Jul 10, 2018 •

edited

juanjoDiaz commented Jul 15, 2018

suzialeksander commented Jul 30, 2018

kltm commented Jul 30, 2018

suzialeksander commented Aug 23, 2018

juanjoDiaz commented Sep 2, 2018

kltm commented Sep 4, 2018

suzialeksander commented Oct 30, 2018

juanjoDiaz commented Oct 31, 2018

Help needed moving away from the MySQL database #126

Help needed moving away from the MySQL database #126

Comments

juanjoDiaz commented May 10, 2018

kltm commented May 10, 2018

juanjoDiaz commented May 10, 2018

kltm commented May 10, 2018

suzialeksander commented Jun 27, 2018

juanjoDiaz commented Jul 5, 2018

kltm commented Jul 6, 2018

dougli1sqrd commented Jul 10, 2018 • edited

juanjoDiaz commented Jul 15, 2018

suzialeksander commented Jul 30, 2018

kltm commented Jul 30, 2018

suzialeksander commented Aug 23, 2018

juanjoDiaz commented Sep 2, 2018

kltm commented Sep 4, 2018

suzialeksander commented Oct 30, 2018

juanjoDiaz commented Oct 31, 2018

dougli1sqrd commented Jul 10, 2018 •

edited