New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help needed moving away from the MySQL database #126
Comments
Welcome to the transition--there's a lot going on and we're still trying to get the all documentation and links in order as well. While OBO is still used as a format for the ontologies, you may want to look at OWL as a possibility moving forward as well: in the future the most data rich "annotation" models that we'll be creating will be based on that (often in TTL format). The the latest blazegraph dumps for a release can be found at http://current.geneontology.org, monthly-ish releases at http://release.geneontology.org (we still need to clean that out--please ignore anything before March 2018); frequent (almost daily) snapshots can be found at http://snapshot.geneontology.org. For all of these under products/blazegraph, with a date first for the "releases". You would be interested in the blazegraph-production.jnl.gz file. I am actually unfamiliar with the relations used to store information like synonyms, or if we're even loading that into the blazegraph at this point; I'll defer comment about that and public data model documentation to @cmungall . |
Thanks for the response @kltm I'll look into OWL and wait for @cmungall for the details :) Couple more questions:
|
For annotation/modeling data, the "dump" files that we provide are either going to be OWL TTL, so an OWL model, or a GAF 2.1. For the blazegraph, the layout in the triplestore is an extremely close analog to how the data is modeled in the OWL, but with small adjustments where they do not fit. For the time being, we have made available http://rdf.geneontology.org . While not widely advertised, it should be usable as a beta tool as we shore it up. |
@cmungall, was there anything you can add to this? |
Thanks for following up! No clue on how to get the information about gene synonyms or gene products. I hope we can get more info soon. I'm surprised if I'm the only one having this problem. I was expecting more people relying on the MySQL database... |
@juanjoDiaz Ugh, sorry, actually those wiki examples are a little crusty. Our public RDF endpoint is more properly at http://rdf.geneontology.org . I tagged on @dougli1sqrd , who is currently working on the graph store and new documentation--he should be able to help get you started a bit. |
Hi @juanjoDiaz! I can also help with specific SPARQL queries you might want. Check out the document, and let me know what other questions you have. FYI, we currently do not have taxon information on gene products in the triple store directly. But we are actively working on that right now, and should have that completed in a day or so, trickling into the triple store after that. |
Thanks @dougli1sqrd ! The queries that I'm trying to do are listed on my first message.
You can see my MySQL DB queries in https://github.com/juanjoDiaz/gfdnet/tree/master/src/main/java/org/cytoscape/gfdnet/model/dataaccess/go Any help on getting those queries in the new system is more than welcome. |
@dougli1sqrd can you help out @juanjoDiaz with these queries? |
@juanjoDiaz |
@juanjoDiaz, is there anything else we can clarify before I close this ticket? |
Hi @suzialeksander, Unfortunately, I still haven't been able to find a clear path to migrate away from MySQL. I understand from @kltm response that BioLink API might be a better option than using the new GO database directly. Is it correct my understanding that BioLink just wraps GO new database together with other databases to offer a consistent API? I've taken a look at the API definition and it seems that I'd be able to build the subsection of the GO tree for a set of genes using:
Is that the suggested approach? However, it also seems that I won't be able Get the list of available genus and species in GO. Also, I'd like to know how production ready, is this Biolink API? I wouldn't like to put the effor to migrate from MySQL to BioLink to then have it changed or deprecated and have to start all over again. Thanks for all the support! [P.S.: as a side note, I disagree on "distance" and "depth" not being relevant concepts. They definitely are relevant for my research (https://www.sciencedirect.com/science/article/pii/S1532046417300382) 🙂] |
@juanjoDiaz No worries--it's not like we're speed demons ourselves. The BioLink API (https://github.com/biolink/) is indeed production and be used for some things (currently used for by Monarch and the Alliance Genome Ribbon), but it is still in progress, with routes still being filled out and fixed as use cases arise. It may be useful to look at their tracker (biolink-api) and see if the features that you want are in progress, or to suggest them for addition. That said, for what you're doing, you may have better luck with directly sending queries to the SPARQL endpoint (http://rdf.geneontology.org) to get the data that you're interested in. [I'd agree that distance may be fine for some kinds of topological use cases; we just try and get users to think about what they're doing in case they are using it as a proxy for some concept of "information", in which case is may be misleading, depending on the use. It's boilerplate that I add to any query we get that makes reference to anything "depth"-like.] |
Hi @juanjoDiaz, did you have any more questions on this right now? If not, I'd like to close this ticket, and you are welcome to open a new ticket anytime. If there's something else on this ticket you still would like more information on, please let us know! Thanks! |
Hi @suzialeksander , Feel free to close this tickets. hen I finally can start working on this I'll create more specific tickets as questions arise. Thanks everyone for the help! |
Hi,
I've been using the MySQL dump of GO in the past for my Cytoscape app (http://apps.cytoscape.org/apps/gfdnet)
I would like to move to the new model which I understand that is based on blazegraph or OBO files.
Essentially, given a gene network, I was using GO to:
You can see my DB queries in https://github.com/juanjoDiaz/gfdnet/tree/master/src/main/java/org/cytoscape/gfdnet/model/dataaccess/go
Any help to port the tool to the latest available data (either Blazegraph, OBO files, or whatnot) would be really appreciated.
The text was updated successfully, but these errors were encountered: