Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFIN genes created in September 2019 and later are not available in Noctua (but available in our GPI) #53

Closed
sabrinatoro opened this issue Mar 10, 2020 · 14 comments

Comments

@sabrinatoro
Copy link

I want to create an annotation to linc.terminator (ZFIN:ZDB-LINCRNAG-190911-1), but this gene is not available in Noctua. I am using the new Noctua form, but the issue is the same in the graph editor.

  1. I am unable to create an annotation using the ID alone - I don't know if this is an expected behavior.
  2. It looks like the new genes created in September 2019 and later are not available in Noctua.
    (examples: ZFIN:ZDB-LINCRNAG-190911-1, ZFIN:ZDB-GENE-190924-2, ZFIN:ZDB-GENE-200114-3). These genes are available in our GPI files.
    This makes me think that there has been a problem with our GPI files since September 2019:
  • either GOC has not been retrieving our latest GPI since September and/or there has been a problem to add the information in Noctua
  • or there is a problem with our GPI files and they create an error in GO/Noctua (we have not been notified about such issue, but it is a possibility it is happening).
    Could you please look into this? Thank you
@suzialeksander
Copy link

@sabrinatoro I'm fairly certain there's a ticket about most of this, let me see if I can find it

@suzialeksander
Copy link

I may be thinking of an email with @vanaukenk about SGD names not popping up in Noctua: "WB has similar delay issues with Protein2GO in that nomenclature updates that WB makes may take a while to trickle down to UniProtKB and thus Protein2GO. One immediate solution may be for SGD to produce a gpi file using the 1.2 specs. GO could then pick that file up and use it to populate entries in NEO with the most up-to-date id and nomenclature. We do that for WB right now, i.e. we produce a GAF for annotation and a gpi for NEO.

@sabrinatoro It looks like you do produce a gpi according to your yaml. Are you absolutely certain the "fail" cases you listed are in the gpi provided to GO? I don't see 190911-1, 190924-2 or 200114-3 in the file I just obtained at http://current.geneontology.org/annotations/zfin.gpi.gz

also slightly similar geneontology/noctua#583 #36

@kltm is there a better home for this issue than helpdesk? Not sure it's exactly a Noctua issue

@sabrinatoro
Copy link
Author

@suzialeksander
yes, these genes are in the latest gpi file we produced at ZFIN.
What I do not know is whether this new gpi file is available at GO (i.e. maybe there is a problem with GO getting the file? or maybe there are some error with it).

@kltm kltm transferred this issue from geneontology/helpdesk Mar 11, 2020
@kltm kltm added the bug label Mar 11, 2020
@kltm
Copy link
Member

kltm commented Mar 12, 2020

Can confirm that it's in NEO:

wget http://skyhook.berkeleybop.org/issue-35-neo-test/ontology/neo.obo
grep "ZFIN:ZDB-LINCRNAG-190911-1" neo.obo 
id: ZFIN:ZDB-LINCRNAG-190911-1
xref: ZFIN:ZDB-LINCRNAG-190911-1

@kltm
Copy link
Member

kltm commented Mar 12, 2020

Noting some similarities to #51 and #52

@kltm
Copy link
Member

kltm commented Mar 12, 2020

Odd:

sjcarbon@moiraine:/tmp$:( zgrep -b "ZFIN:ZDB-LINCRNAG-190911-1" golr-index-contents.tgz 
sjcarbon@moiraine:/tmp$:( zgrep -b "ZFIN:ZDB-LINCRNAG-190" golr-index-contents.tgz 
Binary file (standard input) matches

It's in NEO, but not making it into Solr. Guesses:

  • somehow the Solr load is using a non-local neo
    • that seems right to me--we're possibly loading the "released" neo, not the one we're making here? some import trickery?
  • owltools is somehow really screwing things up

@kltm
Copy link
Member

kltm commented Mar 12, 2020

Déjà vu.

@kltm
Copy link
Member

kltm commented Mar 12, 2020

I think this is the issue:
https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L53-L65
The "official" neo that we're bringing building the index from is not the one we're making in this pipeline, thus the reason we're so far out of phase.

@kltm
Copy link
Member

kltm commented Mar 12, 2020

Okay our go-lego.owl is the materialized one, so no help there:
Thus:
http://snapshot.geneontology.org/ontology/extensions/go-lego-edit.ofn
taking us to:
Import(http://purl.obolibrary.org/obo/go/noctua/neo.owl)
resolving to
http://build-artifacts.berkeleybop.org/build-noctua-entity-ontology/latest/neo.owl
which ends in cloudfront...okay, so what is going on then?

@kltm
Copy link
Member

kltm commented Mar 12, 2020

Well, that's right:

sjcarbon@moiraine:/tmp$:) wget http://go-build.s3.amazonaws.com/build-noctua-entity-ontology/latest/neo.owl 
sjcarbon@moiraine:/tmp$:) grep "ZFIN:ZDB-LINCRNAG-190911-1" neo.owl
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ZFIN:ZDB-LINCRNAG-190911-1</oboInOwl:hasDbXref>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ZFIN:ZDB-LINCRNAG-190911-1</oboInOwl:id>

Same as http://build-artifacts.berkeleybop.org/build-noctua-entity-ontology/latest/neo.owl, so the disconnect is elsewhere...

kltm added a commit to geneontology/pipeline that referenced this issue Mar 12, 2020
Explore short-circuiting for not getting latest NEO
#35
Also: geneontology/neo#53 geneontology/neo#52 geneontology/neo#51
@kltm kltm moved this from TODO to In progress in DONE 2020-05 (Berkeley) Data Release Pipeline Mar 12, 2020
@kltm
Copy link
Member

kltm commented Mar 12, 2020

Internal-only build matches:

sjcarbon@moiraine:/tmp$:( zgrep -b "ZFIN:ZDB-LINCRNAG-190911-1" golr-index-contents.tgz.1 

So...that's great, but NEO itself is not the only thing we need--ontology terms, etc.--I essentially need to expose the contents of :

Import(<http://purl.obolibrary.org/obo/ro.owl>)
Import(<http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-caro.owl>)
Import(<http://purl.obolibrary.org/obo/go/extensions/legorel.owl>)
Import(<http://purl.obolibrary.org/obo/go/extensions/go-plus.owl>)
Import(<http://purl.obolibrary.org/obo/go/extensions/go-bfo-bridge.owl>)
Import(<http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl>)
Import(<http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim-disjoint-over-in-taxon.owl>)
Import(<http://purl.obolibrary.org/obo/eco/eco-basic.owl>)
Import(<http://purl.obolibrary.org/obo/wbbt.owl>)
Import(<http://purl.obolibrary.org/obo/wbphenotype/wbphenotype-base.owl>)
Import(<http://purl.obolibrary.org/obo/wbphenotype/imports/wbls_import.owl>)
Import(<http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-wbbt.owl>)
Import(<http://purl.obolibrary.org/obo/uberon/bridge/cl-bridge-to-wbbt.owl>)
Import(<http://purl.obolibrary.org/obo/ddanat.owl>)
Import(<http://purl.obolibrary.org/obo/zfa.owl>)
Import(<http://purl.obolibrary.org/obo/emapa.owl>)
Import(<http://purl.obolibrary.org/obo/go/noctua/neo.owl>)

except with the last URL the local...or just go ahead and make this job the "official" producer of NEO.

@goodb
Copy link

goodb commented Mar 12, 2020

@kltm Why don't you build the needed solr index (golr) from the merged go_lego file produced by the ontology build? e.g. the big monster ontology that you get when you resolve http://purl.obolibrary.org/obo/go/extensions/go-lego.owl

Then you would just have one ontology file to keep track of and we would be closer to having the type-aheads matching the ontologies loaded in minerva. This would also synchronize the tbox part of the RDF store with everything else.

Note that there seems to be a problem with that build right now (minimally missing an important subclass relation between protein and information biomacromolecule). But at least we would have one place to look and work on fixes and tests. Tagging @balhoff

Is there some reason, right now, that GOLR needs a different set of terms from Minerva or the RDF store? (If we do ever successfully escape from having NEO in Minerva we may deal with things differently on the Minerva side but it seems like we need to patch up the existing system first and that everything else here is going to go forward based on the monster merged ontology known as go_lego).

@kltm
Copy link
Member

kltm commented Mar 19, 2020

@sabrinatoro Could you check to see if these are available now?

@sabrinatoro
Copy link
Author

@kltm Yes, I am confirming that the genes reported in this tickets are now available in Noctua. Thank you!
I am closing this ticket (please reopen it if there was more to be done). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

4 participants