-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should GPAD association writer in ontobio, use GPI files and isoform protein identifiers in associations to modify the subject of annotation in GPAD output? #36
Comments
@kltm Is wondering if this use case can be covered by extension or property? |
from managers call: discussion:
sierra is already checking the GPI in the original conversion from uniprot->MGI + PR |
examples from Lori (PAINT still have UniProt as the subject) - Sierra does not handle PAINT validation to the GPI.
this should be another issue somewhere else, not in this "upstream remainders" project. - @LiNiMGI will handle this :) |
@sierra-moxon What is the action here? |
Hi @pgaudet - this was the action for this ticket, and the fix is in the works in my branch of ontobio used for this project:
the UniProt comment "(PAINT still have UniProt as the subject)" was another topic that came up tangentially while we were talking about this ticket and so I captured it as aside. I do not know the answer to where this is going to be handled, but Li will have hopefully filed it as an issue elsewhere. |
At the moment MGI filter those PAINT (PAINT still have UniProt as the subject) annotations out. @pgaudet we can talk more tomorrow and see what we can do about it. According to Dustin, the conversion from UniProt to MGI for PAINT annotations is done on the PANTHER side and is tied to likely older data (Reference proteome/QfO releases) than is current in MGI. |
new file generated with fixes: http://skyhook.berkeleybop.org/full-issue-325-gopreprocess/annotations/mgi.gpad.gz SMoxon@SMoxon-M82 pipeline % grep "A2ASQ1-2" ~/Downloads/mgi_0318_24.gpad |
From Li's comments here: geneontology/go-site#2043
It looks like we should add code to ontobio so that we can produce GPADs with protein subject identifiers when GAF annotations have isoform identifiers that match ids in the associated GPI file. This is a medium-ish change to the GPAD association writer and would result in GPAD and GAF annotation files with different subjects.
tagging @kltm
snipped from the other ticket for ease of understanding:
==========================================
in the GAF file I produce:
Per David above:
this is what I think it looks like in the final GPAD:
Thanks Sierra @sierra-moxon
Gaf file looks good!
My understanding is when there are isoform information in the gaf (last column of gaf), we will use the isoform PR:ID as the DB Object ID in the first column of GPAD? Am I right @ukemi ?
So final GPAD will looks like:
PR:A2ASQ1-2 RO:0002327 GO:0005201 PMID:22159717 ECO:0000245
The text was updated successfully, but these errors were encountered: