Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO-CAM to gaf/gpad should collate by MOD and only include production models; also need SynGO solution #431

Closed
cmungall opened this issue Sep 21, 2017 · 29 comments

Comments

@cmungall
Copy link
Member

https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/

has individual files per model, we should collate these by the MOD

@balhoff has a smarter way to do this but as a stopgap we can have a quick script that collates by col1.

@cmungall cmungall self-assigned this Sep 21, 2017
@cmungall
Copy link
Member Author

This is high priority for @tonysawfordebi

@vanaukenk
Copy link
Contributor

@cmungall
Do we want to collate by col1 or by taxon id?

@cmungall
Copy link
Member Author

I assume col1.

@vanaukenk
Copy link
Contributor

As a first pass, we will use col 1, since taxon is not in gpad.

@cmungall cmungall changed the title export-lego-to-gpad-sparql should collate by MOD export-lego-to-gpad-sparql should collate by MOD and only include production models Nov 22, 2017
cmungall added a commit to geneontology/noctua-models that referenced this issue Nov 22, 2017
@cmungall
Copy link
Member Author

@cmungall
Copy link
Member Author

TODO: production models only

@cmungall cmungall removed their assignment Jan 27, 2018
@cmungall
Copy link
Member Author

@ukemi
Copy link
Contributor

ukemi commented Jan 27, 2018

Once we have this filtered for production models only, I will request a test-load into MGI.

@ukemi
Copy link
Contributor

ukemi commented Jan 27, 2018

There is something strange in the output files. In https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/mgi.gpad/*view*/
This annotation exists:

MGI MGI:1925503 enables GO:0046577 r1 ECO:0006017 w1   20170926 GO_Noctua   noctua-model-id=gomodel:59c8885900000227|contributor=http://orcid.org/0000-0001-5501-853X
MGI MGI:1925503 involved_in GO:0044108 r2 ECO:0000501 w2   20180126 GO_Noctua   contributor=http://orcid.org/0000-0001-5501-853X|noctua-model-id=gomodel:5a5fc23a00000069

They are not in the model.

@balhoff
Copy link
Member

balhoff commented Jan 31, 2018

@ukemi, looking at the first line, I don't see that annotation in the MGI file or in the model GPAD. Am I missing the problem?

@ukemi
Copy link
Contributor

ukemi commented Jan 31, 2018

Hi @balhoff, can I plead jet-lag? I don't see it now either.

@dougli1sqrd
Copy link
Contributor

What graphstore is this query running against?

@dougli1sqrd
Copy link
Contributor

So we could (I imagine) pretty easily filter the results of the query in only production models. Production models can be found with the triple pattern:
?model <http://geneontology.org/lego/modelstate> "production"^^xsd:string .
But I'm unsure how this triplestore is organized. I'll probably need @balhoff's help.

The other way we could do this, and probably the way forward more in the future, is to put this in the Pipeline. We already have some infrastructure to run reasoned sparql queries against a blazegraph. This is part of the process of producing our production triple store. At that point in the pipeline it would be pretty easy using Jim's blazegraph-runner tool to also run these queries to create gpad.

@cmungall
Copy link
Member Author

cmungall commented Feb 1, 2018

@balhoff
Copy link
Member

balhoff commented Feb 1, 2018

The problem with editing the Minerva SPARQL is that it will affect the GPAD you can make from the Noctua UI. I can't work on this right now but I'll try to get to it soon.

@cmungall
Copy link
Member Author

cmungall commented Feb 1, 2018

ahh I see.

OK how about instead we just insert the model status in as a property of the annotation in the GPAD, then post-hoc filtering is easy

@balhoff
Copy link
Member

balhoff commented Feb 1, 2018

Yes, that sounds good.

@balhoff
Copy link
Member

balhoff commented Feb 8, 2018

Production status within GPAD is merged into Minerva now.

@ukemi
Copy link
Contributor

ukemi commented Feb 8, 2018

So will we filter this before releasing the files that the MODs pick up, or do we expect the MODs to filter the annotations at their sites?

@ukemi
Copy link
Contributor

ukemi commented Feb 8, 2018

Once we think the GPAD is in a consumable state, I will open a ticket here so that we can test load them into MGI with all the new qualifiers etc.

@kltm
Copy link
Member

kltm commented Mar 1, 2018

@cmungall Where should the expected output be?

@cmungall
Copy link
Member Author

cmungall commented Mar 1, 2018

@kltm
Copy link
Member

kltm commented Mar 1, 2018

Should moving these over to their "forever home" and getting them into the main pipeline be part of closure here?

@cmungall cmungall reopened this Mar 12, 2018
@cmungall
Copy link
Member Author

cmungall commented Mar 12, 2018

Reopened because we need to figure out how this works with SynGO.

Our currently strategy is

  1. Collate production models by col1 (eg FB, MGI, UniProt) and deposit here
  2. Have each responsible group in charge of importing the annotations into their database

Note that this means that if SynGO annotations are annotated to UniProtKB IDs, it is UniProtKB's job to bring the prod GPADs and spit them out as GAFs as part of their normal database export.

We can see the the uniprot export is heavily dominated by SynGO:

https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/uniprotkb.gpad/*view*/

(4987 prod assocs)

First: @tonysawfordebi - what is UniProtKB's plans for importing these?

We could sub-collate in some way so that for example we split groups based on contributor, but this gets very complex.

Alternatively, we can just inject them directly in to the release GAFs as we do for PAINT.

This diagram shows part of the overall dataflow:

https://docs.google.com/document/d/1uR_32I2PYwGl6wZcmENETBV1GsgVtDx_xiHv5fr4xWE/edit

@cmungall cmungall changed the title export-lego-to-gpad-sparql should collate by MOD and only include production models export-lego-to-gpad-sparql should collate by MOD and only include production models; also need SynGO solution Mar 13, 2018
@cmungall
Copy link
Member Author

Note: All collated GPADs are now production only:

https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/

image

As can be seen this is dominated by uniprotkb, which is dominated by SynGO-VU...

@cmungall
Copy link
Member Author

cmungall commented Mar 19, 2018

from @tonysawfordebi : we need a gpad header line

UPDATE will be added next jenkins run

@kltm
Copy link
Member

kltm commented Sep 12, 2018

Talking to @cmungall , we wanted to check the status on this item.
Thinking about porting this over to the new pipeline (with the related decreased frequency), would we even need this if we are injecting the Noctua models into our GAF/GPI/GPAD products?

@kltm
Copy link
Member

kltm commented Nov 10, 2018

Note that geneontology/pipeline#65 is now active, so any changes or progress here should either be done in both places or within the new pipeline.

@kltm kltm changed the title export-lego-to-gpad-sparql should collate by MOD and only include production models; also need SynGO solution GO-CAM to gaf/gpad should collate by MOD and only include production models; also need SynGO solution Nov 10, 2018
@kltm
Copy link
Member

kltm commented Nov 19, 2018

Current output from #65 seems to be production-only and by mod (using @cmungall 's script):
http://skyhook.berkeleybop.org/master/products/annotations/
I'll close this ticket--we can start a new ticket for new work or other changes that need to be made.

@kltm kltm closed this as completed Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants