-
Notifications
You must be signed in to change notification settings - Fork 89
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GO-CAM to gaf/gpad should collate by MOD and only include production models; also need SynGO solution #431
Comments
This is high priority for @tonysawfordebi |
@cmungall |
I assume col1. |
As a first pass, we will use col 1, since taxon is not in gpad. |
TODO: production models only |
Will require a change to https://github.com/geneontology/minerva/blob/master/minerva-converter/src/main/resources/org/geneontology/minerva/legacy/sparql/gpad-with-evidence.rq I can do this but if you know how to do it quickly @dougli1sqrd or @balhoff |
Once we have this filtered for production models only, I will request a test-load into MGI. |
There is something strange in the output files. In https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/mgi.gpad/*view*/
They are not in the model. |
@ukemi, looking at the first line, I don't see that annotation in the MGI file or in the model GPAD. Am I missing the problem? |
Hi @balhoff, can I plead jet-lag? I don't see it now either. |
What graphstore is this query running against? |
So we could (I imagine) pretty easily filter the results of the query in only production models. Production models can be found with the triple pattern: The other way we could do this, and probably the way forward more in the future, is to put this in the Pipeline. We already have some infrastructure to run reasoned sparql queries against a blazegraph. This is part of the process of producing our production triple store. At that point in the pipeline it would be pretty easy using Jim's blazegraph-runner tool to also run these queries to create gpad. |
@dougli1sqrd I believe this is the query you would edit: |
The problem with editing the Minerva SPARQL is that it will affect the GPAD you can make from the Noctua UI. I can't work on this right now but I'll try to get to it soon. |
ahh I see. OK how about instead we just insert the model status in as a property of the annotation in the GPAD, then post-hoc filtering is easy |
Yes, that sounds good. |
Production status within GPAD is merged into Minerva now. |
So will we filter this before releasing the files that the MODs pick up, or do we expect the MODs to filter the annotations at their sites? |
Once we think the GPAD is in a consumable state, I will open a ticket here so that we can test load them into MGI with all the new qualifiers etc. |
@cmungall Where should the expected output be? |
https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/ - as soon as the job completes |
Should moving these over to their "forever home" and getting them into the main pipeline be part of closure here? |
Reopened because we need to figure out how this works with SynGO. Our currently strategy is
Note that this means that if SynGO annotations are annotated to UniProtKB IDs, it is UniProtKB's job to bring the prod GPADs and spit them out as GAFs as part of their normal database export. We can see the the uniprot export is heavily dominated by SynGO: (4987 prod assocs) First: @tonysawfordebi - what is UniProtKB's plans for importing these? We could sub-collate in some way so that for example we split groups based on contributor, but this gets very complex. Alternatively, we can just inject them directly in to the release GAFs as we do for PAINT. This diagram shows part of the overall dataflow: https://docs.google.com/document/d/1uR_32I2PYwGl6wZcmENETBV1GsgVtDx_xiHv5fr4xWE/edit |
Note: All collated GPADs are now production only: https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/ As can be seen this is dominated by uniprotkb, which is dominated by SynGO-VU... |
from @tonysawfordebi : we need a gpad header line UPDATE will be added next jenkins run |
Talking to @cmungall , we wanted to check the status on this item. |
Note that geneontology/pipeline#65 is now active, so any changes or progress here should either be done in both places or within the new pipeline. |
Current output from #65 seems to be production-only and by mod (using @cmungall 's script): |
https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/
has individual files per model, we should collate these by the MOD
@balhoff has a smarter way to do this but as a stopgap we can have a quick script that collates by col1.
The text was updated successfully, but these errors were encountered: