-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize / rework problematic SPARQL query in API in pathways widget #3
Comments
Here is the fast version of the query. Note that the gene product ID needs to be injected in two different lines within the query: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pr: <http://purl.org/ontology/prv/core#>
PREFIX metago: <http://model.geneontology.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX providedBy: <http://purl.org/pav/providedBy>
PREFIX MF: <http://purl.obolibrary.org/obo/GO_0003674>
PREFIX causally_upstream_of_or_within: <http://purl.obolibrary.org/obo/RO_0002418>
PREFIX causally_upstream_of_or_within_negative_effect: <http://purl.obolibrary.org/obo/RO_0004046>
PREFIX causally_upstream_of_or_within_positive_effect: <http://purl.obolibrary.org/obo/RO_0004047>
PREFIX causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002411>
PREFIX causally_upstream_of_negative_effect: <http://purl.obolibrary.org/obo/RO_0002305>
PREFIX causally_upstream_of_positive_effect: <http://purl.obolibrary.org/obo/RO_0002304>
PREFIX regulates: <http://purl.obolibrary.org/obo/RO_0002211>
PREFIX negatively_regulates: <http://purl.obolibrary.org/obo/RO_0002212>
PREFIX positively_regulates: <http://purl.obolibrary.org/obo/RO_0002213>
PREFIX directly_regulates: <http://purl.obolibrary.org/obo/RO_0002578>
PREFIX directly_positively_regulates: <http://purl.obolibrary.org/obo/RO_0002629>
PREFIX directly_negatively_regulates: <http://purl.obolibrary.org/obo/RO_0002630>
PREFIX directly_activates: <http://purl.obolibrary.org/obo/RO_0002406>
PREFIX indirectly_activates: <http://purl.obolibrary.org/obo/RO_0002407>
PREFIX directly_inhibits: <http://purl.obolibrary.org/obo/RO_0002408>
PREFIX indirectly_inhibits: <http://purl.obolibrary.org/obo/RO_0002409>
PREFIX transitively_provides_input_for: <http://purl.obolibrary.org/obo/RO_0002414>
PREFIX immediately_causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002412>
PREFIX directly_provides_input_for: <http://purl.obolibrary.org/obo/RO_0002413>
PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
PREFIX hint: <http://www.bigdata.com/queryHints#>
SELECT DISTINCT ?gocam ?title
WHERE {
GRAPH ?gocam {
# Inject gene product ID here
?gene rdf:type <http://identifiers.org/rgd/1586460> .
}
?gocam metago:graphType metago:noctuaCam .
?gocam dc:title ?title .
GRAPH ?gocam {
VALUES ?causal { causally_upstream_of_or_within: causally_upstream_of_or_within_negative_effect: causally_upstream_of_or_within_positive_effect: causally_upstream_of: causally_upstream_of_negative_effect: causally_upstream_of_positive_effect: regulates: negatively_regulates: positively_regulates: directly_regulates: directly_positively_regulates: directly_negatively_regulates: directly_activates: indirectly_activates: directly_inhibits: indirectly_inhibits: transitively_provides_input_for: immediately_causally_upstream_of: directly_provides_input_for: }
VALUES ?causal2 { causally_upstream_of_or_within: causally_upstream_of_or_within_negative_effect: causally_upstream_of_or_within_positive_effect: causally_upstream_of: causally_upstream_of_negative_effect: causally_upstream_of_positive_effect: regulates: negatively_regulates: positively_regulates: directly_regulates: directly_positively_regulates: directly_negatively_regulates: directly_activates: indirectly_activates: directly_inhibits: indirectly_inhibits: transitively_provides_input_for: immediately_causally_upstream_of: directly_provides_input_for: }
#?s enabled_by: ?gpnode . # I don't think this is needed for this query
#?gpnode rdf:type ?identifier . # I don't think this is needed for this query
?ind1 enabled_by: ?gpnode1 .
?ind2 enabled_by: ?gpnode2 .
?ind3 enabled_by: ?gpnode3 .
?ind1 ?causal ?ind2 .
?ind2 ?causal2 ?ind3 .
?gpnode1 rdf:type ?gp1 .
?gpnode2 rdf:type ?gp2 .
?gpnode3 rdf:type ?gp3 .
# Inject gene product ID here
FILTER(?gp1 = <http://identifiers.org/rgd/1586460> || ?gp2 = <http://identifiers.org/rgd/1586460> || ?gp3 = <http://identifiers.org/rgd/1586460>)
}
FILTER EXISTS {
?ind1 rdf:type MF: .
}
FILTER EXISTS {
?ind2 rdf:type MF: .
}
FILTER EXISTS {
?ind3 rdf:type MF: .
}
}
ORDER BY ?gocam |
Thanks @balhoff ! I guess the next steps would be:
|
For people watching here, noting possible crossover with geneontology/helpdesk#366 |
@dustine32 Now has a test instance at https://api2.geneontology.xyz. His testing confirms that the queries time out when there is no other traffic. Next steps are likely:
|
@dustine32 To kick this off, if I wanted to update the code currently at api2, would I just make a PR (that then gets merged) on https://github.com/geneontology/api-gorest-2021? If so, how long might the propagation take (and what would be the best way to observe the progress (if possible with current permissions))? |
For geneontology/api-gorest#3 Trying to bring on @balhoff 's changes to the experimental codebase (geneontology/api-gorest#3 (comment))
@dustine32 Possibly a first attempt here? geneontology/api-gorest-2021#1 |
@kltm Yes! Thank you for PR-ing this. That's the route I would go. Then we'll merge to It would be cool if we could setup a branch-specific auto-deploy to a test or dev URL. Hm... |
Well, we have two at this point, right? api and api2. That should at least, for now, allow testing. Unless autodeploy for branches is easy, it might not be worth pursuing for the moment. |
That said, we can go either route on your determination. I'd ideally like to see if we can get this cleared before we all start wandering away, but more than that I'd like to make sure we don't bork the production API. |
@kltm Yep, I'm down with just getting this out using api2. I'll approve the PR in a sec. What I've read so far makes setting up a dev deploy pipeline (as I imagined it) look a bit tricky but, I think, doable. |
Before and after, from @dustine32 https://api.geneontology.xyz/gp/RGD:1586460/models?causalmf=true (~30 sec timeout) I guess the thing to do from here would be to do a little more testing to make sure we have similar/correct results between the two endpoints, then go to production. That may require a little help from @balhoff (or others setting up a local instance) so that we can actually run the old query to completion. |
That said, since we are basically looking at working vs non-working, I think that just a little spot checking would probably be enough. |
On 2021-12-17, we switched the api.geneontology.xyz DNS record in AWS to point to the new repo. Watching the traffic through the new API's dashboard, it's clear that the switchover took effect and the pathways GO-CAM count is being returned (seemingly) instantaneously on the Alliance gene pages (e.g. FB:FBgn0000499). |
Recently, with our migration to EC2 and move to a smaller machine, we've come to understand that some queries coming through the GO-CAM API to the SPARQL endpoint and both voluminous (possibly crawling bots) and are no longer able to meet the query timeout.
An example of the main problematic query is https://gist.github.com/kltm/19e2d12cde2aa2e1f3f92cebdf16d937
Note that the code for this is https://github.com/lpalbou/api-gorest-2020/blob/aee0b9bd1e8b6c7ea1c815cfd70e2f1972deb0d7/queries/sparql-gp.js#L67; we'll probably want to
After some discussion on the software call, steps here may be:
Secondarily, we'll want to make sure the devops processes are clear to roll this out into production without hiccups. (May want to try just redeployment first?) TBD. (Caveat here that we may run into other such queries. Good to get our practices down and document how we do this.)
@balhoff Already has some very (very) promising ideas here.
Tagging @tmushayahama @dustine32 @sierra-moxon @cmungall @vanaukenk @balhoff .
The text was updated successfully, but these errors were encountered: