-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SPARQL required for GO-CAM website resource files #2
Comments
@dustine32 for Query 1, if you can do the grouping on the client side, this will complete in 18 seconds:
|
@dustine32 here is an 11.5 sec version of query 2:
|
Linked to geneontology#2 I see the timeout of the rdf endpoint has been increased, but the timeout of the API / requests here also needed to be increased. By doing so, I tested locally and the query now works
Query2I tried both versions of the query 2 and didn't find a speed improvement (old query: 27s then 34s on second run; new query: 29s then 39s). As shown, the time greatly varies based on when the server receives the query. It seems that the timeout of the rdf endpoint was increased (up to 60s would probably be a good idea for now ?), so I also increased the timeout of the GO-CAM API itself: #3 . If you merge this PR, the query 2 seems to run and this would solve the cache created on AWS/lambda as it uses https://api.geneontology.xyz/models/go. Query1For the query 1, I still have a server timeout at 30s.. unsure why this is not the case for query 2 ? Maybe some config to check on RDF server ? Indeed, removing the grouping on RDF side, gets a much faster query (10s). Note this is the query used to create the GPs cache on AWS/lambda: https://api.geneontology.xyz/models/gp . @dustine32 remember cloud9 ? Notes
Happy holiday season to all ! 🎄🎉 |
Whoa, thanks again @lpalbou for all the advice! I'm now leaning towards your first note suggestion (using blazegraph runner during the release) but of course I also have to try the easy way out short-term. Commit 5fe0a4b applies @balhoff's fix for Query1 ( I tried applying @balhoff's new Query2 but still ran into a timeout issue while testing the lambda locally:
Then I bumped this timeout from 30 to 60 sec in the Line 21 in 3f2e995
This change at least got me to the next error:
Looks like this 6MB limit is tied to an unchangeable AWS Lambda limit. There are some workarounds such as having the API immediately store the response payload in S3 then returning an S3 URL. This miiight work for us since our goal is to get it into S3 anyway, but it probably won't work for external users (then again, this route has been broken for a while so...). Also, the effort to implement this workaround might as well be spent coding blazegraph-runner calls into the release pipeline. Tagging @kltm. |
Glad if it helps Dustin 🙂 . I do think a longer term solution would be blazegraph runner.. but in the mean time this may/should work. What I am really puzzled about is.. how come we reach a 6Mb payload limit ? That’s a lot, what are we sending ? From memory /models/go or /models/gp would worst case scenario send list of gocam ids.. and by default already does it for all.. so I am missing something here ? ps: the “Winston” article was a lot of fun. I love AWS but sometimes there are hard constraints that can really cause issue (eg code pipeline can’t target an existing GH repo 😅) |
[Note: documentation for manual hack of file update/upload while we work things out: https://docs.google.com/document/d/18vYy9sZq-dyjYWW0mnw3XpXRJjlI7pbQWvMlSSdXdjA/edit#heading=h.tzx1g6nhmgtd .] |
Closing in favor of geneontology/pipeline#265 |
Carrying on with the work to overcome timeout issues with SPARQL queries called by the GO-CAM API. Similar to how we improved the models-by-GP query in geneontology/api-gorest#3, we've still got two queries that are essential for the GO-CAM website to function but currently timing out after 30 seconds:
QUERY 1: This one is meant to get a GO-CAM-to-GP lookup file:
api-gorest-2021/queries/sparql-models.js
Lines 262 to 293 in 480092b
The actual query after resolving the
separator
from the config.json:QUERY 2: This one is meant to get a GO-CAM-to-GO-term lookup file:
api-gorest-2021/queries/sparql-models.js
Lines 219 to 259 in 480092b
Raw query:
@balhoff @kltm Any ideas how we can speed these up to return results in under 30 seconds? They don't need to run crazy fast as they typically only execute when triggered by a GO release (so ~once a month).
The text was updated successfully, but these errors were encountered: