Optimize / rework problematic SPARQL query in API in /go-cam site #4

kltm · 2021-11-23T00:11:47Z

Noting that there is a lot of similarity with #3, but that this is a separate issue.

Recently, with our migration to EC2 and move to a smaller machine, we've come to understand that some queries coming through the GO-CAM API to the SPARQL endpoint are no longer able to meet the query timeout.

An example of the main problematic query is https://github.com/lpalbou/api-gorest-2020/blob/aee0b9bd1e8b6c7ea1c815cfd70e2f1972deb0d7/queries/sparql-models.js#L219
Note that the code for this is https://github.com/lpalbou/api-gorest-2020/blob/aee0b9bd1e8b6c7ea1c815cfd70e2f1972deb0d7/queries/sparql-gp.js#L67; we'll probably want to

We're still looking at how to proceed, but the steps here may be:

determine if optimizations for the query are possible
- direct optimization
- splitting, with coordination from the UI
otherwise, explore caching in the API
test that fixes are producing same results as current, or that we want to change current

Secondarily, we'll want to make sure the devops processes are clear to roll this out into production without hiccups. (May want to try just redeployment first?) TBD. (Caveat here that we may run into other such queries. Good to get our practices down and document how we do this.)

@balhoff Would it be possible for you to look at the SPARQL for this one as well?

Tagging @tmushayahama @dustine32 @sierra-moxon @cmungall @vanaukenk @balhoff .

balhoff · 2021-11-29T20:16:40Z

I'm finding this one much harder to improve.

lpalbou · 2021-12-02T23:50:02Z

Hi, just passing by quickly.

I don’t know about the latest modifications but the first query was cached in a compressed json and loaded by the website. The idea is that for a given month, the result of that (very frequent) query is always the same, so it’s ideal to cache.

the second one is more problematic and more recent. It was created to find the models with at least 1 MF connected by one in and one out causal relationship to other MFs. This one could possibly be optimized but is harder to cache as the input is the is of a gene. Harder but not impossible since we probably only have around 2k genes so we could run and cache those 2k queries. Especially if you were to use something like memcache or redis.. but a json on cdn would do the trick too.

ideally.. we were also discussing to index gocams in golr. That was my initial thought for future evolution of this code and this will probably become more and more important as the resource grows. FYI I have discussed with a few other people in the RDF world and because of those speed issues, they cache everything every night.

Hope this helps a little and hope everyone is doing great - Laurent-Phillipe

kltm · 2022-01-12T02:34:40Z

There are now no longer any more optimizations that need to added that aren't going to be dealt with by another mechanism.
Thank you everybody!

kltm added this to In progress in Software essential and proactive maintenance Nov 23, 2021

kltm mentioned this issue Nov 23, 2021

GO-CAM model browser on 'geneontology.cloud' not loading. geneontology/helpdesk#366

Closed

kltm mentioned this issue Dec 21, 2021

The GO-CAM browser just spins--no display created geneontology/web-gocam#17

Closed

kltm closed this as completed Jan 12, 2022

kltm moved this from In progress to Done in Software essential and proactive maintenance Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize / rework problematic SPARQL query in API in /go-cam site #4

Optimize / rework problematic SPARQL query in API in /go-cam site #4

kltm commented Nov 23, 2021

balhoff commented Nov 29, 2021

lpalbou commented Dec 2, 2021

kltm commented Jan 12, 2022

Optimize / rework problematic SPARQL query in API in /go-cam site #4

Optimize / rework problematic SPARQL query in API in /go-cam site #4

Comments

kltm commented Nov 23, 2021

balhoff commented Nov 29, 2021

lpalbou commented Dec 2, 2021

kltm commented Jan 12, 2022