High response times and system load after adding data sets #2789

hackdna · 2018-05-24T15:31:03Z

Specific code commit: v1.6.3
Version of the web browser and OS: unknown
Environment where the error occurred: AWS

Steps to reproduce

It appears that this happens after adding several data sets.

Observed behavior

Site responds very slowly. High CPU utilization and system load.

ubuntu@ip-172-30-0-50:~$ uptime
 14:55:31 up 42 days, 17:29,  1 user,  load average: 6.66, 3.74, 1.64

Expected behavior

Site responds without delays. Normal CPU utilization and system load.

Notes

may be related to Optimize ISA-Tab import performance #463, "Exponential explosion" when trying to upload 2 line CSV #2277, Solr connection timeout error during data set import #2394 and CPU utilization continuously high #2965
high CPU utilization appears to be caused by Neo4j

The text was updated successfully, but these errors were encountered:

hackdna · 2018-07-18T19:50:35Z

@jkmarx reported the v1.6.5 test site being very slow
@scottx611x reported neo4j process "hogging resources" for an unknown reason
CPU utilization on the web instance was maxed out for several minutes at a time:

This is with only 2-4 users interacting with the site concurrently

jkmarx · 2018-07-27T11:44:45Z

@scottx611x Can use this issue as the Neo4J issue or should we write a separate issue?

Possible routes

Configuration possibility (SO look into it)
Drop per user search?
Proposal to update annotate at certain times (3am)

scottx611x · 2018-09-17T16:34:33Z

Potential avenues for a fix from Fritz:

Hi Nils, thanks to Scott we now know what the bottleneck with uploading new docs is: it's the creation of the annotation set hierarchy in Neo4J (https://github.com/refinery-platform/neo4j-ontology). What the tool basically does is creating sub-ontologies for every user based on the annotation of available datasets. Creating this ontology takes about 1sec for ~500 datasets for 1 users. Given that you guys have ~360 users it takes ~6mins. Back in the day there were no users so I didn't spend time optimizing and went the safe route (update the hierarchy for all users whenever a dataset was uploaded, shared, or deleted). Apparently this is no longer feasible given the increase in users.

I see 3 general ways to improve this:

Try to optimize my Neo4J plugin. I am sure there are ways to cleverly cache certain parts to cut down the computation time of a user's annotation set hierarchy. The traversal of along the Neo4J network might also be improvable. Someone with basically needs to get improve this tool here: https://github.com/refinery-platform/neo4j-ontology/blob/master/src/main/java/org/neo4j/ontology/server/unmanaged/AnnotationResource.java#L190
Depending on whether a new dataset is uploaded or shared not all users's annotation set hierarchy needs to be updated. Depending on the group of users who would have access only those users need to be updated. This might still take a long time because if you make a dataset public everyone needs to be updated. Also, even if everyone needs to be updated, if the dataset doesn't have a any new annotations the update in general is not needed. Hence, there's a lot of room for improvement but one needs to carefully catch every condition. (i.e., which user would be afftected by the action and does the action actually afffect the annotations)
A user's annotation set hierarchy doesn't need to be updated until the user is logged in actually. Since the individual computations is pretty fast (I think 1 sec waiting at the beginning is not that bad) it might be better to trigger the update of a user's annotation set hierarchy once the user actually logs in. Special care needs to be taken when the user is already logged in and what to do when a lot of people are logged in at the same time. hence, one could do something like: the user's annotation set hierarchy who uploaded / change the dataset gets updated immediately, everybody who is logged in will see an update button if something changed to update their annotation set hierarchy manually, and everybody else's annotation set hierarchy get updated once they log in.

scottx611x · 2018-11-07T19:03:10Z

Still an issue during release-1.6.7 testing

hackdna added bug important server performance labels May 24, 2018

hackdna added this to the Next milestone May 24, 2018

hackdna changed the title ~~High response times and system load~~ High response times and system load after adding data sets Jul 13, 2018

jkmarx modified the milestones: Next, Release 1.6.6 Jul 27, 2018

jkmarx assigned scottx611x Jul 27, 2018

jkmarx modified the milestones: Release 1.6.6, Release 1.6.7 Aug 28, 2018

hackdna mentioned this issue Nov 7, 2018

CPU utilization continuously high #2965

Closed

scottx611x mentioned this issue Nov 7, 2018

ISATab Metadata Import "Submit" button should be disabled after initial click #3097

Closed

jkmarx removed this from the Release 1.6.7 milestone Nov 20, 2018

hackdna added this to the Next milestone Mar 13, 2019

hackdna added the HSCI requested label Mar 13, 2019

hackdna modified the milestones: Next, Release 1.6.9 Mar 13, 2019

jkmarx mentioned this issue Apr 17, 2019

Jkmarx/remove neo4j #3330

Merged

jkmarx closed this as completed in #3330 Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High response times and system load after adding data sets #2789

High response times and system load after adding data sets #2789

hackdna commented May 24, 2018 •

edited

hackdna commented Jul 18, 2018

jkmarx commented Jul 27, 2018

scottx611x commented Sep 17, 2018

scottx611x commented Nov 7, 2018

High response times and system load after adding data sets #2789

High response times and system load after adding data sets #2789

Comments

hackdna commented May 24, 2018 • edited

Steps to reproduce

Observed behavior

Expected behavior

Notes

hackdna commented Jul 18, 2018

jkmarx commented Jul 27, 2018

Possible routes

scottx611x commented Sep 17, 2018

scottx611x commented Nov 7, 2018

hackdna commented May 24, 2018 •

edited