Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High response times and system load after adding data sets #2789

Closed
hackdna opened this issue May 24, 2018 · 4 comments · Fixed by #3330
Closed

High response times and system load after adding data sets #2789

hackdna opened this issue May 24, 2018 · 4 comments · Fixed by #3330

Comments

@hackdna
Copy link
Member

hackdna commented May 24, 2018

  • Specific code commit: v1.6.3
  • Version of the web browser and OS: unknown
  • Environment where the error occurred: AWS

Steps to reproduce

It appears that this happens after adding several data sets.

Observed behavior

Site responds very slowly. High CPU utilization and system load.

ubuntu@ip-172-30-0-50:~$ uptime
 14:55:31 up 42 days, 17:29,  1 user,  load average: 6.66, 3.74, 1.64

screen shot 2018-05-16 at 15 13 07

screen shot 2018-05-17 at 11 20 26

screen shot 2018-05-16 at 14 52 00

Expected behavior

Site responds without delays. Normal CPU utilization and system load.

Notes

@hackdna hackdna added this to the Next milestone May 24, 2018
@hackdna hackdna changed the title High response times and system load High response times and system load after adding data sets Jul 13, 2018
@hackdna
Copy link
Member Author

hackdna commented Jul 18, 2018

@jkmarx reported the v1.6.5 test site being very slow
@scottx611x reported neo4j process "hogging resources" for an unknown reason
CPU utilization on the web instance was maxed out for several minutes at a time:
screen shot 2018-07-18 at 15 43 46
This is with only 2-4 users interacting with the site concurrently

@jkmarx jkmarx modified the milestones: Next, Release 1.6.6 Jul 27, 2018
@jkmarx
Copy link
Member

jkmarx commented Jul 27, 2018

@scottx611x Can use this issue as the Neo4J issue or should we write a separate issue?

Possible routes

  • Configuration possibility (SO look into it)
  • Drop per user search?
  • Proposal to update annotate at certain times (3am)

@scottx611x
Copy link
Member

Potential avenues for a fix from Fritz:

Hi Nils, thanks to Scott we now know what the bottleneck with uploading new docs is: it's the creation of the annotation set hierarchy in Neo4J (https://github.com/refinery-platform/neo4j-ontology). What the tool basically does is creating sub-ontologies for every user based on the annotation of available datasets. Creating this ontology takes about 1sec for ~500 datasets for 1 users. Given that you guys have ~360 users it takes ~6mins. Back in the day there were no users so I didn't spend time optimizing and went the safe route (update the hierarchy for all users whenever a dataset was uploaded, shared, or deleted). Apparently this is no longer feasible given the increase in users.

I see 3 general ways to improve this:

  1. Try to optimize my Neo4J plugin. I am sure there are ways to cleverly cache certain parts to cut down the computation time of a user's annotation set hierarchy. The traversal of along the Neo4J network might also be improvable. Someone with basically needs to get improve this tool here: https://github.com/refinery-platform/neo4j-ontology/blob/master/src/main/java/org/neo4j/ontology/server/unmanaged/AnnotationResource.java#L190
  2. Depending on whether a new dataset is uploaded or shared not all users's annotation set hierarchy needs to be updated. Depending on the group of users who would have access only those users need to be updated. This might still take a long time because if you make a dataset public everyone needs to be updated. Also, even if everyone needs to be updated, if the dataset doesn't have a any new annotations the update in general is not needed. Hence, there's a lot of room for improvement but one needs to carefully catch every condition. (i.e., which user would be afftected by the action and does the action actually afffect the annotations)
  3. A user's annotation set hierarchy doesn't need to be updated until the user is logged in actually. Since the individual computations is pretty fast (I think 1 sec waiting at the beginning is not that bad) it might be better to trigger the update of a user's annotation set hierarchy once the user actually logs in. Special care needs to be taken when the user is already logged in and what to do when a lot of people are logged in at the same time. hence, one could do something like: the user's annotation set hierarchy who uploaded / change the dataset gets updated immediately, everybody who is logged in will see an update button if something changed to update their annotation set hierarchy manually, and everybody else's annotation set hierarchy get updated once they log in.

@scottx611x
Copy link
Member

Still an issue during release-1.6.7 testing
screen shot 2018-11-07 at 2 02 05 pm

@jkmarx jkmarx removed this from the Release 1.6.7 milestone Nov 20, 2018
@hackdna hackdna added this to the Next milestone Mar 13, 2019
@hackdna hackdna modified the milestones: Next, Release 1.6.9 Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants