-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU for week #484
Comments
Prod_machine.locks: 95/min |
So there is a spare lock that has been there for over a day, but not on the engine of interest. There was a 499 operation canceled (Timeout?) for GetWordGraph right before it all started. After restarting the engines, it all went back to normal. The locks are the ones that are doing something weird - 600 commands per second on the production locks... What is happening? |
From my investigation, it looks like there are a lot of commands being run, but I can't determine what the commands are. The translation engine whose call was canceled doesn't seem to exist in the database. I can't find any way that the current lock implementation would fire off so many commands. The recent (PR #486) that I fixed in the lock could have something to do with this. Without any more information, I am out of ideas. |
I think I found one way that a lock could get in a state where it keeps hammering the database with attempts to acquire the lock.
And another call tries to acquire a reader or writer lock, then it will hammer the database in a loop. I'm not sure how 1 could happen after our recent changes. PR #486 should make it so that 2 can't happen. |
I submitted a PR (#491) that might reduce the chances of this happening. |
Let's say this is resolved unless it comes back. |
What caused it?
Lots of DB queries. Could be locks or hangfire monitoring or something else.
The text was updated successfully, but these errors were encountered: