-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove recursive read locks #177
Remove recursive read locks #177
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your kind words and the contribution!
@@ -576,8 +574,6 @@ func (s *Scheduler) TaskPresent(j interface{}) bool { | |||
} | |||
|
|||
func (s *Scheduler) jobPresent(j *Job) bool { | |||
s.jobsMutex.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looks like removing this lock is breaking the Update() func. Since it’s a private function we should be able to control any reads to it and if the race you’re seeing is related we can accomplish a different way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohnRoesler, Yes jobPresent() is a private function and the read access of it is already controlled by s.Job() function.
An another way to solve this problem is, have another mutex which can be utilized but I don't see it is required though.
I am unable to locate the Update() func usage about this lock, and it is already using the s.Job() function to get a snapshot of the current running jobs. Please correct me if I am wrong, and happy to submit a new patch for this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trace in the test results when running with test -race. You can see it in the actions output. I’m traveling and on mobile so I can’t dog on at the moment, but if you look there you’ll see why it’s failing the test with your change
Using this approach, we are avoiding the recursive read locks
What does this do?
Removing recursive read locks, which are causing the scheduler get stuck
List any changes that modify/break current functionality
No behavioral changes
Have you included tests for your changes?
No Tests Included, the existing tests will sufficient
Did you document any new/modified functionality?
Notes
Would like to say thank you very much for the wonderful library, and we are planning to use this library in one of our tools.
While we are testing this library(Stress testing), we found that the scheduler engine is getting stuck due to recursive read locks in scheduler.go: jobPresent() function. In this jobPresent() function, we are calling read lock 2 times, one is using direct call, and other one is using s.Jobs() function.
After going through the code, I do not believe we need explicit read lock call in jobPresent(), and once we remove the explicit read locks(This is the PR), the scheduler is not getting stuck.
To demonstrate this behavior, we tried the same with below simple go program(go version 1.16).
Here, it is getting stuck. Now, remove
lock.RLock()
defer lock.RUnlock()
from Reader2() and then build, and run, and the process won't get stuck.
This is what is happening with scheduler too, and it is getting stuck when we do the stress testing.
Hence, proposing a fix for this problem where we removed the unnecessary/recursive read locks.
Once again, thank you so much for the wonderful library.