Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove recursive read locks #177

Merged
merged 6 commits into from
May 25, 2021

Conversation

migopsdinesh
Copy link

What does this do?

Removing recursive read locks, which are causing the scheduler get stuck

List any changes that modify/break current functionality

No behavioral changes

Have you included tests for your changes?

No Tests Included, the existing tests will sufficient

Did you document any new/modified functionality?

Notes

Would like to say thank you very much for the wonderful library, and we are planning to use this library in one of our tools.
While we are testing this library(Stress testing), we found that the scheduler engine is getting stuck due to recursive read locks in scheduler.go: jobPresent() function. In this jobPresent() function, we are calling read lock 2 times, one is using direct call, and other one is using s.Jobs() function.

After going through the code, I do not believe we need explicit read lock call in jobPresent(), and once we remove the explicit read locks(This is the PR), the scheduler is not getting stuck.

To demonstrate this behavior, we tried the same with below simple go program(go version 1.16).


package main

var lock sync.RWMutex

func init() {
	lock = sync.RWMutex{}
}

func Reader1() {
	lock.RLock()
	defer lock.RUnlock()

	fmt.Println("reader1")
}

func Reader2() {
	lock.RLock()
	defer lock.RUnlock()
	Reader1()
	fmt.Println("reader2")
}

func Writer1() {
	lock.Lock()
	defer lock.Unlock()
	fmt.Println("writer1")
}

func main() {

	go func() {
		for {
			Reader2()
		}
	}()

	fmt.Println("acquire write lock after 2 seconds")
	time.Sleep(2 * time.Second)

	go func() {
		for {
			Writer1()
		}
	}()

	time.Sleep(100 * time.Second)
}
$ go build . ; ./main
...
...
...
reader1
reader2
writer1 (Stuck here)

Here, it is getting stuck. Now, remove
lock.RLock()
defer lock.RUnlock()
from Reader2() and then build, and run, and the process won't get stuck.

This is what is happening with scheduler too, and it is getting stuck when we do the stress testing.
Hence, proposing a fix for this problem where we removed the unnecessary/recursive read locks.

Once again, thank you so much for the wonderful library.

@migopsdinesh migopsdinesh changed the title Remove duplicate read Remove reursive read locks May 16, 2021
@migopsdinesh migopsdinesh changed the title Remove reursive read locks Remove recursive read locks May 16, 2021
Copy link
Contributor

@JohnRoesler JohnRoesler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your kind words and the contribution!

@@ -576,8 +574,6 @@ func (s *Scheduler) TaskPresent(j interface{}) bool {
}

func (s *Scheduler) jobPresent(j *Job) bool {
s.jobsMutex.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looks like removing this lock is breaking the Update() func. Since it’s a private function we should be able to control any reads to it and if the race you’re seeing is related we can accomplish a different way

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohnRoesler, Yes jobPresent() is a private function and the read access of it is already controlled by s.Job() function.
An another way to solve this problem is, have another mutex which can be utilized but I don't see it is required though.

I am unable to locate the Update() func usage about this lock, and it is already using the s.Job() function to get a snapshot of the current running jobs. Please correct me if I am wrong, and happy to submit a new patch for this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a trace in the test results when running with test -race. You can see it in the actions output. I’m traveling and on mobile so I can’t dog on at the moment, but if you look there you’ll see why it’s failing the test with your change

@JohnRoesler JohnRoesler merged commit 092c5ec into go-co-op:master May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants