Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in lock table locks #5566

Merged

Conversation

Woellchen
Copy link
Contributor

What problem are we solving?

This fixes a race condition in which multiple Goroutines release and acquire locks for the same file. Because the lock entries in the lock table and the counters in the lock entries themselves are independent and not synchronized, it can happen that two Goroutines acquire a lock and a third Goroutine releases the same lock in between them, leading to two lock entries being created. Those two lock entries would now wait forever on entry.cond.Wait() and can't proceed.

How are we solving the problem?

The fix is to track the locks that are in flight in the lock table itself, so that it knows when it is safe to release a lock before creating a second instance.

How is the PR tested?

The FUSE Mount step in the e2e workflow is failing. This PR should fix it and make it succeed again.

Checks

  • I have added unit tests if possible.
  • I will add related wiki document changes and link to this PR after merging.

This fixes a race condition in which multiple Goroutines release and
acquire locks for the same file. Because the lock entries in the
lock table and the counters in the lock entries themselves are
independent and not synchronized, it can happen that two Goroutines
acquire a lock and a third Goroutine releases the same lock in
between them, leading to two lock entries being created. Those two
lock entries would now wait forever on entry.cond.Wait() and can't
proceed. The fix is to track the locks that are in flight in the
lock table itself, so that it knows when it is safe to release a
lock before creating a second instance.
@chrislusf chrislusf merged commit 96c48bd into seaweedfs:master May 7, 2024
6 checks passed
@Woellchen Woellchen deleted the bugfix/deadlock-in-lock-table-locks branch May 7, 2024 13:00
@eliphatfs
Copy link

I have a stupid question: does this fix #5380?

@Woellchen
Copy link
Contributor Author

It sounds like it could be a possible fix for it, yes, because this bug led to unfinished IO requests and was blocking the kernel forever. However, I didn't encounter uninterruptible sleep during my tests, so I'm not 100% sure this is the same issue, but could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants