Prevent endless loop in lock cycle detection #1510 #1635

trancexpress · 2022-06-10T15:37:20Z

This change prevents an endless loop in: ReentrantCycleDetectingLock.detectPotentialLocksCycle()

Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized,
its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread)
and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn).
In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle().

With this change detectPotentialLocksCycle() removes the lock owning thread from
ReentrantCycleDetectingLock.lockOwnerThread,
if it detects that "this" lock is both waited on and owned by the same thread.
This prevents the endless loop during cycle detection.

Fix for: #1510

google-cla · 2022-06-10T15:37:25Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

trancexpress · 2022-06-10T15:41:51Z

I signed the contributor agreement, no idea when the bot will detect this (or whether it must be re-triggered).

trancexpress · 2022-06-10T17:58:47Z

OK I understand the problem now. ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() has the following code:

        final Thread currentThread = Thread.currentThread();
        synchronized (CycleDetectingLockFactory.class) {
          checkState();
          // Add this lock to the waiting map to ensure it is included in any reported lock cycle.
          lockThreadIsWaitingOn.put(currentThread, this);
          ListMultimap<Thread, ID> locksInCycle = detectPotentialLocksCycle();
          if (!locksInCycle.isEmpty()) {
            // We aren't actually going to wait for this lock, so remove it from the map.
            lockThreadIsWaitingOn.remove(currentThread);
            // potential deadlock is found, we don't try to take this lock
            return locksInCycle;
          }
        }

        // this may be blocking, but we don't expect it to cause a deadlock
        lockImplementation.lock();

        synchronized (CycleDetectingLockFactory.class) {
          // current thread is no longer waiting on this lock
          lockThreadIsWaitingOn.remove(currentThread);
          checkState();

          // mark it as owned by us
          lockOwnerThread = currentThread;
          lockReentranceCount++;
          // add this lock to the list of locks owned by a current thread
          locksOwnedByThread.put(currentThread, this);
        }

What this code (I assume) wants to do is:

add the current thread and "this" lock to lockThreadIsWaitingOn
check for cycles
remove the current thread and "this" from lockThreadIsWaitingOn

And specifically, during 2.:

2.1. calls detectPotentialLocksCycle() and then calls addAllLockIdsAfter()
2.2. the two methods will loop forever, if they iterate over a thread that both owns a lock and waits on that lock, see the respective code:

        /* in detectPotentialLocksCycle() */
        ReentrantCycleDetectingLock<?> lockOwnerWaitingOn = this;
        // try to find a dependency path between lock's owner thread and a current thread
        while (lockOwnerWaitingOn != null && lockOwnerWaitingOn.lockOwnerThread != null) {
          Thread threadOwnerThreadWaits = lockOwnerWaitingOn.lockOwnerThread;
          // in case locks cycle exists lock we're waiting for is part of it
          lockOwnerWaitingOn =
              addAllLockIdsAfter(threadOwnerThreadWaits, lockOwnerWaitingOn, potentialLocksCycle);

        /* in addAllLockIdsAfter() */
        ReentrantCycleDetectingLock<?> unownedLock = lockThreadIsWaitingOn.get(thread);
        // If this thread is waiting for a lock add it to the cycle and return it
        if (unownedLock != null && unownedLock.lockFactory == this.lockFactory) {
          @SuppressWarnings("unchecked")
          ID typed = (ID) unownedLock.userLockId;
          potentialLocksCycle.put(thread, typed);
        }
        return unownedLock;

If the entire sequence was done in 1 synchronized block, there wouldn't be a problem. However, CycleDetectingLockFactory.class is locked to first add to lockThreadIsWaitingOn and check for cycles. Then the synchronized block is left, and another synchronized block is entered, to remove from lockThreadIsWaitingOn.

So what can occur (and occurs in the reproducer), is that 1 thread adds to lockThreadIsWaitingOn with a lock and then acquires the lock (and is so owning the lock). Then another thread does a cycle check, wanting to acquire the same lock. It sees the owner thread of the lock, tries to check for cycles with it and loops forever. While the first thread waits on removing from lockThreadIsWaitingOn - which would prevent the endless loop.

Unfortunately I don't know whether the adding the removal from lockThreadIsWaitingOn in the first synchronized block breaks other code. It might be safer to detect the bad case scenario and just bail out (as in the suggested workaround). I don't know what this means for the cycle detection though. The test added for #915 passes, but seeing that it misses the endless loop reported in #1510 maybe its not extensive enough to catch problems that this change introduces. Though IMO better to have cycle detection that didn't detect a cycle, then an endless loop that eventually results in an OOM.

iloveeclipse · 2022-06-10T21:35:34Z

core/src/com/google/inject/internal/CycleDetectingLock.java

@@ -249,6 +249,9 @@ private ListMultimap<Thread, ID> detectPotentialLocksCycle() {
            MultimapBuilder.linkedHashKeys().arrayListValues().build();
        // lock that is a part of a potential locks cycle, starts with current lock
        ReentrantCycleDetectingLock<?> lockOwnerWaitingOn = this;
+        // Its possible the lock owner thread has not removed itself yet from the waiting-on-lock list.
+        // So we remove it here, to prevent an endless loop. See #1510.
+        lockThreadIsWaitingOn.remove(lockOwnerWaitingOn.lockOwnerThread);


this.remove(this.lockOwnerThread); ?

You mean lockThreadIsWaitingOn.remove(this.lockOwnerThread)?

Sure, why not. I prefer to use lockOwnerWaitingOn after this is stored into it, but its whichever.

trancexpress · 2022-06-10T21:39:50Z

From my POV this is ready for a merge, IMO it fixes the concurrency problem in a reliable and simple way.

trancexpress · 2022-06-11T06:48:10Z

Looks like there are fails in the cycle detection test, will have to check why:

Tests in error: 
  testCycleDetectingLockFactoriesDoNotDeadlock(com.google.inject.internal.CycleDetectingLockTest)
  testCycleReporting(com.google.inject.internal.CycleDetectingLockTest)

…detectPotentialLocksCycle() Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized, its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread) and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn). In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle(). With this change detectPotentialLocksCycle() removes the lock owning thread from ReentrantCycleDetectingLock.lockOwnerThread, if it detects that "this" lock is both waited on and owned by the same thread. This prevents the endless loop during cycle detection. Fix for: google#1510

iloveeclipse · 2022-07-13T07:01:26Z

@andrewthehan, @sameb, @cpovirk, @markmarch, anyone : could we please move forward with review?

…would previously fail: * T1: Enter `lockOrDetectPotentialLocksCycle`: - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class - Lock the cycle detecting lock (CDL) - Lock CAF.class, mark itself as `lockOwnerThread`, remove itself from `lockThreadIsWaitingOn` - Exit `lockOrDetectPotentialLocksCycle` * T1: Re-enter `lockOrDetectPotentialLocksCycle`: - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class T2: Enter `lockOrDetectPotentialLocksCycle` - Lock CAF.class, invoke `detectPotentialLocksCycle`. At this point, `detectPotentialLocksCycle` will now loop forever, because the `lockOwnerThread` is also in `lockThreadIsWaitingOn`. During the course of looping forever, it will OOM, because it's building up an in-memory structure of what it thinks are cycles. The solution is to avoid the re-entrant T1 from adding itself to `lockThreadIsWaitingOn` if it's already the `lockOwnerThread`. It's guaranteed that it won't relinquish the lock concurrently, because it's the exact same thread that owns it. Fixes #1635 and Fixes #1510 PiperOrigin-RevId: 524376697

trancexpress marked this pull request as ready for review June 10, 2022 21:15

trancexpress force-pushed the 1510 branch from a25812b to 2f46a53 Compare June 10, 2022 21:31

iloveeclipse reviewed Jun 10, 2022

View reviewed changes

trancexpress force-pushed the 1510 branch from 2f46a53 to 4e9022a Compare June 10, 2022 21:35

trancexpress force-pushed the 1510 branch from 4e9022a to f48f54c Compare June 11, 2022 07:59

iloveeclipse mentioned this pull request Aug 10, 2022

Is guice project dead? #1643

Closed

copybara-service bot mentioned this pull request Apr 18, 2023

Prevent endless loop in lock cycle detection. The following scenario would previously fail: #1681

Closed

copybara-service bot closed this in 8cf8c62 Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent endless loop in lock cycle detection #1510 #1635

Prevent endless loop in lock cycle detection #1510 #1635

trancexpress commented Jun 10, 2022 •

edited

Loading

google-cla bot commented Jun 10, 2022

trancexpress commented Jun 10, 2022

trancexpress commented Jun 10, 2022 •

edited

Loading

iloveeclipse Jun 10, 2022

trancexpress Jun 10, 2022 •

edited

Loading

trancexpress commented Jun 10, 2022

trancexpress commented Jun 11, 2022

iloveeclipse commented Jul 13, 2022

Prevent endless loop in lock cycle detection #1510 #1635

Prevent endless loop in lock cycle detection #1510 #1635

Conversation

trancexpress commented Jun 10, 2022 • edited Loading

google-cla bot commented Jun 10, 2022

trancexpress commented Jun 10, 2022

trancexpress commented Jun 10, 2022 • edited Loading

iloveeclipse Jun 10, 2022

Choose a reason for hiding this comment

trancexpress Jun 10, 2022 • edited Loading

Choose a reason for hiding this comment

trancexpress commented Jun 10, 2022

trancexpress commented Jun 11, 2022

iloveeclipse commented Jul 13, 2022

trancexpress commented Jun 10, 2022 •

edited

Loading

trancexpress commented Jun 10, 2022 •

edited

Loading

trancexpress Jun 10, 2022 •

edited

Loading