Threading manager stresstest and fixes #15470

hrydgard · 2022-04-08T10:06:09Z

Trying to investigate the problem from #15431 by stressing the thread manager and thread primitives.

Turns out there was both misuse of cond.wait (the callback should return whether to stop waiting, not whether to continue to do so), and additionally there was a subtle race condition between Notify and Wait which I fixed by locking the mutex.

The old "Event" structure was also affected by these bugs.

Was hoping to find the cause of the issue I looked at in #15431

…iting!)

unknownbrackets · 2022-04-09T05:07:53Z

Core/TextureReplacer.cpp

-			if (us == 0)
-				return false;
-			std::unique_lock<std::mutex> lock(mutex_);
-			cond_.wait_for(lock, std::chrono::microseconds(us), [&] { return !triggered_; });


Oops, I can't believe I didn't notice that. I'm sure I copied it from Event and changed the line, but even so.

-[Unknown]

unknownbrackets · 2022-04-09T05:20:40Z

Common/Thread/Waitable.h

+		std::unique_lock<std::mutex> lock(mutex_);
+		if (!triggered_) {


Hm, just wondering, what's the race here?

Considering the old code:

If triggered is true, no lock and it bails. It can't be un-triggered, so this is safe.

If triggered is false and stays false, then we lock and check triggered again (within the cond predicate), and then wait. When trigger happens, we wake.

If triggered is false and changes immediately before the lock, we still lock and check triggered again in the pred. This happens before waiting, within the lock, so triggered is now true and we don't wait.

If triggered is false but doesn't change until after the lock, then Notify() will be blocked until we enter the wait, and then the notify_all() will wake and recheck pred.

As long as Notify locks, in theory we could wrap all these functions in if (!triggered_) { ... } as they are now, without a lock, because it gets rechecked again and can only become true once. Trying to figure out where the race condition is that I'm missing?

-[Unknown]

There seems to be a race with destruction. If you go back to the old code and run it (on a machine with enough cores) it reproduces fairly easily. I think it's something like:

Notify gets called, locks, and reaches triggered_ = true;

The original thread enters Wait, and immediately exits and before the first thread has unlocked the mutex, it deletes the LimitedWaitable.

Thus we get a crash on "mutex deleted when locked".

Hm, ok. I guess could just put a lock in the destructor instead.

-[Unknown]

Yeah, could be a possibility. Feel free to make a PR :)

Well, 12 threads must not be enough to reproduce I guess (at least with unittest threadmanager), but I think there's additional safety to be had in the destruct case, so I'll open.

-[Unknown]

hrydgard added 3 commits April 8, 2022 11:41

Add a scheduling stress test to TestThreadManager.cpp.

b04e592

Was hoping to find the cause of the issue I looked at in #15431

Time the stress test

e0f7350

OK, this does crash

5b20ace

hrydgard added the Multithreading label Apr 8, 2022

hrydgard added this to the v1.13.0 milestone Apr 8, 2022

hrydgard added 2 commits April 8, 2022 12:28

Fix misuses of cond.wait (should return true when you want to stop wa…

bde54cc

…iting!)

Fix race condition in LimitedWaitable between Notify and Wait

adfce57

hrydgard marked this pull request as ready for review April 8, 2022 10:35

hrydgard changed the title ~~Threading manager stresstest~~ Threading manager stresstest and fixes Apr 8, 2022

unknownbrackets approved these changes Apr 9, 2022

View reviewed changes

hrydgard merged commit 5b58b69 into master Apr 9, 2022

hrydgard deleted the threading-manager-stresstest branch April 9, 2022 07:21

unknownbrackets mentioned this pull request Apr 9, 2022

ThreadManager: Improve waitable destruction #15472

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threading manager stresstest and fixes #15470

Threading manager stresstest and fixes #15470

hrydgard commented Apr 8, 2022 •

edited

Loading

unknownbrackets Apr 9, 2022

unknownbrackets Apr 9, 2022

hrydgard Apr 9, 2022

unknownbrackets Apr 9, 2022

hrydgard Apr 9, 2022

unknownbrackets Apr 9, 2022

		std::unique_lock<std::mutex> lock(mutex_);
		if (!triggered_) {

Threading manager stresstest and fixes #15470

Threading manager stresstest and fixes #15470

Conversation

hrydgard commented Apr 8, 2022 • edited Loading

unknownbrackets Apr 9, 2022

Choose a reason for hiding this comment

unknownbrackets Apr 9, 2022

Choose a reason for hiding this comment

hrydgard Apr 9, 2022

Choose a reason for hiding this comment

unknownbrackets Apr 9, 2022

Choose a reason for hiding this comment

hrydgard Apr 9, 2022

Choose a reason for hiding this comment

unknownbrackets Apr 9, 2022

Choose a reason for hiding this comment

hrydgard commented Apr 8, 2022 •

edited

Loading