New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[threads] Use refcounts for coordinating finalization and detaching #12391
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Reverts a29ad08 (mono#9914) The basic problem we want to solve is the following: 1. All access to InternalThread:state must be protected by the InternalThread:synch_cs mutex 2. We must destroy the mutex when we are done with the thread. 3. We don't know which happens later - detaching the machine thread or finalizing its InternalThread managed object. The solution is to replace InternalThread:synch_cs by InternalThread:longlived which is a refcounted struct that holds the synch_cs. The refcount starts out at 2 when the thread is attached to the runtime and when we create the managed InternalThread object that represents it. Both detaching and finalizing the managed object will decrement the refounct, and whichever one happens last will be responsible for destroying the mutex. This addresses mono#11956 which was a race condition due to the previous attempt to fix this lifetime problem. The previous attempt incorrectly used CAS in mono_thread_detach_internal while continuing to use locking of synch_cs elsewhere. In particular mono_thread_suspend_all_other_threads could race with mono_thread_detach_internal: it expects to take the thread lock and test thread->state and use the thread->suspended event, while detaching deletes thread->suspended without taking a lock. As a result we had a concurrency bug: in suspend_all_other_threads it's possible to see both the old (non-Stopped) value of thread->state and the new (NULL) value of thread->suspended. Which leads to crashes. --- Background - why we don't know if detaching or finalization happens first. 1. InternalThread normally outlives the machine thread. This can happen because when one thread starts another it can hold a reference to the fresh thread's Thread object which holds a reference to the InternalThread. So after the machine thread is done, the older thread can query the state of the younger Thread object. This is the normal situation. 2. During shutdown we can have the opposite situation: the InternalThread objects are finalized first (this happens during root domain finalization), but the machine threads are still running, and they may still return to start_wrapper_internal and call detach_internal. So in this case we have an InternalThread whose finalizer ran first and detach will run second.
lambdageek
requested review from
akoeplinger,
lateralusX,
luhenry,
marek-safar and
vargaz
as code owners
January 11, 2019 22:12
BrzVlad
approved these changes
Jan 14, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@monojenkins backport to 2018-12 |
@monojenkins backport to 2018-10 |
@lambdageek backporting to 2018-10 failed, the patch results in conflicts:
Please backport manually! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reverts a29ad08 (#9914)
The basic problem we want to solve is the following:
InternalThread:state
must be protected by theInternalThread:synch_cs
mutexfinalizing its
InternalThread
managed object.The solution is to replace
InternalThread:synch_cs
byInternalThread:longlived
which is a refcounted struct that holds the
synch_cs
. The refcount starts outat 2 when the thread is attached to the runtime and when we create the managed
InternalThread
object that represents it.Both detaching and finalizing the managed object will decrement the refounct,
and whichever one happens last will be responsible for destroying the mutex.
This addresses #11956 which was a race
condition due to the previous attempt to fix this lifetime problem. The
previous attempt incorrectly used CAS in
mono_thread_detach_internal
whilecontinuing to use locking of
synch_cs
elsewhere. In particularmono_thread_suspend_all_other_threads
could race withmono_thread_detach_internal
: it expects to take the thread lock and testthread->state
and use thethread->suspended event
, while detaching deletesthread->suspended
without taking a lock.As a result we had a concurrency bug: in suspend_all_other_threads it's
possible to see both the old (non-Stopped) value of
thread->state
and thenew (NULL) value of
thread->suspended
. Which leads to crashes.Background - why we don't know if detaching or finalization happens first.
InternalThread
normally outlives the machine thread. This can happen becausewhen one thread starts another it can hold a reference to the fresh thread's
Thread
object which holds a reference to theInternalThread
. So after themachine thread is done, the older thread can query the state of the younger
Thread
object. This is the normal situation.During shutdown we can have the opposite situation: the
InternalThread
objects are finalized first (this happens during root domain finalization), but
the machine threads are still running, and they may still return to
start_wrapper_internal
and calldetach_internal
. So in this case we have anInternalThread
whose finalizer ran first and detach will run second.