[threads] Use refcounts for coordinating finalization and detaching #12391

Reverts a29ad08 (mono#9914) The basic problem we want to solve is the following: 1. All access to InternalThread:state must be protected by the InternalThread:synch_cs mutex 2. We must destroy the mutex when we are done with the thread. 3. We don't know which happens later - detaching the machine thread or finalizing its InternalThread managed object. The solution is to replace InternalThread:synch_cs by InternalThread:longlived which is a refcounted struct that holds the synch_cs. The refcount starts out at 2 when the thread is attached to the runtime and when we create the managed InternalThread object that represents it. Both detaching and finalizing the managed object will decrement the refounct, and whichever one happens last will be responsible for destroying the mutex. This addresses mono#11956 which was a race condition due to the previous attempt to fix this lifetime problem. The previous attempt incorrectly used CAS in mono_thread_detach_internal while continuing to use locking of synch_cs elsewhere. In particular mono_thread_suspend_all_other_threads could race with mono_thread_detach_internal: it expects to take the thread lock and test thread->state and use the thread->suspended event, while detaching deletes thread->suspended without taking a lock. As a result we had a concurrency bug: in suspend_all_other_threads it's possible to see both the old (non-Stopped) value of thread->state and the new (NULL) value of thread->suspended. Which leads to crashes. --- Background - why we don't know if detaching or finalization happens first. 1. InternalThread normally outlives the machine thread. This can happen because when one thread starts another it can hold a reference to the fresh thread's Thread object which holds a reference to the InternalThread. So after the machine thread is done, the older thread can query the state of the younger Thread object. This is the normal situation. 2. During shutdown we can have the opposite situation: the InternalThread objects are finalized first (this happens during root domain finalization), but the machine threads are still running, and they may still return to start_wrapper_internal and call detach_internal. So in this case we have an InternalThread whose finalizer ran first and detach will run second.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[threads] Use refcounts for coordinating finalization and detaching #12391

[threads] Use refcounts for coordinating finalization and detaching #12391

Commits on Jan 11, 2019