Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[threads] Use refcounts for coordinating finalization and detaching #12391

Merged
merged 1 commit into from Jan 15, 2019

Conversation

lambdageek
Copy link
Member

@lambdageek lambdageek commented Jan 11, 2019

Reverts a29ad08 (#9914)

The basic problem we want to solve is the following:

  1. All access to InternalThread:state must be protected by the
    InternalThread:synch_cs mutex
  2. We must destroy the mutex when we are done with the thread.
  3. We don't know which happens later - detaching the machine thread or
    finalizing its InternalThread managed object.

The solution is to replace InternalThread:synch_cs by InternalThread:longlived
which is a refcounted struct that holds the synch_cs. The refcount starts out
at 2 when the thread is attached to the runtime and when we create the managed
InternalThread object that represents it.
Both detaching and finalizing the managed object will decrement the refounct,
and whichever one happens last will be responsible for destroying the mutex.

This addresses #11956 which was a race
condition due to the previous attempt to fix this lifetime problem. The
previous attempt incorrectly used CAS in mono_thread_detach_internal while
continuing to use locking of synch_cs elsewhere. In particular
mono_thread_suspend_all_other_threads could race with
mono_thread_detach_internal: it expects to take the thread lock and test
thread->state and use the thread->suspended event, while detaching deletes
thread->suspended without taking a lock.

As a result we had a concurrency bug: in suspend_all_other_threads it's
possible to see both the old (non-Stopped) value of thread->state and the
new (NULL) value of thread->suspended. Which leads to crashes.


Background - why we don't know if detaching or finalization happens first.

  1. InternalThread normally outlives the machine thread. This can happen because
    when one thread starts another it can hold a reference to the fresh thread's
    Thread object which holds a reference to the InternalThread. So after the
    machine thread is done, the older thread can query the state of the younger
    Thread object. This is the normal situation.

  2. During shutdown we can have the opposite situation: the InternalThread
    objects are finalized first (this happens during root domain finalization), but
    the machine threads are still running, and they may still return to
    start_wrapper_internal and call detach_internal. So in this case we have an
    InternalThread whose finalizer ran first and detach will run second.

Reverts a29ad08 (mono#9914)

The basic problem we want to solve is the following:
  1. All access to InternalThread:state must be protected by the
     InternalThread:synch_cs mutex
  2. We must destroy the mutex when we are done with the thread.
  3. We don't know which happens later - detaching the machine thread or
     finalizing its InternalThread managed object.

The solution is to replace InternalThread:synch_cs by InternalThread:longlived
which is a refcounted struct that holds the synch_cs.  The refcount starts out
at 2 when the thread is attached to the runtime and when we create the managed
InternalThread object that represents it.
Both detaching and finalizing the managed object will decrement the refounct,
and whichever one happens last will be responsible for destroying the mutex.

This addresses mono#11956 which was a race
condition due to the previous attempt to fix this lifetime problem.  The
previous attempt incorrectly used CAS in mono_thread_detach_internal while
continuing to use locking of synch_cs elsewhere. In particular
mono_thread_suspend_all_other_threads could race with
mono_thread_detach_internal: it expects to take the thread lock and test
thread->state and use the thread->suspended event, while detaching deletes
thread->suspended without taking a lock.

As a result we had a concurrency bug: in suspend_all_other_threads it's
possible to see both the old (non-Stopped) value of thread->state and the
new (NULL) value of thread->suspended.  Which leads to crashes.

---

Background - why we don't know if detaching or finalization happens first.

1. InternalThread normally outlives the machine thread. This can happen because
when one thread starts another it can hold a reference to the fresh thread's
Thread object which holds a reference to the InternalThread. So after the
machine thread is done, the older thread can query the state of the younger
Thread object. This is the normal situation.

2. During shutdown we can have the opposite situation: the InternalThread
objects are finalized first (this happens during root domain finalization), but
the machine threads are still running, and they may still return to
start_wrapper_internal and call detach_internal. So in this case we have an
InternalThread whose finalizer ran first and detach will run second.
Copy link
Member

@BrzVlad BrzVlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@lambdageek
Copy link
Member Author

@monojenkins backport to 2018-12

@lambdageek
Copy link
Member Author

@monojenkins backport to 2018-10

@monojenkins
Copy link
Contributor

@lambdageek backporting to 2018-10 failed, the patch results in conflicts:

Applying: [threads] Use refcounts for coordinating finalization and detaching
Using index info to reconstruct a base tree...
M	mcs/class/corlib/System.Threading/Thread.cs
M	mono/metadata/object-internals.h
M	mono/metadata/threads-types.h
M	mono/metadata/threads.c
Falling back to patching base and 3-way merge...
Auto-merging mono/metadata/threads.c
CONFLICT (content): Merge conflict in mono/metadata/threads.c
Auto-merging mono/metadata/threads-types.h
Auto-merging mono/metadata/object-internals.h
Auto-merging mcs/class/corlib/System.Threading/Thread.cs
error: Failed to merge in the changes.
Patch failed at 0001 [threads] Use refcounts for coordinating finalization and detaching

Please backport manually!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants