Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shutdown race between mono_thread_manage and mono_thread_detach_internal. #5599

Merged

Conversation

lateralusX
Copy link
Member

There is a race during shutdown when main tread is running mono_thread_manage and attached threads are running mono_thread_detach_internal. The problem is related to the removal from the registered threads list since mono_thread_manage will pick threads to wait upon from that list. If a thread removes itself from the list using mono_thread_detach_internal it will then race against runtime shutdown
and the removal of used resources still in used by mono_thread_detach_intenral, like MonoInternalThread.

This race could trigger the assert seen in Bugzilla (also includes repro steps):

https://bugzilla.xamarin.com/show_bug.cgi?id=58770

or other undefined behavior depending on where the detached thread resume execution after main thread has shutdown the runtime, but not yet terminated the process.

The fix makes sure threads that are detaching during shutdown ends up on the joinable threads list. Threads on that list will be waited upon by the main thread during shutdown, mono_thread_cleanup. This will make sure the detached thread finishes the remaining of mono_thread_detach_internal and potential other code still using GC memory during shutdown. Threads already on the threads list will already be waited upon by mono_thread_manage during shutdown (if not removed by mono_thread_detach_internal triggering this race), so this change won't add additional threads to be waited up by the runtime. It just makes the shutdown more reliable using already available concepts (like the joinable threads list).

Since above fix uses the joinable thread implementation in threads.c, not implemented on Windows platforms, PR also adds implementation for that missing feature.

g_hash_table_insert (joinable_threads, tid, tid);
UnlockedIncrement (&joinable_thread_count);
if (!g_hash_table_lookup (joinable_threads, tid)) {
threads_add_joinable_thread_nolock (tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the lookup needed ?

Copy link
Member Author

@lateralusX lateralusX Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows we need to store a handle to the thread in the joinable_threads table (as value) to make sure the tid still represents the same thread when waited upon. As long as there is an outstanding handle to a thread the tid won't be reused by the OS. This check prevents that we replace a tid handle, since the handle will then be leaked on Windows platforms. An alternative could be to implement a custom value_destroy_func on Windows to capture this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a little more into this, current implementations call of g_hash_table_insert is problematic in cases where the tid is already in the table, since the joinable_thread_count will then not be correct, and there is no way to check if g_hash_table_insert added a new item or just changing previous item in the table. Doing the value_destroy_func on Windows won't solve this, and the problem with the increment of joinable_thread_count exists on other platforms. In previous usage of joinable threads this use case probably didn't happen (same tid added multiple times to table), but it can happen with this fix, so doing the check prevents duplicates, makes sure the joinable threads in the list corresponds to the joinable_thread_count. It also makes the implementation of mono_threads_add_joinable_thread more robust for future use. OK to keep this change as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use g_hash_table_lookup_extended so a NULL value isn't considered false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@lateralusX
Copy link
Member Author

build

@lateralusX
Copy link
Member Author

Looks like just looking at the shutdown flag wont get all use case. Will update with additional condition on when to add threads to the joinable threads list.

#else
static void
threads_native_thread_join_lock (gpointer key, gpointer value)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use mono_native_thread_join on both platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to avoid additional additional openthread since we already have a valid thread handle in the table. mono_native_thread_join only accepts tid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a handle base join in the windows threading backend in complement to the already existing tid version.

{
HANDLE thread_handle = (HANDLE)value;
MONO_ENTER_GC_SAFE;
WaitForSingleObject (thread_handle, INFINITE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use mono_native_thread_join on both platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above


joinable_threads_lock ();
if (!joinable_threads)
joinable_threads = g_hash_table_new (NULL, NULL);
if (g_hash_table_lookup (joinable_threads, tid)) {

value = g_hash_table_lookup (joinable_threads, tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use g_hash_table_lookup_extended so a NULL value wouldn't be considered false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

g_hash_table_insert (joinable_threads, tid, tid);
UnlockedIncrement (&joinable_thread_count);
if (!g_hash_table_lookup (joinable_threads, tid)) {
threads_add_joinable_thread_nolock (tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

g_hash_table_insert (joinable_threads, tid, tid);
UnlockedIncrement (&joinable_thread_count);
if (!g_hash_table_lookup (joinable_threads, tid)) {
threads_add_joinable_thread_nolock (tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use g_hash_table_lookup_extended so a NULL value isn't considered false.

@luhenry
Copy link
Contributor

luhenry commented Sep 18, 2017

Also, like you pointed out earlier, this will also wait for threads which have been created outside of the runtime, and for which there is no guarantee that they will ever exit. For the example of the platform threadpool, which can be used by user code to call into managed code, there is 0 guarantee that these threads will ever exit.

@lateralusX
Copy link
Member Author

I will remodell the condition on when we add to the joinable list since shutdown flag wont capture all cases anyways. I will most likely move back to a previous solution just addin threads that we control.

@lateralusX
Copy link
Member Author

lateralusX commented Sep 19, 2017

We should now only add threads to the joinable thread list if its a thread that we know will terminate shortly after call to detach. External threads should never get onto the joinable threads list. This should not change much to current dependency on the threads list. If a attached thread is still on the threads list when thread_manage runs remove_and_abort_threads it will wait infinite for threads to finish.

@@ -739,7 +742,7 @@ mono_thread_attach_internal (MonoThread *thread, gboolean force_attach, gboolean
}

static void
mono_thread_detach_internal (MonoInternalThread *thread)
mono_thread_detach_is_joinable_internal (MonoInternalThread *thread, gboolean is_joinable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely prefer to keep the same function mono_thread_detach_internal, even if you add a parameter, to avoid having a proliferation of functions that do very little.

inline static void
mono_thread_detach_internal (MonoInternalThread *thread)
{
mono_thread_detach_is_joinable_internal (thread, FALSE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-joinable thread is a thread that have been externally attached, not a thread that is externally detached. And you already have this information via MonoThreadInfo.runtime_thread which is set to TRUE iff it's been created via start_wrapper which is only accessible when creating a thread from the runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I can switch to just looking at runtime_thread in condition and remove the additional parameter.

@lateralusX lateralusX force-pushed the lateralusX/fix-thread-detach-shutdown-race branch from 4b818c0 to ffa9cf0 Compare September 25, 2017 07:40
@lateralusX
Copy link
Member Author

Investigating a new scenario triggering similar assert when moving over to just looking at runtime_thread. Looks like there are runtime threads not included in the threads list during shutdown. Current solution won't add them to the joinable list, maybe they should be, but will investigate when/why this happens.

There is also a potential race between timeout of finalizer thread and joinable threads. The current implementation of mono_threads_join_threads can cause issues if finalizer thread is waiting on a joinable thread when aborted by main thread. This is probably a less likely scenario since the default finalizer timeout is 40 seconds, so won't focus on this scenario for now.

@lateralusX
Copy link
Member Author

lateralusX commented Sep 25, 2017

Found the additional race condition in this area, when a thread is detached it reset the threads background state, threads.c:773 and this will make thread_manage to not wait upon the tid, but remove it from the threads list. That will in turn cause mono_thread_detach_internal not hitting the code path adding to the joinable thread list. In order to resolve all races during shutdown between finalizing the call to mono_thread_detach_internal and shutdown of GC (and runtime) adding to joinable list needs to take place before changing the background state of the thread. NOTE, only runtime threads guaranteed to exit will be added to list.

@vargaz
Copy link
Contributor

vargaz commented Sep 25, 2017

Joinable threads are added in this code path:

@lateralusX lateralusX force-pushed the lateralusX/fix-thread-detach-shutdown-race branch from ffa9cf0 to 075b439 Compare September 25, 2017 13:08
@lateralusX
Copy link
Member Author

There are races when threads won't reach that code path before the runtime already have terminated the GC, and that will cause the assert/uncontrolled termination seen on Jenkins.

Above will make sure runtime threads gets onto the joinable thread list earlier, just before the runtime thread could vanish from thread_manage's radar and not waited upon by anyone.

@lateralusX lateralusX force-pushed the lateralusX/fix-thread-detach-shutdown-race branch from 075b439 to 517bca1 Compare September 25, 2017 15:40
@lateralusX
Copy link
Member Author

lateralusX commented Sep 25, 2017

Will investigate what's triggering the Linux failures. Will add some temporary asserts and rerun Jenkins builds.

@lateralusX
Copy link
Member Author

Believe that I found the cause of Linux failures, it boils down to the same issue as described above with race between finalizer thread and joinable threads. There is a small chance that a thread will be joined multiple times. This won't cause any problems on Windows, but on Posix's platforms this is undefined behavior.

#else
static void
threads_native_thread_join_lock (gpointer key, gpointer value)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the key/value parameters mean here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key/Value will be different on Windows where key is the thread ID and value is the OS handle to the thread (preventing thread id to be recycled). On other platforms key will be the same as value (same as previous implementation where tid was both key and value), but same function signature is used on all platforms. So for Windows key == tid and value == thread handle, on other platforms key == tid and value == tid (the same as previous implementation). I can give the parameters different names if that makes it clearer.

}
}
inline static void
threads_native_thread_join_nolock (gpointer key, gpointer value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not perf critical, they shouldn't be inline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@lateralusX lateralusX force-pushed the lateralusX/fix-thread-detach-shutdown-race branch from 6466c46 to 5d6a7bf Compare September 26, 2017 09:40
Current Windows implementation doesn't have support for the joinable thread
implementation in threads.c. Joinable threads are used by the runtime during
shutdown to coordinate with exit of platform threads.

Since threads detaching and exiting during shutdown could still use or provide
important resources, like GC memory, lacking support for this feature could
potentially expose the implementation to various shutdown race conditions.

Commit also change requested permission on a thread handle when opening it for
synchronization purposes. No need to request all thread access when handle is only
used for synchronization, like in WaitForXXX API calls.
There is a race during shutdown when main tread is running mono_thread_manage and
attached threads are running mono_thread_detach_internal. The problem is related to
the removal from the registered threads list since mono_thread_manage will pick
threads to wait upon from that list. If a thread removes itself from the list
using mono_thread_detach_internal it will then race against runtime shutdown
and the removal of used resources still in used by mono_thread_detach_intenral,
like MonoInternalThread.

This race could trigger the assert seen in bugzilla:

https://bugzilla.xamarin.com/show_bug.cgi?id=58770

or other undefined behavior depending on where the detached thread resume execution
after main thread has shutdown the runtime, but not yet terminated the process.

The fix makes sure threads that are detaching during shutdown ends up on the joinable
threads list. Threads on that list will be waited upon by the main thread during shutdown,
mono_thread_cleanup. This will make sure the detached thread finishes the remaining of
mono_thread_detach_internal and potential other code still using GC memory during shutdown.
Threads already on the threads list will already be waited upon by mono_thread_manage during
shutdown (if not removed by mono_thread_detach_internal triggering this race), so this change
won't add additional threads to be waited up by the runtime. It just makes the shutdown more
reliable using already available concepts (like the joinable threads list).
@lateralusX lateralusX force-pushed the lateralusX/fix-thread-detach-shutdown-race branch from 5d6a7bf to fffe6e2 Compare September 26, 2017 09:49
…ltiple times.

Checking against joinable_threads list in mono_threads_add_joinable_thread is not enough.
There is a small chance that the same runtime thread could get on the list twice since
mono_threads_add_joinable_thread could be called multiple times for the the same tid.
This will work under Windows but is undefined behavior on POSIX's platforms.

This could happen if the finalizer thread calls mono_threads_join_threads when a thread
is detaching and have added itself to the joinable_threads, but before unregister_thread
has been called. Finalizer thread will then take tid from list and wait, this will cause
the second call to mono_threads_add_joinable_thread to not find itself in the joinable_threads
and add tid a second time. NOTE, this race only applies to runtime threads added to the
joinable_threads list.

Fix adds an additional method to add runtime threads to the list. This method will only add to
list if its indeed a runtime thread and doesn’t already have an outstanding pending native join
request. This method is called by code previously checking runtime_thread flag before adding to
joinable_threads list. This will prevent the scenario when the same MonoThreadInfo add its tid to
the joinable_threads list multiple times due to above race condition.
@lateralusX lateralusX force-pushed the lateralusX/fix-thread-detach-shutdown-race branch from fffe6e2 to 7d92655 Compare September 26, 2017 12:09
@luhenry luhenry merged commit ed1884b into mono:master Sep 26, 2017
lateralusX added a commit to lateralusX/mono that referenced this pull request Nov 28, 2017
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
lateralusX added a commit to lateralusX/mono that referenced this pull request Dec 14, 2017
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
UnityAlex pushed a commit to Unity-Technologies/mono that referenced this pull request Sep 24, 2020
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
UnityAlex pushed a commit to Unity-Technologies/mono that referenced this pull request Sep 24, 2020
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
UnityAlex pushed a commit to Unity-Technologies/mono that referenced this pull request Oct 7, 2020
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
UnityAlex pushed a commit to Unity-Technologies/mono that referenced this pull request Oct 22, 2020
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
rbalaji-rythmos pushed a commit to Unity-Technologies/mono that referenced this pull request Jan 27, 2021
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
ashwinimurt pushed a commit to Unity-Technologies/mono that referenced this pull request Feb 4, 2021
mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…nternal. (mono/mono#5599)

* Add support for mono_threads_add_joinable_thread and friends on Windows.

Current Windows implementation doesn't have support for the joinable thread
implementation in threads.c. Joinable threads are used by the runtime during
shutdown to coordinate with exit of platform threads.

Since threads detaching and exiting during shutdown could still use or provide
important resources, like GC memory, lacking support for this feature could
potentially expose the implementation to various shutdown race conditions.

Commit also change requested permission on a thread handle when opening it for
synchronization purposes. No need to request all thread access when handle is only
used for synchronization, like in WaitForXXX API calls.

* Fix race between mono_thread_detach_internal and mono_thread_manage.

There is a race during shutdown when main tread is running mono_thread_manage and
attached threads are running mono_thread_detach_internal. The problem is related to
the removal from the registered threads list since mono_thread_manage will pick
threads to wait upon from that list. If a thread removes itself from the list
using mono_thread_detach_internal it will then race against runtime shutdown
and the removal of used resources still in used by mono_thread_detach_intenral,
like MonoInternalThread.

This race could trigger the assert seen in bugzilla:

https://bugzilla.xamarin.com/show_bug.cgi?id=58770

or other undefined behavior depending on where the detached thread resume execution
after main thread has shutdown the runtime, but not yet terminated the process.

The fix makes sure threads that are detaching during shutdown ends up on the joinable
threads list. Threads on that list will be waited upon by the main thread during shutdown,
mono_thread_cleanup. This will make sure the detached thread finishes the remaining of
mono_thread_detach_internal and potential other code still using GC memory during shutdown.
Threads already on the threads list will already be waited upon by mono_thread_manage during
shutdown (if not removed by mono_thread_detach_internal triggering this race), so this change
won't add additional threads to be waited up by the runtime. It just makes the shutdown more
reliable using already available concepts (like the joinable threads list).

* Fix warning C4141 on Windows MSVC compiler.

* Updated with review feedback.

* Prevent race between finalizer and main thread to join same thread multiple times.

Checking against joinable_threads list in mono_threads_add_joinable_thread is not enough.
There is a small chance that the same runtime thread could get on the list twice since
mono_threads_add_joinable_thread could be called multiple times for the the same tid.
This will work under Windows but is undefined behavior on POSIX's platforms.

This could happen if the finalizer thread calls mono_threads_join_threads when a thread
is detaching and have added itself to the joinable_threads, but before unregister_thread
has been called. Finalizer thread will then take tid from list and wait, this will cause
the second call to mono_threads_add_joinable_thread to not find itself in the joinable_threads
and add tid a second time. NOTE, this race only applies to runtime threads added to the
joinable_threads list.

Fix adds an additional method to add runtime threads to the list. This method will only add to
list if its indeed a runtime thread and doesn’t already have an outstanding pending native join
request. This method is called by code previously checking runtime_thread flag before adding to
joinable_threads list. This will prevent the scenario when the same MonoThreadInfo add its tid to
the joinable_threads list multiple times due to above race condition.


Commit migrated from mono/mono@ed1884b
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
mono/mono#5599 fixed a race condition during shutdown
when runtime threads have come parts of their way through detach, but still
depend on runtime resources, like GC memory. The fix added runtime threads
to the joinable thread list just before they vanished from mono_thread_manage
radar making sure shutdown waited upon the thread before cleaning up.

The above fix slightly changed the behavior of the finalizer thread since it
waits on joinable threads and will now potential block on threads still
executing code (that involves runtime resources). There’s was an assumption
around the threads on the joinable thread list that they should be very close
to complete when added, so join calls coming from the finalizer thread should
almost never block and if it does, the code that remains to execute should not
involve runtime operations risking deadlock situations. Adding the thread to
the list earlier than previously done expose the shutdown to some potential
theoretical problems.

To mitigate the risk and still solve the race condition this commit adds a
mechanism to keep track of active runtime threads until they park on joinable
thread list. The pending counter will be waited upon by the shutdown
thread, just before it does its regular wait on all joinable threads
(after finalizer thread has stopped) to make sure all runtime threads have
been added to the joinable thread list before waiting upon them. Threads are
added to the joinable thread as late as possible, exactly how it’s been done
in the past by sgen_client_thread_detach_with_lock. Shutdown thread will wait
on runtime threads to appear on the list for a short time and if timeout (pending
runtime thread count not reaching 0 before timeout), it will just print a warning
and continue shutdown.

Getting into a wait state during shutdown due to runtime threads not yet added to
joinable threads list should be very rare (hitting previous race condition that was rare),
triggering the timeout should be even more rare, and if that ever happens, we are exposed
to shutdown race condition as we have had in the past, but now we at least get a warning in
the log making it simpler to analyze further.

This commit also fixes a problem with the debugger thread hitting the same race condition as above.
The shutdown thread stopping the debugger thread didn't completely wait for it to stop using runtime
resources before continue shutdown sequence. This triggers the same race condition as when shutting
down regular runtime threads. This commit makes sure stop_debugger_thread waits on the debugger thread
handle to become signaled (happens at the very end of thread lifetime) before continuing the shutdown
logic.


Commit migrated from mono/mono@be683c1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants