Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Work Stealing Queue Deadlock fix. #275

Merged
merged 1 commit into from

3 participants

@bholmes
Collaborator

Check that all the work stealing queues are empty before entering the wait loop.

If another thread adds a wsq job after the previous call to dequeue_or_steal
but before the InterlockedIncrement call deadlock can occur.
@bholmes bholmes threadpool.c (async_invoke_thread) :
	Check that all the work stealing queues are empty before entering the wait loop.

	If another thread adds a wsq job after the previous call to dequeue_or_steal
	but before the InterlockedIncrement call deadlock can occur.
f2f98b1
@bholmes
Collaborator

// Program to show problem

using System;
using System.Collections.Generic;
using System.Threading;

namespace ThreadPoolTest
{
    class Program
    {
        delegate void AsyncEventHandler();

        static void Main(string[] args)
        {
            AsyncEventHandler evnt = new AsyncEventHandler(thread1Function);
            WaitHandle.WaitAll(new WaitHandle[] { evnt.BeginInvoke(null, null).AsyncWaitHandle });

            Thread.Sleep(2000);

            WaitHandle.WaitAll(new WaitHandle[] { evnt.BeginInvoke(null, null).AsyncWaitHandle });
        }

        static void thread1Function()
        {
            Console.WriteLine("Starting work on Thread 1");
            AsyncEventHandler evnt = new AsyncEventHandler(thread2Function);
            WaitHandle.WaitAll(new WaitHandle[] { evnt.BeginInvoke(null, null).AsyncWaitHandle });
            Console.WriteLine("Work Complete on Thread 1");
        }

        static void thread2Function()
        {
            Console.WriteLine("Starting work on Thread 2");
            Thread.Sleep(100);
            Console.WriteLine("Work Complete on Thread 2");
        }
    }
}
@bholmes
Collaborator

Also to make the above program show the problem some modifications to metadata/threadpool.c are needed.

1) in threadpool_init hardcode tp->min_threads = 1; and tp->max_threads = 2;
2) in async_invoke_thread before InterlockedIncrement (&tp->waiting); insert a sleep of 5 seconds.

This will widen the window that allows this bug to occur.

@bholmes
Collaborator

Another way to fix this would be to change the mono_cq_count (tp->queue) == 0 check to something that includes the wsq as well.

@gonzalop gonzalop merged commit dae0e64 into mono:master
@kumpera

This is patch is wrong, you cannot mess with managed objects while mono_gc_set_skip_thread (TRUE) is in place.

Can you try something like this:

InterlockedIncrement (&tp->waiting);
dequeue_or_steal (tp, &data, wsq);
if (data) {
InterlockedDecrement (&tp->waiting); //You missed this as well
break;
}

mono_gc_set_skip_thread (TRUE); //now we're done and won't touch managed memory anymore.

@ermshiperete ermshiperete referenced this pull request from a commit in sillsdev/mono
@baulig baulig Fix #275
"net.tcp://localhost:<port>/<path>" should listen on IPAddress.Any.

Cherry-pick from c99bb5c.

Change-Id: I9ab95b30997aaf180bad0feb876aa34fdddb0265
eccef63
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 18, 2012
  1. @bholmes

    threadpool.c (async_invoke_thread) :

    bholmes authored
    	Check that all the work stealing queues are empty before entering the wait loop.
    
    	If another thread adds a wsq job after the previous call to dequeue_or_steal
    	but before the InterlockedIncrement call deadlock can occur.
This page is out of date. Refresh to see the latest.
Showing with 6 additions and 0 deletions.
  1. +6 −0 mono/metadata/threadpool.c
View
6 mono/metadata/threadpool.c
@@ -1488,6 +1488,12 @@ async_invoke_thread (gpointer data)
mono_gc_set_skip_thread (TRUE);
InterlockedIncrement (&tp->waiting);
+
+ // Another thread may have added a job into its wsq since the last call to dequeue_or_steal
+ // Check all the queues again before entering the wait loop
+ dequeue_or_steal (tp, &data, wsq);
+ if (data)
+ break;
#if defined(__OpenBSD__)
while (mono_cq_count (tp->queue) == 0 && (res = mono_sem_wait (&tp->new_job, TRUE)) == -1) {// && errno == EINTR) {
#else
Something went wrong with that request. Please try again.