proposal: deterministic multithreading #21916

glycerine · 2017-09-18T03:07:34Z

Various dissertations have proposed scheduling algorithms that provide deterministic multithreading. For example, this one claims it gives up only 16% on a four core benchmark. In exchange it provides deterministic execution. This could provide drastically faster time to program delivery in many cases. Further, it would dramatically help the reproduction of concurrency bugs if a "trace mode" could then be built that would trace a non-deterministic execution for later reproduction under deterministic scheduling.

http://groups.csail.mit.edu/commit/papers/09/asplos073-olszewski.pdf

The proposal would be to add a deterministic interleaving mode to the goroutine scheduler.

rsc · 2017-09-25T21:10:53Z

This is a fascinating idea but ultimately not something that's ready yet for the proposal process. There is likely many person-years of work remaining to integrate something like this into Go.

One problem that applies to most of these systems is that deterministic execution, or following-a-trace deterministic execution, is only useful if you have exactly the same program. As soon as you attempt a bug fix, the program has changed and may (deterministically) go down a completely different path. Worse, it will do so repeatedly, so you can't even run it multiple times to check whether the bug is really fixed. Most programs run different code paths on different inputs, so even needing to try a different input might trigger this loss of repeatability, despite the "determinism".

(This observation is not mine; it was first pointed out to me by @aclements, although maybe I mangled it in the retelling above.)

In addition to that problem, which I think is the most significant one, these systems also usually fail to cope with non-determinism from timing of network inputs or I/O, or from explicit randomization of address space operations in the kernel (which in turn affects code paths in address-value-sensitive code like the garbage collector).

None of this is to say that it's not an interesting idea that we'd love to see someone run with, but it would be a major, speculative undertaking. It's not something we on the Go team are likely to attempt ourselves without a clearer demonstration of applicability to Go, but if someone else wants to explore further how to productionize something like this into Go, the problems you encounter and their solutions would probably make a great paper or maybe the start of a great PhD thesis.

LarryRuane · 2018-02-07T01:08:51Z

This is more difficult than just implementing deterministic scheduling -- all sources of nondeterminism must be eliminated. I worked on a complex distributed block-level storage product (written in C) that supported this kind of pseudo-random testing. We had to simulate (mock) all interactions with the outside world, including networking, file access and, most importantly, time itself. The network and files were simulated using memory buffers. The entire simulated cluster of nodes ran within a single executable (one address space, one scheduler), which was single-threaded (we didn't use threads since we had no deterministic thread scheduler, among other reasons). Each fake network message and file IO access would appear to take a (pseudo) random variable amount of time. The scheduler (which was specific to this test mode) would use a priority queue to determine at what time the next event (network message arrival, whatever) occurs, and then simply advance the simulated clock to that time. So simulation time usually moved faster than real time!

Every failure was guaranteed to be reproducible. It was amazing. We ran this test (with different random number generator starting seeds) on racks of systems 24 hours a day. It found many unbelievably subtle timing-window bugs.

One enormous benefit of doing all this is that, when a failure is found, you can enable (or add) arbitrary amounts of tracing then re-run that seed. You can also add massive amounts of checking / audit code and possibly catch the bug earlier. These changes have no effect on simulated time. (I'm sure you've all seen the Heisenberg Uncertainty Principle at work when you try to reproduce bugs on a real system with tracing enabled.)

Your comment, rsc, about not knowing if you've fixed the bug is very astute. If necessary to get around this problem, I would actually "time-bomb" the fix, such that it would only kick in just at the (simulated) moment the wrong decision was made (I knew when that was from tracing output), so the run would reproduce to that exact state, and then I could see it continue from that point okay (or not okay if the fix didn't work or had some bad side-effect).

Since becoming interested in Go, I've thought about this very idea, but I agree that it's a huge project, plus it takes significant forethought when designing the code. It would be great, though.

glycerine · 2018-02-08T14:13:08Z

@LarryRuane I've implemented an interactive Go REPL (https://github.com/gijit/gi) based on a single-threaded just-in-time trace compiler, LuaJIT. gijit currently supports everything in Go except goroutines/select/channels. I'll be adding those next, and may well opt for a deterministic scheduler. You can contribute (your input is most welcome) or simply follow progress on issue gijit/gi#2

gopherbot added this to the Proposal milestone Sep 18, 2017

gopherbot added the Proposal label Sep 18, 2017

rsc closed this as completed Sep 25, 2017

golang locked and limited conversation to collaborators Feb 8, 2019

gopherbot added the FrozenDueToAge label Feb 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: deterministic multithreading #21916

proposal: deterministic multithreading #21916

glycerine commented Sep 18, 2017

rsc commented Sep 25, 2017

LarryRuane commented Feb 7, 2018

glycerine commented Feb 8, 2018

proposal: deterministic multithreading #21916

proposal: deterministic multithreading #21916

Comments

glycerine commented Sep 18, 2017

rsc commented Sep 25, 2017

LarryRuane commented Feb 7, 2018

glycerine commented Feb 8, 2018