Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition Race Condition #6

Closed
theckman opened this issue Jun 8, 2016 · 3 comments
Closed

Transition Race Condition #6

theckman opened this issue Jun 8, 2016 · 3 comments

Comments

@theckman
Copy link

theckman commented Jun 8, 2016

Hey there,

I was taking a look at how your library was written and noticed that your transitions were not serialized using a mutex. I figured this would mean that your library would fail in cases where it shouldn't.

When I managed to reproduce it in testing, the failure was a bit more explosive than I thought it would be. When I wrote the test, I expected one of the goroutines would either: A) Fail with an InvalidTransition, B) Fail with the state being unexpected. However, both goroutines reported an InvalidTransition error, while only one reported that the state was incorrect. At first-pass, I can't explain why both goroutines ended up getting an InvalidTransition error. It would seem to confirm the race condition.

Here is the diff and then the test output. The output shows two errors getting returned from Transition, when really one should do it.

diff --git a/fsm_test.go b/fsm_test.go
index 8a9fec0..f08dfb8 100644
--- a/fsm_test.go
+++ b/fsm_test.go
@@ -1,6 +1,9 @@
 package fsm_test

 import (
+   "fmt"
+   "runtime"
+   "sync"
    "testing"
    "time"

@@ -91,6 +94,59 @@ func TestMachineTransition(t *testing.T) {
    st.Expect(t, some_thing.State, fsm.State("started"))
 }

+func TestMachineTransitionConcurrent(t *testing.T) {
+   if runtime.GOMAXPROCS(0) < 2 {
+       t.Skip("unlikely that concurrency bug would manifest on system with one CPU core; skipping")
+   }
+
+   rules := fsm.Ruleset{}
+   rules.AddTransition(fsm.T{"pending", "started"})
+   rules.AddTransition(fsm.T{"pending", "aborted"})
+   rules.AddTransition(fsm.T{"started", "finished"})
+   rules.AddTransition(fsm.T{"aborted", "pending"})
+
+   thing := Thing{State: "pending"}
+   machine := fsm.New(fsm.WithRules(rules), fsm.WithSubject(&thing))
+
+   var err error
+   wg := &sync.WaitGroup{}
+
+   wg.Add(2)
+
+   go func() {
+       err = machine.Transition("started")
+       fmt.Println(thing.State)
+       st.Expect(t, err, nil)
+       st.Expect(t, thing.State, fsm.State("started"))
+       wg.Done()
+   }()
+
+   go func() {
+       time.Sleep(time.Nanosecond * 10000)
+       err = machine.Transition("aborted")
+       st.Expect(t, err, nil)
+       st.Expect(t, thing.State, fsm.State("aborted"))
+       wg.Done()
+   }()
+
+   wg.Wait()
+}
+
 func BenchmarkRulesetParallelGuarding(b *testing.B) {
    rules := fsm.Ruleset{}
    rules.AddTransition(fsm.T{"pending", "started"})

Output:

=== RUN   TestRulesetTransitions
--- PASS: TestRulesetTransitions (0.00s)
=== RUN   TestRulesetParallelGuarding
--- PASS: TestRulesetParallelGuarding (0.00s)
=== RUN   TestMachineTransition
--- PASS: TestMachineTransition (0.00s)
=== RUN   TestMachineTransitionConcurrent
started
--- FAIL: TestMachineTransitionConcurrent (0.00s)
    st.go:41:
        fsm_test.go:127: should be ==
            have: (*errors.errorString) invalid transition
            want: (<nil>) <nil>
    st.go:41:
        fsm_test.go:128: should be ==
            have: (fsm.State) started
            want: (fsm.State) aborted
    st.go:41:
        fsm_test.go:119: should be ==
            have: (*errors.errorString) invalid transition
            want: (<nil>) <nil>
FAIL
exit status 1
FAIL    github.com/ryanfaerman/fsm  0.015s
@ryanfaerman
Copy link
Owner

ryanfaerman commented Jun 8, 2016

If we take a step back and thing what is happening here, the story as I understand it is as follows:

  1. machine is initialized as pending with valid transitions being to started or to aborted.
  2. machine is transitioned to started, valid transitions now are: finished.
  3. machine is transitioned to aborted, this is an invalid transition since the state of the machine is started.

The fact that these are running in goroutines does not make this a concurrent transition. There is only a single machine and it cannot be transitioned into an invalid state at any time. The second goroutine should have its expectations as:

st.Expect(t, err, fsm.InvalidTransition)
st.Expect(t, thing.State, fsm.State("started"))

Additionally, I was able to eliminate the strange behavior of having a valid transition seemingly return an error.

The issue is with declaring the var err error outside of the goroutines. You've introduced this variable into the scope above both goroutines, meaning that they will be sharing them. This is where the race condition actually lives. If the second goroutine manages to set the error before the first can run the expectations, you'll see this behavior.

But, by removing the err variable, I found a different issue. Now, randomly, either both sets of expectations would fail or both would pass. I don't think this is a bad thing. It means that the state machine is ensuring that the states are occurring according to their transition rule set.

Finally, I'm not really sure what you're actually trying to test here. From what I can tell, it seems that the race condition is actually in the test itself and not the state machine. Maybe if I understood more of what scenario you're trying to address, we can come up with a test case that accurately represents it.


EDIT: I've thought about it some more. I think a better test would involve two wait groups. We have an issue here in that we can't really predict when both goroutines will start. If we have each one wait until both goroutines signal they are running, and only start after that signal, we'll be able to see how the FSM really behaves.

@theckman
Copy link
Author

theckman commented Jun 8, 2016

Ugh. I'm sorry about the err variable. That's an embarrassing thing to miss in the test.

So I was looking at this block of code, and I was wondering if it's possible for two goroutines have the if statement evaluate to true, thus updating state twice:

If the consumer doesn't confirm state at the end, they could be running with the wrong assumption.

@ryanfaerman
Copy link
Owner

I was just reviewing that exact same function. I think there is a possible issue there. There is a chance that two or more goroutines could run the Permitted function before the another has a chance to update the state.

It is undefined which one would be permitted in that scenario.

I'm on the fence if it's better to simply document the possible issue here and leave it as an exercise to the consumer or to add a mutex on the Machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants