Lockup of two SM instances #43

domokos · 2015-12-29T09:43:41Z

I spent quite some time until I figured that FM instances are not independent, they share a common queue/event source, which actually causes a lockup when two independent threads (foo, bar) use two - seemingly independent - SM instances (A & B) where thread foo from the callback of SM A wants to

signal thread bar to exit and wait for it to exit
bar on receiving the signal wants to trigger a state change in FM B and exit

In this case the SM will lock up in a deadlock.

Here's the code snippet attached to reproduce the issue - you can set WORKAROUND_ACTIVE to activate a workaround with a trade-off. You can signal TTIN to the app while being locked up to see the issue.

This may be a design limitation but then it should be noted in the - otherwise excellent - documentation. I found this out the 'hard way' by getting an unexpected and seemingly inexplicable thread lockup.

code.zip

piotrmurach · 2015-12-29T18:23:37Z

Thank you for reporting this, I will definitely take a look. I didn't intend this to be a feature, if anything, I wanted to ensure that instances of FM are threadsafe but independent of each other. You should be able to create a system of state machines that cooperate to solve a task, this deadlock is a bug, not a feature. Let's fix it!

… race condition and close issue #43

piotrmurach · 2015-12-30T13:10:08Z

@domokos Thank you for the code, it was super helpful in zeroing in on the problem! I have added a simpler version of your code as a integration test case to keep this bug at bay for the future.

Turns out the issue was to do with how state machines trigger and emit callback events, more precisely, how threads acquire an exclusive lock to be able to issue and observe events. My thoery is that the mutex has become a global mutex for the whole state machine, meaning, once thread triggered the event it also got to have a lock for the observer to emit callbacks and hence locking up any other thread from responding to events from any callbacks. Hence I think triggering an event from a different thread helped to fix the issue as a new mutex was created and hence new lock could be acquired. Long story short, this should be fixed now as both observer and event triggering should have separate locks. I will let you know once released, all is in master if you want to checkout.

piotrmurach · 2015-12-30T15:58:52Z

Released v0.11.2 that includes the fix, would you mind trying it out?

domokos · 2015-12-30T19:26:22Z

@peter-murach Thank you for resolving this so fast.

I installed version 0.11.2 and can confirm that this version does not have this bug any more. It works as expected.

This issue can be closed.

piotrmurach added a commit that referenced this issue Dec 30, 2015

Fix bug with two FSM machines locking up on callbacks triggers due to…

9822de1

… race condition and close issue #43

domokos closed this as completed Dec 30, 2015

craiglittle mentioned this issue Jan 4, 2016

Halt all callbacks when transition cancelled #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lockup of two SM instances #43

Lockup of two SM instances #43

domokos commented Dec 29, 2015

piotrmurach commented Dec 29, 2015

piotrmurach commented Dec 30, 2015

piotrmurach commented Dec 30, 2015

domokos commented Dec 30, 2015

Lockup of two SM instances #43

Lockup of two SM instances #43

Comments

domokos commented Dec 29, 2015

piotrmurach commented Dec 29, 2015

piotrmurach commented Dec 30, 2015

piotrmurach commented Dec 30, 2015

domokos commented Dec 30, 2015