New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raimo: New gen State Machine: OTP-13065 #960

Merged
merged 52 commits into from Apr 25, 2016

Conversation

Projects
None yet
@RaimoNiskanen
Contributor

RaimoNiskanen commented Feb 9, 2016

gen_statem is a proposed new state machine engine in the gen_* framework that is intended to supersede gen_fsm since many find it hardly usable.

Major benefits over gen_fsm:

  • All events are handled in the same place in the implementation for a given state
  • Selective receive is mimicked by an internal queue that allows retrying an event in a later state
  • An infinite state space is possible

I have created a pull request of it to get external feedback. We have started to convert some state machines here at OTP to use gen_statem and are so far fairly happy with it.

Build and read the documentation. Please write some state machines.

/ Raimo Niskanen

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 9, 2016

Since building the documentation is easier said than done, here is at least something you can read right now: a PDF snapshot of stdlib; look in the reference manual at gen_statem

EDIT: I have removed the PDF documentation in favour of the HTML documentation right below.

</seealso> will be called.
If you use <c>handle_event</c> as a
<seealso marker="#type-state">state</seealso> and later
decides to use non-atom states you will then have to

This comment has been minimized.

@legoscia

legoscia Feb 9, 2016

Contributor

s/decides/decide/

decides to use non-atom states you will then have to
rewrite your code to stop using that state.
</p>
<p>When the using an atom-only

This comment has been minimized.

@legoscia

legoscia Feb 9, 2016

Contributor

s/When the using/When using/

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 9, 2016

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 9, 2016

I made that change in 2015, while lib/stdlib/doc/src/Makefile was changed in 2016

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 9, 2016

@legoscia: I have made the grammar corrections and await maybe more changes before pushing

@ferd

This comment has been minimized.

Contributor

ferd commented Feb 9, 2016

Okay, I haven't taken the most time thinking this over, this is after a first impression.

This is not simpler than gen_fsm. It's just tricky and complicated in a different way.

the assumption that things are simpler than gen_fsm

what is the complexity in gen_fsm, and does this address it?

So let me first state my own assumptions here.

I think the problem with gen_fsm may not be that you have to call StateName/3 and StateName/4 for sync and async events, but also that you have to add to that handle_info/3, handle_event/3, handle_sync_event/3, and that it is not obvious when I would want to use the handle_* forms rather than the StateName forms (i.e. the [sync_]send_all_state vs. [sync_]send_state stuff).

In comparison, there's this gen_server thing sitting right next to it, and I can define a record like state#{name=StateName} and poof I have just created a FSM using all the mechanisms I already know and love, because everyone learns gen_server first.

I think that's the true reason gen_fsm is not currently seeing broad usage. It has little to do with the fact you lose selective receives (in which case, something like plain_fsm could be appropriate) or other problems, just usability.

My assertion in this post is that gen_statem won't fix any of that and will not see any broader usage because it is still weirdly complex, although in entirely different ways.

Benefits over gen_fsm

First of all, the first point:

  • All events are handled in the same place in the implementation for a given state

This is true for a given state, because there is only one callback per function, but I do not think the new behaviour is any simpler.

You have the possibility of using both CallbackMod:State/5 and CallbackMod:handle_event/5 since you allow non-atom names, and doing so changes the behaviour. Even worse, if I apparently name my state 'handle_event' I create conflicts in how things may be handled; this is documented, but the consequences of doing so are not.

The naming of the old gen_fsm is bad and confusing, and that is fixed in gen_statem by going with call and cast.

But I think we're going to add confusion by adding message types in there, more on this later.

  • Selective receive is mimicked by an internal queue that allows retrying an event in a later state

That one is hard to reason about, possibly because of documentation. I can't parse sentences like:

When the state machine changes states all enqueued events becomes not yet processed to be processed before the old not yet processed events.

But let's look at the described behaviour:

  • If the option retry is true the current event is enqueued as postponed to be retried.

Is it retried last after all the currently waiting events? If you're simulating a mailbox I expect the latter, but if you 'enqueue' it I expect the former. This is unclear.

  • If the state changes all postponed events are transferred to not yet processed to be processed before other not yet processed events.

I don't understand this at all.

  • All operations are processed in order of appearance.

Are they? You just told me you can enqueue stuff and move it around, and that state changes do something to it.

  • The timeout option is processed if present. So a state timer may be started or a timeout zero event may be inserted as if just received.

Ok so we can definitely break the order of events then?

  • The (possibly new) state function is called with the next not yet processed event if there is any, otherwise the gen_statem goes into receive or hibernation (if the option hibernate is true) to wait for the next message. In hibernation the next non-system event awakens the gen_statem.

Ok so I guess 'not yet processed' is a list of events to be seen next? So the idea is that there's a list of messages to process/reprocess. As long as the list is enqueued, I do not trigger receives, meaning higher priority interrupts cannot be received, right? I can only reorder messages I currently have in my state.

If so this means I don't yet have proper selective receives, since I cannot match for high priority events unless I have handled all of those I currently have to deal with.

Now let's look at that type signature and the doc that says:

state_op() = state_option() | state_operation()

Either a state_option() of which the first occurence in the containing list takes precedence, or a state_operation() that are performed in order of the containing list.

state_option() can be retry, {retry, boolean()}, hibernate, {hibernate, boolean()}, {timeout, timeout(), term()}.
state_operation() can be {reply, Client, Term}, {stop, Reason}, {insert_event, Type, Term}, {remove_event, Type, Term}, {remove_event, Predicate}, {cancel_timer, TRef}, {demonitor, Ref}, {unlink, PidOrPort}.

Things I note from the doc:

  • the order of a retry is undefined
  • insert_event shoves the event to be the next one processed (screw that rule about 'All operations are processed in order of appearance'). But if I don't trigger a receive without having emptied the list of events to handle, how is this different from just calling StateName(NextType, NextEvent, ...) and so on?
  • remove_event operates on the oldest queued event. Is this relative to the order of the list or the order of arrival? If I insert and event and then remove it, am I really removing the oldest one, or is this built on the (now-faulty) assumption that event order is maintained?
  • Why do we special case cancelling timers, demonitoring (which by the way no longer has the 'flush' option in this implementation), or unlinking? I don't understand why this needs to be there, especially since it is far less flexible than mechanisms everyone already knows (function calls with more flexible parameters). I guess flushing would be broken anyway since the messages can now be enqueued in the process, right? So I'd need to both call erlang:demonitor(Ref, [flush]) and return {remove_event, fun(_, {'DOWN', R, _, _, _}) when R =:= Ref -> true; (_,_) -> false end} as an operation for things to work? I guess if flush is now the default, I lose the ability not to flush?
  • An infinite state space is possible

Okay. I don't have much to say here.

splitting up events into types and messages (why dismissing tagged tuples?)

-type event_type() :: {call, Client :: client()} | cast | info | timeout | internal.

Boy am I not a fan of this. Rather than reusing mechanisms everyone knows already (handle_call, handle_cast, handle_info) and possibly just passing a state name around, we now have turned the structure upside-down and carry some message type around so everything gets shoved into a single function.

Functionally they're equivalent, but now OTP has this big piece of design missing, where knowledge about one component cannot be carried to another one. Every new piece is a learning experience with vastly different implementation mechanisms. If gen_fsm was too complex for someone coming off gen_servers, I can't imaginge gen_statem is going to be any simpler.

It also creates entirely distinct interfaces to interact within the FSM. Now instead of going gen_statem:cast(...) internally, I can send myself an internal cast, but only by doing {insert_event, {call, SomeClient}, ...} except that nope, in this case, this event will be the next first thing to be processed so they're not exactly equivalent! But it does look simple enough, right?

gen_statem:cast -> {insert_event, cast, Term}               % ok this makes sense
Pid ! Message   -> {insert_event, info, Term}               % yeah that matches gen_server
gen_statem:call -> {insert_event, {call, Client}, Term}     % uh okay, I guess I am meant to reuse a client?
gen_statem:?    -> {insert_event, internal, Term}           % alright? I guess this keeps me from having to tag my tuples?
gen_statem:?    -> {insert_event, timeout, Term}            % what?

There is seemingly little cohesion between these. I can appreciate the 'internal' stuff being used for generated events, but why wouldn't we just keep using tagged tuples the way we were? Plus given the names, I'd expect to have handle_cast, handle_call, and handle_info at the very least, because that is what gen_server, gen_fsm and gen_event all do!

weird manual management of the mailbox

what's the rationale for retrying on all state changes?

If the gen_statem State is an atom(), the state function is Module:State/5. If it is any other term() the state function is Module:handle_event/5 . After a state change (NewState =/= State) all postponed events are retried.

The implicit assumption I get here is that if I retried an event, it's not in the 'not seen yet' list or whatever its name is, it is in a secondary one going to wait until a different state name is hit. When the events are retried, how does this work?

  • Are they retried before the other enqueued events?
  • Are they retried after the other enqueued events?
    • in this case, if I have retried events in state S-2, and then retried it again in S-1, but after I have also retried regular events in S-1, is the old order of retries maintained?
  • Are they all retried before the state of the FSM is allowed to change, or are some retried if the state does not change?
  • If only some of them are retried, then I have these fun questions for you:
    • Assume I have states a and b. While in a I received two events, E1 and E2, which I have deferred to retry for later. As a transitions to b, I handle an event that changes my state back to a. Are E1 and E2 going to be retried since I did have a transition a -> b despite currently being in a, the same state I was in when I first scheduled a retry?
    • Same question, but now assume that E1 is run in b and it does trigger a as a new state. Is E2 going to run when transitioning from b -> a despite having been retried in a?
    • Let's now imagine I have states a, b, and c. If I retry E1 in a, then retry it again in b, can it only ever run while in c? If I then retry it in c again, am I stuck with unhandlable events?

With the last 3 questions, I am not sure what the point is with the retry queue: either it does not track transitions properly and defeats its own purposes compared to the regular eveent queue, or it has to track ever-increasing states with a risk of leaking data by having unhandled events forever.

In a nutshell

I'm not sure this will do anything to help adoption of state machines in Erlang. I think the problem with gen_fsm is one of usability first and capability second (and this is why people use a less-adequate gen_server in lieu of gen_fsm), and in this case, the solution proposed fixes some of the usability (the external interface is good!), but not enough (the internal interface is not, and breaks all conceptual affinity with the rest of OTP). To me this means that regardless of technical capabilities, this solution won't be able to supercede gen_fsm, or if it does, it won't solve the problem stated:

gen_statem is a proposed new state machine engine in the gen_* framework that is intended to supersede gen_fsm since many find it hardly usable.

Unless the intended meaning was that gen_statem supercedes gen_fsm as a hardly usable gen_* framework state machine engine ;)

I want to be constructive

This is getting long, but I don't like being so damn negative in a review. You're always free to disregard me, but if what I wanted to do was help usage of gen_fsm's mechanisms, I would:

  • clarify state vs. statename vs. data vs. statedata. This is confusing, not made better currently.
  • keep using concepts that are in line with what gen_server and gen_event do
  • keep the new simpler interface for callers, this one is good

What would work well? Give users a gen_server except it has an additional argument: StateName. Rename current State to Data, possibly.

init(Args) -> {ok, StateName, Data} | ...

handle_call(Msg, From, StateName, Data) -> {reply, Reply, StateName, Data} | ...

handle_cast(Msg, StateName, Data) -> {noreply, StateName, Data} | ...

handle_info(Msg, StateName, Data) -> {noreply, StateName, Data} | ...

...

If I want that big gen_fsm:send_all_sync_event, I can encode that handling by going handle_call(Event, _From, AnyState, Data) -> ..., for example. Let me use what the language gave me. There is no question to be asked on whether the state name needs to be an atom or not, it just is a term. Tracing still works without more complexity nor loss of generality over the current or proposed implementations.

Just doing that, I think, would make you have many users of your gen_statem instantly. I can take the intuitive knowledge I have about gen_server, and use it directly in this new gen_statem. There is no conceptual overhead, the transition is done easily, and I have the convenience of matching on state names explicitly. It is mundane, predictable, relatively flexible, and fixes usability issues. Problem solved?

What about selective receives?

I'd punt on figuring out selective receives for this one. I think in the end OTP is pretty damn incompatible with them. That's a shame, really, and it makes a lot of self-driving FSMs hard when you want to manually trigger events while handling regular interrupts. Sadly the current proposed implementation here still does not allow for that to happen, since it appears the mailbox is accessed only when events are no longer waiting in the queue. The interface as described here is also confusing. If the point is to offer the flexibility of selective receives in a state machine, I believe a far more different approach is required, and one that does maintain mailbox semantics should be used instead of what is being proposed here.

@OTP-Maintainer

This comment has been minimized.

OTP-Maintainer commented Feb 9, 2016

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


1 similar comment
@OTP-Maintainer

This comment has been minimized.

OTP-Maintainer commented Feb 10, 2016

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@lehoff

This comment has been minimized.

lehoff commented Feb 10, 2016

I agree with Fred - in particular his suggestion on how to have three handle_* functions for the three types of events. That would greatly simplify understanding what is going on and what the sender of a message is expecting. Separating the call type into another argument to the mother of all handle functions is not easy to work with.

I would welcome a removal of functions with the state name as the function name. In the gen_fsm that is part of what scatters the logic around way too much.

Having the state be any term would allow for handling sub-states in a much more elegant way - I have had great pain with a gen_fsm where I had to add a field in the state record to track sub-states of just one state in the FSM. That kind of clutter can be removed by following Fred's suggestion.

My personal take on the history of gen_fsm is that the evolution from having states implemented as functions (soooo nice the first time you run across it) has leaked too much into the practicalities of how to implement things inside gen_fsm. No need to repeat that history with gen_statem.

I also agree with Fred that selective receive has a very poor fit with OTP.
We live without selective receive in gen_fsm today so I think the best thing to do is to implement gen_statem without the support for selective receives. That will make gen_statem a nice replacement for gen_fsm and address the lack of usage of gen_fsm. This will better the code quality instead of having a lot of my_fsm things based on gen_server out there.

A FSM that has selective receive should respect the mailbox semantics - again I agree with Fred - and I think the best way to address this is by solving that problem on its own.

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 10, 2016

@ferd: Thank you for reading the documentation and for your feedback. It is unfortunate that you and gen_statem got a bad start because I think most things are really as you want them to be.

We have a saying here in Sweden; You have not understood the beauty of the crows' song.

Now to understand the beauty of the crows' song it helps to understand what the crows are singing, and it seems I have failed in communicating how the event queue in gen_statem works; it is not that complicated but of course there are subtleties.

It also helps to understand why the crows are singing as they do that is motivating why I did choose this implementation. But that I do not think there should be much motivation in the documentation since motivation only is interesting for those with prior knowledge of gen_fsm and the discussions that resulted in e.g plain_fsm.

Design goals

I saw Ulf Wiger's talk on plain_fsm and thought he was so right in that the simple state machine code examples that use plain Erlang selective receive are beautiful. But I thought that using a parse transform or having to explicitly deal with sys messages was not that pretty.

So these were the primary goals:

  • Feel like a plain Erlang selective receive state machine
  • Not use parse transforms nor explicit receive of sys messages
  • Fit into the gen_* framework

Design implications

What is the State?

A plain Erlang selective receive state machine basically looks like this:

state1() ->
  receive
    event1 ->
      state2()
  end.

state2() ->
  receive
    event2 ->
      state1()
  end.

The State in this case is a function name i.e an atom(). I think this was one of few good things with gen_fsm that gives a benefit over gen_server in that the function name is the same as the state name. If you do not like this idea you can use gen_server instead.

There are cases when a more complex state than an atom() would be useful, e.g state machines within state machines and by deciding that the function handle_event will be used when the State is not an atom() it becomes possible to have an infinite or hierarchical state space.

Note that it is entirely up to the state machine implementation to use handle_event as a state name or as a state handling function or both. You get enough rope.

Signature of the state handling functions

It was intentional to give the state handling functions an unusual arity i.e 5:

  State(EventType, EventContent, PrevState, State, StateData)
  handle_event(EventType, EventContent, PrevState, State, StateData)

Partly because handle_event/5 for a non-atom State would need an argument State. Partly because the argument PrevState might have use cases but mostly to make the state handling functions the only that have to be arity 5. This avoids reserving any state name. Except possibly for handle_event which anyway is a bad state name, and which you may actually use as a state name if you like as long as you stick to State atoms. You get enough rope.

Another implication of the design goal "Feel like a plain Erlang selective receive state machine" is that all events should be handled in the same state handling function so there can be no such thing as handle_sync_event/4 or differing arities used depending on if the event was a call or a cast. A gen_server does not handle all events in the same function and I think that is bad for a state machine implementation and breaks this design goal.

State handling function's return value

As you see i have ditched the {reply,...} and {noreply,...} convention from gen_server. This is because you might want to combine {reply,Client,Reply} with {timeout,T}, retry, {insert_event,internal,Event} and other operations and options, so you return a list of options and operations. Furthermore the gen_server {reply,...} is flawed because it can only reply to a call in the current state since the Client is implicit. In gen_statem you can return several {reply,...} operations if necessary e.g if several clients should be answered at once, and this can be done in a later state than the one(s) that got the call(s).

EventType and EventContent in a tuple or separated

I chose to have them separated which increases state functions arity to 5. The callback function code_change/4 is arity 4 in all other gen_* modules and I thought it to be more ugly to change that than to consistently separate EventType and EventContent throughout gen_statem.

In gen_server the EventType is embedded in the callback function name e.g handle_call vs. handle_cast so that is an even bigger separation.

Event queue semantics

The event queue is intended to work like the ordinary process message queue or as an extension of it, and that apparently did not get through the documentation.

The order of received events is preserved. I could nitpick and claim that an inserted event is not a received event, or just say that inserted events break that ordering. It is more useful to get an inserted event immediately. Explanation follows:

Do not call your own state functions

In gen_fsm you often had to call your own state handling functions to ensure that an "inserted" event gets processed immediately. This leads to confusion of which state you actually are in, and you are supposed to use gen_statem's {insert_event,EventType,EventContent} instead of calling your own state functions.

There are subtle differences between calling your own state function and inserting an event that can be boiled down to do not call your own state functions. Calling a common event handling function from several state functions is just fine, though.

state1(cast, event1, _, _, _) ->
  [retry];
state1(cast, event2, PrevState, State, StateData) ->
  state2(internal, event3, PrevState, State, StateData).

state2(cast, event1, _, _, _) ->
  do_something(event1),
  {};
state2(internal, event3, _, _, _) ->
  do_something(event3),
  {}.

Here event1 is supposed to be handled in state2 but since you call state2 from state1 the state machine engine will not see the state change and not retry event1 and you will stay in state1 instead of transiting to state2.

Event ordering

Events are presented to the state machine implementation in the order received except for inserted events that are presented immediately.

If you postpone an event it is not presented again in the current state. If you consume an event (by not postponing it) it is removed from the event queue. If gen_statem runs out of events in the event queue a new one is received. If the state changes all events are retried in the order received starting with the oldest.

This is just like the normal process message queue: If a message does not match any receive pattern (the state function postpones it) it is not tried again in this receive statement. If a message matches a receive pattern (the state function consumes it) it is removed from the process message queue. If the code enters a new receive statement (the state changes) all messages are retried in the order received starting with the oldest.

The special treatment of {timeout,0} is just an optimization. There is no point in starting a timer that will fire immediately so why not pretend that the timeout message got received just now?

How this violates "handling interrupts" as you put it I can not see. The state handling function has to first process any queued events before a new one can be received, but this is just a loop over the old events, and then the "interrupt" event will be received. This is essential to preserve message ordering.

{cancel_timer,Ref} and the others

Flushing the stray message is the reason for these operations so therefore there is no flush option. As you guessed e.g a timeout message might already be in the event queue. Therefore you will have to be able to remove events from the event queue, so {remove_event,EventPredicate} is the essential operation, but getting a combination of erlang:cancel_timer/1 and {remove_event,EventPredicate} right is not that simple:

state1(cast, event1, _, State, TimerRef) ->
  case erlang:cancel_timer(TimerRef) of
    TimeLeft when is_integer(TimeLeft) ->
      {State,undefined};
    false ->
      receive
        {timeout,TimerRef,_} ->
          {State,undefined};
      after 0 ->
          {State,undefined,
           [{remove_event,
             fun (info, {timeout,TRef,_})
                    when TRef =:= TimerRef ->
                       true;
                   (_, _) ->
                       false
             end}]}
      end
  end.

Compared to:

state1(cast, event1, _, State, TimerRef) ->
  {State,undefined,[{cancel_timer,TimerRef}]}.

The same goes for {demonitor,Ref} and {unlink,Pid}.

Documentation improvements

I sense that the documentation needs improvement, so any suggestions on how to do that is greatly appreciated.

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 10, 2016

@lehoff

I disagree with you and @ferd regarding having three handle_* functions for the three types of events. Collecting all events in the state handling functions greatly simplify understanding what is going on since you can compare your code to your state diagram and see all events that cause transitions. Having to separate this logic into three different functions is not easy to work with.

I think the parts of gen_fsm that scatter the logic is the functions that are not the state name functions. If you do not like state name functions you can use gen_server.

Allowing the state to be any term to allow for handling sub-states I think is solved nicely by letting the handle_event/5 function in a gen_statem callback module handle that.

My personal opinion is that having states implemented as functions still is soooo nice, and if you do not think that you can use gen_server.

When you think that selective receive fits poorly with the gen_* framework you are right, but the possibility to postpone an event and handle it in a later state is addictive. It is something very much missing in gen_fsm and practically everyone writing state machines with gen_server or gen_fsm has to re-implement event queueing in some form. Having this implemented in gen_statem will improve the code quality since the queueing and its semantics is already well tested.

gen_statem preserves the mailbox semantics as far as possible given that the gen_* model dictates that you have to receive all messages immediately, and solving this problem in an nice way in gen_statem will benefit all writing state machines.

@essen

This comment has been minimized.

Contributor

essen commented Feb 10, 2016

I think this is too complex. People don't use gen_fsm because of complexity, and people will not use gen_statem for the same reasons. It does have some good ideas, they're just not taken far enough and/or are mixed with other concepts.

The complexity in gen_statem mainly comes from:

  • Wanting a non-finite state machine
  • Wanting selective receives
  • Wanting to handle timers/demonitors/unlinks (this does not include the inactivity timeout which is understandable)

The most interesting change is having a single function per state. I think this is a great usability improvement. I think everyone can agree with this. Another interesting change is being able to return a list of commands.

I think you should drop handle_event: it conflicts with StateName. Apparently you're a fan of ropes, but most people are a fan of hanging themselves so that wouldn't work well. If you drop handle_event, you can also drop an argument from the StateName function:

StateName(EventType, EventContent, PrevState, StateData)

This function doesn't need any catch-all (the send_all_state equivalent) because we can easily write a catch-all as needed.

This function doesn't need to be able to return timers because handling timers is really trivial. I'm not sure who writes huge blocks of code to handle timers like you did earlier, most people just send_after/start_timer, cancel_timer and ignore any unexpected timeout messages. Not much point wasting resources going through a possibly huge mailbox just to remove a timeout message that you can just ignore. Same goes for demonitors and unlinks.

This function doesn't need to deal with inserting events or anything. It can just call the related function. You can do that because this is a finite state machine and there's no handle_event, so it's always the same call. Note that people will do this regardless of you having return values for inserting events, because they won't even suspect calling a function is bad (nor should it be).

I don't see a use for retry. I know selective receives are great, but part of what makes them great is that they're efficient, and I can't possibly see this as being efficient, since you'll have to walk through the list of messages every time, unlike with selective receives. Oddly enough I also don't see a use case for selective receives with state machines. I'd like to hear a concrete real-world example.

This function is now clear (all code for a state is in the right place) and in line with the rest of the language. We restored the ability to call functions and handle messages in a single way, as opposed to your proposed solution where messages can be handled in StateName function, in handle_event or through a returned command. We can also easily use language constructs for a catch-all "send_all_state" should we require it.

If you want people to use something other than gen_server, it needs to be an interface that fits with the rest of the language and libraries, not something complex like gen_fsm or gen_statem. I believe this is a step in the right direction, but that you tried to run too fast and lost your ways.

@zxq9

This comment has been minimized.

zxq9 commented Feb 10, 2016

Complexity killed the gen_fsm -- I think nobody is disputing that. Following with @essen's message, I think reducing complexity is the way to go, and here I see a ton of things I've never really cared much for in state machines. Setting the bar lower in terms of featureness may be beneficial -- even if that means breaking this into more than a single behavior.

Reading through this I keep thinking:

  • I can't remember a time I actually wrote a state machine that was intended to handle synchronous messages in some states and asynchronous messages in others (system messages aside).
  • I can't remember a time when infinite states was something I wanted to create in a state machine (but perhaps I lack sufficient intuition to distinguish between a "state machine" abstraction and a more general "object/process/thingy" abstraction).
  • I can't remember a time when I actually wanted selective receive in a state machine (and didn't find having a buddy process that ran interference for the fsm much easier to deal with and understand instead).
  • I can't remember a time when I wanted something other than an atom to represent a state (but perhaps I lack creativity).

I would experiment with dropping the things @essen mentions dropping (selective receive, infinite states, bookkeeping, time), and then split this into a gen_async_fsm and gen_sync_fsm. Is this a new programming constraint? Yes, of course it is. Could one be cast to the other if you really needed? Also yes. Is it likely to actually be something someone wants to do? Probably not. This is why I tend to write my own fsms by hand via proc_lib when I need them. The whole point of the state-machine abstraction is that it is limited, interacts with the world in a single way, and lacks external complexity due to its singularity of purpose per state. The last thing an fsm should be is complex.

@ferd

This comment has been minimized.

Contributor

ferd commented Feb 10, 2016

If you postpone an event it is not presented again in the current state. If you consume an event (by not postponing it) it is removed from the event queue. If gen_statem runs out of events in the event queue a new one is received. If the state changes all events are retried in the order received starting with the oldest.

Doesn't that risk being a problem? If I want to retry multiple messages later within the state until my Data has been modified, there are no semantics to accomplish that, something that I could do with true selective receives.

It also has the caveats of the questions I asked in my previous post:

  • Assume I have states a and b. While in a I received two events, E1 and E2, which I have deferred to retry for later. As a transitions to b, I handle an event that changes my state back to a. Are E1 and E2 going to be retried since I did have a transition a -> b despite currently being in a, the same state I was in when I first scheduled a retry?
  • Same question, but now assume that E1 is run in b and it does trigger a as a new state. Is E2 going to run when transitioning from b -> a despite having been retried in a?
  • Let's now imagine I have states a, b, and c. If I retry E1 in a, then retry it again in b, can it only ever run while in c? If I then retry it in c again, am I stuck with unhandlable events?

Those are pretty damn important semantics. When I use a selective receive, I do not have to bother with it because I pick and choose as I go, but here I totally depend on the underlying implementation.

How this violates "handling interrupts" as you put it I can not see.

Assume I send 50 low-priority events of the form: {schedule_task, normal, Ref, Description}. As a 51st message, I send {schedule_task, high, Ref, Description}. Under a selective receive, each iteration could be done as:

loop(State) ->
    receive
        {schedule_task, high, Ref, Description} ->
            NewState = schedule(Ref, Description, State),
            loop(NewState)
    after 0 ->
        receive
            {schedule_task, Other, Ref, Description} ->
                NewState = schedule(Ref, Description, State),
                loop(NewState)
        end
    end.

This pattern lets me peek inside my inbox to always get higher priority messages. I could use these, but also sys messages for example, if I wanted my OTP tracing and suspending to take top priority. If I go for selective receives, this is a thing I can do.

Under the implementation posted here, assume that the first 50 low priority messages have been consumed and put onto the queue. The high priority send just happens later. Whenever it is sent, though, the first high-priority receive would fail; in fact I cannot ever represent this pattern unless I use a retry (which ensures I won't see the message again in the current state), but with the caveat that I cannot, then, process the retried messages without artificially changing states!

The pattern in the current implementation is strictly less powerful than a selective receive because it can only operate in batches of processes at once, and a higher priority message cannot be read unless it is already buffered in memory or messages are artificially discarded/rescheduled to deal with this fact.

It was intentional to give the state handling functions an unusual arity i.e 5:
Partly because the argument PrevState might have use cases but mostly to make the state handling functions the only that have to be arity 5.
I chose to have them separated which increases state functions arity to 5.

I'm not sure I understand the value of doing this? The objective was to avoid conflicts of states like terminate and whatnot?

There are subtle differences between calling your own state function and inserting an event that can be boiled down to do not call your own state functions. Calling a common event handling function from several state functions is just fine, though.

Right. I agree we shouldn't have to, but the reality is we do. The idea of 'inserting' events is a good one, but is ultimately an entirely distinct feature from selective receives. I think I would like to see such a thing in a FSM like the one I proposed briefly, as long as the semantics are clearly defined ("this goes next"). Doing things that way does ensure state transitions, logs, and everything trace-related that the FSM can do are properly handled for reports and whatnot.

It still has that caveat of 'not properly handling interrupts' I mentioned earlier because every time you insert events, you keep the mailbox inactive, prevent the consumption of sys messages or other calls, and basically still have that not-powerful-enough receive simulation. But it's an improvement nonetheless.


  • Feel like a plain Erlang selective receive state machine
  • Not use parse transforms nor explicit receive of sys messages
  • Fit into the gen_* framework

I think point 1 is a miss, or at the very best a partial success. Mostly because the implementation is strictly less powerful than a selective receive, and it is still complex to reason about. I might be tempted to instead use a supervisor_bridge and use actual selective receives if I need interrupts, or to implement or use something like Ulf's FSM instead.

Point 2 is obviously hit without a problem.

Point 3 I think is a miss. The design just doesn't fit super well from the consumer-side (the caller side is fine). Here's some distinctions between gen_statem and other gen_* behaviours:

  • Explicity message types rather than handling specified by function name, which clashes with all other behaviours
  • Different message handling model from anything else in this language or library (this is major)
  • Different return values that provide tooling for that different message consumption (including ref cancelling and monitors handling and unlinks and all of that), and there can be multiple of them.

The latter one I guess is the one that annoys me the least, but it does create potential confusion. If someone learns gen_server first, they have to re-learn gen_statem from scratch. If they go to gen_event, they have very little to adapt to. If they go to gen_fsm, they have to adapt to all the naming and different ways to send events, but the overall mechanisms are the same.

I don't know how much value you put on having a fairly uniform design around things, but OTP is already seen as one of the challenging parts of the ecosystem to learn, and one of the most worthwhile ones too. I fear that adding complexity to things here will only make learning and teaching it more difficult.

The expected result I expect from these changes if they go through unmodified is a bit like what follows:

  1. user learns gen_server, supervisors, gen_event, etc.
  2. user checks out gen_fsm, finds it hard, does not use it
  3. user finds out about gen_statem, which replaces gen_fsm, finds it as hard if not harder, does not use it
  4. user sticks the finite state machine into a gen_server as a workaround

The only actual impact for the vast majority of users is the addition of another step they won't make use of, adds to their frustration, and we end up with the same result of 'user sticks a FSM into a gen_server'.

Note that for this point I am not debating the technical merits of the solution you propose. It is strictly more powerful than what we currently have, and more flexible. What I am saying is that it falls short of its objectives, and in practice it's gonna have no noticeable impact on code quality of most Erlang systems because people are not gonna use these any more than they use current fsms just because of the complexity of it.

@ferd

This comment has been minimized.

Contributor

ferd commented Feb 10, 2016

Reading through this I keep thinking:

  • I can't remember a time I actually wrote a state machine that was intended to handle synchronous messages in some states and asynchronous messages in others (system messages aside).
  • I can't remember a time when infinite states was something I wanted to create in a state machine (but perhaps I lack sufficient intuition to distinguish between a "state machine" abstraction and a more general "object/process/thingy" abstraction).
  • I can't remember a time when I actually wanted selective receive in a state machine (and didn't find having a buddy process that ran interference for the fsm much easier to deal with and understand instead).
  • I can't remember a time when I wanted something other than an atom to represent a state (but perhaps I lack creativity).
    ·

On point 1: I do all the time, whenever I send async events, but I want to query the state of the FSM. That's a synchronous event.

On point 2 and 4: It would be nice to be able to create nested states, which means having to represent your state possibly as {SuperState, SubState} -- this means multiple ways to arrange things in an
d dispatch them, and a non-atom state ('<SuperState>_<SubState>' is really annoying to match on). Otherwise, it lets you represent specific states such as {waiting_on, X} rather than waiting_on + S#state.waited_on -- which quickly has you blow up your state record to handle all the needs of all the states and risks leaving undesired data behind as you transition through.

On point 3: this is valuable as a property. Check out Death by Accidental Complexity by Ulf Wiger and you'll want them too

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 10, 2016

@essen

I think that if you remove features from gen_statem you will get gen_server or some small addition to it.

The features of gen_statem over gen_server as I see them are:

  • Handling all events in one specific function per state
  • Doing something close to selective receives i.e postponing events i.e an event queue
  • Enabling a non-finite state machine (vs gen_fsm)

Handling timers/demonitors/unlinks as well as insert_event, remove_event, etc are needed due to the event queue.

Regarding handling all events in one specific function per state

We agree on that the concept of having a single function per state is a great usability improvement, and that returning a list of commands is a good feature. Thank you for that!

Regarding a non-finite state machine

You argue that by dropping handle_event/5 we could drop the State argument making all state handling functions arity 4. That would reserve the state name code_change or forcing some ugly quirk of the code_change/4 function to change its arity. Furthermore I have already while using gen_statem encountered this pattern:

state1(info, event1, _PrevState, _State, _StateData) ->
    do_something(event1),
    {};
state1(Type, Content, PrevState, State, StateData) ->
    handle_common_events(Type, Content, PrevState, State, SateData).

Here most of the events are handled in the state function, but some events benefit on using a common handling function, and that function might need State. Sure you can write the literate state state1 in the call, but then you are likely to get cut-and-paste errors.

Furthermore we have already found use cases for a non-atom state, so I think it is a not unimportant feature.

Enabling a non-finite state machine just has the consequence that the state function handle_event/5 becomes special but only if you do not use a non-atom state which is up to you.

Therefore I think the feature of a non-finite state machine is worth the small additional complexity.

Regarding the event queue

I think this is a killer feature of gen_statem. You do not know what you are missing unless you have tried it. I deem that there are many state machines out there written with gen_server that has some half-hearted event queue, some shady treatment of certain calls, or some creative interpretation of the specification just to get around the fact that you do not want to handle some events in all states but can not ignore them.

The basics are simple enough. You retry an event and deal with it in a later state. This is not as efficient as a true selective receive, but it should not be hard for the implementation to keep the number of postponed events small so I think correctness beats efficiency in this case.

If you remove the event queue from gen_statem you get gen_server with a little code for aggregating handle_call, handle_cast and handle_info into one function determined by one state field. That is around 10 lines of code.

Therefore I think the event queue is a really important feature that lifts the abstraction level far over what you easily can do with gen_server, and that without it there is no need for gen_statem.

Removing stray messages from e.g cancel_timer

You could ditch this feature by stating that it is up to the state machine to ignore all such stray events. But that would make it very hard to know if a stray {timeout,Ref,Msg} is actually one that you initiated yourself and then it became obsolete from one that you initiated erroneously.

I think that it should be possible in a state machine implementation to know exactly which events that are truly unexpected, and therefore a reliable way to e.g cancel a timer and removing the stray message is needed.

The argument that most state machines for simplicity ignore all unknown messages e.g matching late timeout messages is not strong enough to not implement {cancel_timer,Ref} and its fellows. If you want to write tight code you should be able to.

Complexity

I think most of the complexity is percieved from that I somewhat failed in the documentation, and that all the features are essential to make gen_statem worth using.

@zxq9

This comment has been minimized.

zxq9 commented Feb 10, 2016

On point 1: I do all the time, whenever I send async events, but I want to query the state of the FSM. That's a synchronous event.

It is synchronous, but its also not something that is occurring in the context of using the fsm as an fsm -- which puts it in the category of system messages (which is totally normal). An asynch or synch fsm spawned by proc_lib will still deal with system messages querying its state just fine (other semantics could be written in, but why when those already exist?) -- my point is that the peer messages I want to send it are not likely to be a casual mix of synch/asynch forms.

On point 2 and 4: It would be nice to be able to create nested states, which means having to represent your state possibly as {SuperState, SubState} -- this means multiple ways to arrange things in and dispatch them, and a non-atom state ('_' is really annoying to match on). Otherwise, it lets you represent specific states such as {waiting_on, X} rather than waiting_on + S#state.waited_on -- which quickly has you blow up your state record to handle all the needs of all the states and risks leaving undesired data behind as you transition through.

When this desire grips my heart, I reach for a different tool.

On point 3: this is valuable as a property. Check out Death by Accidental Complexity by Ulf Wiger and you'll want them too

I tend to think of this as, once again, something more system-oriented than "using it as an actual fsm". The real problem I see here is that its very hard for me to envision a gen_* type behavior that is more simply implemented and easy to use than simply implementing the exact selective receive you wrote as an example above within a proc_lib module.

The "would be nice to have"s are going to wind up pushing this behavior back into the corner with its disused cousin, gen_fsm. I really think all these great ideas are great -- and that they probably belong in separate modules. As someone trying to solve an actual problem without letting the complexity of the tool itself set my hair on fire I am deliberately seeking a behavior with limited functionality when I start thinking in terms of fsms. I don't know the right answer to solve all the issues in this behavior recommendation, but I do know that all the "nice to have"s are pushing it in the wrong direction. Fragmentation of features into separate behaviors may be a win.

@essen

This comment has been minimized.

Contributor

essen commented Feb 10, 2016

Have you wondered if you could improve the design of state machines for protocols? For example these usually come with two state machines, one for reading/parsing and one for processing. As far as I understand neither gen_fsm nor gen_statem would make it easy to cleanly separate both state machines without requiring separate processes.

@HansN

This comment has been minimized.

Contributor

HansN commented Feb 10, 2016

I have done some coding with the new gen_statem for a couple of weeks
using early versions from Raimo.

The target was the ssh_connection_handler, which is implemented with
gen_fsm. I want to point of a couple of gen_statem properties that are
in my opinion invaluable.

This central FSM of the ssh protocol has a number of complications:

  • It is a union of the server-side and client-side fsm:s due to an
    extreme similarity
  • In some cases there is a however a difference in what a server or
    client might send or accept
  • Message decryption AND decoding is stateful, depending on the state
    of this fsm
  • Received byte stream suffer of the usual framing problem, but since
    the stream is encrypted it must first be partly decrypted with state
    information
  • Decrypted and decoded messages sometime must be handled immediatly
    (for example an explicit disconnect message)
  • Some events/responses need queuing

Non-atomic state names

The distinction of the role (server or client) was previously as a flag
in the state data. This caused the fsm callback functions to use
matching in statedata to distinguish between similar states. This mean
that reading is hard since I must also read data and understand it
although the state name seem explicit. The non-atomic state name comes
handy here:

Handling 'state1' client part with gen_fsm:

  state1(...., #state{role==client} = S) ->

and with gen_statem:

  handle_event(..., {state1,client}, ...) ->

Event insertion

The peer byte stream arrived in handle_info as messages from gen_tcp.
They where decrypted, framed and decoded (needing state information(!))
and then injected as events in the fsm by calling the state function in
handle_info. This is more elegantly done with the insert_event operation:

Gen_fsm:

  handle_info({tcp,..,Bytes}, StateName, S0) ->
      try decrypt_and_decode(Bytes, S0) of
         {ok, Msg, S} ->
             ?MODULE:StateName(Msg, S)
      ...

Gen_statem:

  handle_event(info, {tcp,..,Bytes}, _, ... S0) ->
      try decrypt_and_decode(Bytes, S0) of
         {ok, Msg, S} ->
             {StateName, S,  [{insert_event, internal, Msg}]}
      ...

what does we gain of this? Firstly, all state/event handling is done by
the gen_* with documented primitives, and not by me happening to know
how to cheat the driver. Secondly the tracing/debugging of gen_* works!
That is, we get both the byte stream and the decoded message explicit in
the gen_* tracing! Valuable in my opinion.

"Conclusion"

So how was the ssh_connection_handler after the change to gen_statem? I
must say that the complex parts just vanished and the logic suddenly
became clear. I am not finnished with the re-write yet, but so far it
looks that gen_statem is the way to go. Not gen_fsm and certainly not
gen_server for a protocol with complications like ssh.

-Hans

On 02/09/2016 11:03 AM, RaimoNiskanen wrote:

gen_statem is a proposed new state machine engine in the gen_*
framework that is intended to supersede /gen_fsm/ since many find it
hardly usable.

Major benefits over /gen_fsm/:

  • All events are handled in the same place in the implementation for a
    given state
  • Selective receive is mimicked by an internal queue that allows
    retrying an event in a later state
  • An infinite state space is possible

I have created a pull request of it to get external feedback. We have
started to convert some state machines here at OTP to use gen_statem
and are so far fairly happy with it.

Build and read the documentation. Please write some state machines.

/ Raimo Niskanen


    You can view, comment on, or merge this pull request online at:

#960

    Commit Summary


Reply to this email directly or view it on GitHub
#960.

@essen

This comment has been minimized.

Contributor

essen commented Feb 10, 2016

Can you post the code somewhere? An example would probably go a long way.

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 10, 2016

@ferd

Doesn't that risk being a problem? If I want to retry multiple messages later within the state until my Data has been modified, there are no semantics to accomplish that, something that I could do with true selective receives.

It also has the caveats of the questions I asked in my previous post:

Assume I have states a and b. While in a I received two events, E1 and E2, which I have deferred to retry for later. As a transitions to b, I handle an event that changes my state back to a. Are E1 and E2 going to be retried since I did have a transition a -> b despite currently being in a, the same state I was in when I first scheduled a retry?

Yes. Because you have a transition b -> a all postponed are retried. There is no memory of in which states an event has been postponed.

Same question, but now assume that E1 is run in b and it does trigger a as a new state. Is E2 going to run when transitioning from b -> a despite having been retried in a?

Yes. Because you have a transition b -> a all postponed events are retried.

Let's now imagine I have states a, b, and c. If I retry E1 in a, then retry it again in b, can it only ever run while in c? If I then retry it in c again, am I stuck with unhandlable events?

No. After a state transition all postponed events are retried.

Those are pretty damn important semantics. When I use a selective receive, I do not have to bother with it because I pick and choose as I go, but here I totally depend on the underlying implementation.

Well, that applies to selective receive also. The semantics has to be well defined. In this case the rule is that after a state transition all postponed events are retried.

This pattern lets me peek inside my inbox to always get higher priority messages. I could use these, but also sys messages for example, if I wanted my OTP tracing and suspending to take top priority. If I go for selective receives, this is a thing I can do.

Well, you are right. Postponing events is not as powerful as true selective receive, and prioritizing high important messages by that kind of specialized receive statements is can not be done with the proposed mechanism.

I still think that postponing events is good enough to be worth having...

@lehoff

This comment has been minimized.

lehoff commented Feb 10, 2016

Re timer handling
I think timers should be avoided in the implementation of gen_statem.

If people need a timeout they should turn that into an event using
something like timer:apply_after(Time, my_fsm, timout_something, Args).

(timer:apply_after/4 should probably be move somewhere else.)

This has the nice side effect that mocking timeouts while testing becomes
super easy.

On 10 February 2016 at 15:06, RaimoNiskanen notifications@github.com
wrote:

@essen https://github.com/essen

I think that if you remove features from gen_statem you will get
gen_server or some small addition to it.

The features of gen_statem over gen_server as I see them are:

  • Handling all events in one specific function per state
  • Doing something close to selective receives i.e postponing events
    i.e an event queue
  • Enabling a non-finite state machine (vs gen_fsm)

Handling timers/demonitors/unlinks as well as insert_event, remove_event,
etc are needed due to the event queue.
Regarding handling all events in one specific function per state

We agree on that the concept of having a single function per state is a
great usability improvement, and that returning a list of commands is a
good feature. Thank you for that!
Regarding a non-finite state machine

You argue that by dropping handle_event/5 we could drop the State
argument making all state handling functions arity 4. That would reserve
the state name code_change or forcing some ugly quirk of the code_change/4
function to change its arity. Furthermore I have already while using
gen_statem encountered this pattern:

state1(info, event1, _PrevState, _State, _StateData) ->
do_something(event1),
{};
state1(Type, Content, PrevState, State, StateData) ->
handle_common_events(Type, Content, PrevState, State, SateData).

Here most of the events are handled in the state function, but some events
benefit on using a common handling function, and that function might need
State. Sure you can write the literate state state1 in the call, but then
you are likely to get cut-and-paste errors.

Furthermore we have already found use cases for a non-atom state, so I
think it is a not unimportant feature.

Enabling a non-finite state machine just has the consequence that the
state function handle_event/5 becomes special but only if you do not use
a non-atom state which is up to you.

Therefore I think the feature of a non-finite state machine is worth the
small additional complexity.
Regarding the event queue

I think this is a killer feature of gen_statem. You do not know what you
are missing unless you have tried it. I deem that there are many state
machines out there written with gen_server that has some half-hearted
event queue, some shady treatment of certain calls, or some creative
interpretation of the specification just to get around the fact that you do
not want to handle some events in all states but can not ignore them.

The basics are simple enough. You retry an event and deal with it in a
later state. This is not as efficient as a true selective receive, but it
should not be hard for the implementation to keep the number of postponed
events small so I think correctness beats efficiency in this case.

If you remove the event queue from gen_statem you get gen_server with a
little code for aggregating handle_call, handle_cast and handle_info into
one function determined by one state field. That is around 10 lines of code.

Therefore I think the event queue is a really important feature that lifts
the abstraction level far over what you easily can do with gen_server,
and that without it there is no need for gen_statem.
Removing stray messages from e.g cancel_timer

You could ditch this feature by stating that it is up to the state machine
to ignore all such stray events. But that would make it very hard to know
if a stray {timeout,Ref,Msg} is actually one that you initiated yourself
and then it became obsolete from one that you initiated erroneously.

I think that it should be possible in a state machine implementation to
know exactly which events that are truly unexpected, and therefore a
reliable way to e.g cancel a timer and removing the stray message is needed.

The argument that most state machines for simplicity ignore all unknown
messages e.g matching late timeout messages is not strong enough to not
implement {cancel_timer,Ref} and its fellows. If you want to write tight
code you should be able to.
Complexity

I think most of the complexity is percieved from that I somewhat failed in
the documentation, and that all the features are essential to make
gen_statem worth using.


Reply to this email directly or view it on GitHub
#960 (comment).

http://www.linkedin.com/in/torbenhoffmann
@lehoff

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Feb 10, 2016

@lehoff

Re timer handling
I think timers should be avoided in the implementation of gen_statem.

This was my first stance, but soon enough it turned out that a state machine of ours used the timeout of gen_fsm to drop itself into hibernation after inactivity, so having a state timeout turned out to be useful enough to implement.

@lehoff

This comment has been minimized.

lehoff commented Feb 10, 2016

That could easily be done with an event that triggers jumping into
hibernation.
That complication does not seem worth it for this. State timeouts are more
hassle than benefit.

On 10 February 2016 at 15:40, RaimoNiskanen notifications@github.com
wrote:

@lehoff https://github.com/lehoff

Re timer handling
I think timers should be avoided in the implementation of gen_statem.

This was my first stance, but soon enough it turned out that a state
machine of ours used the timeout of gen_fsm to drop itself into
hibernation after inactivity, so having a state timeout turned out to be
useful enough to implement.


Reply to this email directly or view it on GitHub
#960 (comment).

http://www.linkedin.com/in/torbenhoffmann
@lehoff

@IngelaAndin

This comment has been minimized.

Contributor

IngelaAndin commented Feb 10, 2016

I have used the new gen_statem to simplify the ssl application state machine, that is currently implemented using gen_fsm.

I am still working on the final version since this it is a moving target and the final
design is not yet set i in stone.

But here are some points form a real use case.

I really want the semantics of selective receive, but still want my process to be OTP compatible. Yes I can do it my self but reinventing the wheel over and over again is putting the effort in the wrong place. It is not about how hard it is, it is about concentrating the efforts on my problem.

For instance in the current implementation you may get a crash in the ssl application if the user tries to upgrade a gen_tcp socket to SSL/TLS that is in active mode. The socket must be in passive mode before you can start an upgrade, and if the socket is not crated in passive mode (not the default) the upgrade has to be coordinated with the peer. Reported here: http://bugs.erlang.org/browse/ERL-69

With the selective receive mechanism it is trivial to make sure that the user can not crash the ssl process and thereby not create a error report on ssl application when a user makes an error. What we have now is several function clauses that does not really make sense but tries to make sure ssl application does not crash and if timing is lucky it may work and if timing is less lucky the connection will fail. I really think this is awful.

Also insert_event is really useful. In the ssl case all events are received form a socket, but when they are received they need to be decrypted and decoded before they can be interpreted as an event. And when that is done it is really important that they be processed first, I can not use
cast or a send event function to send a message to myself because then I can destroy the logical
order of the events as recived from the tcp socket.

In the current implementation this is handled by writing a lot of dispatcher code calling state functions explicitly. This does make the code harder to follow, and the abstraction start to feel pointless when you write code to not use it.

@zxq9

This comment has been minimized.

zxq9 commented Feb 10, 2016

I can't shake the feeling that this concept is perhaps conflating creation of a gen_crypto_protocol_thingy with the broader desire for an actual gen_fsm that is just an fsm. Perhaps a different name is in order?

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 8, 2016

I am now working on the Users' Guide, or rather on the Design Principles chapter for gen_statem and hope to have something to show any day/week now.

Both the applications SSH and SSL have their state machines being rewritten to use gen_statem so you can look at real world examples when they get published in our 19.0.rc1. SSH is now working at least as good as before the rewrite but have a few old bugs to hunt down. SSL is just getting through "almost all" test cases but is expected to be in good shape to RC1.

So the semantics and API are slowly freezing...

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 15, 2016

New Update

Added missing feature: short form of {timeout,T,Msg} that was by accident not implemented .

I have written a chapter for gen_statem in OTP Design Principles User's Guide:
http://erlang.org/~raimo/doc-7.2/doc/design_principles/statem.html

It's a first version so comments are welcome!

Next week I will merge this to the 'master' branch so our technical writers can have a look at the documentation.

Documentation
.BEAM files

@ferd

This comment has been minimized.

Contributor

ferd commented Apr 15, 2016

In:

The gen_statem behaviour supports two different callback modes. In the mode state_functions, the state transition rules are written as a number of Erlang functions, which conform to the following convention:

StateName(EventType, EventContent, Data) ->
.. code for actions here ...
{next_state, StateName', Data'}.

...

handle_event(EventType, EventContent, State, Data) ->
.. code for actions here ...
{next_state, State', Data'}

I would avoid using ' since that's not valid syntax and our newcomer friends may get a bit confused. Just NewState and NewStateName may work there.

I'm okay to the content until 4.9 (yay! the single-event FSM is shown!)

In 4.9, the text says:

See the documentation for details. You can for example actually reply to several callers and generate multiple next events to handle.

with a link on documentation. I am already in the documentation. You may want to stay consistent and use reference manual again here.

In 4.11:

If you need to cancel a timer due to some other event you can use erlang:cancel_timer(Tref). Note that a timeout message can not arrive after this, unless you have postponed it before (why on earth one would do that).

I would replace that last parentheses by (if this is undesirable, make sure you do not accidentally postpone such messages) -- someone having a problem with it won't go "okay well the people writing the doc think I'm an idiot".

4.12:
This section is about postponing. Since you refer to postponing in 4.11 (Timeouts), you may want to swap these two sections together.

The state transition action postpone is designed to be able to model selective receive

I'd add a 's' to the last receive here.


I think the doc is good. There are only two things missing, and they're intimately related.

  1. When to pick either format for the FSM, if any (preferences, or, as follows...)
  2. the ability to handle complex states that are not atoms

I think if we want to add a small section for that we could translate/adapt one of the examples we had in this thread to show the functionality.

@OTP-Maintainer

This comment has been minimized.

OTP-Maintainer commented Apr 15, 2016

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 18, 2016

@ferd: Thank you for your corrections! I have used them all except for merging 4.11 with 4.12 since I think they are talking about different things, but I did put a forward reference where postponing is mentioned in 4.11.

Then I will start writing on the two missing items, that I agree are missing.

@ferd

This comment has been minimized.

Contributor

ferd commented Apr 18, 2016

Cool, I'll be looking forward for the new sections. For what it's worth, I advocated swapping 4.11 and 4.12, not merging them :)

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 18, 2016

Aah, yes you did. Sorry I misread.

I think I will keep the order to treat State Timeouts earlier since it is probably a more common concept and I do not want it to come too far after Event Timeouts.

@jlouis

This comment has been minimized.

Contributor

jlouis commented Apr 20, 2016

Catching up on this: I really like the direction in which this is going. It looks highly usable!

@fishcakez

This comment has been minimized.

Contributor

fishcakez commented Apr 20, 2016

I tried using gen_statem and hit a couple of bumps.

Should code_change/4 return {state_functions | handle_event_function, State, Data} instead of {:ok, State, Data}? The current return value limits the changes possible with an update. Also in the case of state_functions there isn't a check to prevent changing into a non-atom state. Should that be prevented? It is prevented in other callbacks.

Could something "interesting" happen if {ok, Data} was returned by mistake from code_change/4, as Data would be used for Misc?

When using sys:replace_state/2 its possible to change to a non-atom state when using state_functions. Should this be prevented?

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 21, 2016

@fishcakez wrote:

Should code_change/4 return {state_functions | handle_event_function, State, Data} instead of {:ok, State, Data}? The current return value limits the changes possible with an update.

You are absolutely right!

It is even worse than that. The new code actually does not need to have a clue about what callback mode the old code used; the only way it could possibly know would be indirectly through the Vsn argument to sys:change_code/4,5 that is through version handling of application upgrade. That is only an administrative way of knowing the old code's callback mode, which is very unattractive.

The new code's code_change/4 should only need to understand the old State and Data and it knows which callback mode it wants. Hence it should specify it just like for init/1 and there can be no default.

I'll get some second opinions in my group but then probably change the code_change/4 return value signature to {callback_mode(),State,Data}.

The purpose was to be compatible with gen_fsm but that seems to be impossible in this case...

Also in the case of state_functions there isn't a check to prevent changing into a non-atom state. Should that be prevented? It is prevented in other callbacks.

Here I have considered to let gen_statem fail when calling the new state, or to check the value when changing states. I missed checking code_change/4 and the sys:replace_state/2 fun return values. Either should all be checked or none, and instead fail when calling the new state function.

It is always possible to transit to a state that is not implemented so maybe that should be the way to handle all these errors. I am starting think there is no reason to have special handling of setting a non-atom state in callback mode state_functions, but the badarg error from apply that would be the result can perhaps be improved.

Could something "interesting" happen if {ok, Data} was returned by mistake from code_change/4, as Data would be used for Misc?

Yes... That is a wart I inherited from gen_fsm and should fix it instead. Making that mistake could get really ugly.

Thank you for pointing out these irregularities!

Modify code_change/4 to return CallbackMode
Also move check of non-atom states in callback mode
state_functions to where the state function is called.
This gives homogenous diagnostics for state functions,
code_change/4 and system_replace_state StateFun.

Irregularities pointed out by James Fish.
@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 21, 2016

Updated version

  • Fix behaviour documentation according to @ferd and Ingela
  • Add a section 4.2 about callback modes and choosing them.
  • Add two paragraphs last under 4.13 about postponing events
  • Add an example 4.16 with complex state
  • Unify error detection of invalid callback function to after the call has failed, not when setting the new state. This fixes an irregularity pointed out by @fishcakez
  • Change code_change/4 to have to return {NewCallbackMode,NewState,NewData} as proposed by @fishcakez. That hit the documentation here and there. This is a backwards incompatible change

To Do:

  • Boast some about aiming to replace gen_fsm
  • Point to gen_statem from gen_fsm documentation
  • Explain the not-quite-experimental state of gen_statem in OTP-19.0
  • Merge to 'master'
  • Have our hired Technical Writers go through the documentation

Reference Manual
Behaviour Documentation
.BEAM files

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 22, 2016

I might have forgotten to push the previous version, but now it and these ToDo points are pushed:

  • Boast some about aiming to replace gen_fsm
  • Point to gen_statem from gen_fsm documentation
  • Explain the not-quite-experimental state of gen_statem in OTP-19.0

Remaining To Do:

  • Merge to 'master'
  • Have our hired Technical Writers go through the documentation

Reference Manual
Behaviour Documentation
.BEAM files

@OTP-Maintainer

This comment has been minimized.

OTP-Maintainer commented Apr 22, 2016

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


Add section on state filtering
Misc documentation fixes.

@RaimoNiskanen RaimoNiskanen merged commit f06f690 into erlang:master Apr 25, 2016

@RaimoNiskanen

This comment has been minimized.

Contributor

RaimoNiskanen commented Apr 25, 2016

Merged to master

Now our hired Technical Writers can access and go through the documentation.

Reference Manual
Behaviour Documentation

@RaimoNiskanen RaimoNiskanen deleted the RaimoNiskanen:raimo/new-gen-state-machine/OTP-13065 branch Oct 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment