You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Race condition during bilateral initial resend request
I believe I have found a race condition when the initiator successfully connects to an acceptor.
The specific scenarios which cause a bug is when both have a few messages of advance at the time of the first successful logon and the initiator unsent messages are all administrative messages (Logon).
Expected sequence of event
The expected flow is as follow:
Direction
Message type
init -> acc
Logon
acc -> init
Logon
acc -> init
ResendRequest
init -> acc
ResendRequest
acc -> init
News (the 1st message to resend)
acc -> init
News (the 2nd message to resend)
acc -> init
SequenceReset
init -> acc
SequenceReset
Issue
The issue is that sometimes, the initiator sends the sequenceReset immediatly after the resendRequest of the acceptor
and then sends it's own resendRequest.
The sequenceReset is visibly out of order as it has a wrong timestamp (a later one) and the resendRequest continues with the sequence number of before the sequence reset.
The issue is that the acceptor that I'm connecting to (which is using Quickfix Java I think) doesn't tolerate the wrong sequence number and rejects it with a Logout.
And thus the resend mechanism fails and the acceptor messages are lost.
Actual sequence of event
Here are the outputed FIX messages during the scenario:
8=FIX.4.4|A9=76|A35=A|A34=4|A49=INIT|A52=20210525-22:05:30.686|A56=ACC|A369=0|A98=0|A108=90|A10=220|A
8=FIX.4.4|A9=71|A35=A|A34=14|A49=ACC|A52=20210525-22:05:30.803|A56=INIT|A98=0|A108=90|A10=239|A
8=FIX.4.4|A9=68|A35=2|A34=15|A49=ACC|A52=20210525-22:05:30.804|A56=INIT|A7=1|A16=0|A10=068|A
8=FIX.4.4|A9=107|A35=4|A34=1|A43=Y|A49=INIT|A52=20210525-22:05:30.833|A56=ACC|A122=20210525-22:05:30.833|A369=15|A36=6|A123=Y|A10=013|A
8=FIX.4.4|A9=73|A35=2|A34=5|A49=INIT|A52=20210525-22:05:30.821|A56=ACC|A369=0|A7=1|A16=0|A10=030|A
8=FIX.4.4|A9=108|A35=5|A34=16|A49=ACC|A52=20210525-22:05:30.848|A56=INIT|A58=MsgSeqNum too low, expecting 6 but received 5|A10=159|A
8=FIX.4.4|A9=76|A35=A|A34=7|A49=INIT|A52=20210525-22:05:40.882|A56=ACC|A369=1|A98=0|A108=90|A10=223|A
8=FIX.4.4|A9=71|A35=A|A34=17|A49=ACC|A52=20210525-22:05:40.996|A56=INIT|A98=0|A108=90|A10=000|A
8=FIX.4.4|A9=68|A35=2|A34=18|A49=ACC|A52=20210525-22:05:40.997|A56=INIT|A7=6|A16=0|A10=090|A
8=FIX.4.4|A9=107|A35=4|A34=6|A43=Y|A49=INIT|A52=20210525-22:05:41.031|A56=ACC|A122=20210525-22:05:41.031|A369=18|A36=9|A123=Y|A10=008|A
8=FIX.4.4|A9=73|A35=2|A34=8|A49=INIT|A52=20210525-22:05:41.015|A56=ACC|A369=1|A7=2|A16=0|A10=032|A
8=FIX.4.4|A9=108|A35=5|A34=19|A49=ACC|A52=20210525-22:05:41.046|A56=INIT|A58=MsgSeqNum too low, expecting 9 but received 8|A10=160|A
Root cause
I believe I identified the root cause of the issue.
There are essentially three goroutines involved in processing messages in quickfixgo:
the read loop (go readLoop()), which parse and sends messages to the messageIn channel.
the processing loop (session.run()), which selects from either messageIn (message to be processed) or messageEvent (signaling message in the sending queue)
the write loop (go writeLoop())
Step 1
When the acceptor Logon message is received, we are in the LogonState.
The Logon message is written in the messageIn channel, and the session gets it and forwards it to the FixMsgIn() of the LogonState.
So the FixMsgIn() of the logon state essentially create a resendRequest which is simply put in the queue of message to send as we are not logged and trigger the messageEvent signal.
This puts us in the state resendState which in our case will simply proxy the next FixMsgIn to the loggedOn state as there is nothing else to do.
Step 2.a
The session get the messageEvent signal from step 1, and as the resendState signal is considered a loggedOn state, it dequeue the single resend message and immediately sends it.
Step 2.b
The readLoop gets the resendRequest of the acceptor and writes it in the messageIn channel.
The session get the message and submits it to the FixMsgIn of the loggedOn state. This creates a resendRequest and IMMEDIATELY sends it as bytes.
Issue
The issue is that the step 2.a and 2.b are concurrent, as during the time requested to perform step 1, the readloop could have first written the resendRequest in the messageIn which would cause step 2.b to be executed before step 2.a.
The problem is that in that case, step 2.b ignores the fact that there is already a message to send in the queue (the resend) and bypass it.
Possible remediation
If step 2.b happens first, if instead of immediately sending the sequenceReset, it first send the queued resendRequest, no out of order messages are sent.
The text was updated successfully, but these errors were encountered:
Race condition during bilateral initial resend request
I believe I have found a race condition when the initiator successfully connects to an acceptor.
The specific scenarios which cause a bug is when both have a few messages of advance at the time of the first successful logon and the initiator unsent messages are all administrative messages (Logon).
Expected sequence of event
The expected flow is as follow:
Issue
The issue is that sometimes, the initiator sends the
sequenceReset
immediatly after theresendRequest
of the acceptorand then sends it's own
resendRequest
.The
sequenceReset
is visibly out of order as it has a wrong timestamp (a later one) and theresendRequest
continues with the sequence number of before the sequence reset.The issue is that the acceptor that I'm connecting to (which is using Quickfix Java I think) doesn't tolerate the wrong sequence number and rejects it with a Logout.
And thus the resend mechanism fails and the acceptor messages are lost.
Actual sequence of event
Here are the outputed FIX messages during the scenario:
Root cause
I believe I identified the root cause of the issue.
There are essentially three goroutines involved in processing messages in quickfixgo:
go readLoop()
), which parse and sends messages to the messageIn channel.session.run()
), which selects from either messageIn (message to be processed) or messageEvent (signaling message in the sending queue)go writeLoop()
)Step 1
When the acceptor Logon message is received, we are in the
LogonState
.The Logon message is written in the
messageIn
channel, and the session gets it and forwards it to theFixMsgIn()
of theLogonState
.So the
FixMsgIn()
of the logon state essentially create aresendRequest
which is simply put in the queue of message to send as we are not logged and trigger themessageEvent
signal.This puts us in the state
resendState
which in our case will simply proxy the nextFixMsgIn
to theloggedOn
state as there is nothing else to do.Step 2.a
The session get the
messageEvent
signal from step 1, and as theresendState
signal is considered aloggedOn
state, it dequeue the single resend message and immediately sends it.Step 2.b
The
readLoop
gets theresendRequest
of the acceptor and writes it in themessageIn
channel.The session get the message and submits it to the
FixMsgIn
of theloggedOn
state. This creates aresendRequest
and IMMEDIATELY sends it as bytes.Issue
The issue is that the step 2.a and 2.b are concurrent, as during the time requested to perform step 1, the
readloop
could have first written theresendRequest
in themessageIn
which would cause step 2.b to be executed before step 2.a.The problem is that in that case, step 2.b ignores the fact that there is already a message to send in the queue (the resend) and bypass it.
Possible remediation
If step 2.b happens first, if instead of immediately sending the
sequenceReset
, it first send the queuedresendRequest
, no out of order messages are sent.The text was updated successfully, but these errors were encountered: