-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disconnection for unexpected timeout on heartbeat response to a TestRequest #569
Comments
Hi @annacochetti , thanks for the report. Thanks, |
Done, thanks for picking it immediately |
Some more questions: How many FIX sessions do you run on your acceptor/initiator? Do they both only have one session? What are you doing in your logic when you receive an ExecutionReport? Do you have some processing in your fromApp() callback that might block? This is the main reason for such problems because QFJ cannot process incoming messages (such as heartbeats) when something is blocking the message processing thread. Is this problem always showing after reception of an application message, e.g. ExecReport? |
The connections have been up and runnning for years now. |
Are you using one initiator instance that handles the various initiator sessions in one java process? What were the other initiator sessions doing around the time of the first disconnection? Were they connected? Are you using a I think this is hard to analyse post mortem. IMHO it is obvious that something blocked the message processing thread, otherwise the heartbeats and the logon (after the disconnection) would have been processed. |
All the other sessions are working well. The only session affected is the FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW |
I checked the code and the message "Setting DefaultApplVerID (1137=7) from Logon" is logged before the message is queued for processing. quickfixj/quickfixj-core/src/main/java/quickfix/mina/initiator/InitiatorIoHandler.java Lines 65 to 80 in e916e4d
That supports my assumption of a blocked thread. Since you use the Edit: by the way this could also be a bug in QFJ itself (your used version is some years old) but I cannot recall a specific bug at the moment. Thanks, |
Each Session uses its own ThreadedSocketInitiator and tht's why we have only one session that has issues. |
OK, so it is no longer an assumption that a thread is blocked. :) |
Agreed. |
We found at last that we had incorrect and undetected repeated accesses to the database. |
So can this thread be marked as "answered" then? I actually thought the thread in your app couldn't be blocked because you said
So I understood that you put the messages to a queue, instantly returning the processing back to fromApp(). But maybe I misunderstood.
I am afraid that needs to be implemented by your custom application logic. You could maybe monitor the return value of |
I am closing this issue, since there is no bug here. Thanks for your help, Chris |
You're welcome, Anna. 👍 |
In our production system we have an acceptor and an initiator both QFJ based.
At a certain time the initiator logged
errorEvent -> logError :149 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Disconnecting: Timed out waiting for heartbeat
and abruptly disconnected the session,
while the hb time interval was set to 30 seconds and the last hb has been received 27 seconds before as a response to a test request (and was not the only message sent on the session)
After this issue the initiator was not able to understand that the acceptor was responding to the Logon messages and kept logging
errorEvent -> logError :149 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Disconnecting: Timed out waiting for logon response
The complete log portion is:
2022-11-11 09:08:52,055 INFO [QFJ Timer] outgoing -> send :2572 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=81|35=0|34=348|49=TRADESTAC_TRANSACTION_TW|52=20221111-08:08:52.055|56=TRADESTAC_TW|10=123|
2022-11-11 09:10:52,952 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=81|35=0|34=409|49=TRADESTAC_TW|52=20221111-08:10:52.952|56=TRADESTAC_TRANSACTION_TW|10=120|
[...] // Test request
2022-11-11 09:10:55,052 INFO [QFJ Timer] outgoing -> send :2572 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=90|35=1|34=368|49=TRADESTAC_TRANSACTION_TW|52=20221111-08:10:55.052|56=TRADESTAC_TW|112=TEST|10=137|
[...] // Many execution reports
2022-11-11 09:10:55,052 INFO [QFJ Timer] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Sent test request TEST
2022-11-11 09:10:55,053 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=90|35=0|34=410|49=TRADESTAC_TW|52=20221111-08:10:55.053|56=TRADESTAC_TRANSACTION_TW|112=TEST|10=125|
2022-11-11 09:10:56,084 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=574|35=8|34=411|49=TRADESTAC_TW|52=20221111-08:10:56.084|56=TRADESTAC_TRANSACTION_TW|6=0|[...]
2022-11-11 09:11:15,446 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=852|35=8|34=412|49=TRADESTAC_TW|52=20221111-08:11:15.446|56=TRADESTAC_TRANSACTION_TW|6=0[...]
2022-11-11 09:11:17,466 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=703|35=8|34=413|49=TRADESTAC_TW|52=20221111-08:11:17.466|56=TRADESTAC_TRANSACTION_TW|6=0[...]
2022-11-11 09:11:19,452 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=812|35=8|34=414|49=TRADESTAC_TW|52=20221111-08:11:19.452|56=TRADESTAC_TRANSACTION_TW|6=0|[...]
2022-11-11 09:11:20,453 INFO [NioProcessor-44] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=1007|35=8|34=415|49=TRADESTAC_TW|52=20221111-08:11:20.452|56=TRADESTAC_TRANSACTION_TW|6=0|[...]
2022-11-11 09:11:22,053 ERROR [QFJ Timer] errorEvent -> logError :149 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Disconnecting: Timed out waiting for heartbeat
2022-11-11 09:11:22,751 INFO [NioProcessor-86] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: MINA session created: local=/127.0.0.1:65078, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=localhost/127.0.0.1:8089
2022-11-11 09:11:23,052 INFO [QFJ Timer] outgoing -> send :2572 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=104|35=A|34=1|49=TRADESTAC_TRANSACTION_TW|52=20221111-08:11:23.052|56=TRADESTAC_TW|98=0|108=30|141=Y|1137=7|10=198|
2022-11-11 09:11:23,054 INFO [QFJ Timer] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Initiated logon request
2022-11-11 09:11:23,057 INFO [NioProcessor-86] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=104|35=A|34=1|49=TRADESTAC_TW|52=20221111-08:11:23.056|56=TRADESTAC_TRANSACTION_TW|98=0|108=30|141=Y|1137=7|10=202|
2022-11-11 09:11:23,058 INFO [NioProcessor-86] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Setting DefaultApplVerID (1137=7) from Logon
2022-11-11 09:11:34,051 ERROR [QFJ Timer] errorEvent -> logError :149 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Disconnecting: Timed out waiting for logon response
2022-11-11 09:11:34,798 INFO [NioProcessor-100] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: MINA session created: local=/127.0.0.1:65112, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=localhost/127.0.0.1:8089
2022-11-11 09:11:35,051 INFO [QFJ Timer] outgoing -> send :2572 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=104|35=A|34=1|49=TRADESTAC_TRANSACTION_TW|52=20221111-08:11:35.051|56=TRADESTAC_TW|98=0|108=30|141=Y|1137=7|10=200|
2022-11-11 09:11:35,053 INFO [QFJ Timer] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Initiated logon request
2022-11-11 09:11:35,056 INFO [NioProcessor-100] incoming -> messageReceived :129 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: 8=FIXT.1.1|9=104|35=A|34=1|49=TRADESTAC_TW|52=20221111-08:11:35.054|56=TRADESTAC_TRANSACTION_TW|98=0|108=30|141=Y|1137=7|10=203|
2022-11-11 09:11:35,056 INFO [NioProcessor-100] event -> ? :? - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Setting DefaultApplVerID (1137=7) from Logon
2022-11-11 09:11:45,052 ERROR [QFJ Timer] errorEvent -> logError :149 - FIXT.1.1:TRADESTAC_TRANSACTION_TW->TRADESTAC_TW: Disconnecting: Timed out waiting for logon response
[...] // From this point on, last lines are repeated
We have no steps to reproduce this behaviour but we strongly suggest to look at the timers: sometimes they seem not to be reset correctly
We expect the reply logon message to be acknowledged and the session to restart correctly
system information:
Additional context
Since this has been a production issue we cannot provide a unit test result.
We think that the issue is pretty frequent on the heartbeat, but this has been the first time we had the issue on the logon too.
Since we did not find any acknowledged bug on this behaviour we suspect that the QFJ version is not relevant.
The text was updated successfully, but these errors were encountered: