Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the ESTAB tcp connection remains even after closed initiator #269

Closed
lyhistory opened this issue Mar 17, 2020 · 9 comments
Closed

the ESTAB tcp connection remains even after closed initiator #269

lyhistory opened this issue Mar 17, 2020 · 9 comments
Labels

Comments

@lyhistory
Copy link

lyhistory commented Mar 17, 2020

Describe the bug
the acceptor written in java is running on redhat vm, and the initiator written in c# is running in local win10 pc;

after suceesfully exchange messages, I left the initiator there without any message exchange for about half an hour, then I closed the initiator, but the tcp connection remains ESTAB, as showing in below screenshot:
image

and when I try to run the initiator again, it just cannot logon, toadmin cannot get a response,
and the acceptor keep printing:
"2020-03-17 16:25:15.617 INFO 13057 --- [ NioProcessor-9] q.mina.acceptor.AcceptorIoHandler : MINA session created: local=/xxxxxxx:9870, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/xxxxxx:30986"
generated a lot TIME_WAIT tcp connections
image

it looks like something has blocked the new connection, obviously the ESTAB Tcp connection must have onhold some resources I guess

To Reproduce
as described above

Expected behavior
the tcp connection should disconnected after closing the initiator

system information:

  • OS: [e.g. Linux]: Red Hat Enterprise Linux Server 7.2 (Maipo)
  • Java version: openjdk version "1.8.0_232"
  • QFJ Version: quickfix-core 2.1.0 (quickfixj-spring-boot-starter 2.4.1)
@lyhistory lyhistory added the bug label Mar 17, 2020
@chrjohn
Copy link
Member

chrjohn commented Mar 17, 2020

"2020-03-17 16:25:15.617 INFO 13057 --- [ NioProcessor-9] q.mina.acceptor.AcceptorIoHandler : MINA session created: local=/xxxxxxx:9870, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/xxxxxx:30986"

This message is printed each time a connection is opened against the acceptor. When there is no Logon message coming in afterwards then the Initiator most likely does not send one. What does the log on the initiator look like?

@lyhistory
Copy link
Author

@chrjohn the client initiator tried to logon but no reponse
image

@lyhistory
Copy link
Author

lyhistory commented Mar 17, 2020

is it possible that its something related to the network settings, because I'm testing in my company domain, the remote acceptor running on a vm, I have to logon vpn on my local pc,

I mean the initiator doesn't directly connect to the acceptor but go through some vpn gateway( my network knowledge is limited, but I can see that the Peer Address Port showing in the first screenshot 54833 is not the port that my local initiator listened),

I'm thinking may be because the gateway port in between remains open even when I closed the initiator, so the acceptor still keep the tcp connection alive(ESTAB),
but here comes a question that I've already enabled "HeartBtInt=200000 " for initiator, after closing, by right, the acceptor should know it's disconnected and close the tcp connection

@lyhistory
Copy link
Author

so, to summarize, there is two questions:

  1. why the tcp conneciton remains open after I closed initiator ( HeartBtInt seems not working)

  2. in this remaining open tcp connection, which resource is occupied that blocks the upcoming new connections(all the connections except the first one become TIME_WAIT), I mean blocks the logon attempt

@chrjohn
Copy link
Member

chrjohn commented Mar 17, 2020

Your heartbeat interval is 200.000 seconds. That's 55 days!
It might very well be network related. I have never seen such a behaviour.

Could you please post the acceptor and the initiator config?

@lyhistory
Copy link
Author

oh, s, I mistook it for microseconds, thanks, I'm doing another round of testing, and here is the config:

acceptor:

[DEFAULT]
ConnectionType=acceptor
ReconnectInterval=60
SenderCompID=TEST2
ResetOnDisconnect=Y
ResetOnLogout=Y
ResetOnLogon=Y
UseDataDictionary=Y

[SESSION]
BeginString=FIX.4.4
TargetCompID=TEST1
StartTime=00:00:00
EndTime=23:59:59
HeartBtInt=30
SocketAcceptHost=127.0.0.1
SocketAcceptPort=9870
DataDictionary=FIX44_CFT.xml
Password=fffffff0
EncryptKey=9732EFDD8496F4D091CDA7D12055E0CB3E0A65E3AEA8FEA553349979A4527A33
SocketLinger=100

initiator

[DEFAULT]
ConnectionType=initiator
ReconnectInterval=10
FileStorePath=store
FileLogPath=log
StartTime=00:00:00
EndTime=00:00:00
UseDataDictionary=Y
DataDictionary=config/FIX44_CFT.xml
SocketConnectHost=12.13.14.48
SocketConnectPort=9870
LogoutTimeout=5
ResetOnLogon=Y
ResetOnDisconnect=Y
CheckLatency=N

[SESSION]
# inherit ConnectionType, ReconnectInterval and SenderCompID from default
BeginString=FIX.4.4
SenderCompID=TEST1
TargetCompID=TEST2
HeartBtInt=20
UserName=TEST1
Password=fffffff0
EncryptKey=9732EFDD8496F4D091CDA7D12055E0CB3E0A65E3AEA8FEA553349979A4527A33

@lyhistory
Copy link
Author

lyhistory commented Mar 18, 2020

@chrjohn
thanks, the first question tested and resolved,
1.why the tcp conneciton remains open after I closed initiator ( HeartBtInt seems not working)
now left the second one, pls check the screenshot below for the tcp packets captured by wireshark, these two initiators using excatly the same code, the same config, when the first initiator running successfully and exchange heartbeat info with the acceptor, then I started the second initiator, you can see that the logon attempt ignored by acceptor, and the acceptor response with a [FIN, ACK] to finish the talk with the second initiator

2.in this remaining open tcp connection, which resource is occupied that blocks the upcoming new connections(all the connections except the first one become TIME_WAIT), I mean blocks the logon attempt

image

@chrjohn
Copy link
Member

chrjohn commented Mar 18, 2020

When two initiators have exactly the same config, then this also means they have the same SenderCompID/TargetCompID pair? That won't work.
A FIX session can only be established between two counterparties. If you want multiple initiators to connect to the same acceptor, then each needs to have a separate Session, i.e. distinct SenderCompID/TargetCompID pairs.

The log of the acceptor should show a message "Multiple logons/connections for this session are not allowed" when an initiator tries to connect to an already established session.

@lyhistory
Copy link
Author

understood, I'll close the ticket, although I didn't see "Multiple logons/connections for this session are not allowed", but I guess that's because I'm using quickfixj-spring-boot-starter instead of'pure' quickfixj, will check on this, thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants