Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TcpIpConnectionManager puts connections in map under remote endpoint bind address, not under the address to which we are connecting #11256

Closed
mmedenjak opened this Issue Aug 29, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@mmedenjak
Copy link
Contributor

mmedenjak commented Aug 29, 2017

The behaviour was observed when testing WAN replication with the discovery SPI on EC2. The WAN connection manager will try and connect to the target endpoint by calling com.hazelcast.nio.ConnectionManager#getOrConnect(com.hazelcast.nio.Address) and it chooses the public IP of the EC2 instance returned by the AWS API. Unfortunately hazelcast instances started on EC2 instances automatically bind to the private rather than the public IP. This can be fixed by setting the public IP in the hazelcast config but this both complicates deployment and isn’t enough in some cases. Those cases include having a cluster on EC2 instance and having this cluster as a WAN endpoint at the same time. In this case, the cluster members will (usually) connect to the private IP while WAN will connect to the public IP. This is a conflicting requirement as an instance cannot bind itself to both the private and public IP at the same time.
This is why we tend to get messages like these, both on the ACTIVE and PASSIVE cluster:

Wrong bind request from /34.202.160.90, identified as /107.22.159.0

and

Wrong bind request from [107.22.159.0]:5701! This node is not the requested endpoint: [184.72.209.93]:5701

The solution might be disabling the spoofing checks by using -Dhazelcast.nio.tcp.spoofing.checks=false and -Dhazelcast.socket.client.bind.any=false and in some sense this is fine. The only objection so far is that the value "false" for hazelcast.socket.client.bind.any is counterintuitive - I would expect the value to be "true".

This works mainly fine, the only remaining problem is that the TcpIpConnectionManager will register the connection in the connectionsMap under the address which is provided by the remoteEndpoint, specifically bind.getLocalAddress() - see

bind((TcpIpConnection) packet.getConn(), bind.getLocalAddress(), bind.getTargetAddress(), bind.shouldReply());

This will actually be the private IP of the remote endpoint so even though the connection was requested on the public IP :
connectionManager.getOrConnect(remotePublicAddress)
The map contains the entry :
remotePrivateAddress ==> Connection[id=1, localPrivateAddress->, endpoint=remotePublicAddress, alive=true, type=MEMBER]

And because of this, the WAN connection manager will retry reestablishing a connection which already exists, just under a different address.

@mmedenjak mmedenjak changed the title Inconsistent behaviour of TcpIpConnectionManager when disabling spoofing checks TcpIpConnectionManager puts connections in map under remote endpoint bind address, not under the address to which we are connecting Sep 12, 2017

mmedenjak pushed a commit to mmedenjak/hazelcast that referenced this issue Sep 13, 2017

Matko Medenjak
Keep connection in connections map under remote address
This occurs if the endpoint is under NAT and it binds to one address
but we connect to an another. We first try connecting to the public
address on the NAT.
If we disable binding checks with
-Dhazelcast.nio.tcp.spoofing.checks=false
-Dhazelcast.socket.client.bind.any=false
then the connection gets established but the connection will be put
in the connections map under the endpoint bind address (the address
that the endpoint sees behind the NAT), not under the
address to which we requested the connection to be established.

Alternatively, this could be fixed by setting the public address in
the configuration for both sides of the connection but this could
complicate deployment as we don't know the public address in advance
(e.g. AWS).

Fixes : hazelcast#11256

@mmedenjak mmedenjak added this to the 3.11 milestone May 10, 2018

@mmedenjak mmedenjak modified the milestones: 3.11, 3.12 Aug 27, 2018

@mmedenjak

This comment has been minimized.

Copy link
Contributor Author

mmedenjak commented Feb 25, 2019

@vbekiaris @tkountis can this issue be closed with the new TcpIpConnectionManager design?

@tkountis

This comment has been minimized.

Copy link
Contributor

tkountis commented Feb 25, 2019

This was one of the requirements.
Yes, it was fixed.

@mmedenjak

This comment has been minimized.

Copy link
Contributor Author

mmedenjak commented Feb 25, 2019

Thanks!

@mmedenjak mmedenjak closed this Feb 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.