Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker] Hazelcast nodes keep leaving and rejoining #11118

Closed
tdevopsottawa opened this issue Aug 16, 2017 · 9 comments
Closed

[docker] Hazelcast nodes keep leaving and rejoining #11118

tdevopsottawa opened this issue Aug 16, 2017 · 9 comments

Comments

@tdevopsottawa
Copy link

@tdevopsottawa tdevopsottawa commented Aug 16, 2017

I'm attempting to run a 3-node hazelcast cluster in docker (via the hazelcast-aws plugin) and I'm having this bizzare issue where nodes keep leaving and then rejoining. What happens is after the cluster initializes and all 3 nodes are listed as members, they suddenly start leaving one by one until there's only 1 node left; then they start coming back until all 3 are there, then they start leaving again, and so on and so on forever and ever. Here's the logs from the time nodes starting leaving to the time all 3 nodes were back:

16-Aug-2017 18:42:10.052 INFO [http-nio-8080-exec-8] com.hazelcast.session.HazelcastSessionManager.commit Thread name:http-nio-8080-exec-8 commited key:null
16-Aug-2017 18:42:10.210 INFO [http-nio-8080-exec-2] com.hazelcast.session.HazelcastSessionManager.commit Thread name:http-nio-8080-exec-2 commited key:null
2017-08-16 18:43:03.243 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.4:58597
2017-08-16 18:43:03.243 INFO 1 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.4:58597
16-Aug-2017 18:43:03.247 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-1] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:58597 and /172.31.6.187:5702
2017-08-16 18:43:03.252 WARN 1 --- [.IO.thread-in-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
2017-08-16 18:43:03.252 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=5, /10.255.0.8:5702->/10.255.0.4:58597, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
16-Aug-2017 18:43:03.254 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=1, /172.18.0.3:58597->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
16-Aug-2017 18:43:08.314 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:36533 and /172.31.43.12:5702
2017-08-16 18:43:08.315 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.3:36533
2017-08-16 18:43:08.315 INFO 1 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.3:36533
2017-08-16 18:43:08.316 WARN 1 --- [.IO.thread-in-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:43:08.316 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=6, /10.255.0.8:5702->/10.255.0.3:36533, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702
16-Aug-2017 18:43:08.318 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=2, /172.18.0.3:36533->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
16-Aug-2017 18:45:03.479 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-4] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:38045 and /172.31.43.12:5702
2017-08-16 18:45:03.491 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.InboundResponseHandler : [172.31.29.145]:5702 [dev] [3.8.1] No Invocation found for normal response with callId 122 sent from [172.31.43.12]:5702
2017-08-16 18:45:03.492 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=3, /172.18.0.3:53133->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:03.495 INFO 1 --- [cached.thread-3] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.43.12:5702, timeout: 0, bind-any: true
2017-08-16 18:45:03.495 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=2, /10.255.0.8:5702->/10.255.0.2:35157, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:03.496 INFO 1 --- [cached.thread-2] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.6.187:5702, timeout: 0, bind-any: true
2017-08-16 18:45:03.497 INFO 1 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:40843 and /172.31.43.12:5702
2017-08-16 18:45:03.498 INFO 1 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:41217 and /172.31.6.187:5702
2017-08-16 18:45:03.501 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=8, /172.18.0.3:41217->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:03.501 INFO 1 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=7, /172.18.0.3:40843->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:03.502 INFO 1 --- [cached.thread-2] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.43.12:5702, timeout: 0, bind-any: true
2017-08-16 18:45:03.502 INFO 1 --- [cached.thread-3] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.6.187:5702, timeout: 0, bind-any: true
2017-08-16 18:45:03.504 INFO 1 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:58761 and /172.31.6.187:5702
2017-08-16 18:45:03.504 INFO 1 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:55539 and /172.31.43.12:5702
2017-08-16 18:45:03.504 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.4:58761
2017-08-16 18:45:03.505 INFO 1 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.4:58761
2017-08-16 18:45:03.505 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.3:55539
2017-08-16 18:45:03.505 INFO 1 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.3:55539
2017-08-16 18:45:03.506 WARN 1 --- [.IO.thread-in-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
2017-08-16 18:45:03.508 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=11, /10.255.0.8:5702->/10.255.0.4:58761, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
2017-08-16 18:45:03.509 WARN 1 --- [.IO.thread-in-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:45:03.509 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=12, /10.255.0.8:5702->/10.255.0.3:55539, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:45:03.510 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=9, /172.18.0.3:58761->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:03.510 INFO 1 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=10, /172.18.0.3:55539->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=NONE] closed. Reason: Connection closed by the other side
2017-08-16 18:45:03.511 INFO 1 --- [cached.thread-2] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.43.12:5702, timeout: 0, bind-any: true
2017-08-16 18:45:03.514 INFO 1 --- [cached.thread-4] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.6.187:5702, timeout: 0, bind-any: true
2017-08-16 18:45:03.515 INFO 1 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:45973 and /172.31.43.12:5702
2017-08-16 18:45:03.516 INFO 1 --- [cached.thread-4] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:36237 and /172.31.6.187:5702
2017-08-16 18:45:10.930 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.2:38283
2017-08-16 18:45:10.930 INFO 1 --- [cached.thread-4] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.2:38283
2017-08-16 18:45:10.942 INFO 1 --- [ration.thread-0] c.i.o.SplitBrainMergeValidationOperation : [172.31.29.145]:5702 [dev] [3.8.1] Ignoring join check from [172.31.6.187]:5702, because this node is not master...
2017-08-16 18:45:10.944 INFO 1 --- [ration.thread-1] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1] Old master [172.31.43.12]:5702 left the cluster, assigning new master Member [172.31.29.145]:5702 - 8f118025-3b21-49dc-8e6c-152e26083337 this
2017-08-16 18:45:10.945 INFO 1 --- [ration.thread-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=13, /172.18.0.3:45973->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Removing member [172.31.43.12]:5702, requested by: [172.31.43.12]:5702
2017-08-16 18:45:10.945 INFO 1 --- [ration.thread-1] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1] Removing Member [172.31.43.12]:5702 - ad2e4981-dc68-4252-8a8b-db1b6f9818dc
2017-08-16 18:45:10.953 INFO 1 --- [ration.thread-1] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1]
Members [2] {
Member [172.31.29.145]:5702 - 8f118025-3b21-49dc-8e6c-152e26083337 this
Member [172.31.6.187]:5702 - fe84b906-9463-4a8f-80df-cbb0afa1b435
}
2017-08-16 18:45:10.955 INFO 1 --- [cached.thread-4] c.h.t.TransactionManagerService : [172.31.29.145]:5702 [dev] [3.8.1] Committing/rolling-back alive transactions of Member [172.31.43.12]:5702 - ad2e4981-dc68-4252-8a8b-db1b6f9818dc, UUID: ad2e4981-dc68-4252-8a8b-db1b6f9818dc
2017-08-16 18:45:10.957 INFO 1 --- [1_dev.migration] c.h.i.p.InternalPartitionService : [172.31.29.145]:5702 [dev] [3.8.1] Fetching most recent partition table! my version: 0
16-Aug-2017 18:45:13.491 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-4] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:50077 and /172.31.6.187:5702
16-Aug-2017 18:45:13.494 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=4, /172.18.0.3:50077->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:23.303 INFO 1 --- [cached.thread-4] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:55309 and /172.31.43.12:5702
2017-08-16 18:45:23.304 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.3:55309
2017-08-16 18:45:23.304 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.3:55309
2017-08-16 18:45:23.305 WARN 1 --- [.IO.thread-in-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:45:23.305 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=17, /10.255.0.8:5702->/10.255.0.3:55309, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:45:23.306 INFO 1 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=16, /172.18.0.3:55309->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:45:58.722 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=107, invocationTime=1502909158716 (2017-08-16 18:45:58.716), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=100, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:03.744 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=117, invocationTime=1502909163738 (2017-08-16 18:46:03.738), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=110, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:08.765 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=127, invocationTime=1502909168759 (2017-08-16 18:46:08.759), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=120, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:13.789 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=137, invocationTime=1502909173783 (2017-08-16 18:46:13.783), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=130, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:18.812 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=147, invocationTime=1502909178805 (2017-08-16 18:46:18.805), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=140, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:23.840 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=157, invocationTime=1502909183834 (2017-08-16 18:46:23.834), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=150, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:28.862 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=167, invocationTime=1502909188856 (2017-08-16 18:46:28.856), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=160, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:33.883 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=177, invocationTime=1502909193877 (2017-08-16 18:46:33.877), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=170, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:38.904 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=187, invocationTime=1502909198898 (2017-08-16 18:46:38.898), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=180, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:43.928 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=197, invocationTime=1502909203922 (2017-08-16 18:46:43.922), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=190, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:48.948 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=207, invocationTime=1502909208942 (2017-08-16 18:46:48.942), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=200, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:53.968 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=217, invocationTime=1502909213962 (2017-08-16 18:46:53.962), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=210, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
2017-08-16 18:46:58.989 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=227, invocationTime=1502909218983 (2017-08-16 18:46:58.983), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=220, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
16-Aug-2017 18:47:03.257 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-4] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:60905 and /172.31.6.187:5702
2017-08-16 18:47:03.257 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.4:60905
2017-08-16 18:47:03.258 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.4:60905
2017-08-16 18:47:03.258 WARN 1 --- [.IO.thread-in-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
2017-08-16 18:47:03.258 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=18, /10.255.0.8:5702->/10.255.0.4:60905, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
16-Aug-2017 18:47:03.259 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=5, /172.18.0.3:60905->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:47:04.010 WARN 1 --- [_1_dev.response] c.h.s.i.o.impl.Invocation : [172.31.29.145]:5702 [dev] [3.8.1] Retrying invocation: Invocation{op=com.hazelcast.internal.partition.operation.FetchPartitionStateOperation{serviceName='hz:core:partitionService', identityHash=2093413797, partitionId=-1, replicaIndex=0, callId=237, invocationTime=1502909224004 (2017-08-16 18:47:04.004), waitTimeout=-1, callTimeout=9223372036854775807}, tryCount=2147483647, tryPauseMillis=500, invokeCount=230, callTimeoutMillis=9223372036854775807, firstInvocationTimeMs=1502909110956, firstInvocationTime='2017-08-16 18:45:10.956', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[172.31.6.187]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=true, type=MEMBER]}, Reason: com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: [172.31.6.187]:5702, caller: [172.31.29.145]:5702, partitionId: -1, operation: com.hazelcast.internal.partition.operation.FetchPartitionStateOperation, service: hz:core:partitionService
16-Aug-2017 18:47:04.106 INFO [hz.SESSION-REPLICATION-INSTANCE.InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor.null [172.31.29.145]:5702 [dev] [3.8.3] Invocations:1 timeouts:1 backup-timeouts:0
2017-08-16 18:47:05.831 INFO 1 --- [ration.thread-0] c.i.o.SplitBrainMergeValidationOperation : [172.31.29.145]:5702 [dev] [3.8.1] Removing [172.31.6.187]:5702, since it thinks it's already split from this cluster and looking to merge.
2017-08-16 18:47:05.831 INFO 1 --- [ration.thread-0] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=14, /172.18.0.3:36237->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Removing [172.31.6.187]:5702, since it thinks it's already split from this cluster and looking to merge.
2017-08-16 18:47:05.832 INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1] Removing Member [172.31.6.187]:5702 - fe84b906-9463-4a8f-80df-cbb0afa1b435
2017-08-16 18:47:05.832 INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1]
Members [1] {
Member [172.31.29.145]:5702 - 8f118025-3b21-49dc-8e6c-152e26083337 this
}
2017-08-16 18:47:05.833 INFO 1 --- [cached.thread-1] c.h.t.TransactionManagerService : [172.31.29.145]:5702 [dev] [3.8.1] Committing/rolling-back alive transactions of Member [172.31.6.187]:5702 - fe84b906-9463-4a8f-80df-cbb0afa1b435, UUID: fe84b906-9463-4a8f-80df-cbb0afa1b435
2017-08-16 18:47:06.021 INFO 1 --- [1_dev.migration] c.h.i.p.InternalPartitionService : [172.31.29.145]:5702 [dev] [3.8.1] Most recent partition table version: 0
2017-08-16 18:47:08.340 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.2:44779
2017-08-16 18:47:08.340 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.2:44779
2017-08-16 18:47:15.854 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.3:42863
2017-08-16 18:47:15.854 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.3:42863
2017-08-16 18:47:15.855 WARN 1 --- [.IO.thread-in-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.6.187]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:47:15.855 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=20, /10.255.0.8:5702->/10.255.0.3:42863, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.6.187]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:47:19.122 INFO 1 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=19, /10.255.0.8:5702->/10.255.0.2:44779, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
16-Aug-2017 18:47:19.123 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=3, /172.18.0.3:38045->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:47:19.132 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.3:41901
2017-08-16 18:47:19.133 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.3:41901
2017-08-16 18:47:19.133 WARN 1 --- [.IO.thread-in-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.6.187]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:47:19.133 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=21, /10.255.0.8:5702->/10.255.0.3:41901, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.6.187]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:47:20.139 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.2:46921
2017-08-16 18:47:20.140 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.2:46921
2017-08-16 18:47:26.141 INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1]
Members [2] {
Member [172.31.29.145]:5702 - 8f118025-3b21-49dc-8e6c-152e26083337 this
Member [172.31.43.12]:5702 - f504c33a-62cc-410a-be1a-c655bf4e5bc2
}
16-Aug-2017 18:49:03.187 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:50597 and /172.31.6.187:5702
16-Aug-2017 18:49:03.198 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-4] com.hazelcast.internal.cluster.impl.DiscoveryJoiner.null [172.31.29.145]:5702 [dev] [3.8.3] [172.31.6.187]:5702 should merge to us , both have the same data member count: 1
16-Aug-2017 18:49:03.201 INFO [hz.SESSION-REPLICATION-INSTANCE.cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnectionManager.null [172.31.29.145]:5702 [dev] [3.8.3] Established socket connection between /172.18.0.3:51911 and /172.31.43.12:5702
16-Aug-2017 18:49:03.204 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=7, /172.18.0.3:51911->/172.31.43.12:5702, endpoint=[172.31.43.12]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:49:03.893 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.3:47641
2017-08-16 18:49:03.894 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.3:47641
2017-08-16 18:49:03.894 WARN 1 --- [.IO.thread-in-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.6.187]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:49:03.894 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=23, /10.255.0.8:5702->/10.255.0.3:47641, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.6.187]:5702! This node is not requested endpoint: [172.31.43.12]:5702
2017-08-16 18:49:05.834 WARN 1 --- [ration.thread-0] c.h.s.i.o.impl.OutboundResponseHandler : [172.31.29.145]:5702 [dev] [3.8.1] Cannot send response: SplitBrainJoinMessage{packetVersion=4, buildNumber=20170411, memberVersion=3.8.1, clusterVersion=3.8, address=[172.31.29.145]:5702, uuid='8f118025-3b21-49dc-8e6c-152e26083337', liteMember=false, memberCount=2, dataMemberCount=2} to [172.31.6.187]:5702. com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOperation{serviceName='hz:core:clusterService', identityHash=711408796, partitionId=-1, replicaIndex=0, callId=180, invocationTime=1502909345832 (2017-08-16 18:49:05.832), waitTimeout=-1, callTimeout=60000}
2017-08-16 18:49:05.834 INFO 1 --- [cached.thread-1] c.hazelcast.nio.tcp.InitConnectionTask : [172.31.29.145]:5702 [dev] [3.8.1] Connecting to /172.31.6.187:5702, timeout: 0, bind-any: true
2017-08-16 18:49:05.836 INFO 1 --- [cached.thread-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:37073 and /172.31.6.187:5702
2017-08-16 18:49:05.839 INFO 1 --- [.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=24, /172.18.0.3:37073->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
16-Aug-2017 18:49:09.105 INFO [hz.SESSION-REPLICATION-INSTANCE.InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor.null [172.31.29.145]:5702 [dev] [3.8.3] Invocations:1 timeouts:1 backup-timeouts:0
2017-08-16 18:49:23.446 INFO 1 --- [cached.thread-4] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /172.18.0.3:45377 and /172.31.6.187:5702
2017-08-16 18:49:23.447 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.4:45377
2017-08-16 18:49:23.447 INFO 1 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.4:45377
2017-08-16 18:49:23.447 WARN 1 --- [.IO.thread-in-1] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
2017-08-16 18:49:23.448 INFO 1 --- [.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=26, /10.255.0.8:5702->/10.255.0.4:45377, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.6.187]:5702
2017-08-16 18:49:23.449 INFO 1 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection : [172.31.29.145]:5702 [dev] [3.8.1] Connection[id=25, /172.18.0.3:45377->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
16-Aug-2017 18:49:25.901 INFO [hz.SESSION-REPLICATION-INSTANCE.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection.null [172.31.29.145]:5702 [dev] [3.8.3] Connection[id=6, /172.18.0.3:50597->/172.31.6.187:5702, endpoint=[172.31.6.187]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2017-08-16 18:49:26.914 INFO 1 --- [thread-Acceptor] c.h.nio.tcp.SocketAcceptorThread : [172.31.29.145]:5702 [dev] [3.8.1] Accepting socket connection from /10.255.0.2:49073
2017-08-16 18:49:26.915 INFO 1 --- [cached.thread-4] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Established socket connection between /10.255.0.8:5702 and /10.255.0.2:49073
2017-08-16 18:49:32.915 INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService : [172.31.29.145]:5702 [dev] [3.8.1]
Members [3] {
Member [172.31.29.145]:5702 - 8f118025-3b21-49dc-8e6c-152e26083337 this
Member [172.31.43.12]:5702 - f504c33a-62cc-410a-be1a-c655bf4e5bc2
Member [172.31.6.187]:5702 - 4a403f11-e8e2-402f-aef8-1815261bc465
}
@tkountis
Copy link
Contributor

@tkountis tkountis commented Sep 3, 2017

@sshorkey the log entry

2017-08-16 18:43:08.316 WARN 1 --- [.IO.thread-in-2] c.h.nio.tcp.TcpIpConnectionManager : [172.31.29.145]:5702 [dev] [3.8.1] Wrong bind request from [172.31.29.145]:5702! This node is not requested endpoint: [172.31.43.12]:5702

Says that you trying to connect to an IP that doesn't match the localhost IP of the target node. In other words you are attempting '172.31.43.12' but the node's IP is '172.31.29.145'. This is a security check to avoid spoofing attacks.

Can you tell us some more about the topology ? Is this a private LAN or over WAN? Do you use any Joiner (ie. Multicast) ? Do you use NAT between nodes ? Can you paste the output of ifconfig -a here ?

@tkountis tkountis self-assigned this Sep 3, 2017
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 22, 2017

Hi @sshorkey ! I believe this issue is connected to other docker issues we are currently working on. You can check out a community project: https://github.com/bitsofinfo/hazelcast-docker-swarm-discovery-spi and the relevant issues:
#10801
#9219

Basically hazelcast binds itself to one address while connections to it are created through a different address. This causes the mismatch. We are also working on creating a new SPI which will help with the docker deployments. It will hopefully be available in 3.9-GA which is due shortly.

@mmedenjak mmedenjak changed the title Hazelcast nodes keep leaving and rejoining [docker] Hazelcast nodes keep leaving and rejoining Sep 22, 2017
@tombujok tombujok assigned jerrinot and unassigned tkountis Oct 10, 2017
@tombujok
Copy link
Contributor

@tombujok tombujok commented Oct 10, 2017

@jerrinot Matko pushed his final changes to this PR since he didn't have access to your branch. He wanted you to push the final state of the PR since he didn't know the details of the whole scope of the work you did here. Here's his PR:
#11548

@mmedenjak mmedenjak assigned mmedenjak and unassigned jerrinot Oct 10, 2017
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Oct 10, 2017

@tombujok I've discussed this with Jaromir and since he is not available in the upcoming days, I am taking over. I will probably finalise the PR and ask the reviewers to take a final look tomorrow.

@tombujok
Copy link
Contributor

@tombujok tombujok commented Oct 11, 2017

@mmedenjak ok, bear in mind there are 3 issues. Will the PR solve all of them?

mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 11, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 11, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 11, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 11, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 11, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 12, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 12, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Oct 12, 2017
Adds a new SPI for defining the addresses that the hazelcast instance
will bind to and the address which will be advertised to other members
on which they can bind to.
This SPI will allow for implementations for different environments in
which the default address picker does not pick suitable addresses and
manually configuring the public address provides additional complexity
for deploying hazelcast.

The SPI does not necessarily fix all current issues when deploying
hazelcast but provides the user with a way to avoid these issues by
either providing their own implementation or using some implementation
that hazelcast will provide in the future.

Related to:
hazelcast#10477
hazelcast#9963
hazelcast#11118
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Oct 12, 2017

@sshorkey there is a new SPI that has just been merged and which will be released in the 3.9 version. It will allow you to define the bind and public address when starting the hazelcast instance and if used correctly it can be used to avoid issues when forming a cluster.
For now you will have to use the SPI yourself and write an implementation which will fix your issue but we are planning on releasing implementations of our own which will be bundled into plugins such as the docker or AWS plugin for easier deployment.
Please check out the new SPI:
https://github.com/hazelcast/hazelcast/blob/3cede71cad1fe87312f0901ff77f903ed2d4383d/hazelcast/src/main/java/com/hazelcast/spi/MemberAddressProvider.java
Please create a new issue or reopen this one if this does not suit your use case.

@bitsofinfo
Copy link

@bitsofinfo bitsofinfo commented Dec 14, 2017

Starting work on https://github.com/bitsofinfo/hazelcast-docker-swarm-discovery-spi 1.0-RC4 which will integrate this new MemberAddressProvider in a version of the SPI for hazelcast 3.9.x+

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Dec 14, 2017

@bitsofinfo We do plan on shipping an implementation of the new SPI for Docker in a future plugin version but I'm not sure when yet.

@ummershervani
Copy link

@ummershervani ummershervani commented Mar 1, 2018

Hello,

I am seeing the same issue on Docker-k8's
Connection[id=5, /172.16.13.113:39913->/192.168.133.31:5701, endpoint=[192.168.133.31]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side","level":"INFO"}

Established socket connection between /172.16.13.113:39913 and /192.168.133.31:5701
Wrong bind request from [172.16.18.45]:5701! This node is not the requested endpoint: [192.168.133.31]:5701"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.