Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split-Brain not working as expected with 2 nodes #11403

Closed
arnaud-soulard opened this issue Sep 20, 2017 · 5 comments
Closed

Split-Brain not working as expected with 2 nodes #11403

arnaud-soulard opened this issue Sep 20, 2017 · 5 comments

Comments

@arnaud-soulard
Copy link

@arnaud-soulard arnaud-soulard commented Sep 20, 2017

The context is the following:

  1. Hazelcast version 3.8.5, also tested with 3.9-EA

  2. I want to use 2 members

  3. I am using 1 client per member, i.e. 2 clients

  4. Java version:
    openjdk version "1.8.0_141"
    OpenJDK Runtime Environment (build 1.8.0_141-b16)
    OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)

  5. OS: CentOS Linux release 7.3.1611 (Core)
    uname -a: Linux mfpz-1 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

6-7) Hereafter is the scenario:

  • Start 2 nodes on 2 separate computers (VirtualBox VMs), with "console.sh", in hazelcast demo directory

  • Insert a key on the first node.
    "m.put a 1" returns "null"

  • Check the key can be accessed from the second node.
    "m.size" returns "Size: 1" on both nodes

  • Shutdown network interface from the second VM.

Traces on the first node:

hazelcast[default] > sept. 20, 2017 12:03:52 PM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
AVERTISSEMENT: [10.86.86.50]:5701 [] [3.8.5] Removing Member [10.86.84.178]:5701 - 879cee5b-f37b-4bf3-9490-0e0685c63fcd because it has not sent any heartbeats for 4000 ms. Now: 2017-09-20 12:03:52.834, last heartbeat time was 2017-09-20 12:03:44.547
sept. 20, 2017 12:03:52 PM com.hazelcast.nio.tcp.TcpIpConnection
INFOS: [10.86.86.50]:5701 [] [3.8.5] Connection[id=1, /10.86.86.50:5701->/10.86.84.178:47003, endpoint=[10.86.84.178]:5701, alive=false, type=MEMBER] closed. Reason: Removing Member [10.86.84.178]:5701 - 879cee5b-f37b-4bf3-9490-0e0685c63fcd because it has not sent any heartbeats for 4000 ms. Now: 2017-09-20 12:03:52.834, last heartbeat time was 2017-09-20 12:03:44.547
sept. 20, 2017 12:03:52 PM com.hazelcast.internal.cluster.ClusterService
INFOS: [10.86.86.50]:5701 [] [3.8.5] Removing Member [10.86.84.178]:5701 - 879cee5b-f37b-4bf3-9490-0e0685c63fcd
sept. 20, 2017 12:03:52 PM com.hazelcast.internal.cluster.ClusterService
INFOS: [10.86.86.50]:5701 [] [3.8.5] 

Members [1] {
	Member [10.86.86.50]:5701 - 6ea064bb-0d2d-4603-a355-4459ebfe22a4 this
}

sept. 20, 2017 12:03:52 PM com.hazelcast.transaction.TransactionManagerService
INFOS: [10.86.86.50]:5701 [] [3.8.5] Committing/rolling-back alive transactions of Member [10.86.84.178]:5701 - 879cee5b-f37b-4bf3-9490-0e0685c63fcd, UUID: 879cee5b-f37b-4bf3-9490-0e0685c63fcd
sept. 20, 2017 12:03:52 PM com.hazelcast.internal.partition.impl.MigrationManager
INFOS: [10.86.86.50]:5701 [] [3.8.5] Partition balance is ok, no need to re-partition cluster data... 
  • Insert another key on the first node.
    "m.put b 2" returns "null"
    "m.size" returns "Size: 2" on node 1
    "m.size" returns "Size: 1" on node 2

  • Restore network connection on the second VM
    A Split-Brain resolution should occur. It is the case, but I do not get the expected behavior.

Traces on node 1:

hazelcast[default] > sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.SocketAcceptorThread
INFOS: [10.86.86.50]:5701 [] [3.8.5] Accepting socket connection from /10.86.84.178:59453
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFOS: [10.86.86.50]:5701 [] [3.8.5] Established socket connection between /10.86.86.50:5701 and /10.86.84.178:59453
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFOS: [10.86.86.50]:5701 [] [3.8.5] Established socket connection between /10.86.86.50:46191 and /10.86.84.178:5701
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.MulticastJoiner
INFOS: [10.86.86.50]:5701 [] [3.8.5] We are merging to [10.86.84.178]:5701, both have the same data member count: 1
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.MulticastJoiner
AVERTISSEMENT: [10.86.86.50]:5701 [] [3.8.5] [10.86.86.50]:5701 is merging [multicast] to [10.86.84.178]:5701
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.operations.LockClusterStateOperation
INFOS: [10.86.86.50]:5701 [] [3.8.5] Locking cluster state. Initiator: [10.86.86.50]:5701, lease-time: 60000
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.operations.LockClusterStateOperation
INFOS: [10.86.86.50]:5701 [] [3.8.5] Extending cluster state lock. Initiator: [10.86.86.50]:5701, lease-time: 20000
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.operations.ChangeClusterStateOperation
INFOS: [10.86.86.50]:5701 [] [3.8.5] Changing cluster state state to ClusterStateChange{type=class com.hazelcast.cluster.ClusterState, newState=FROZEN}, Initiator: [10.86.86.50]:5701 transient: false
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.operations.MergeClustersOperation
AVERTISSEMENT: [10.86.86.50]:5701 [] [3.8.5] [10.86.86.50]:5701 is merging to [10.86.84.178]:5701, because: instructed by master [10.86.86.50]:5701
sept. 20, 2017 12:05:44 PM com.hazelcast.core.LifecycleService
INFOS: [10.86.86.50]:5701 [] [3.8.5] [10.86.86.50]:5701 is MERGING
sept. 20, 2017 12:05:44 PM com.hazelcast.instance.Node
AVERTISSEMENT: [10.86.86.50]:5701 [] [3.8.5] Setting new local member. old uuid: 6ea064bb-0d2d-4603-a355-4459ebfe22a4 new uuid: fa18305f-07e9-4f88-8215-f7dec2af5ccf
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnection
INFOS: [10.86.86.50]:5701 [] [3.8.5] Connection[id=2, /10.86.86.50:5701->/10.86.84.178:59453, endpoint=[10.86.84.178]:5701, alive=false, type=MEMBER] closed. Reason: TcpIpConnectionManager is stopping
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnection
INFOS: [10.86.86.50]:5701 [] [3.8.5] Connection[id=3, /10.86.86.50:46191->/10.86.84.178:5701, endpoint=[10.86.84.178]:5701, alive=false, type=MEMBER] closed. Reason: TcpIpConnectionManager is stopping
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.MulticastJoiner
INFOS: [10.86.86.50]:5701 [] [3.8.5] Trying to join to discovered node: [10.86.84.178]:5701
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.InitConnectionTask
INFOS: [10.86.86.50]:5701 [] [3.8.5] Connecting to /10.86.84.178:5701, timeout: 0, bind-any: true
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFOS: [10.86.86.50]:5701 [] [3.8.5] Established socket connection between /10.86.86.50:48887 and /10.86.84.178:5701
sept. 20, 2017 12:05:50 PM com.hazelcast.system
INFOS: [10.86.86.50]:5701 [] [3.8.5] Cluster version set to 3.8
sept. 20, 2017 12:05:50 PM com.hazelcast.internal.cluster.ClusterService
INFOS: [10.86.86.50]:5701 [] [3.8.5] 

Members [2] {
	Member [10.86.84.178]:5701 - 879cee5b-f37b-4bf3-9490-0e0685c63fcd
	Member [10.86.86.50]:5701 - fa18305f-07e9-4f88-8215-f7dec2af5ccf this
}

sept. 20, 2017 12:05:52 PM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
AVERTISSEMENT: [10.86.86.50]:5701 [] [3.8.5] Resetting heartbeat timestamps because of huge system clock jump! Clock-Jump: 5001 ms, Heartbeat-Timeout: 4000 ms

hazelcast[default] > m.size
Size: 1
-------> The second key entered has been lost

Traces on node 2:

hazelcast[default] > sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFOS: [10.86.84.178]:5701 [] [3.8.5] Established socket connection between /10.86.84.178:59453 and /10.86.86.50:5701
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.SocketAcceptorThread
INFOS: [10.86.84.178]:5701 [] [3.8.5] Accepting socket connection from /10.86.86.50:46191
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFOS: [10.86.84.178]:5701 [] [3.8.5] Established socket connection between /10.86.84.178:5701 and /10.86.86.50:46191
sept. 20, 2017 12:05:44 PM com.hazelcast.internal.cluster.impl.MulticastJoiner
INFOS: [10.86.84.178]:5701 [] [3.8.5] [10.86.86.50]:5701 should merge to us , both have the same data member count: 1
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnection
INFOS: [10.86.84.178]:5701 [] [3.8.5] Connection[id=2, /10.86.84.178:59453->/10.86.86.50:5701, endpoint=[10.86.86.50]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnection
INFOS: [10.86.84.178]:5701 [] [3.8.5] Connection[id=3, /10.86.84.178:5701->/10.86.86.50:46191, endpoint=[10.86.86.50]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.SocketAcceptorThread
INFOS: [10.86.84.178]:5701 [] [3.8.5] Accepting socket connection from /10.86.86.50:48887
sept. 20, 2017 12:05:44 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFOS: [10.86.84.178]:5701 [] [3.8.5] Established socket connection between /10.86.84.178:5701 and /10.86.86.50:48887
sept. 20, 2017 12:05:50 PM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
AVERTISSEMENT: [10.86.84.178]:5701 [] [3.8.5] Resetting heartbeat timestamps because of huge system clock jump! Clock-Jump: -3827 ms, Heartbeat-Timeout: 4000 ms
sept. 20, 2017 12:05:50 PM com.hazelcast.internal.cluster.ClusterService
INFOS: [10.86.84.178]:5701 [] [3.8.5] 

Members [2] {
	Member [10.86.84.178]:5701 - 879cee5b-f37b-4bf3-9490-0e0685c63fcd this
	Member [10.86.86.50]:5701 - fa18305f-07e9-4f88-8215-f7dec2af5ccf
}

sept. 20, 2017 12:05:51 PM com.hazelcast.internal.partition.impl.MigrationManager
INFOS: [10.86.84.178]:5701 [] [3.8.5] Re-partitioning cluster data... Migration queue size: 271

hazelcast[default] > 
hazelcast[default] > 
hazelcast[default] > sept. 20, 2017 12:05:54 PM com.hazelcast.internal.partition.impl.MigrationThread
INFOS: [10.86.84.178]:5701 [] [3.8.5] All migration tasks have been completed, queues are empty.

hazelcast[default] > m.size
Size: 1

Traces after a while on Node 1:

hazelcast[default] > sept. 20, 2017 12:06:52 PM com.hazelcast.internal.cluster.impl.ClusterMergeTask
GRAVE: [10.86.86.50]:5701 [] [3.8.5] While merging...
java.util.concurrent.TimeoutException
	at com.hazelcast.spi.impl.AbstractCompletableFuture.get(AbstractCompletableFuture.java:225)
	at com.hazelcast.internal.cluster.impl.ClusterMergeTask.waitOnFutureInterruptible(ClusterMergeTask.java:154)
	at com.hazelcast.internal.cluster.impl.ClusterMergeTask.executeMergeTasks(ClusterMergeTask.java:137)
	at com.hazelcast.internal.cluster.impl.ClusterMergeTask.run(ClusterMergeTask.java:74)
	at com.hazelcast.instance.LifecycleServiceImpl.runUnderLifecycleLock(LifecycleServiceImpl.java:107)
	at com.hazelcast.internal.cluster.impl.ClusterServiceImpl.merge(ClusterServiceImpl.java:319)
	at com.hazelcast.internal.cluster.impl.operations.MergeClustersOperation$1.run(MergeClustersOperation.java:60)
	at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)

sept. 20, 2017 12:07:52 PM com.hazelcast.internal.cluster.impl.ClusterMergeTask
GRAVE: [10.86.86.50]:5701 [] [3.8.5] While merging...
java.util.concurrent.TimeoutException
	at com.hazelcast.spi.impl.AbstractCompletableFuture.get(AbstractCompletableFuture.java:225)
	at com.hazelcast.internal.cluster.impl.ClusterMergeTask.waitOnFutureInterruptible(ClusterMergeTask.java:154)
	at com.hazelcast.internal.cluster.impl.ClusterMergeTask.executeMergeTasks(ClusterMergeTask.java:137)
	at com.hazelcast.internal.cluster.impl.ClusterMergeTask.run(ClusterMergeTask.java:74)
	at com.hazelcast.instance.LifecycleServiceImpl.runUnderLifecycleLock(LifecycleServiceImpl.java:107)
	at com.hazelcast.internal.cluster.impl.ClusterServiceImpl.merge(ClusterServiceImpl.java:319)
	at com.hazelcast.internal.cluster.impl.operations.MergeClustersOperation$1.run(MergeClustersOperation.java:60)
	at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)

sept. 20, 2017 12:07:52 PM com.hazelcast.core.LifecycleService
INFOS: [10.86.86.50]:5701 [] [3.8.5] [10.86.86.50]:5701 is MERGED

hazelcast[default] > m.keys
b
a
Total 2

Traces on node 2:
hazelcast[default] > m.size
Size: 2

I played a lot with the parameters "hazelcast.merge.first.run.delay.seconds" and "hazelcast.merge.next.run.delay.seconds", but in all cases, I have two steps:

  1. After the first merge, I temporary lost the key that has been inserted when the second node was not available.
  2. After the second merge, the "lost" key is back

I am expecting having the "2." directly, as soon as the second node is available again.

I made experiments with 3 nodes, in this case, it works as expected. The merge is simpler.

  1. Here is the configuration file:

hazelcast.xml.txt

@mdogan
Copy link
Contributor

@mdogan mdogan commented Sep 20, 2017

@arnaud-soulard:

Cluster merge is considered as completed when MERGED even is fired by LifecycleService. You can see that also in logs some time after LifecycleService prints MERGING.

Between MERGING and MERGED states, you can observe stale results.

Additionally, clock jumps and timeouts in your logs look suspicious;

sept. 20, 2017 12:05:52 PM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
AVERTISSEMENT: [10.86.86.50]:5701 [] [3.8.5] Resetting heartbeat timestamps because of huge system clock jump! Clock-Jump: 5001 ms, Heartbeat-Timeout: 4000 ms
sept. 20, 2017 12:05:50 PM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
AVERTISSEMENT: [10.86.84.178]:5701 [] [3.8.5] Resetting heartbeat timestamps because of huge system clock jump! Clock-Jump: -3827 ms, Heartbeat-Timeout: 4000 ms
@mdogan mdogan added the Team: Core label Sep 20, 2017
@arnaud-soulard
Copy link
Author

@arnaud-soulard arnaud-soulard commented Sep 20, 2017

Thank you for your quick response.

I agree, since I am using Hazelcast, I have those "Clock-Jump" warnings.
I configured all my Virtualbox VMs to use ntp, and to use the "hpet" clocksource (in file /sys/devices/system/clocksource/clocksource0/current_clocksource)

By default, it was using "kvm-clock"

Using "hpet" did not make those warnings disappear.
When I manually check times on my VMs, it seems to be synchronized, at least, there is no 3-4 seconds shift, as it seems to be indicated by the warning.

@mdogan mdogan added the To Triage label Sep 21, 2017
@arnaud-soulard
Copy link
Author

@arnaud-soulard arnaud-soulard commented Sep 21, 2017

I tested today the same hazelcast configuration on Real Computers, not on Virtual Machines : the merge is executed without timeout: "MERGING" is followed by "MERGED". It is working as I expect it to work.

Then, I tested with VMware Virtual Machines: same issue than the one I have with VirtualBox Machines

@arnaud-soulard
Copy link
Author

@arnaud-soulard arnaud-soulard commented Sep 25, 2017

If I configure my Virtual Machines with at least 2 CPU cores, I get the same behavior than when using real computers, i.e. it works as I expect it to work.

If you consider it is not an issue, it should maybe be documented that when using Virtual Machines, they should be configured with a minimum of 2 CPU cores.

@mdogan
Copy link
Contributor

@mdogan mdogan commented Sep 25, 2017

Thanks for the feedback.

It seems there's a executor deadlock when core size is 1. By default, we create some executors with core-size thread counts. So, when core size is 1, they are created with just single thread and a deadlock occurs when that thread is blocked by split-merge task.

Marking this as defect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.