Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split-brain recovery does not follow documented process #23

Closed
GregDThomas opened this issue Apr 23, 2019 · 1 comment

Comments

Projects
None yet
1 participant
@GregDThomas
Copy link
Contributor

commented Apr 23, 2019

Steps to reproduce:

  1. Setup a two-node Openfire cluster. Login to the admin console of each node, check both nodes show both cluster members at http://localhost:9090/system-clustering.jsp
  2. On the junior node, disable networking (or remove the network cable)
  3. Confirm that after a brief period of time, both nodes now show that they are the senior member of a single node cluster
  4. Re-enable/re-connect the network on the junior node.
  5. Wait for Hazelcast to re-establish the network.

Expected results:

Actual results:

  • The cluster re-forms, but the junior node does not receive an indication that it has been demoted.

Cluster initially forms:

2019.04.23 15:02:38 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - This node (9c528cce-d3f0-4d6a-9e0d-3fd775b542f2/openfire2.example.com) has joined the cluster

Network is disabled:

2019.04.23 15:05:08 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - Another node (62a9c948-9991-4704-a323-4ec937a741cd/<unknown>) has left the cluster
2019.04.23 15:05:14 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - This node (9c528cce-d3f0-4d6a-9e0d-3fd775b542f2/openfire2.example.com) is now the senior member

Network is re-enabled:

2019.04.23 15:07:35 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - Another node (62a9c948-9991-4704-a323-4ec937a741cd/openfire1.example.com (10.215.75.172)) has joined the cluster

GregDThomas added a commit to GregDThomas/openfire-hazelcast-plugin that referenced this issue Apr 23, 2019

GregDThomas added a commit that referenced this issue Apr 23, 2019

Merge pull request #24 from GregDThomas/split-brain-recovery
 Fix issue #23 - fire correct events when recovering from split brain
@GregDThomas

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

Sequence of events now logged as follows:

Cluster initially forms:

2019.04.24 10:17:18 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - This node (8e97db5d-8fb7-422b-bee6-f3a61a9d38b0/openfire2.example.com) has joined the cluster [seniorMember=openfire1.example.com (10.215.75.172)]

Network is disabled:

2019.04.24 10:18:11 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - Another node (d63cc58b-44a5-4b29-83f8-cf1e55540965/openfire1.example.com (10.215.75.172)) has left the cluster [seniorMember=openfire2.example.com (10.215.75.174)]
2019.04.24 10:18:11 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - Sending message to admins: openfire1.example.com (10.215.75.172) has left the cluster - there is now only 1 node in the cluster (enabled=true)
2019.04.24 10:18:11 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - This node (8e97db5d-8fb7-422b-bee6-f3a61a9d38b0/openfire2.example.com) is now the senior member

Network is re-enabled:

2019.04.24 10:22:14 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - This node (8e97db5d-8fb7-422b-bee6-f3a61a9d38b0/openfire2.example.com) has left the cluster [seniorMember=<unknown>]
2019.04.24 10:22:14 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - Sending message to admins: The local node ('openfire2.example.com') has left the cluster - this node no longer has any resilience (enabled=true)
2019.04.24 10:22:14 INFO  [ClusterManager events dispatcher]: org.jivesoftware.openfire.cluster.ClusterMonitor - This node (8e97db5d-8fb7-422b-bee6-f3a61a9d38b0/openfire2.example.com) has joined the cluster [seniorMember=openfire1.example.com (10.215.75.172)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.