Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Zookeeper now does an automatic rolling restart after quorum change. #46
Conversation
| +# 1. Juju leader changes in the middle of a restart: this needs to be | ||
| +# tested, but should work. The leader data should get picked up by | ||
| +# the next leader, and we'll need to generate a fresh restart_queue | ||
| +# anyway, as the Juju leader node has gone away. |
petevg
Sep 23, 2016
Darn it. I tested this, and it doesn't work. If you remove the leader using remove-unit, it still attempts to wrangle everything as leader while it is tearing itself down. It fails to get RelationBase, throws an exception, and gets stuck in a state where it can't finish tearing down.
The good news is that I can fix that bug by just telling the leader to only run check_cluster when zkpeer.joined is set. But that causes other bugs (the new leader never sets up the restart queue).
Continuing to troubleshoot. Will push a fix once I have it ...
| + | ||
| + ''' | ||
| + queue = json.loads(leader_get('restart_queue'))[1:] | ||
| + remove_state('zkpeer.restarted') |
johnsca
Sep 24, 2016
Owner
Charm layers should never directly add or remove states set by an interface layer, as that breaks the encapsulation. Additionally, it's conceivable that a restarted unit might trigger another -changed hook for some unrelated reason which would cause the zkpeer.restarted state to be re-added and this code to erroneously assume that some new unit has restarted.
Perhaps trying to track the units which have restarted with a state is the wrong approach. What about something like this instead:
@when('zookeeper.rolling_restart.in_progress', 'leadership.is_leader', 'zkpeer.joined')
def update_restart_queue(zkpeer):
queue = json.loads(leader_get('restart_queue'))
changed = False
for peer in zkpeer.restarted_nodes():
if peer in queue:
queue.remove(peer)
changed = True
if changed:
hookenv.log('Leader updating restart queue: {}'.format(queue))
leader_set(restart_queue=json.dumps(queue))
petevg
Sep 25, 2016
•
@johnsca Our original implementation of .restarted_nodes had a much bigger bug, as it included all nodes that had participated in setting the zkpeer.restarted state, regardless of nonce. That is why I went with the code that you see here. You're correct that relation data changing will break things, as noted in the mega comment.
I'm trying out a different implementation of .restarted_nodes, which loops through conversations and checks for the restarted.nonce on the remote side. It doesn't work yet (the third node in the queue doesn't restart), which is hopefully just a minor bug in my logic, and not a flaw in the approach :-/
petevg
Sep 25, 2016
@johnsca After banging out some of the kinks (we suddenly couldn't assume that we had a restart_queue at all in places where we assumed that we had one, and that lead to json.loads attempting to parse None), this does seem to work. Will push shortly, after I do a bit more paranoid double checking.
The one remaining bug is that, if you remove the Juju leader while there are a lot of Zookeeper nodes deployed, the Juju leader will restart several times, as it sees its peers depart, before it knows to stop running events. This doesn't break things -- you wind up with a cluster in a good state after the leader finishes departing and a new leader is elected, but it leads to some weird status messages on the leader while it's tearing down. It ends up counting down the number of nodes still attached to it in its status messages, between restarts.
Is there a "I'm going away" state that we can check for?
|
We may still have weird status messages when tearing down a leader, but it seems all comments have been addressed and those messages don't affect the ZK cluster's functionality. LGTM, and thanks for this! Waaay better UX to do this rolling restart automatically. |
petevg commentedSep 23, 2016
Replaces user unfriendly manual restart process.
@juju-solutions/bigdata