Zookeeper now does an automatic rolling restart after quorum change. #46

Merged
merged 3 commits into from Oct 10, 2016

Conversation

Projects
None yet
3 participants

petevg commented Sep 23, 2016

Replaces user unfriendly manual restart process.

@juju-solutions/bigdata

Zookeeper now does an automatic rolling restart after quorum change.
Replaces user unfriendly manual restart process.
+# 1. Juju leader changes in the middle of a restart: this needs to be
+# tested, but should work. The leader data should get picked up by
+# the next leader, and we'll need to generate a fresh restart_queue
+# anyway, as the Juju leader node has gone away.
@petevg

petevg Sep 23, 2016

Darn it. I tested this, and it doesn't work. If you remove the leader using remove-unit, it still attempts to wrangle everything as leader while it is tearing itself down. It fails to get RelationBase, throws an exception, and gets stuck in a state where it can't finish tearing down.

The good news is that I can fix that bug by just telling the leader to only run check_cluster when zkpeer.joined is set. But that causes other bugs (the new leader never sets up the restart queue).

Continuing to troubleshoot. Will push a fix once I have it ...

Fixed an issue where things broke when you removed the Juju leader.
The leader still attempted to orchestrate things as it was shutting
down. We fix this by requiring zkpeer.joined on check_cluster, and add
an alternate version of the funciton that triggers on zkpeer.departed,
so the new leader picks things up.
+
+ '''
+ queue = json.loads(leader_get('restart_queue'))[1:]
+ remove_state('zkpeer.restarted')
@johnsca

johnsca Sep 24, 2016

Owner

Charm layers should never directly add or remove states set by an interface layer, as that breaks the encapsulation. Additionally, it's conceivable that a restarted unit might trigger another -changed hook for some unrelated reason which would cause the zkpeer.restarted state to be re-added and this code to erroneously assume that some new unit has restarted.

Perhaps trying to track the units which have restarted with a state is the wrong approach. What about something like this instead:

@when('zookeeper.rolling_restart.in_progress', 'leadership.is_leader', 'zkpeer.joined')
def update_restart_queue(zkpeer):
    queue = json.loads(leader_get('restart_queue'))
    changed = False
    for peer in zkpeer.restarted_nodes():
        if peer in queue:
            queue.remove(peer)
            changed = True
    if changed:
        hookenv.log('Leader updating restart queue: {}'.format(queue))
        leader_set(restart_queue=json.dumps(queue))
@petevg

petevg Sep 25, 2016

@johnsca Our original implementation of .restarted_nodes had a much bigger bug, as it included all nodes that had participated in setting the zkpeer.restarted state, regardless of nonce. That is why I went with the code that you see here. You're correct that relation data changing will break things, as noted in the mega comment.

I'm trying out a different implementation of .restarted_nodes, which loops through conversations and checks for the restarted.nonce on the remote side. It doesn't work yet (the third node in the queue doesn't restart), which is hopefully just a minor bug in my logic, and not a flaw in the approach :-/

@petevg

petevg Sep 25, 2016

@johnsca After banging out some of the kinks (we suddenly couldn't assume that we had a restart_queue at all in places where we assumed that we had one, and that lead to json.loads attempting to parse None), this does seem to work. Will push shortly, after I do a bit more paranoid double checking.

The one remaining bug is that, if you remove the Juju leader while there are a lot of Zookeeper nodes deployed, the Juju leader will restart several times, as it sees its peers depart, before it knows to stop running events. This doesn't break things -- you wind up with a cluster in a good state after the leader finishes departing and a new leader is elected, but it leads to some weird status messages on the leader while it's tearing down. It ends up counting down the number of nodes still attached to it in its status messages, between restarts.

Is there a "I'm going away" state that we can check for?

Fixed issue w/ unrelated relation data changing causing bugs during r…
…estart

We now update the queue based on which nodes have restarted, for the
current nonce, instead of trying to do the dance with states. This seems
to work well.
Member

kwmonroe commented Oct 10, 2016

We may still have weird status messages when tearing down a leader, but it seems all comments have been addressed and those messages don't affect the ZK cluster's functionality.

LGTM, and thanks for this! Waaay better UX to do this rolling restart automatically.

@kwmonroe kwmonroe merged commit fc9a53c into BIGTOP-2476-zookeeper Oct 10, 2016

@kwmonroe kwmonroe deleted the feature/auto-rolling-restart branch Oct 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment