Added hooks in interface layer to facilitate a rolling restart. #5

Merged
merged 2 commits into from Oct 10, 2016

Conversation

Projects
None yet
4 participants
Contributor

petevg commented Sep 23, 2016

Allows Zookeeper to perform an automagic rolling restart, rather than
requiring an ops person to do the restart manually.

@juju-solutions/bigdata

petevg added some commits Sep 22, 2016

Added hooks in interface layer to facilitate a rolling restart.
Allows Zookeeper to perform an automagic rolling restart, rather than
requiring an ops person to do the restart manually.
Fixed bug w/ relation data changing in the middle of a restart
We communicate the fact that we're restarting differently, to avoid the
case where unlrelated relation data changing leads us to miss a restart.
+ the Zookeeper leader is not necessarily the Juju leader.
+
+ '''
+ for conv in self.conversations():
@ktsakalozos

ktsakalozos Oct 3, 2016

Member

What would happen if the leader changes?

@petevg

petevg Oct 3, 2016

Contributor

Docs on Zookeeper leader election are here: http://zookeeper.apache.org/doc/current/recipes.html#sc_leaderElection

Per convo in Hangout w/ @ktsakalozos, we do not currently handle the case where a Zookeeper leader election happens without triggering an event that Juju is wired to notice (e.g. the process crashes, but the machine does not go away). If that happens, Juju may have the wrong node flagged as the Zookeeper leader.

Unfortunately, there really isn't an obvious way to handle that. Juju reactive handlers trigger when a Juju event happens; by definition, they can't trigger when something happens that Juju doesn't register as an event.

We've decided that the best thing to do is to document this possibility, and open a ticket to deal with it if it turns out to be something that happens a significant number of times in the field. (An operator could fix things in the meantime by doing a manual rolling restart on the cluster).

@johnsca

johnsca Oct 4, 2016

Owner

Well, we could have the unit respond to ZK events (leader change) and use juju-run on the unit to invoke one of the hook handlers. It's certainly something to keep in mind, but as you said, we can defer it to see how big of an issue it is in practice.

Member

kwmonroe commented Oct 10, 2016

Looks like comments have been address, and group think says to watch this space for edge-case leadership changes.

Sounds good to me.

@kwmonroe kwmonroe merged commit 6f82811 into master Oct 10, 2016

@kwmonroe kwmonroe deleted the feature/auto-rolling-restart branch Oct 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment