raft: use RawNode for node's event loop; clean up bootstrap #10892

tbg · 2019-07-15T22:32:20Z

raft: use RawNode for node's event loop

It has always bugged me that any new feature essentially needed to be
tested twice due to the two ways in which apps can use raft (`*node` and
`*RawNode`). Due to upcoming testing work for joint consensus, now is a
good time to rectify this somewhat.

This commit removes most logic from `(*node).run` and uses `*RawNode`
internally. This simplifies the logic and also lead (via debugging) to some
insight on how the semantics of the approaches differ, which is now
documented in the comments.

raft: clean up bootstrap

This is the first (maybe not last) step in cleaning up the bootstrap code
around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved today
is unclean: The app is supposed to pass peers to StartNode(), we add
configuration changes for them to the log, immediately pretend that they
are applied, but actually leave them unapplied (to give the app a chance to
observe them, though if the app did decide to not apply them things would
really go off the rails), and then return control to the app. The app will
then process the initial Readys and as a result the configuration will be
persisted to disk; restarts of the node then use RestartNode which doesn't
take any peers.

The code that did this lived awkwardly in two places fairly deep down the
callstack, though it was really only necessary in StartNode(). This commit
refactors things to make this more obvious: only StartNode does this dance
now. In particular, RawNode does not support this at all any more; it
expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since the
Storage interface doesn't provide the right accessors for this purpose.
Briefly speaking, we want to make sure that a non-bootstrapped node can
never catch up via the log so that we can implicitly use one of the
"skipped" log entries to represent the configuration change into the
bootstrap configuration. This is an invasive change that affects all
consumers of raft, and it is of lower urgency since the code (post this
commit) already encapsulates the complexity sufficiently.

bdarnell · 2019-07-18T21:02:14Z

The first commit LGTM. I agree in spirit with the second commit, but worry about backwards compatibility. For better or worse, RawNode has supported the same auto-initialization as Node, so someone other than CockroachDB could be depending on it (at least the signature change on NewRawNode will break their build so the failure can't pass silently).

I think that since the first commit reduces the duplication so the bootstrapping only lives in one place, it's probably best if that one place continues to be in RawNode instead of StartNode, at least until the anticipated helpers are available.

It has always bugged me that any new feature essentially needed to be tested twice due to the two ways in which apps can use raft (`*node` and `*RawNode`). Due to upcoming testing work for joint consensus, now is a good time to rectify this somewhat. This commit removes most logic from `(*node).run` and uses `*RawNode` internally. This simplifies the logic and also lead (via debugging) to some insight on how the semantics of the approaches differ, which is now documented in the comments.

This is the first (maybe not last) step in cleaning up the bootstrap code around StartNode. Initializing a Raft group for the first time is awkward, since a configuration has to be pulled from thin air. The way this is solved today is unclean: The app is supposed to pass peers to StartNode(), we add configuration changes for them to the log, immediately pretend that they are applied, but actually leave them unapplied (to give the app a chance to observe them, though if the app did decide to not apply them things would really go off the rails), and then return control to the app. The app will then process the initial Readys and as a result the configuration will be persisted to disk; restarts of the node then use RestartNode which doesn't take any peers. The code that did this lived awkwardly in two places fairly deep down the callstack, though it was really only necessary in StartNode(). This commit refactors things to make this more obvious: only StartNode does this dance now. In particular, RawNode does not support this at all any more; it expects the app to set up its Storage correctly. Future work may provide helpers to make this "preseeding" of the Storage more user-friendly. It isn't entirely straightforward to do so since the Storage interface doesn't provide the right accessors for this purpose. Briefly speaking, we want to make sure that a non-bootstrapped node can never catch up via the log so that we can implicitly use one of the "skipped" log entries to represent the configuration change into the bootstrap configuration. This is an invasive change that affects all consumers of raft, and it is of lower urgency since the code (post this commit) already encapsulates the complexity sufficiently.

We are worried about breaking backwards compatibility for any application out there that may have relied on the old behavior. Their RawNode invocation would have been broken by the removal of the peers argument so it would not have changed silently; an associated comment tells callers how to fix it.

tbg · 2019-07-19T08:03:11Z

I think that since the first commit reduces the duplication so the bootstrapping only lives in one place, it's probably best if that one place continues to be in RawNode instead of StartNode, at least until the anticipated helpers are available.

Makes sense. I didn't want to bring this mostly-useless extra arg back though, so I added RawNode.Bootstrap which is now called from StartNode. A comment on NewRawNode indicates to callers that calling Bootstrap(peers) replaces the previous peers parameter.

It has a data race between the test's call to `reduceUncommittedSize` and a corresponding call during Ready handling in `(*node).run()`. The corresponding RawNode test still verifies the functionality, so instead of fixing the test we can remove it.

I changed `(*RawNode).Ready`'s behavior in etcd-io#10892 in a problematic way. Previously, `Ready()` would create and immediately "accept" a Ready (i.e. commit the app to actually handling it). In etcd-io#10892, Ready() became a pure read-only operation and the "accepting" was moved to `Advance(rd)`. As a result it was illegal to use the RawNode in certain ways while the Ready was being handled. Failure to do so would result in dropped messages (and perhaps worse). For example, with the following operations 1. `rd := rawNode.Ready()` 2. `rawNode.Step(someMsg)` 3. `rawNode.Advance(rd)` `someMsg` would be dropped, because `Advance()` would clear out the outgoing messages thinking that they had all been handled by the client. I mistakenly assumed that this restriction had existed prior, but this is incorrect. I noticed this while trying to pick up the above PR in CockroachDB, where it caused unit test failures, precisely due to the above example. This PR reestablishes the previous behavior (result of `Ready()` must be handled by the app) and adds a regression test. While I was there, I carried out a few small clarifying refactors.

tbg mentioned this pull request Jul 16, 2019

enabling reviewable etcd-io/maintainers#13

Open

tbg force-pushed the rawnode-everywhere-attempt3 branch from 4d598d8 to ecf60c2 Compare July 16, 2019 09:08

tbg requested a review from bdarnell July 18, 2019 08:30

tbg force-pushed the rawnode-everywhere-attempt3 branch from ecf60c2 to 069125b Compare July 18, 2019 11:42

tbg added 3 commits July 19, 2019 09:59

tbg force-pushed the rawnode-everywhere-attempt3 branch from 0aadfa8 to 500af91 Compare July 19, 2019 08:02

tbg merged commit 3c5e2f5 into etcd-io:master Jul 19, 2019

tbg deleted the rawnode-everywhere-attempt3 branch July 19, 2019 12:30

tbg mentioned this pull request Jul 23, 2019

raft: require app to consume result from Ready() #10920

Merged

absolute8511 added a commit to youzan/ZanRedisDB that referenced this pull request Jul 8, 2021

feat: merge optimize from etcd etcd-io/etcd#10892 & etcd-io/etcd#10920

79f1c52

Param-S mentioned this pull request Feb 7, 2022

Protobuf and etcd upgrade hyperledger/fabric#2997

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: use RawNode for node's event loop; clean up bootstrap #10892

raft: use RawNode for node's event loop; clean up bootstrap #10892

tbg commented Jul 15, 2019 •

edited

Loading

bdarnell commented Jul 18, 2019

tbg commented Jul 19, 2019 •

edited

Loading

raft: use RawNode for node's event loop; clean up bootstrap #10892

raft: use RawNode for node's event loop; clean up bootstrap #10892

Conversation

tbg commented Jul 15, 2019 • edited Loading

bdarnell commented Jul 18, 2019

tbg commented Jul 19, 2019 • edited Loading

tbg commented Jul 15, 2019 •

edited

Loading

tbg commented Jul 19, 2019 •

edited

Loading