Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: Introduce CommittedEntries pagination #9982

Merged
merged 2 commits into from
Aug 11, 2018

Conversation

bdarnell
Copy link
Contributor

@bdarnell bdarnell commented Aug 6, 2018

The MaxSizePerMsg setting is now used to limit the size of
Ready.CommittedEntries. This prevents out-of-memory errors if the raft
log has become very large and commits all at once.

I just used the same size limit as we have for MsgApp, although I'd be willing to make it a separate setting if you think it would be useful to configure them separately.

Ensure that this limit is respected when generating MsgApp messages.
@xiang90 xiang90 self-assigned this Aug 6, 2018
@codecov-io
Copy link

Codecov Report

Merging #9982 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9982      +/-   ##
==========================================
+ Coverage   69.34%   69.35%   +<.01%     
==========================================
  Files         386      386              
  Lines       35914    35919       +5     
==========================================
+ Hits        24905    24912       +7     
+ Misses       9212     9209       -3     
- Partials     1797     1798       +1
Impacted Files Coverage Δ
raft/raft.go 91.96% <ø> (+0.69%) ⬆️
raft/log.go 79.12% <100%> (+0.11%) ⬆️
raft/node.go 90.98% <100%> (-1.45%) ⬇️
pkg/netutil/netutil.go 63.11% <0%> (-6.56%) ⬇️
pkg/logutil/zap_grpc.go 47.61% <0%> (-4.77%) ⬇️
pkg/transport/listener.go 58.67% <0%> (-4.09%) ⬇️
mvcc/watcher.go 96.29% <0%> (-3.71%) ⬇️
clientv3/concurrency/election.go 79.52% <0%> (-2.37%) ⬇️
pkg/adt/interval_tree.go 90.39% <0%> (-0.91%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93be31d...84c537d. Read the comment docs.

raft/log.go Outdated
}

// newLog returns log using the given storage. It recovers the log to the state
// that it just commits and applies the latest snapshot.
func newLog(storage Storage, logger Logger) *raftLog {
func newLog(storage Storage, logger Logger, maxMsgSize uint64) *raftLog {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe provide another function newLogWithSize, so we don't need to change origin newLog in many places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The MaxSizePerMsg setting is now used to limit the size of
Ready.CommittedEntries. This prevents out-of-memory errors if the raft
log has become very large and commits all at once.
@xiang90
Copy link
Contributor

xiang90 commented Aug 8, 2018

lgtm. defer to @siddontang

@bdarnell
Copy link
Contributor Author

Any more comments @siddontang ?

@siddontang
Copy link
Contributor

LGTM

@xiang90 xiang90 merged commit 11dd0b5 into etcd-io:master Aug 11, 2018
bdarnell added a commit to cockroachdb/vendored that referenced this pull request Aug 13, 2018
Picks up etcd-io/etcd#9982 and etcd-io/etcd#9985 (and no other changes
to packages we use).
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Aug 13, 2018
Picks up etcd-io/etcd#9982 and etcd-io/etcd#9985 (and no other changes
to packages we use).

Fixes cockroachdb#27983
Fixes cockroachdb#27804

Release note (bug fix): Additional fixes for out-of-memory errors
caused by very large raft logs.

Release note (performance improvement): Greatly improved performance
when catching up followers that are behind when raft logs are large.
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Aug 13, 2018
28511: vendor: Update etcd r=tschottdorf a=bdarnell

Picks up etcd-io/etcd#9982 and etcd-io/etcd#9985 (and no other changes
to packages we use).

Fixes #27983
Fixes #27804

Release note (bug fix): Additional fixes for out-of-memory errors
caused by very large raft logs.

Release note (performance improvement): Greatly improved performance
when catching up followers that are behind when raft logs are large.

Co-authored-by: Ben Darnell <ben@bendarnell.com>
tbg added a commit to tbg/etcd that referenced this pull request Sep 4, 2018
In etcd-io#9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism works was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: cockroachdb/cockroach#28918 (comment)
tbg added a commit to tbg/etcd that referenced this pull request Sep 4, 2018
In etcd-io#9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: cockroachdb/cockroach#28918 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants