Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: add MaxInflightBytes limit to the outgoing messages flow #14624

Merged
merged 5 commits into from
Nov 14, 2022

Conversation

pav-kv
Copy link
Contributor

@pav-kv pav-kv commented Oct 25, 2022

Raft flow control limits the message size and the number of inflight messages. However, a single large entry that exceeds the size limit can still be sent. In combination with the max messages count limit, many large messages can be sent in a row and overflow the receiver. In effect, the "max" size value acts as "target" which can be missed.

This PR adds an additional soft limit on the total size of inflight messages, which catches such situations and prevents the receiver overflow.

@codecov-commenter
Copy link

codecov-commenter commented Oct 25, 2022

Codecov Report

Merging #14624 (0ef5df1) into main (0e4bf4a) will decrease coverage by 0.01%.
The diff coverage is 91.42%.

@@            Coverage Diff             @@
##             main   #14624      +/-   ##
==========================================
- Coverage   75.47%   75.45%   -0.02%     
==========================================
  Files         457      457              
  Lines       37320    37330      +10     
==========================================
+ Hits        28166    28168       +2     
- Misses       7376     7385       +9     
+ Partials     1778     1777       -1     
Flag Coverage Δ
all 75.45% <91.42%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
raft/raft.go 89.39% <80.00%> (-0.20%) ⬇️
raft/confchange/confchange.go 82.06% <100.00%> (ø)
raft/tracker/inflights.go 100.00% <100.00%> (+4.16%) ⬆️
raft/tracker/progress.go 95.91% <100.00%> (ø)
raft/tracker/tracker.go 95.57% <100.00%> (+0.03%) ⬆️
server/storage/mvcc/watchable_store.go 84.42% <0.00%> (-7.61%) ⬇️
client/v3/concurrency/session.go 88.63% <0.00%> (-4.55%) ⬇️
server/etcdserver/api/v3rpc/interceptor.go 76.56% <0.00%> (-3.13%) ⬇️
server/etcdserver/api/v3rpc/watch.go 82.53% <0.00%> (-2.23%) ⬇️
client/v3/retry_interceptor.go 66.36% <0.00%> (-0.91%) ⬇️
... and 13 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pav-kv
Copy link
Contributor Author

pav-kv commented Oct 26, 2022

@tbg

raft/tracker/inflights.go Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the limit_inflight_bytes branch 2 times, most recently from d1f3221 to 626be8a Compare October 26, 2022 12:29
@pav-kv pav-kv marked this pull request as ready for review October 26, 2022 12:38
@tbg tbg self-requested a review October 26, 2022 13:08
raft/raft.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have only skimmed this now, I think this will be close to ready when the precursor PR is in. 👍🏽

raft/raft.go Show resolved Hide resolved
@pav-kv pav-kv marked this pull request as draft November 1, 2022 23:06
@pav-kv pav-kv force-pushed the limit_inflight_bytes branch 2 times, most recently from 637cd6a to cd3de31 Compare November 4, 2022 14:00
@pav-kv pav-kv marked this pull request as ready for review November 4, 2022 14:02
@pav-kv
Copy link
Contributor Author

pav-kv commented Nov 4, 2022

@tbg I rebased this on top of #14633 which is approved and nearly done, so this PR is ready for review too, just ignore the first 6 commits.

@ahrtr ahrtr added the area/raft label Nov 7, 2022
raft/confchange/confchange.go Outdated Show resolved Hide resolved
raft/tracker/inflights.go Outdated Show resolved Hide resolved
raft/tracker/inflights.go Show resolved Hide resolved
raft/tracker/inflights.go Outdated Show resolved Hide resolved
raft/tracker/progress.go Show resolved Hide resolved
@tbg
Copy link
Contributor

tbg commented Nov 7, 2022

cc @ahrtr you probably want to take a look as well.

@ahrtr
Copy link
Member

ahrtr commented Nov 7, 2022

cc @ahrtr you probably want to take a look as well.

I see there are some duplicated code with #14633, so let's get that one resolved firstly.

@pav-kv pav-kv force-pushed the limit_inflight_bytes branch 2 times, most recently from 615df18 to 30c9fdb Compare November 8, 2022 23:04
raft/tracker/inflights.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Nov 9, 2022

Overall looks good to me.

Probably we should add a check: the MaxInflightBytes shouldn't be smaller than MaxSizePerMsg.

@ahrtr
Copy link
Member

ahrtr commented Nov 9, 2022

Please also add two changelog items, one for this feature, and the other for #14633

@pav-kv
Copy link
Contributor Author

pav-kv commented Nov 9, 2022

Probably we should add a check: the MaxInflightBytes shouldn't be smaller than MaxSizePerMsg.

@ahrtr I'm on the fence here, since both are soft limits that can be violated slightly anyway, and both are used independently. If they were hard limits, it would make more sense, because it would be possible to not send anything if the sizes are misconfigured.

Other than this, I addressed all comments. Want to take another look?

@pav-kv
Copy link
Contributor Author

pav-kv commented Nov 9, 2022

Maybe we should not overload maxSize = 0 in places other than the config. Instead, we could reset MaxInflightBytes to noLimit in the validate func, similarly to how it's done for MaxUncommittedEntriesSize:

etcd/raft/raft.go

Lines 228 to 230 in 215b799

if c.MaxUncommittedEntriesSize == 0 {
c.MaxUncommittedEntriesSize = noLimit
}

@tbg @ahrtr Thoughts?

@tbg
Copy link
Contributor

tbg commented Nov 9, 2022

Maybe we should not overload maxSize = 0 in places other than the config.

Sounds good to me as a pattern to follow, and since there's precedent we should be consistent.

Probably we should add a check: the MaxInflightBytes shouldn't be smaller than MaxSizePerMsg.

@pavelkalinnikov while you're here mind giving MaxSizePerMsg a similar comment to the one we wrote for MaxInflightBytes (bandwidth-delay product etc), and make sure they cross-reference each other?

I think @ahrtr has a point.

etcd/raft/raft.go

Lines 460 to 462 in 215b799

if pr.State != tracker.StateReplicate || !pr.Inflights.Full() {
ents, erre = r.raftLog.entries(pr.Next, r.maxMsgSize)
}

It is very likely a misconfiguration to have MaxSizePerMsg >> MaxInflightBytes; I don't see why anyone would set that in practice. It is true that MaxInflightBytes can overshoot, but in an ideal world it would overshoot "minimally", i.e. by exactly one entry.

The code above could be improved: rather than pulling entries until the sum is >= MaxSizePerMsg and then shoving that into the tracker, more "optimal" code would be this (mod bugs maybe :-) )

	var maxBytes uint64
	if pr.State == tracker.StateReplicate {
		maxBytes = min(r.maxMsgSize, pr.Inflights.BytesBudget())
	} else {
		// When follower is being probed, send only one entry.
		// TODO: should we send *no* entries? Or maybe the optimistic
		// strategy of sending maxMsgSize worth of entries is actually
		// better in practice? We've seen some *very divergent* raft logs
		// in CRDB incidents so I'm inclined to be conservative here and
		// send as little data as possible.
		maxBytes = 1
	}
	var ents []pb.Entry
	var erre error
	// In a throttled StateReplicate only send empty MsgApp, to ensure progress.
	// Otherwise, if we had a full Inflights and all inflight messages were in
	// fact dropped, replication to that follower would stall. Instead, an empty
	// MsgApp will eventually reach the follower (heartbeats responses prompt the
	// leader to send an append), allowing it to be acked or rejected, both of
	// which will clear out Inflights.
	if maxBytes > 0 {
		ents, erre = r.raftLog.entries(pr.Next, r.maxMsgSize)
	}

	if len(ents) == 0 && !sendIfEmpty {
		return false
	}

and at that point it should be clear that it makes no sense to set maxMsgSize to anything larger than MaxInflightBytes.

CHANGELOG/CHANGELOG-3.6.md Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the limit_inflight_bytes branch 2 times, most recently from e1cb79e to 282b773 Compare November 10, 2022 21:17
@pav-kv
Copy link
Contributor Author

pav-kv commented Nov 10, 2022

@tbg Added a couple of checks in the config, as suggested by you and @ahrtr.

@tbg I like this bit:

if pr.State == tracker.StateReplicate {
	maxBytes = min(r.maxMsgSize, pr.Inflights.BytesBudget())
} else {

The maxBytes = 1 bit in the probing state would change the current behaviour, so I would defer it to another PR / thread of thinking, and for now only do:

maxBytes := r.maxMsgSize
if pr.State == tracker.StateReplicate {
	maxBytes = min(maxBytes, pr.Inflights.BytesBudget())
}
// ...
if maxBytes > 0 {
	ents, erre = r.raftLog.entries(pr.Next, maxBytes)
}

Although I think it would break the edge case of MaxSizePerMsg == 0. Currently it results in always sending one entry, but with the maxBytes > 0 guard we would rule it out, and never send any entries. Slightly inclined to leave as is, but will poke at the BytesBudget() idea in general tomorrow.

@tbg
Copy link
Contributor

tbg commented Nov 13, 2022

Although I think it would break the edge case of MaxSizePerMsg == 0

Could we map 0 -> 1 when installing the config?

Slightly inclined to leave as is, but will poke at the BytesBudget() idea in general tomorrow.

Sounds good, let's do a follow-up soon while we remember this piece of code.

The Inflights type has limits on the message size and the number of inflight
messages. However, a single large entry that exceeds the size limit can still
be sent. In combination with the max messages count limit, many large messages
can be sent in a row and overflow the receiver. In effect, the "max" values act
as "target" rather than hard limits.

This commit adds an additional soft limit on the total size of inflight
messages, which catches such situations and prevents the receiver overflow.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
This commit plumbs the max total byte size of the Inflights type higher up the
stack to the ProgressTracker.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
This commit introduces the max inflight bytes setting at the Config level, and
tests that raft flow control honours it.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank yo @pavelkalinnikov

@tbg tbg merged commit e64d644 into etcd-io:main Nov 14, 2022
@pav-kv pav-kv deleted the limit_inflight_bytes branch November 14, 2022 11:17
@pav-kv pav-kv restored the limit_inflight_bytes branch November 14, 2022 11:23
@pav-kv pav-kv deleted the limit_inflight_bytes branch November 14, 2022 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants