-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make ping cache rate limit delay configurable #27955
make ping cache rate limit delay configurable #27955
Conversation
core/src/serve_repair.rs
Outdated
// Rate limit to 4x the repair request iteration delay | ||
const REPAIR_PING_CACHE_RATE_LIMIT_DELAY: Duration = Duration::from_millis(4 * REPAIR_MS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
400ms here sounds very low to me. the packet has to do a round trip and be processed at both nodes.
Also what is the reasoning to define this in terms of REPAIR_MS
? i.e why should changing REPAIR_MS
change this value too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decoupled the delay from REPAIR_MS
. I'm using a small multiple of REPAIR_MS
because the repair request side is only waiting REPAIR_MS
before sending the next batch of repair requests. If a ping or pong is lost the request side will not be able to successfully get a repair processed until the rate_limit_delay
expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but if that is like a few seconds it shouldn't do much harm and also the delay should at least be long enough for:
- the ping is serialized and sent to the socket.
- the packet travels to the other node
- the other node processes the packet, and sends back the pong response
- the packet travels back to this node
- this node processes the packet and register the response.
Otherwise each time we are sending 2 ping packets.
Picking the delay shorter than above, wouldn't make things faster; it would only add overhead.
US-Europe round-trip traversal for a packet seems to be ~400ms at least based on some numbers I had seen previously. And that is just the packet traversal and not accounting for any processing latencies.
I think we can set this delay to >= 2 seconds or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased to 2s.
gossip/src/crds_gossip_pull.rs
Outdated
Duration::from_secs(20 * 60), // ttl | ||
Duration::from_secs(20 * 60) / 64, // ttl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both lines are commented as ttl
.
also would be much better to use rate_limit_delay
instead of delay
, so that it would be more clear and easier to find all references across files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed that one, updated to rate_limit_delay
gossip/src/ping_pong.rs
Outdated
pub fn new(ttl: Duration, rate_limit_delay: Duration, cap: usize) -> Self { | ||
Self { | ||
ttl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably would be good to put an assert
here that the rate_limit_delay
is sufficiently smaller than ttl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an assert
for sanity checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
(cherry picked from commit 8b0f9b4)
@mergify backport v1.14 |
✅ Backports have been created
|
(cherry picked from commit 8b0f9b4)
(cherry picked from commit 8b0f9b4) # Conflicts: # core/src/serve_repair.rs
Problem
If a ping or subsequent pong packet is not received or is dropped
PingCache
rate limit delay will not create a new ping packet until the rate limit delay has been reached.Ping cache uses a rate limit delay of
TTL / 64
. This should be configurable.Summary of Changes
PingCache
RepairService::run
is 100ms.Fixes #