Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ping cache rate limit delay configurable #27955

Merged
merged 8 commits into from
Sep 26, 2022

Conversation

jbiseda
Copy link
Contributor

@jbiseda jbiseda commented Sep 21, 2022

Problem

If a ping or subsequent pong packet is not received or is dropped PingCache rate limit delay will not create a new ping packet until the rate limit delay has been reached.
Ping cache uses a rate limit delay of TTL / 64. This should be configurable.

Summary of Changes

  • Add configurable rate limit delay parameter to PingCache
  • Set repair ping cache rate limit to 500ms The delay between repair cycles in RepairService::run is 100ms.

Fixes #

@jbiseda jbiseda marked this pull request as ready for review September 24, 2022 01:54
Comment on lines 79 to 80
// Rate limit to 4x the repair request iteration delay
const REPAIR_PING_CACHE_RATE_LIMIT_DELAY: Duration = Duration::from_millis(4 * REPAIR_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400ms here sounds very low to me. the packet has to do a round trip and be processed at both nodes.

Also what is the reasoning to define this in terms of REPAIR_MS? i.e why should changing REPAIR_MS change this value too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decoupled the delay from REPAIR_MS. I'm using a small multiple of REPAIR_MS because the repair request side is only waiting REPAIR_MS before sending the next batch of repair requests. If a ping or pong is lost the request side will not be able to successfully get a repair processed until the rate_limit_delay expires.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but if that is like a few seconds it shouldn't do much harm and also the delay should at least be long enough for:

  • the ping is serialized and sent to the socket.
  • the packet travels to the other node
  • the other node processes the packet, and sends back the pong response
  • the packet travels back to this node
  • this node processes the packet and register the response.

Otherwise each time we are sending 2 ping packets.
Picking the delay shorter than above, wouldn't make things faster; it would only add overhead.

US-Europe round-trip traversal for a packet seems to be ~400ms at least based on some numbers I had seen previously. And that is just the packet traversal and not accounting for any processing latencies.

I think we can set this delay to >= 2 seconds or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased to 2s.

Comment on lines 1112 to 1113
Duration::from_secs(20 * 60), // ttl
Duration::from_secs(20 * 60) / 64, // ttl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both lines are commented as ttl.
also would be much better to use rate_limit_delay instead of delay, so that it would be more clear and easier to find all references across files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that one, updated to rate_limit_delay

Comment on lines 158 to 160
pub fn new(ttl: Duration, rate_limit_delay: Duration, cap: usize) -> Self {
Self {
ttl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably would be good to put an assert here that the rate_limit_delay is sufficiently smaller than ttl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an assert for sanity checking

Copy link
Contributor

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jbiseda jbiseda merged commit 8b0f9b4 into solana-labs:master Sep 26, 2022
mergify bot pushed a commit that referenced this pull request Sep 26, 2022
@jbiseda
Copy link
Contributor Author

jbiseda commented Sep 26, 2022

@mergify backport v1.14

@mergify
Copy link
Contributor

mergify bot commented Sep 26, 2022

backport v1.14

✅ Backports have been created

brooksprumo pushed a commit that referenced this pull request Sep 27, 2022
mergify bot added a commit that referenced this pull request Sep 27, 2022
make ping cache rate limit delay configurable (#27955)

(cherry picked from commit 8b0f9b4)

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
@jbiseda jbiseda added the v1.13 label Oct 26, 2022
mergify bot pushed a commit that referenced this pull request Oct 26, 2022
(cherry picked from commit 8b0f9b4)

# Conflicts:
#	core/src/serve_repair.rs
mergify bot added a commit that referenced this pull request Oct 26, 2022
* make ping cache rate limit delay configurable (#27955)

(cherry picked from commit 8b0f9b4)

# Conflicts:
#	core/src/serve_repair.rs

* resolve merge conflicts

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
@jbiseda jbiseda deleted the ping-cache-rate-limit-config branch February 1, 2023 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants