New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce cache retry load #23025
Merged
Merged
reduce cache retry load #23025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fspmarshall
force-pushed
the
fspmarshall/reduce-cache-retry-load
branch
from
March 14, 2023 04:55
70b9d2c
to
c1266b8
Compare
espadolini
reviewed
Mar 14, 2023
Wouldn't hurt to rerun the GHA tests a few extra times on this, as these types of changes have caused flakiness in the past. |
fspmarshall
force-pushed
the
fspmarshall/reduce-cache-retry-load
branch
from
March 20, 2023 17:56
c1266b8
to
09c8df4
Compare
fspmarshall
force-pushed
the
fspmarshall/reduce-cache-retry-load
branch
from
March 20, 2023 18:25
09c8df4
to
6162d76
Compare
espadolini
approved these changes
Mar 20, 2023
gabrielcorado
approved these changes
Mar 20, 2023
fspmarshall
force-pushed
the
fspmarshall/reduce-cache-retry-load
branch
from
March 20, 2023 19:55
6162d76
to
8fe0141
Compare
justinas
pushed a commit
that referenced
this pull request
Apr 18, 2023
* add exponential backoff * improve cache backoff
justinas
added a commit
that referenced
this pull request
Apr 18, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduces a new exponential backoff type and applies it to non-control-plane caches. The intent of this change is to help us better reduce thundering herd effects in large clusters (10k+ agents).
The new exponential backoff starts off with comparatively sized delays, but escalates much quicker on recurring errors. We also increase the maximum retry for non-control-plane caches from 90s to 256s, though most users will actually experience an increase from 60s to 256s since the switch from 60s to 90s was very recent.
The old standard retry behavior resulted in an approximate sequence of
0s, 12s, 24s, 36s, 48s, 60s
for all instances. The new behavior as of this change results in an approximate sequence of0s, 5s, 10s, 20s, 40s, 80s, 90s
for control-plane elements and an approximate sequence of0s, 16s, 32s, 64s, 128s, 256s
for peripheral agents.In addition to the primary behavioral changes described above, this PR also makes two more minor changes. Caches now allow their full max backoff for the init event to show up instead of a fixed one minute (slow init event propagation has been observed as a source of excessive cache re-init errors in high load clusters). The maximum backoff used by the cache is now also configurable like so:
While the default value of 256s is probably sufficiently large for most clusters, very large clusters might benefit from bumping this even higher.