-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shared_token_bucket: Make duration->tokens conversion more solid #1723
Merged
avikivity
merged 1 commit into
scylladb:master
from
xemul:br-shared-token-bucket-fix-replenish-rounding
Aug 7, 2023
Merged
shared_token_bucket: Make duration->tokens conversion more solid #1723
avikivity
merged 1 commit into
scylladb:master
from
xemul:br-shared-token-bucket-fix-replenish-rounding
Aug 7, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The t.b. should be replenished from time to time. When replenishing happens it calcualtes the duration between now and last replenishment and converts it to the amount of gained tokens. Since duration is a floating point number and tokens are integers, the convertion may lose some precision, e.g. over a time the bucket mush accumulate 2.718 tokens, but it can only accumulate 2 or 3. Current code rounds the floating point result to integer in the hope that rounding errors compenate each other on average. However, if the assumption doesn't hold, those deviation may accumulate resulting in wrong replenishing rate. This effects is additionally compensated by the minimal amount of tokens to replenish (called threshold), but it still exists. There's a simple yet robust fix to that. With the integer number of tokens at hand convert it back to the duration it _had_ to be to get this exact amount of tokens and update the "last time replenished" with the corrected duration. refs: scylladb#1641 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul
added a commit
to xemul/seastar
that referenced
this pull request
Jul 6, 2023
The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows three problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
@avikivity , please review |
1 similar comment
@avikivity , please review |
xemul
added a commit
to xemul/seastar
that referenced
this pull request
Jul 29, 2023
The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 30% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 966646.1 t/s perf_capped_context.sleeping_throughput: 838035.2 t/s perf_pure_context.sleeping_throughput_with_hog: 317685.3 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul
added a commit
to xemul/seastar
that referenced
this pull request
Jul 29, 2023
The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 30% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 966646.1 t/s perf_capped_context.sleeping_throughput: 838035.2 t/s perf_capped_context.sleeping_throughput_with_hog: 317685.3 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul
added a commit
to xemul/seastar
that referenced
this pull request
Jul 31, 2023
The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 50% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 999149.3 t/s perf_capped_context.sleeping_throughput: 859995.9 t/s perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul
added a commit
to xemul/seastar
that referenced
this pull request
Jul 31, 2023
The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 50% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 999149.3 t/s perf_capped_context.sleeping_throughput: 859995.9 t/s perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
avikivity
pushed a commit
that referenced
this pull request
Aug 7, 2023
The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by #1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by #1723) - When run with CPU hogs the individual shard's rates may differ too much (see #1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 50% of the configured (see #1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 999149.3 t/s perf_capped_context.sleeping_throughput: 859995.9 t/s perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The t.b. should be replenished from time to time. When replenishing happens it calcualtes the duration between now and last replenishment and converts it to the amount of gained tokens. Since duration is a floating point number and tokens are integers, the convertion may lose some precision, e.g. over a time the bucket mush accumulate 2.718 tokens, but it can only accumulate 2 or 3. Current code rounds the floating point result to integer in the hope that rounding errors compenate each other on average.
However, if the assumption doesn't hold, those deviation may accumulate resulting in wrong replenishing rate. This effects is additionally compensated by the minimal amount of tokens to replenish (called threshold), but it still exists.
There's a simple yet robust fix to that. With the integer number of tokens at hand convert it back to the duration it had to be to get this exact amount of tokens and update the "last time replenished" with the corrected duration.
refs: #1641