shared_token_bucket: Make duration->tokens conversion more solid #1723

xemul · 2023-07-06T17:47:12Z

The t.b. should be replenished from time to time. When replenishing happens it calcualtes the duration between now and last replenishment and converts it to the amount of gained tokens. Since duration is a floating point number and tokens are integers, the convertion may lose some precision, e.g. over a time the bucket mush accumulate 2.718 tokens, but it can only accumulate 2 or 3. Current code rounds the floating point result to integer in the hope that rounding errors compenate each other on average.

However, if the assumption doesn't hold, those deviation may accumulate resulting in wrong replenishing rate. This effects is additionally compensated by the minimal amount of tokens to replenish (called threshold), but it still exists.

There's a simple yet robust fix to that. With the integer number of tokens at hand convert it back to the duration it had to be to get this exact amount of tokens and update the "last time replenished" with the corrected duration.

refs: #1641

The t.b. should be replenished from time to time. When replenishing happens it calcualtes the duration between now and last replenishment and converts it to the amount of gained tokens. Since duration is a floating point number and tokens are integers, the convertion may lose some precision, e.g. over a time the bucket mush accumulate 2.718 tokens, but it can only accumulate 2 or 3. Current code rounds the floating point result to integer in the hope that rounding errors compenate each other on average. However, if the assumption doesn't hold, those deviation may accumulate resulting in wrong replenishing rate. This effects is additionally compensated by the minimal amount of tokens to replenish (called threshold), but it still exists. There's a simple yet robust fix to that. With the integer number of tokens at hand convert it back to the duration it _had_ to be to get this exact amount of tokens and update the "last time replenished" with the corrected duration. refs: scylladb#1641 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows three problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

xemul · 2023-07-19T05:49:53Z

@avikivity , please review

xemul · 2023-07-28T07:32:55Z

@avikivity , please review

The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 30% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 966646.1 t/s perf_capped_context.sleeping_throughput: 838035.2 t/s perf_pure_context.sleeping_throughput_with_hog: 317685.3 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 30% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 966646.1 t/s perf_capped_context.sleeping_throughput: 838035.2 t/s perf_capped_context.sleeping_throughput_with_hog: 317685.3 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by scylladb#1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by scylladb#1723) - When run with CPU hogs the individual shard's rates may differ too much (see scylladb#1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 50% of the configured (see scylladb#1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 999149.3 t/s perf_capped_context.sleeping_throughput: 859995.9 t/s perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

The test checks if the token-bucket "rate" is held under various circumstances: - when shards sleep between grabbing tokens - when shards poll the t.b. frequently - when shards are disturbed with CPU hogs So far the test shows four problems: - With few shards tokens deficiency produces zero sleep time, so the "good" user that sleeps between grabs effectively converts into a polling ("bad") user (fixed by #1722) - Sometimes replenishing rounding errors accumulate and render lower resulting rate than configured (fixed by #1723) - When run with CPU hogs the individual shard's rates may differ too much (see #1083). E.g. the bucket configured with the rate of 100k tokens/sec, 48 shards, run 4 seconds. "Slowest" shard vs "fastest" shards get this amount of tokens: no hog: 6931 ... 9631 with hog: 2135 ... 29412 (sum rate is 100k with the aforementioned fixes) - With "capped-release" token bucket and token releasing by-timer with the configured rate and hogs the resulting throughput can be as low as 50% of the configured (see #1641) Created token-bucket 1000000.0 t/s perf_pure_context.sleeping_throughput_with_hog: 999149.3 t/s perf_capped_context.sleeping_throughput: 859995.9 t/s perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

xemul requested a review from avikivity July 6, 2023 17:47

xemul mentioned this pull request Jul 6, 2023

tests: Add perf test for shard_token_bucket #1724

Merged

avikivity merged commit fa80682 into scylladb:master Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shared_token_bucket: Make duration->tokens conversion more solid #1723

shared_token_bucket: Make duration->tokens conversion more solid #1723

xemul commented Jul 6, 2023

xemul commented Jul 19, 2023

xemul commented Jul 28, 2023

shared_token_bucket: Make duration->tokens conversion more solid #1723

shared_token_bucket: Make duration->tokens conversion more solid #1723

Conversation

xemul commented Jul 6, 2023

xemul commented Jul 19, 2023

xemul commented Jul 28, 2023