Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared_token_bucket: Make duration->tokens conversion more solid #1723

Conversation

xemul
Copy link
Contributor

@xemul xemul commented Jul 6, 2023

The t.b. should be replenished from time to time. When replenishing happens it calcualtes the duration between now and last replenishment and converts it to the amount of gained tokens. Since duration is a floating point number and tokens are integers, the convertion may lose some precision, e.g. over a time the bucket mush accumulate 2.718 tokens, but it can only accumulate 2 or 3. Current code rounds the floating point result to integer in the hope that rounding errors compenate each other on average.

However, if the assumption doesn't hold, those deviation may accumulate resulting in wrong replenishing rate. This effects is additionally compensated by the minimal amount of tokens to replenish (called threshold), but it still exists.

There's a simple yet robust fix to that. With the integer number of tokens at hand convert it back to the duration it had to be to get this exact amount of tokens and update the "last time replenished" with the corrected duration.

refs: #1641

The t.b. should be replenished from time to time. When replenishing
happens it calcualtes the duration between now and last replenishment
and converts it to the amount of gained tokens. Since duration is a
floating point number and tokens are integers, the convertion may lose
some precision, e.g. over a time the bucket mush accumulate 2.718
tokens, but it can only accumulate 2 or 3. Current code rounds the
floating point result to integer in the hope that rounding errors
compenate each other on average.

However, if the assumption doesn't hold, those deviation may accumulate
resulting in wrong replenishing rate. This effects is additionally
compensated by the minimal amount of tokens to replenish (called
threshold), but it still exists.

There's a simple yet robust fix to that. With the integer number of
tokens at hand convert it back to the duration it _had_ to be to get
this exact amount of tokens and update the "last time replenished" with
the corrected duration.

refs: scylladb#1641

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
@xemul xemul requested a review from avikivity July 6, 2023 17:47
xemul added a commit to xemul/seastar that referenced this pull request Jul 6, 2023
The test checks if the token-bucket "rate" is held under various
circumstances:

- when shards sleep between grabbing tokens
- when shards poll the t.b. frequently
- when shards are disturbed with CPU hogs

So far the test shows three problems:

- With few shards tokens deficiency produces zero sleep time, so the
  "good" user that sleeps between grabs effectively converts into a
  polling ("bad") user (fixed by scylladb#1722)

- Sometimes replenishing rounding errors accumulate and render lower
  resulting rate than configured (fixed by scylladb#1723)

- When run with CPU hogs the individual shard's rates may differ too
  much (see scylladb#1083). E.g. the bucket configured with the rate of 100k
  tokens/sec, 48 shards, run 4 seconds.

  "Slowest" shard vs "fastest" shards get this amount of tokens:

    no hog:   6931 ... 9631
    with hog: 2135 ... 29412

  (sum rate is 100k with the aforementioned fixes)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
@xemul
Copy link
Contributor Author

xemul commented Jul 19, 2023

@avikivity , please review

1 similar comment
@xemul
Copy link
Contributor Author

xemul commented Jul 28, 2023

@avikivity , please review

xemul added a commit to xemul/seastar that referenced this pull request Jul 29, 2023
The test checks if the token-bucket "rate" is held under various
circumstances:

- when shards sleep between grabbing tokens
- when shards poll the t.b. frequently
- when shards are disturbed with CPU hogs

So far the test shows four problems:

- With few shards tokens deficiency produces zero sleep time, so the
  "good" user that sleeps between grabs effectively converts into a
  polling ("bad") user (fixed by scylladb#1722)

- Sometimes replenishing rounding errors accumulate and render lower
  resulting rate than configured (fixed by scylladb#1723)

- When run with CPU hogs the individual shard's rates may differ too
  much (see scylladb#1083). E.g. the bucket configured with the rate of 100k
  tokens/sec, 48 shards, run 4 seconds.

  "Slowest" shard vs "fastest" shards get this amount of tokens:

    no hog:   6931 ... 9631
    with hog: 2135 ... 29412

  (sum rate is 100k with the aforementioned fixes)

- With "capped-release" token bucket and token releasing by-timer with
  the configured rate and hogs the resulting throughput can be as low as
  30% of the configured (see scylladb#1641)

  Created token-bucket 1000000.0 t/s
  perf_pure_context.sleeping_throughput_with_hog:   966646.1 t/s
  perf_capped_context.sleeping_throughput:          838035.2 t/s
  perf_pure_context.sleeping_throughput_with_hog:   317685.3 t/s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul added a commit to xemul/seastar that referenced this pull request Jul 29, 2023
The test checks if the token-bucket "rate" is held under various
circumstances:

- when shards sleep between grabbing tokens
- when shards poll the t.b. frequently
- when shards are disturbed with CPU hogs

So far the test shows four problems:

- With few shards tokens deficiency produces zero sleep time, so the
  "good" user that sleeps between grabs effectively converts into a
  polling ("bad") user (fixed by scylladb#1722)

- Sometimes replenishing rounding errors accumulate and render lower
  resulting rate than configured (fixed by scylladb#1723)

- When run with CPU hogs the individual shard's rates may differ too
  much (see scylladb#1083). E.g. the bucket configured with the rate of 100k
  tokens/sec, 48 shards, run 4 seconds.

  "Slowest" shard vs "fastest" shards get this amount of tokens:

    no hog:   6931 ... 9631
    with hog: 2135 ... 29412

  (sum rate is 100k with the aforementioned fixes)

- With "capped-release" token bucket and token releasing by-timer with
  the configured rate and hogs the resulting throughput can be as low as
  30% of the configured (see scylladb#1641)

  Created token-bucket 1000000.0 t/s
  perf_pure_context.sleeping_throughput_with_hog:   966646.1 t/s
  perf_capped_context.sleeping_throughput:          838035.2 t/s
  perf_capped_context.sleeping_throughput_with_hog: 317685.3 t/s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul added a commit to xemul/seastar that referenced this pull request Jul 31, 2023
The test checks if the token-bucket "rate" is held under various
circumstances:

- when shards sleep between grabbing tokens
- when shards poll the t.b. frequently
- when shards are disturbed with CPU hogs

So far the test shows four problems:

- With few shards tokens deficiency produces zero sleep time, so the
  "good" user that sleeps between grabs effectively converts into a
  polling ("bad") user (fixed by scylladb#1722)

- Sometimes replenishing rounding errors accumulate and render lower
  resulting rate than configured (fixed by scylladb#1723)

- When run with CPU hogs the individual shard's rates may differ too
  much (see scylladb#1083). E.g. the bucket configured with the rate of 100k
  tokens/sec, 48 shards, run 4 seconds.

  "Slowest" shard vs "fastest" shards get this amount of tokens:

    no hog:   6931 ... 9631
    with hog: 2135 ... 29412

  (sum rate is 100k with the aforementioned fixes)

- With "capped-release" token bucket and token releasing by-timer with
  the configured rate and hogs the resulting throughput can be as low as
  50% of the configured (see scylladb#1641)

  Created token-bucket 1000000.0 t/s
  perf_pure_context.sleeping_throughput_with_hog:   999149.3 t/s
  perf_capped_context.sleeping_throughput:          859995.9 t/s
  perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul added a commit to xemul/seastar that referenced this pull request Jul 31, 2023
The test checks if the token-bucket "rate" is held under various
circumstances:

- when shards sleep between grabbing tokens
- when shards poll the t.b. frequently
- when shards are disturbed with CPU hogs

So far the test shows four problems:

- With few shards tokens deficiency produces zero sleep time, so the
  "good" user that sleeps between grabs effectively converts into a
  polling ("bad") user (fixed by scylladb#1722)

- Sometimes replenishing rounding errors accumulate and render lower
  resulting rate than configured (fixed by scylladb#1723)

- When run with CPU hogs the individual shard's rates may differ too
  much (see scylladb#1083). E.g. the bucket configured with the rate of 100k
  tokens/sec, 48 shards, run 4 seconds.

  "Slowest" shard vs "fastest" shards get this amount of tokens:

    no hog:   6931 ... 9631
    with hog: 2135 ... 29412

  (sum rate is 100k with the aforementioned fixes)

- With "capped-release" token bucket and token releasing by-timer with
  the configured rate and hogs the resulting throughput can be as low as
  50% of the configured (see scylladb#1641)

  Created token-bucket 1000000.0 t/s
  perf_pure_context.sleeping_throughput_with_hog:   999149.3 t/s
  perf_capped_context.sleeping_throughput:          859995.9 t/s
  perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
avikivity pushed a commit that referenced this pull request Aug 7, 2023
The test checks if the token-bucket "rate" is held under various
circumstances:

- when shards sleep between grabbing tokens
- when shards poll the t.b. frequently
- when shards are disturbed with CPU hogs

So far the test shows four problems:

- With few shards tokens deficiency produces zero sleep time, so the
  "good" user that sleeps between grabs effectively converts into a
  polling ("bad") user (fixed by #1722)

- Sometimes replenishing rounding errors accumulate and render lower
  resulting rate than configured (fixed by #1723)

- When run with CPU hogs the individual shard's rates may differ too
  much (see #1083). E.g. the bucket configured with the rate of 100k
  tokens/sec, 48 shards, run 4 seconds.

  "Slowest" shard vs "fastest" shards get this amount of tokens:

    no hog:   6931 ... 9631
    with hog: 2135 ... 29412

  (sum rate is 100k with the aforementioned fixes)

- With "capped-release" token bucket and token releasing by-timer with
  the configured rate and hogs the resulting throughput can be as low as
  50% of the configured (see #1641)

  Created token-bucket 1000000.0 t/s
  perf_pure_context.sleeping_throughput_with_hog:   999149.3 t/s
  perf_capped_context.sleeping_throughput:          859995.9 t/s
  perf_capped_context.sleeping_throughput_with_hog: 512912.0 t/s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
@avikivity avikivity merged commit fa80682 into scylladb:master Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants