Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Rate limiter TokenBucket::auto_replenish() #3370

Merged

Conversation

JonathanWoollett-Light
Copy link
Contributor

@JonathanWoollett-Light JonathanWoollett-Light commented Jan 16, 2023

Changes

Fixes bug in TokenBucket::auto_replenish.

Reason

When a TokenBucket gets empty, the implementation sets a timer of 100ms for replenishing it.

The bucket replenishing function calculates the new tokens to add to the budget using the following
formula:

let tokens = (time_delta * self.processed_capacity) / self.processed_refill_time;

However, this formula can return 0 depending of the values of self.processed_capacity and self.processed_refill_time.
For a TokenBucket of size total capacity and complete_refill_time_ns refill period in nanoseconds, the above values are calculated as follows:

// Get the greatest common factor between `size` and `complete_refill_time_ns`.
let common_factor = gcd(size, complete_refill_time_ns);
let processed_capacity: u64 = size / common_factor;
let processed_refill_time: u64 = complete_refill_time_ns / common_factor;

So, for example if size == 1 and complete_refill_time_ns == 1_000_000_000 (equivalent to 1 token per second) the replenishing tokens will be

let tokens = (100_000_000 * 1) / 1_000_000_000; // which gives 0 in integer division

As a result, the bucket will never get replenished.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

  • All commits in this PR are signed (git commit -s).
  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • New unsafe code is documented.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.

  • This functionality can be added in rust-vmm.

@JonathanWoollett-Light JonathanWoollett-Light added the Type: Bug Indicates an unexpected problem or unintended behavior label Jan 16, 2023
acatangiu
acatangiu previously approved these changes Jan 16, 2023
Copy link
Contributor

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all internals are u64 (and 2^64 nanoseconds is > 580 years) the gcd is definitely not needed and this simplification makes total sense 👍

pb8o
pb8o previously approved these changes Jan 17, 2023
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
@JonathanWoollett-Light
Copy link
Contributor Author

@roypat @bchalios Could you please re-review.

src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @JonathanWoollett-Light noted, the implementation still "leaks" tokens (but doesnt hang anymore). This is because if auto_replenish is called after enough time for, say, 1.9 tokens to regenerate has passed, one token will be added, and last_update will be fast forwarded. The 0.9 tokens will effectively be discarded.

We can fix this by incrementing last_update by the actual time it took to generate the tokens that have been added to the budget, instead of setting it to Instant::now(). I think my proposed suggestion implements this correctly, but I'd love to have someone else trace out the logic to verify it.

Technically, this does not even need the "if tokens > 0" conditional anymore.

src/rate_limiter/src/lib.rs Outdated Show resolved Hide resolved
acatangiu
acatangiu previously approved these changes Jan 17, 2023
bchalios
bchalios previously approved these changes Jan 17, 2023
Frequently calling `auto_replenish` will reset `self.last_update` each
time and `tokens` may be a fractional value (0 since it is a `u64`), in
this case no tokens will replenished.

To avoid this we increment `self.last_update` by the minimum time
required to generate `tokens`, in the case where we have the time to
generate `1.8` tokens but only generate `x` tokens due to integer
arithmetic this will carry the time required to generate 0.8th of a
token over to the next call, such that if the next call where to
generate `2.3` tokens it would instead generate `3.1` tokens. This
minimizes dropping tokens at high frequencies.

Signed-off-by: Jonathan Woollett-Light <jcawl@amazon.co.uk>
Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share @JonathanWoollett-Light's concern regarding last_updated falling behind Instant::now(), so here’s a proof (I hope I got everything right) that it doesn’t (for simplicity, I use the non-reduced versions of size and refill time):

  1. Note that integer division is just normal division, followed by rounding down (floor function)
  2. For positive y, we have floor(x / y) * y >= x - y, as the expression on the LHS “rounds down” to the closest multiple of y
  3. For integer z, we have floor(x + z) = floor(x) + z
  4. Multiplication by (-1) flips inequalities, e.g. if x <= y, then z - x >= z - y.
  5. We refer to the system time at the start of auto_replenish as now_old, and to the system time at the end of auto_replenish as now_new. With this, we have time_delta = now_start - last_update_old

With this in place, note how at the end of auto_replenish we set last_update to

last_update_new = last_update_old + floor(floor((now_start - last_update_old) * size / refill_time) * refill_time / size)

To prove that last_update does not deviate from now_end = Instant::now() unboundedly, we want to find an upper bound for now - last_update_new. We have

now_end - last_update_new = now_end - last_update_old - floor(floor((now_start - last_update_old) * size / refill_time) * refill_time / size)
<= now_end - last_update_old - floor(((now_start - last_update_old) * size - refill_time) / size)
= now_end - last_update_old - floor(now_start - last_update_old - refill_time / size)
= now_end - now_start - last_update_old + last_update_old + floor(refill_time / size)
= (now_end - now_start) + floor(refill_time / size)

This means that last_updated indeed does not fall behind more than the execution time of auto_replenish plus a constant dependent on the bucket configuration.

@JonathanWoollett-Light
Copy link
Contributor Author

@bchalios @pb8o Could you please re-review need another approval.

@roypat roypat merged commit 79ed3c4 into firecracker-microvm:main Jan 18, 2023
roypat added a commit to roypat/firecracker that referenced this pull request Jun 30, 2023
Adds proofs that ensure various properties we expect of the ratelimiter
are upheld, and that no panics can occur.

Partially based on firecracker-microvm#3370

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Co-authored-by: Felipe R. Monteiro <felisous@amazon.com>
Co-authored-by: Daniel Schwartz-Narbonne <dsn@amazon.co.uk>
roypat added a commit to roypat/firecracker that referenced this pull request Jun 30, 2023
Adds proofs that ensure various properties we expect of the ratelimiter
are upheld, and that no panics can occur.

Partially based on firecracker-microvm#3370

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Co-authored-by: Felipe R. Monteiro <felisous@amazon.com>
Co-authored-by: Daniel Schwartz-Narbonne <dsn@amazon.co.uk>
roypat added a commit to roypat/firecracker that referenced this pull request Jul 4, 2023
Adds proofs that ensure various properties we expect of the ratelimiter
are upheld, and that no panics can occur.

Partially based on firecracker-microvm#3370

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Co-authored-by: Felipe R. Monteiro <felisous@amazon.com>
Co-authored-by: Daniel Schwartz-Narbonne <dsn@amazon.co.uk>
roypat added a commit to roypat/firecracker that referenced this pull request Jul 4, 2023
Adds proofs that ensure various properties we expect of the ratelimiter
are upheld, and that no panics can occur.

Partially based on firecracker-microvm#3370

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Co-authored-by: Felipe R. Monteiro <felisous@amazon.com>
Co-authored-by: Daniel Schwartz-Narbonne <dsn@amazon.co.uk>
roypat added a commit that referenced this pull request Jul 4, 2023
Adds proofs that ensure various properties we expect of the ratelimiter
are upheld, and that no panics can occur.

Partially based on #3370

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Co-authored-by: Felipe R. Monteiro <felisous@amazon.com>
Co-authored-by: Daniel Schwartz-Narbonne <dsn@amazon.co.uk>
ShadowCurse pushed a commit to ShadowCurse/firecracker that referenced this pull request Jul 26, 2023
Adds proofs that ensure various properties we expect of the ratelimiter
are upheld, and that no panics can occur.

Partially based on firecracker-microvm#3370

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Co-authored-by: Felipe R. Monteiro <felisous@amazon.com>
Co-authored-by: Daniel Schwartz-Narbonne <dsn@amazon.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants