Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle some extreme edge cases in the ratelimiter implementation #3706

Merged
merged 5 commits into from May 25, 2023

Conversation

roypat
Copy link
Contributor

@roypat roypat commented May 19, 2023

Changes

This series of commits fixes some edge cases in the ratelimiter implementation related to insanely large timespans (on the magnitude of hundreds of thousands of years)

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

This is due to the constructor attempting to convert from
milliseconds to nanoseconds without checking whether the
nanosecond value fits into a u64.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Due to rounding down instead of up when computing the time that it would
take to generate "x" tokens, the fractional part of a nanosecond is used
"twice" once per call to auto_replenish. This can only happen if the
time since the last auto_replenish call is enough to generate at least
one token, e.g. (refill_time / bucket_size) milliseconds. To accumulate
enough "double used" time to generate an extra token, at least
(refill_time * 1_000_000 / bucket_size) calls to auto_replenish are
needed, with the calls timed in a way that causes the "double used" time
to be as close as possible to 1ns. Note that since time resolution is at
the nanosecond level, at least one nanosecond has to pass between these
calls.

For example, if a TokenBucket is setup for 1GB/s, we could allow one
extra byte of traffic every (1000 / 1_000_000_000) * 1_000_000 = 1ms,
e.g. excess throughput of 1KB/s (or 0.0001% above the target)

Additionally, in extremely unrealistic scenarios (either refill_time of
thousands of years or the guest waiting thousands of years), an integer
overflow in a multiplication in auto_replenish can leads to less tokens
being replenished than should be.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Due to a missing clamping overation, it used to be able to increment
one_time_burst beyond initial_one_time_burst.

Additionally, integer overflow during addition could cause the
budget/one_time_burst to actually decrease if they were close to
u64::MAX priot to calling force_replenish (a very unrealistic scenario)

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label May 19, 2023
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one comment. Is it possible to add unit-tests for these extreme cases?

@bchalios
Copy link
Contributor

LGTM, just one comment. Is it possible to add unit-tests for these extreme cases?

Spoke offline with @roypat. We will be adding tests for these cases with subsequent contributions.

Copy link
Contributor

@mattschlebusch mattschlebusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers, but to echo Babis query on unit-testing, they are a must here as the edge-cases could easily be overlooked.

src/rate_limiter/src/lib.rs Show resolved Hide resolved
src/rate_limiter/src/lib.rs Show resolved Hide resolved
src/rate_limiter/src/lib.rs Show resolved Hide resolved
@roypat roypat merged commit 2bd40cd into firecracker-microvm:main May 25, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants