Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.3.x] CORE-2752 - Fix Kafka quota throttling delay enforcement #18575

Conversation

pgellert
Copy link
Contributor

@pgellert pgellert commented May 20, 2024

While exempt clients are already excluded from their traffic being
recorded, they are still being throttled if the token bucket is in a
negative state because of other clients. To avoid this, exempt clients
should have a 0 throttling delay to be excluded from throttling
altogether.

(cherry picked from commit cea3a29)
Quota enforcement is currently done in a complicated and
not-Kafka-compatible way in Redpanda, and this commit intends to fix
that.

In Kafka >=2.0, client throttling is implemented in a simple way.
Brokers are meant to return how long the client is supposed to be
throttled when there is a quota violation. Then, the Kafka client is
supposed to wait until this throttling time passed, or else the broker
applies the throttle on its side.

However, currently in Redpanda the delay is enforced differently. For
produce, we enforce the throttle we would send in the response if there
was a throttle in the last response (regardless of whether the client
obeyed that throttle or not). For fetch, we always enforce the current
throttle. For ingress/egress quotas we correctly track how long the
client was supposed to be throttled, but we only do that for
ingress/egress throttling.

This commit fixes the throttling behaviour by tracking how long the
client is supposed to have waited and applying that throttle on the next
request if the client did not.

(cherry picked from commit aec585d)
@pgellert pgellert added this to the v23.3.x-next milestone May 20, 2024
@pgellert pgellert added the kind/backport PRs targeting a stable branch label May 20, 2024
@pgellert
Copy link
Contributor Author

The only reason I had to cherry-pick this manually was that git thought there was a conflict on the _is_virtualized_connection field inside connection_context.h introduced in #16658 which my PR had through dev. So the diff of this PR is identical to that of the original.

@pgellert pgellert self-assigned this May 20, 2024
@pgellert pgellert requested review from a team and oleiman and removed request for a team May 20, 2024 12:29
@pgellert pgellert marked this pull request as ready for review May 20, 2024 12:29
@pgellert pgellert requested a review from BenPope May 20, 2024 12:29
@pgellert pgellert merged commit bb3dcd0 into redpanda-data:v23.3.x May 21, 2024
20 checks passed
@pgellert pgellert deleted the vbotbuildovich/backport-18218-v23.3.x-467 branch May 21, 2024 11:36
@BenPope BenPope modified the milestones: v23.3.x-next, v23.3.16 Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants