Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quota: throughput racy enforcement bug fixes #467

Merged
merged 5 commits into from May 18, 2022

Conversation

bom-d-van
Copy link
Member

@bom-d-van bom-d-van commented May 18, 2022

Three changes are made, but the most important one is quota: fix a race condition bug in trieIndex.applyQuotas. The other two commits are by-products of debugging the racy applyQuotas.

quota: fix a race condition bug in trieIndex.applyQuotas

the current quota config file is last-match-wins, and with heavy concurrent
quota enforcement, go-carbon should only update quota info per node once,
otherwise it risks of having confusing and incorrect quota enforcement.

for example, suppose we have the following quota configs:

    [sys.*]
        throughput = 1024
    [sys.app]
        throughput = 4096

if sys.app is updated twice using top down order in the quota config file, there
is a window that sys.app would have a quota with throughput of 1024, and if the
namespace happen to be receiving more than 1024 data points during that window,
it would trigger incorrect throttling.

TODO: with updateChecker, it's also straightforward now to support
first-match-wins in quota config file, but we would have to introduce a new
flat to ask for it for backward compatibility.

quota: delay throughput counter update in trieindex.throttle

When throughput quota usage might be exhausted in a child rule, in the previous implementation,
the throughput counter would be updated immediately after the check, this might lead to over-accounting
for parent quota configs.

The comment changed the implementation to only update the counter if the traffic is confirmed
to be within quota consumption.

At the same time, go-carbon also makes sure to report throttled data points for soft throughput quota
enforcement (i.e. with "none" dropping policy).

quota: refactor throughputUsagePerNamespace usage/quota tracking and enforcement

The previous implementation has a racy update of throughputUsagePerNamespace.resetAt and dataPoints
that might potentially lead quota over-enforcement. This commit addess it by wrapping them in one
struct with atomic updates. The new tradeoff is under-enforcement during counter reset, a better
tradeoff imo. more details could be found in the commit.

…enforcement

The previous implementation has a racy update of throughputUsagePerNamespace.resetAt and dataPoints
that might potentially lead quota over-enforcement. This commit addess it by wrapping them in one
struct with atomic updates. The new tradeoff is under-enforcement during counter reset, a better
tradeoff imo. more details could be found in the commit.
When throughput quota usage might be exhausted in a child rule, in the previous implementation,
the throughput counter would be updated immediately after the check, this might lead to over-accounting
for parent quota configs.

The comment changed the implementation to only update the counter if the traffic is confirmed
to be within quota consumption.

At the same time, go-carbon also makes sure to report throttled data points for soft throughput quota
enforcement (i.e. with "none" dropping policy).
the current quota config file is last-match-wins, and with heavy concurrent
quota enforcement, go-carbon should only update quota info per node once,
otherwise it risks of having confusing and incorrect quota enforcement.

for example, suppose we have the following quota configs:

    [sys.*]
        throughput = 1024
    [sys.app]
        throughput = 4096

if sys.app is updated twice using top down order in the quota config file, there
is a window that sys.app would have a quota with throughput of 1024, and if the
namespace happen to be receiving more than 1024 data points during that window,
it would trigger incorrect throttling.

TODO: with updateChecker, it's also straightforward now to support
first-match-wins in quota config file, but we would have to introduce a new
flat to ask for it for backward compatibility.
@bom-d-van
Copy link
Member Author

note: need to add a few tests for this PR in a separate one, it's only tested on our production and it seems to be working.

@bom-d-van
Copy link
Member Author

note: also included some minor changes for deep source warning and debugging code cleanup.

@bom-d-van bom-d-van merged commit 8c1ea97 into master May 18, 2022
@bom-d-van bom-d-van deleted the quota/throughput-edge-bug branch May 18, 2022 13:29
bom-d-van added a commit that referenced this pull request May 25, 2022
The two tests are for covering fixes made in #467.
More specifically, for commit 47f59ba and 653a7f0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant