RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check #35723

bsurber · 2024-08-16T01:38:51Z

Commit Message: Refactor the RLQS filter's traffic processing & fix the broken action-assignment expiration check.
Additional Description:
Current behavior:

Every request that hits the RLQS filter results in the cache thinking that its entry has expired due to the initial / default action of a newly created BucketCache index not having a TTL.
If the TTL was fixed, it would still start failing all expiration checks shortly after assignments were received, even if those assignments were sent routinely (not going stale). This is because the expiration time was calculated off of a private field set on BucketCache index creation, and never updated when receiving assignments.
Note: Integration testing was previously passing as it was incorrectly expecting multiple RLQS usage reports.
Risk Level: minor - filter is in-development and currently in a broken state
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:

…ed false-positive) expiration check Signed-off-by: Brian Surber <bsurber@google.com>

tyxia

Thank you! Left some comments from a quick first pass.

If possible, I would suggest having this PR only for fixing the expiration check and leaving the refactor into separate PR.

source/extensions/filters/http/rate_limit_quota/filter.cc

source/extensions/filters/http/rate_limit_quota/quota_bucket_cache.h

source/extensions/filters/http/rate_limit_quota/filter.cc

tyxia · 2024-08-16T02:26:42Z

Also please fix the format so that CI can run. You can use command bazel run //tools/code_format:check_format -- fix
Thanks!

/wait

Signed-off-by: Brian Surber <bsurber@google.com>

tyxia

Mostly LGTM modulo the open threads and code coverage

Thanks!

source/extensions/filters/http/rate_limit_quota/filter.cc

tyxia · 2024-08-19T18:12:53Z

source/extensions/filters/http/rate_limit_quota/filter.cc

-    }
+  if (ret_status == Envoy::Http::FilterHeadersStatus::StopIteration) {
+    sendDenyResponse();
+    quota_buckets_[bucket_id]->quota_usage.num_requests_denied += 1;


Overall, I feel updating the allow/deny stats based on ret_status is not robust. The reason it works here is because our RLQS filter is in non-blocking fashion that it will not wait for response from RLQS server.

StopIteration in Envoy generally could represent that filter is waiting for the remote server response before continuing, rather than tightly tied with localReply/reject.

Thus, can we update the quota_usage inline inside of the functions above when the request is determined to be ALLOW or Throttled.

We've addressed this internally without too much to say. It'll do for now as that would be a bit of future-proofing for an unknown problem at the cost of additional refactoring.

Signed-off-by: Brian Surber <bsurber@google.com>

tyxia · 2024-08-21T01:43:38Z

/wait

…to the default-action Signed-off-by: Brian Surber <bsurber@google.com>

tyxia

Thanks!

phlax · 2024-08-26T10:13:24Z

@tyxia @bsurber i think this is causing a very frequent flake - we may need to revert while the issue is resolved

tyxia · 2024-08-26T10:22:51Z

@phlax Thanks for notification!

Could you please highlight which test is flaky? so that we can resolve/revert it accordingly

phlax · 2024-08-26T10:30:33Z

see eg here https://dev.azure.com/cncf/envoy/_build/results?buildId=178487&view=logs&j=1439b9f7-a348-5b50-b5fe-ea612ea91241&t=1002ac43-da84-5fae-70b2-98833b702d09&l=386

same test timed out twice on that run - but its happening a lot (if sporadically) elsewhere

bsurber · 2024-08-26T16:32:18Z

Hm, I had seen ASSERT_TRUE(response_->waitForEndStream()); deadlock during testing but had thought I'd found the cause. I guess that was just occluded by flakiness and that search will have to resume.

…broken expiration check" (#35847) Reverts #35723 Signed-off-by: Brian Surber <bsurber@google.com>

…xpiration check v2 (#35973) Commit Message: Refactor the RLQS filter's traffic processing & fix the broken action-assignment expiration check. Additional Description: Addresses flakiness from [past PR](#35723) by adding a missing `sendDenyResponse` & improving test consistency in asynchronous steps. Risk Level: Minor Testing: Integration and unit testing run 1k times to check flakiness --------- Signed-off-by: Brian Surber <bsurber@google.com>

Refactor traffic processing in the RLQS filter & fix a broken (repeat…

f14cf6d

…ed false-positive) expiration check Signed-off-by: Brian Surber <bsurber@google.com>

bsurber requested review from tyxia and yanavlasov as code owners August 16, 2024 01:38

bsurber changed the title ~~Refactor traffic processing in the RLQS filter & fix a broken expiration check~~ RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check Aug 16, 2024

tyxia reviewed Aug 16, 2024

View reviewed changes

source/extensions/filters/http/rate_limit_quota/filter.cc Show resolved Hide resolved

source/extensions/filters/http/rate_limit_quota/quota_bucket_cache.h Show resolved Hide resolved

source/extensions/filters/http/rate_limit_quota/filter.cc Show resolved Hide resolved

tyxia self-assigned this Aug 16, 2024

repokitteh-read-only bot added the waiting label Aug 16, 2024

Update comment to pass typo CI testing.

03af091

Signed-off-by: Brian Surber <bsurber@google.com>

repokitteh-read-only bot removed the waiting label Aug 16, 2024

tyxia reviewed Aug 19, 2024

View reviewed changes

Update testing & difficult-to-reach logic for coverage

666641b

Signed-off-by: Brian Surber <bsurber@google.com>

repokitteh-read-only bot added the waiting label Aug 21, 2024

Remove unused first-assignment-time var and add explanation comments …

72bb546

…to the default-action Signed-off-by: Brian Surber <bsurber@google.com>

repokitteh-read-only bot removed the waiting label Aug 21, 2024

tyxia approved these changes Aug 22, 2024

View reviewed changes

tyxia merged commit 3896048 into envoyproxy:main Aug 22, 2024
47 checks passed

bsurber deleted the fix-rlqs-expiration branch August 22, 2024 16:49

tyxia mentioned this pull request Aug 26, 2024

Revert "RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check" #35846

Closed

bsurber mentioned this pull request Aug 26, 2024

Revert "RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check" #35847

Merged

tyxia pushed a commit that referenced this pull request Aug 26, 2024

Revert "RLQS: Refactor traffic processing in the RLQS filter & fix a …

87d555a

…broken expiration check" (#35847) Reverts #35723 Signed-off-by: Brian Surber <bsurber@google.com>

bsurber mentioned this pull request Sep 4, 2024

RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check v2 #35973

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check #35723

RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check #35723

bsurber commented Aug 16, 2024 •

edited

Loading

tyxia left a comment •

edited

Loading

tyxia commented Aug 16, 2024

tyxia left a comment

tyxia Aug 19, 2024 •

edited

Loading

bsurber Aug 21, 2024

tyxia commented Aug 21, 2024

tyxia left a comment

phlax commented Aug 26, 2024

tyxia commented Aug 26, 2024

phlax commented Aug 26, 2024

bsurber commented Aug 26, 2024 •

edited

Loading

RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check #35723

RLQS: Refactor traffic processing in the RLQS filter & fix a broken expiration check #35723

Conversation

bsurber commented Aug 16, 2024 • edited Loading

tyxia left a comment • edited Loading

Choose a reason for hiding this comment

tyxia commented Aug 16, 2024

tyxia left a comment

Choose a reason for hiding this comment

tyxia Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

bsurber Aug 21, 2024

Choose a reason for hiding this comment

tyxia commented Aug 21, 2024

tyxia left a comment

Choose a reason for hiding this comment

phlax commented Aug 26, 2024

tyxia commented Aug 26, 2024

phlax commented Aug 26, 2024

bsurber commented Aug 26, 2024 • edited Loading

bsurber commented Aug 16, 2024 •

edited

Loading

tyxia left a comment •

edited

Loading

tyxia Aug 19, 2024 •

edited

Loading

bsurber commented Aug 26, 2024 •

edited

Loading