router: support for rate limit retry policy #4946

ramaraochavali · 2018-11-02T08:51:48Z

Description: Adds a new retry policy for rate limited requests.
Risk Level: Low
Testing: Added automated tests
Docs Changes: Changed
Release Notes: Added
Fixes #4855

Signed-off-by: Rama <rama.rao@salesforce.com>

ramaraochavali · 2018-11-02T10:29:05Z

@mattklein123 can you PTAL when you get time?

mattklein123

LGTM. Small question/comment.

mattklein123 · 2018-11-02T20:37:41Z

docs/root/configuration/http_filters/router_filter.rst

@@ -99,6 +99,9 @@ retriable-status-codes
  in either :ref:`the retry policy <envoy_api_field_route.RouteAction.RetryPolicy.retriable_status_codes>`
  or in the :ref:`config_http_filters_router_x-envoy-retriable-status-codes` header.

+rate-limited
+  Envoy will attempt a retry if a request is rate limited (with response code 429) by the :ref:`rate limit service <arch_overview_rate_limit>`.


I think I would remove the "by the rate limit service" clause? Since really you are just adding a 429 retry policy?

Do we want to add a parallel grpc response code policy for resource exhausted or something like that?

I think I would remove the "by the rate limit service" clause? Since really you are just adding a 429 retry policy?

Yes. I will remove that.

Do we want to add a parallel grpc response code policy for resource exhausted or something like that?

I think parallel grpc response code policy is not required because rate limiting filter returns 429 status (along with gRPC status UNAVAILABLE). So the newly added policy should be sufficient. Am I missing some thing here?

I also realized the condition is not doing what I am intending to do (and the corresponding test also). Now I have corrected. It should be clear now. PTAL.

Signed-off-by: Rama <rama.rao@salesforce.com>

mattklein123 · 2018-11-03T19:21:26Z

@ramaraochavali sorry I'm confused. Per your comment here: #4855 (comment) do you plan on actually using this feature? I would rather not add it unless there is a specific use case.

ramaraochavali · 2018-11-03T23:35:30Z

@mattklein123 There are two things that this PR does

Adds the behaviour that a rate limited request is not retried irrespective of the gRPC status. With out this it retries if we have retry-grpc-on has UNAVAILABLE. Our case is the case added in the test case PolicyUnavailableWhenRateLimited
It gives opt-in config for retrying rate limited requests.
We do not use the opt-in behaviour. But we need 1 definitely. So if you think, the opt-in config is not need (and that is what I was asking in the issue), I will remove it. But we definitely need the code not retry rate limited requests irrespective of what is specificied in retry-grpc-on because those conditions (UNAVAILABLE or RESOURCE_EXHAUSTED can come for other reasons and hence will be retried now).

mattklein123 · 2018-11-03T23:42:35Z

Adds the behaviour that a rate limited request is not retried irrespective of the gRPC status. With out this it retries if we have retry-grpc-on has UNAVAILABLE. Our case is the case added in the test case PolicyUnavailableWhenRateLimited

Where does this PR do this? I'm totally lost.

ramaraochavali · 2018-11-04T01:12:20Z

source/common/router/retry_state_impl.cc

@@ -186,6 +189,11 @@ RetryStatus RetryStateImpl::shouldRetry(const Http::HeaderMap* response_headers,
 }

 bool RetryStateImpl::wouldRetryFromHeaders(const Http::HeaderMap& response_headers) {
+  // We should retry rate limited request only if the policy is set.
+  if (Http::CodeUtility::isRateLimited(Http::Utility::getResponseStatus(response_headers))) {


@mattklein123 This is the check I am talking about. It checks if the request is rate limited and the retry policy for rate_limited is set then only retries. Otherwise just returns, does not retry and check for other gRPC conditions. Am I understanding this incorrectly? or missing some thing here that is causing the confusion?

If you think opt-in behaviour is not needed, we could some thing similar to https://github.com/envoyproxy/envoy/blob/master/source/common/router/retry_state_impl.cc#L266??

How does this help with gRPC though? In a normal gRPC response the status code is going to be 200, so this wouldn't apply?

I was not intending this for normal gRPC. I was intending this for a use case where the rate limiting filter identifies the request as rate limited and that returns status code of 429. Do you think this configuration is generic and people get confused? That is why in the initial PR I added the clause that returned by rate limiting service in the doc explanation. Do you see any other way of achieving the same?

Sorry, stepping back, can you explain what you are trying to achieve? You originally said you don't want to retry rate limited requests, but this PR will allow you to do that? I don't see any functional change in this PR other than an addition of a new rate limit policy which I think you said you don't want to use?

@mattklein123 Our caller is gRPC. I am following that PR of mapping change. However, we see some cases where RESOURCES_EXHAUSTED is returned by the service (other than rate limiting). And retrying those requests to other nodes might help in some cases. So ideally we would want the RESOURCE_EXHAUSTED to be in the retry policy but exclude rate limiting calls.

I think the confusion here is that for a gRPC caller, HTTP status is converted to 200 (OK), so what you are doing here is not going to help for gRPC. If you have an HTTP caller, 429 is not retried by default, so you wouldn't get a retry either.

Thanks for this. This is creating the confusion for me. I did not know that for gRPC HTTP status is converted to 200(OK). I was missing that point. Agree this PR is not going to help for that case. Sorry for all the confusion.

So basically if we want to have RESOURCE_EXHAUSTED in the retry codes list (after #4879 is merged) and do not want to retry rate limited requests - how should we proceed?
I have couple of solutions.

Add a new header called x-envoy-ratelimited to distinguish that it is a rate limited request and do not retry it if we see that header some thing similar to x-envoy-overloaded ? or use the header x-envoy-ratelimited to drive the rate-limited (similar to what I have here in this PR except that it would be driven by the new header) policy instead of Http Status?

Should we add a new policy as proposed here (retry of rate limited requests #4855 (comment) - option 2) - which also might need the above header?

or any other better solutions?

I guess if no code changes are possible, that would be best. For example, do you really need to retry in the case of RESOURCE_EXHAUSTED? That seems generally bad.

If you do, I guess setting a header much like we do for overloaded makes sense to me.

@mattklein123 just to confirm, if we implement this new header approach, do you think it is better to configure the behaviour of enforcing retries based on some policy or just do not retry (and document this behaviour) if we see that header exactly as how x-envoy-overloaded works?

I would rather not add a retry policy until someone explicitly asks for it?

Makes sense..I agree with it. I have pushed the PR #4972 with above changes. PTAL.

ramaraochavali · 2018-11-05T03:55:51Z

@mattklein123 I am going to close this PR, but based on what you think about other point about RESOURCE_EXHAUSTED, will open another PR.

ramaraochavali added 2 commits November 2, 2018 14:05

rate limited retry policy implementation

b941163

Signed-off-by: Rama <rama.rao@salesforce.com>

Merge branch 'master' into fix/retry_ratelimit

4336cf9

Signed-off-by: Rama <rama.rao@salesforce.com>

mattklein123 reviewed Nov 2, 2018

View reviewed changes

ramaraochavali added 2 commits November 3, 2018 09:06

fix condition and clarify docs

340bc09

Signed-off-by: Rama <rama.rao@salesforce.com>

kick ci

cc947f2

Signed-off-by: Rama <rama.rao@salesforce.com>

mattklein123 self-assigned this Nov 4, 2018

ramaraochavali commented Nov 4, 2018

View reviewed changes

ramaraochavali closed this Nov 5, 2018

ramaraochavali mentioned this pull request Nov 6, 2018

router: do not retry rate limited requests #4972

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

router: support for rate limit retry policy #4946

router: support for rate limit retry policy #4946

ramaraochavali commented Nov 2, 2018

ramaraochavali commented Nov 2, 2018

mattklein123 left a comment

mattklein123 Nov 2, 2018

ramaraochavali Nov 3, 2018 •

edited

ramaraochavali Nov 3, 2018

mattklein123 commented Nov 3, 2018

ramaraochavali commented Nov 3, 2018

mattklein123 commented Nov 3, 2018

ramaraochavali Nov 4, 2018 •

edited

ramaraochavali Nov 4, 2018

mattklein123 Nov 5, 2018

ramaraochavali Nov 5, 2018

mattklein123 Nov 5, 2018

ramaraochavali Nov 5, 2018 •

edited

mattklein123 Nov 5, 2018

ramaraochavali Nov 6, 2018

mattklein123 Nov 6, 2018

ramaraochavali Nov 6, 2018

ramaraochavali commented Nov 5, 2018

router: support for rate limit retry policy #4946

router: support for rate limit retry policy #4946

Conversation

ramaraochavali commented Nov 2, 2018

ramaraochavali commented Nov 2, 2018

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramaraochavali Nov 3, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 commented Nov 3, 2018

ramaraochavali commented Nov 3, 2018

mattklein123 commented Nov 3, 2018

ramaraochavali Nov 4, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramaraochavali Nov 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramaraochavali commented Nov 5, 2018

ramaraochavali Nov 3, 2018 •

edited

ramaraochavali Nov 4, 2018 •

edited

ramaraochavali Nov 5, 2018 •

edited