Support retries of failed proxy requests #2414

mikemherron · 2023-03-07T18:59:46Z

Implements #2372

Support for retrying proxy requests that fail due to an unavailable backend instance.

…ackend instances are unavailable

middleware/proxy.go

aldas

I think we need to edit proxyRaw so all errors are actually set as error. c.Set("_error", fmt.Sprintf("proxy raw, hijack error=%v, url=%s", t.URL, err)) is not error

middleware/proxy.go

middleware/proxy_test.go

mikemherron · 2023-03-11T10:21:59Z

I think we need to edit proxyRaw so all errors are actually set as error. c.Set("_error", fmt.Sprintf("proxy raw, hijack error=%v, url=%s", t.URL, err)) is not error

Good catch, I'll do this as part of this PR.

…tion

…heck for previous error

…ases

codecov · 2023-03-22T21:25:10Z

Codecov Report

❗ No coverage uploaded for pull request base (master@ec642f7). Click here to learn what that means.
Patch coverage: 80.82% of modified lines in pull request are covered.

❗ Current head f3472cd differs from pull request most recent head 5560254. Consider uploading reports for the commit 5560254 to get more accurate results

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #2414   +/-   ##
=========================================
  Coverage          ?   92.84%           
=========================================
  Files             ?       39           
  Lines             ?     4555           
  Branches          ?        0           
=========================================
  Hits              ?     4229           
  Misses            ?      237           
  Partials          ?       89

Impacted Files	Coverage Δ
middleware/proxy.go	`74.24% <80.82%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

aldas

LGTM, better than my ideas.

just fix these linting errors and we are good.

p.s. makefile default target is helpful here.

aldas · 2023-03-22T21:33:13Z

@lammel, do you want to take a look?

aldas · 2023-03-22T21:35:03Z

@mikemherron could you please add PR for docs also https://github.com/labstack/echox/blob/master/website/content/middleware/proxy.md

mikemherron · 2023-03-23T13:41:51Z

@mikemherron could you please add PR for docs also https://github.com/labstack/echox/blob/master/website/content/middleware/proxy.md

Yes - I will try do this later today or failing that tomorrow at the latest.

mikemherron · 2023-03-24T10:25:02Z

@mikemherron could you please add PR for docs also https://github.com/labstack/echox/blob/master/website/content/middleware/proxy.md

Yes - I will try do this later today or failing that tomorrow at the latest.

Docs PR: labstack/echox#281

lammel · 2023-03-24T14:34:35Z

Nice work! Although I don't like the name RetryFilter I failed to come up with something better. So lets stick with it.

What I'd like to see is an additional test for timeout of an proxy target (quite common due to firewall or load issues).
Thanks for the docs PR, that is great. Will respond there too.

mikemherron · 2023-03-24T14:45:33Z

Nice work! Although I don't like the name RetryFilter I failed to come up with something better. So lets stick with it.

Yes, I know what you mean, there is nothing else I could see in the code base with *Filter....I did consider RetryHandler, but that sounds like the function is responsible for doing the retry which didn't feel right either, but was more in line with Echo conventions. Let me know if you come up with an alternative, happy to change it.

What I'd like to see is an additional test for timeout of an proxy target (quite common due to firewall or load issues). Thanks for the docs PR, that is great. Will respond there too.

That's a good point. I wouldn't have time to look at it this week but will have time next week if you'd prefer to hold off merging until then.

lammel · 2023-03-27T09:04:27Z

Yes, I know what you mean, there is nothing else I could see in the code base with *Filter....I did consider RetryHandler, but that sounds like the function is responsible for doing the retry which didn't feel right either, but was more in line with Echo conventions. Let me know if you come up with an alternative, happy to change it.

I failed too. So let's stick with it for now.

What I'd like to see is an additional test for timeout of an proxy target (quite common due to firewall or load issues). Thanks for the docs PR, that is great. Will respond there too.

That's a good point. I wouldn't have time to look at it this week but will have time next week if you'd prefer to hold off merging until then.

I'd prefer to wait as we are not in a hurry. Take your time, looking forward to it.
Multiple timeouts might cause some unexpected delay and side effects, so let's see how it behaves.

mikemherron · 2023-03-28T09:52:45Z

I'd prefer to wait as we are not in a hurry. Take your time, looking forward to it. Multiple timeouts might cause some unexpected delay and side effects, so let's see how it behaves.

Added a new test with a timing out backend that sends 20 concurrent requests. The behaviour of the timeouts seems fine, but it does raise another interesting issue: when using the round robin load balancer, the index of the current target is shared amongst all requests. This means that is is possible for a failing request to end up retrying against the same backend, since other concurrent requests will have incremented the current target index.

This makes sense, but probably won't be what users expect. In the test, we have 2 backends configured, 1 that will always timeout, and another that will always succeed, and RetryCount set to 1. I would expect that every request will succeed, since if we get a failure we should retry with "the next backend" - but "next" backend can be the same depending on how many other concurrent requests we have.

I'm not sure there is a simple fix to this. The expected behaviour, IMO, should be for an individual request to somehow keep track of what backend it tried, and then ask the balancer for the next one relative to that, rather than the "next" one as determined by some global state. This could be done by adding another argument on to ProxyBalancer with the last backend, or have the NextTarget method also return some state to act like a cursor that can be passed back in on retry calls. However, both of these would change the ProxyBalancer interface and cause a breaking change.

This limitation means the retry feature is useful for skipping over intermittent failures, but less useful in cases with an entire instance becomes unavailable.

For now, I just made the test pass on 502 errors (backend unavailable) - interested to hear any thoughts on potential solitions.

aldas · 2023-03-28T10:02:05Z

@mikemherron as we/you moved c.Set(config.ContextKey, tgt) around and removed clearing that - in case of retry provider can actually check context for previous target with c.Get(config.ContextKey) and try to avoid getting same target for next value. All this without changing the existing API.

mikemherron · 2023-03-28T10:05:23Z

@mikemherron as we/you moved c.Set(config.ContextKey, tgt) around and removed clearing that - in case of retry provider can actually check context for previous target and c.Get(config.ContextKey) and try to avoid getting same target for next value.

Yes, that's a good point. Should we change the provided round robin and random balancer implementations to do this? Or leave it to users that want that behaviour?

aldas · 2023-03-28T10:12:54Z

It makes sense for these default implementations to avoid serving same target for next try. Do you feel adventurous and have time for this enchantment? If this feature solves more problems that it creates - it is probably worth implementing.

mikemherron · 2023-03-28T11:19:05Z

It makes sense for these default implementations to avoid serving same target for next try. Do you feel adventurous and have time for this enchantment? If this feature solves more problems that it creates - it is probably worth implementing.

I went to do that but realised the balancers don't have access to the ContextKey config attribute - so instead I updated the round robin balancer to store the last index used inside the context, and then check on each call if there was a previous index and if so, start from there. I think this solves the issue well, and was able to update the previous test so all requests now pass.

I didn't seem to make as much sense to do this on the random balancer. We could keep getting random balancers until we get one that is not the same, but it seems sort of wasteful. We could start iterating from the previous index on retires (similar to what's being done in the round robine balancer) but I think if you make a choice to use the random balancer that would be unexpected.

Let me know what you think...

middleware/proxy.go

…ange

aldas · 2023-05-12T17:35:18Z

allright, I am merging this PR. That part I do not agree is private/internal so we can revise it if needs be.

@mikemherron Thank you for the work and being patient with me.

mikemherron · 2023-05-15T10:27:13Z

allright, I am merging this PR. That part I do not agree is private/internal so we can revise it if needs be.

@mikemherron Thank you for the work and being patient with me.

No problem at all @aldas, I totally understand you need to do what you think best for the project. Thanks for all your input!

mikemherron added 4 commits March 7, 2023 18:45

Draft proposal for supporting retries of failed proxy requests when b…

adb1901

…ackend instances are unavailable

Add retry count as first class feature, simplify callback, add tests

46e5bd5

Minor comment tidy up

7c35cda

Rename retry handler to filter

678da72

mikemherron changed the title ~~Draft proposal for supporting retries of failed proxy requests~~ Support retries of failed proxy requests Mar 10, 2023

mikemherron commented Mar 10, 2023

View reviewed changes

middleware/proxy.go Outdated Show resolved Hide resolved

mikemherron marked this pull request as ready for review March 10, 2023 15:35

aldas requested changes Mar 10, 2023

View reviewed changes

mikemherron added 2 commits March 11, 2023 09:39

use named fields for structs

675c19e

Remove function type for filter, don't use Default var for default

5989375

mikemherron added 8 commits March 17, 2023 16:18

Add error to RetryFilter, move BadGateway check to default implementa…

c12daa7

…tion

Add ErrorHandler to Proxy middleware

7299721

Clear proxy error from context after balancer call so providers can c…

df10d1c

…heck for previous error

Update proxyRaw to store actual Errors in _error context key in all c…

f04fc6d

…ases

Fix linting errors

c3ad657

Doc updates for Proxy retry features

c4232f7

Remove redundant test comments

1cc7aa1

only clear proxy error key when we need to

c6e358c

mikemherron requested a review from aldas March 17, 2023 17:45

aldas approved these changes Mar 22, 2023

View reviewed changes

Fix linting errors

dbbaadd

Add test covering proxy retries on timeout

179e74f

Proxy round robin balancer uses next target for retried requests

f3472cd

aldas requested changes Mar 31, 2023

View reviewed changes

middleware/proxy.go Show resolved Hide resolved

mikemherron added 2 commits April 4, 2023 11:34

Fix potential range error when round robin balancer targets changed.

7c2cd96

Documented expected retry behaviour when RR balancer proxy targets ch…

5560254

…ange

aldas requested a review from lammel April 15, 2023 22:04

aldas merged commit 0ae7464 into labstack:master May 12, 2023

aldas mentioned this pull request May 23, 2023

Proxy middleware should detect broken backends and switch to healthy backends #2372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support retries of failed proxy requests #2414

Support retries of failed proxy requests #2414

mikemherron commented Mar 7, 2023 •

edited

aldas left a comment

mikemherron commented Mar 11, 2023

codecov bot commented Mar 22, 2023 •

edited

aldas left a comment

aldas commented Mar 22, 2023

aldas commented Mar 22, 2023

mikemherron commented Mar 23, 2023

mikemherron commented Mar 24, 2023

lammel commented Mar 24, 2023

mikemherron commented Mar 24, 2023

lammel commented Mar 27, 2023

mikemherron commented Mar 28, 2023

aldas commented Mar 28, 2023 •

edited

mikemherron commented Mar 28, 2023

aldas commented Mar 28, 2023 •

edited

mikemherron commented Mar 28, 2023

aldas commented May 12, 2023 •

edited

mikemherron commented May 15, 2023

Support retries of failed proxy requests #2414

Support retries of failed proxy requests #2414

Conversation

mikemherron commented Mar 7, 2023 • edited

aldas left a comment

Choose a reason for hiding this comment

mikemherron commented Mar 11, 2023

codecov bot commented Mar 22, 2023 • edited

Codecov Report

aldas left a comment

Choose a reason for hiding this comment

aldas commented Mar 22, 2023

aldas commented Mar 22, 2023

mikemherron commented Mar 23, 2023

mikemherron commented Mar 24, 2023

lammel commented Mar 24, 2023

mikemherron commented Mar 24, 2023

lammel commented Mar 27, 2023

mikemherron commented Mar 28, 2023

aldas commented Mar 28, 2023 • edited

mikemherron commented Mar 28, 2023

aldas commented Mar 28, 2023 • edited

mikemherron commented Mar 28, 2023

aldas commented May 12, 2023 • edited

mikemherron commented May 15, 2023

mikemherron commented Mar 7, 2023 •

edited

codecov bot commented Mar 22, 2023 •

edited

aldas commented Mar 28, 2023 •

edited

aldas commented Mar 28, 2023 •

edited

aldas commented May 12, 2023 •

edited