503 and 403 errors when using more than 1 ambassador pod #1461

robertrbruno · 2019-04-24T16:32:59Z

Describe the bug

When talking to a service through ambassador that has an auth service, was getting what appeared to be random responses between 200, 503, and 403. Had replica of 3 set for the ambassador deployment. Upon looking at the logs, one pod was always giving 200 responses, another 503, and another 403. Tried restarting the trouble pods with no luck.

As a workaround I made my deployment to just replica 1 and now only seem to be getting 200 responses. Only saw this bug when I upgraded to 0.53.1. Was previously on version 0.50.3.

Versions (please complete the following information):

Ambassador: 0.53.1
Kubernetes environment (bare metal)
Version 1.13.1

vaibhavrtk · 2019-04-30T06:34:08Z

I am still getting intermittedt 503 even when one replica is there.
ACCESS [2019-04-30T06:23:58.601Z] "GET /api/v1/query?query=kube_pod_labels%7Blabel_app%3D%22connect-sidekiq%22%7D&time=1556605419.186&_=1556604765861 HTTP/2" 503 UC 0 57 0 - "100.96.66.0" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" "de590163-2ac6-4508-89c7-03937c6f73f8" "prometheus.granite.rock.swiggy" "100.67.36.178:80"

vaibhavrtk · 2019-04-30T06:34:50Z

This is happening for 10-20% of the requests

ankurpshah · 2019-05-07T14:27:07Z

Facing similar issue on the latest version of Ambassador (0.60.2)

erulabs · 2019-06-08T18:25:00Z

Also seeing this issue with 0.71.0 - services intermittently 503 - I believe this occurs when the metrics-server pod is having issues, tho am still investigating that.

Whamied · 2019-06-11T15:11:12Z

We were encountering this issue version 0.51.1 Upgrading to version0.61.1 has the same issue.

We are seeing intermittent 503 and 403 response codes.

Adding a retry_policy has taken the number of 503 issues down to almost 0, but we are still seeing intermittent 403 responses. The requests that give a 403 do not reach our Auth service. We are getting UAEX response flags on all of those.

Is the retry configuration applied to the external auth call? If not, is there a way to configure that?

dioniseo · 2019-08-07T16:19:52Z

Does anybody can confirm that retry_policy works for AuthService?

The changes were merged but actually looked to issue in envoy repo - envoyproxy/envoy#5974 that was closed without adding support of retry for envoy.ext_authz filter.

richarddli · 2019-08-07T16:23:20Z

I believe we already have taken this patch in our version of Envoy

dioniseo · 2019-08-07T17:16:40Z

@richarddli If I find correctly configuration definition for ext_authz filter: https://github.com/datawire/ambassador/blob/master/go/apis/envoy/config/filter/http/ext_authz/v2/ext_authz.pb.go#L72
then there is no field for retry policy, but I'm not experienced with GO so may be wrong...

dioniseo · 2019-08-16T10:51:31Z

Hi @richarddli, did you have a chance to recheck that retry policy should have worked for AuthService configuration in latest releases of Ambassador? If we define retry per Mapping or Globally then it works fine but when define only for AuthService then it seems doesn't work.

sekaninat · 2019-08-19T07:47:19Z

Hi, any closure on this? I'd say we're experiencing the same behavior with just a single replica of plain envoy with authorization service and a some endpoint backend. Around 20% of the requests are 403 or 503 when we generate higher load. Note that the failed requests are not received by the target component, 403 - authorization service and 503 - the backend, at all.

yamaszone · 2019-09-12T21:45:44Z

We are seeing this issue with Ambassador v0.70.0 configured with 3 replicas. ~15% requests encounters 403 with high load (~1000 req/sec) but with lower rate everything works as expected. We are planning to upgrade Ambassador to latest version but wanted to know if the issue was expected to be fixed with version later than v0.70.0. Here's an example error log:

ACCESS [2019-09-12T20:35:25.044Z] "GET /route/endpoint1 HTTP/1.1" 403 UAEX 0 0 5001 - "10.1.0.128,10.1.0.44" "hey/0.0.1" "678fbfc4-2718-41a2-a16d-a217ddd39ca6" "xyz.westus2.cloudapp.azure.com" "-"

@richarddli can you please comment on this?

sekaninat · 2019-09-13T07:12:00Z

For us it was wrong kubernetes configuration. Our pods didn't have enough connections enabled in sysctl.

mahbh2001 · 2019-09-27T10:03:14Z

We are using ambassador:0.75.0 with 3 replicas , getting similar issue where we get intermittent failures while hitting authorisation service HTTP/1.1" 403 UAEX 0 0 5002 - connections are getting closed ~5sec.

stale · 2019-11-26T10:32:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

f-a-a · 2020-01-06T08:28:23Z

i're experiencing this on version 0.86.1, can anyone comment on whats needed to have this rectified?

oyvinvo · 2020-01-29T14:38:19Z

I'm experiencing this aswell running in AWS when the concurrency is high enough. Firing 200 requests (5 concurrent) usually makes atleast one fail with response flag UAEX or UC. When UAEX is raised we can't find any traces in our AuthService. When UC is raised we can find a trace, but the trace says everything is okay, and 200 is returned.

To me it seems like UAEX is raised when the connection is closed before the AuthService is reached, while UC is raised when the connection is closed before the AuthService has responded.

We're running version 1.0

MateuszCzubak · 2020-02-17T17:00:46Z

Same issue here - 403s or 503s with UC and UAEX codes in 0.86.1 version. IMO this issue should be reopened.

Mokto · 2020-02-18T07:48:10Z

Last version is 1.1.1. Maybe you should try to upgrade first ?

oyvinvo · 2020-02-18T07:58:37Z

Haven't had the time to try it yet, but I'm confident that the new setting https://www.getambassador.io/reference/core/ambassador/#upstream-idle-timeout-cluster_idle_timeout_ms in version 1.1.1 might solve the issues for me.

f-a-a · 2020-02-18T08:04:05Z

update: I have managed to rectify this issue in my cluster still running on version 0.86.1.

For my case, I found that the bottleneck was on my external AuthService not processing request in time. All I did was increasing resources and adding replicas on oathkeeper's deployment to ensure it has enough to process and it hasn't been throwing errors as of late.

przemek-sl · 2020-03-02T10:43:29Z

Did anyone manage to solve this issue? Or did you try to apply any workarounds to mitigate the number of errors between Ambassador and AuthService?

@sekaninat Did you remember what you changed and what values you had before?

jasperkuperus · 2020-06-02T10:08:44Z

@richarddli Shouldn't this issue be reopened? I'm also seeing this issue unfortunately :(

hextrim · 2021-01-28T16:16:29Z

You can believe it or not.
I have a lot of 403/503 UAEX - fixed kube dns.
All 200.

MateuszCzubak · 2021-03-15T12:08:26Z

In our case some of the errors went away by increasing the CPU limits for ambassador deployment but the problem still remains.

amitzbb · 2021-06-28T15:05:43Z

Did anyone find an official solution for that issue?

Algirdyz · 2021-08-27T13:53:48Z

I am also getting this issue.
Running ambassador 1.13
3 pods
Load on the server is very low an it still happens.

Any solutions or workarounds yet?

prathap0611 · 2022-10-06T14:46:59Z

Did we get any resolution on this. We are also getting this error with emissary-ingress 2.1 and tried with 2 pods

TalhaNaeem101 · 2022-12-07T09:03:41Z

Did anyone find any solution for this? I am also getting this error even with 3.0.0 version. Randomly, gets 403 on my app, while using appropriate resources and 3 pods for ambassador. The problem is that the request does not reach my micro-service and it gives 403, however it is random. #4286 #3893

prathap0611 · 2022-12-08T16:59:30Z

Did we get any resolution on this. We are also getting this error with emissary-ingress 2.1 and tried with 2 pods

In our case, it is an issue with our design. We have two different instances of the application running each having its own authentication service. When requests hits ingress it sends the request for authentication in a round robin fashion. When a request to a targeted application hits the corresponding auth service it was working but when it hits the other auth service it was failing.

We changed our design and have single authentication service now and it is working fine. Hope this helps if someone encounters similar problem

TonkWorks mentioned this issue Jun 15, 2019

Tonkworks/1622 Support retry for AuthService #1632

Merged

2 tasks

ryandawsonuk mentioned this issue Aug 6, 2019

intermittent 503 errors in end-to-end tests SeldonIO/seldon-core#758

Closed

stale bot added the stale Issue is stale and will be closed label Nov 26, 2019

stale bot closed this as completed Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

503 and 403 errors when using more than 1 ambassador pod #1461

503 and 403 errors when using more than 1 ambassador pod #1461

robertrbruno commented Apr 24, 2019

vaibhavrtk commented Apr 30, 2019 •

edited

vaibhavrtk commented Apr 30, 2019 •

edited

ankurpshah commented May 7, 2019

erulabs commented Jun 8, 2019

Whamied commented Jun 11, 2019 •

edited

dioniseo commented Aug 7, 2019

richarddli commented Aug 7, 2019 •

edited

dioniseo commented Aug 7, 2019

dioniseo commented Aug 16, 2019

sekaninat commented Aug 19, 2019

yamaszone commented Sep 12, 2019

sekaninat commented Sep 13, 2019

mahbh2001 commented Sep 27, 2019

stale bot commented Nov 26, 2019

f-a-a commented Jan 6, 2020

oyvinvo commented Jan 29, 2020 •

edited

MateuszCzubak commented Feb 17, 2020

Mokto commented Feb 18, 2020

oyvinvo commented Feb 18, 2020

f-a-a commented Feb 18, 2020

przemek-sl commented Mar 2, 2020

jasperkuperus commented Jun 2, 2020

hextrim commented Jan 28, 2021

MateuszCzubak commented Mar 15, 2021 •

edited

amitzbb commented Jun 28, 2021

Algirdyz commented Aug 27, 2021

prathap0611 commented Oct 6, 2022

TalhaNaeem101 commented Dec 7, 2022 •

edited

prathap0611 commented Dec 8, 2022

503 and 403 errors when using more than 1 ambassador pod #1461

503 and 403 errors when using more than 1 ambassador pod #1461

Comments

robertrbruno commented Apr 24, 2019

vaibhavrtk commented Apr 30, 2019 • edited

vaibhavrtk commented Apr 30, 2019 • edited

ankurpshah commented May 7, 2019

erulabs commented Jun 8, 2019

Whamied commented Jun 11, 2019 • edited

dioniseo commented Aug 7, 2019

richarddli commented Aug 7, 2019 • edited

dioniseo commented Aug 7, 2019

dioniseo commented Aug 16, 2019

sekaninat commented Aug 19, 2019

yamaszone commented Sep 12, 2019

sekaninat commented Sep 13, 2019

mahbh2001 commented Sep 27, 2019

stale bot commented Nov 26, 2019

f-a-a commented Jan 6, 2020

oyvinvo commented Jan 29, 2020 • edited

MateuszCzubak commented Feb 17, 2020

Mokto commented Feb 18, 2020

oyvinvo commented Feb 18, 2020

f-a-a commented Feb 18, 2020

przemek-sl commented Mar 2, 2020

jasperkuperus commented Jun 2, 2020

hextrim commented Jan 28, 2021

MateuszCzubak commented Mar 15, 2021 • edited

amitzbb commented Jun 28, 2021

Algirdyz commented Aug 27, 2021

prathap0611 commented Oct 6, 2022

TalhaNaeem101 commented Dec 7, 2022 • edited

prathap0611 commented Dec 8, 2022

vaibhavrtk commented Apr 30, 2019 •

edited

vaibhavrtk commented Apr 30, 2019 •

edited

Whamied commented Jun 11, 2019 •

edited

richarddli commented Aug 7, 2019 •

edited

oyvinvo commented Jan 29, 2020 •

edited

MateuszCzubak commented Mar 15, 2021 •

edited

TalhaNaeem101 commented Dec 7, 2022 •

edited