-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
Describe the bug
Hey,
We are noticing blips in services under load during kubernetes rollouts. We observe a handful of 503 errors from istio-proxy on the pod being removed (either because of a rollout, or a scale down). This screenshot is from three separate "scale downs":
When scaling down, this is the sequence of events we observe:
- pod goes into TERMINATING state and is removed from kubernetes endpoints
- A handful of the last requests to the pod are reported by istio-proxy as 503
- those requests are also logged in the upstream calling service as 503
- application exits
- istio-proxy exits
As you can see here:
At the moment, our only saving grace is that we have configured a retry policy which means our end users experience a bit of a slow request, but not a failure - however relying on a retry mechanism in this scenario doesn't feel right.
Expected behavior
The isito-proxy on the application being scaled down should not receive any requests after it has entered a TERMINATING state.
Steps to reproduce the bug
As above, but I can get on hangouts and show you this in detail.
The application itself gracefully handles sigterms and drains and have confirmed this with load tests without istio-proxy in play. I have also added a preStop hook to the application with istio, to ensure the app doesn't receive a SIGTERM until well after istio-proxy shuts down.
Version
gke 1.10.5, istio 1.0
Is Istio Auth enabled or not?
Yes
Environment
GKE

