-
Notifications
You must be signed in to change notification settings - Fork 8k
envoy drain duration should adapt to pod spec #34855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We use Overall I agree we should do this. A few concerns I think we will need to work out:
The benefits largely outweigh the issues, but I we should see what we can do to address those |
@howardjohn It's passive because Envoy does not signal clients pro-actively. On a long-living connection with few requests, clients simply don't receive any notification. For very long grace periods, we have to implement polling for open connections so that if app exits early, proxy does, too. If we do pursue this, I'd imagine flags (1)-(3) are no longer useful. 5s is a dangerously low value, more so for apps with very longs shutdowns. |
One complication is that health checks (and some other endpoints) go through Envoy and those should not count as application connections during drain. |
Tagging @hzxuzhonghu. |
This is exactly the same reason why we can not completely depend on Envoy to provide draining connection count. |
I just ran into this issue; one of my services has a high maximum but low average request time, so it gracefully shuts itself down, but once it's actually exited the sidecar persists and waits around even though there's no active connections. Hopefully I see this in a release soon, because I'd really like to test it and have even faster resource freeing across our clusters. |
@kevin-lindsay-1 #35059 has this implementation. You can set EXIT_ON_ZERO_ACTIVE_CONNECTIONS on Agent side and try it. Please let us know how it works |
@ramaraochavali I use helm to install istio, and can change the image tags. Are there image builds that I could try out? |
https://storage.googleapis.com/istio-build/dev/latest - Please try with this |
@ramaraochavali I use dockerhub for my images; can you tell me what I'd need to change on the helm chart for |
Sorry. I am not aware of that. May be @howardjohn knows about it |
--set global.hub, --set global.tag
…On Mon, Oct 11, 2021 at 10:30 PM Rama Chavali ***@***.***> wrote:
Sorry. I am not aware of that. May be @howardjohn
<https://github.com/howardjohn> knows about it
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34855 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXLMRZ6OZO44EQKOBLLUGPBXXANCNFSM5CX26JXQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@howardjohn sorry, I understand the Not experienced with using google storage as a hub, to the point where I'm not even 100% that it's compatible as an implementation of a docker hub. |
gcr.io/istio-testing tag=latest
https://github.com/istio/istio/wiki/Dev%20Builds
…On Tue, Oct 12, 2021 at 3:57 PM Kevin Lindsay ***@***.***> wrote:
@howardjohn <https://github.com/howardjohn> sorry, I understand the values
to set, but I don't know what the actual hub and tag values would be, as
this appears to be a google hub and tag, and the attempts I made all gave
me errors.
Not experienced with using google storage as a hub, to the point where I'm
not even 100% that it's compatible as an implementation of a docker hub.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34855 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXKBF46II7UPDNQ4ARLUGS4LXANCNFSM5CX26JXQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Thanks, I'll give it a shot! |
@ramaraochavali I updated the Should I not set Pod logs say version |
@kevin-lindsay-1 It is not part of proxy metadata. You need to set that as env var while starting proxy. |
@ramaraochavali how do I set that when using the sidecar injector? I tried going in after the pod initialized, but it looks like it doesn't check for that variable after initialization, and I don't see a way to configure the injector to set environment variables that aren't part of a specific pre-defined API. |
@kevin-lindsay-1 Sorry. I missed that - since you have already set in proxy Metadata - it should have been carried to env of proxy. Can you please share your pod.yaml and logs? |
@ramaraochavali scratch that. i checked on my config, and I must have tried something previously along the lines of: proxy.istio.io/proxyMetadata: |
EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true' I just tried it again with: proxy.istio.io/config: |
proxyMetadata:
EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true' and it worked exactly as expected this time. Sorry for the late response, took a while to get around to testing this again. |
Awesome. Thanks for the confirmation
…On Tue, 26 Oct 2021 at 3:54 AM, Kevin Lindsay ***@***.***> wrote:
@ramaraochavali <https://github.com/ramaraochavali> scratch that. i
checked on my config, and I must have tried something previously along the
lines of:
proxy.istio.io/proxyMetadata: | EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true'
I just tried it again with:
proxy.istio.io/config: | proxyMetadata: EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true'
and it worked exactly as expected this time.
Sorry for the late response, took a while to get around to testing this
again.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34855 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEDINKAIJ65INL67XUU7PUDUIXKJLANCNFSM5CX26JXQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@ramaraochavali I'd be very interested to know when this feature hits an official release, as it will be very useful for my team, as we use EKS w/ Spot instances, so faster drain time can save us maintenance time and a non-trivial amount of costs over time. |
@kyessenov are you saying |
It is in 1.12 and here are the dates for https://github.com/istio/istio/wiki/Istio-Release-1.12 |
@ramaraochavali Thanks for quickly reply. What's the best practice before |
2 is not used now. |
Hey, If i'm reading the changes right, we could template our workloads with an annotation like this:
And should be able to remove our custom wrapper script as that would give me a drain duration === to the termination grace period, and it would exit in a timely manner. |
Can you please share Bootstrap's proxyMetadata section if possible? |
with inclusionRegexps``` { "ANNOTATIONS": { "kubernetes.io/config.seen": "2022-02-26T22:20:30.119706875+09:00", "kubernetes.io/psp": "eks.privileged", "prometheus.io/scrape": "true", "kubectl.kubernetes.io/default-container": "nginx", "sidecar.istio.io/status": "{\"initContainers\":[\"istio-init\"],\"containers\":[\"istio-proxy\"],\"volumes\":[\"istio-envoy\",\"istio-data\",\"istio-podinfo\",\"istio-token\",\"istiod-ca-cert\"],\"imagePullSecrets\":null,\"revision\":\"1-12-1\"}", "prometheus.io/path": "/stats/prometheus", "kubernetes.io/config.source": "api", "proxy.istio.io/config": "terminationDrainDuration: 10s\nproxyMetadata:\n EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true'\n MINIMUM_DRAIN_DURATION: 11s\nproxyStatsMatcher:\n inclusionRegexps:\n - \".*downstream_cx_active\"\n", "prometheus.io/port": "15020", "kubectl.kubernetes.io/default-logs-container": "nginx" }, "WORKLOAD_NAME": "nginx", "INTERCEPTION_MODE": "REDIRECT", "ENVOY_STATUS_PORT": 15021, "NAME": "nginx-55475b47b6-57m8z", "INSTANCE_IPS": "10.110.101.240", "ISTIO_PROXY_SHA": "istio-proxy:e6f45abcf874983fbff384459d70b28c072f68b5", "CLUSTER_ID": "Kubernetes", "OWNER": "kubernetes://apis/apps/v1/namespaces/teraoka/deployments/nginx", "LABELS": { "pod-template-hash": "55475b47b6", "service.istio.io/canonical-revision": "latest", "app.kubernetes.io/name": "nginx", "security.istio.io/tlsMode": "istio", "app.kubernetes.io/instance": "nginx", "service.istio.io/canonical-name": "nginx" }, "ISTIO_VERSION": "1.12.1", "SERVICE_ACCOUNT": "nginx", "PILOT_SAN": [ "istiod-1-12-1.istio-system.svc" ], "PROXY_CONFIG": { "proxyMetadata": { "MINIMUM_DRAIN_DURATION": "11s", "EXIT_ON_ZERO_ACTIVE_CONNECTIONS": "true" }, "controlPlaneAuthPolicy": "MUTUAL_TLS", "binaryPath": "/usr/local/bin/envoy", "proxyStatsMatcher": { "inclusionRegexps": [ ".*downstream_cx_active" ] }, "configPath": "./etc/istio/proxy", "proxyAdminPort": 15000, "statNameLength": 189, "discoveryAddress": "istiod-1-12-1.istio-system.svc:15012", "drainDuration": "45s", "statusPort": 15020, "terminationDrainDuration": "10s", "concurrency": 2, "tracing": { "zipkin": { "address": "zipkin.istio-system:9411" } }, "holdApplicationUntilProxyStarts": true, "parentShutdownDuration": "60s", "serviceCluster": "istio-proxy" }, "MESH_ID": "cluster.local", "APP_CONTAINERS": "nginx", "NAMESPACE": "teraoka", "PROV_CERT": "var/run/secrets/istio/root-cert.pem", "ENVOY_PROMETHEUS_PORT": 15090, "POD_PORTS": "[{\"name\":\"http\",\"containerPort\":80,\"protocol\":\"TCP\"}]" } ```without inclusionRegexps
diff
|
@ramaraochavali Thank you, too. |
Does this actually work now? If so, when can I test it in a non-dev build? |
@ramaraochavali I have seen your PR back in Feb. I'm on 1.13.4 and still don't see this has been merged and only see it on 1.14 beta will your 37573 be merged to any 1.12 or 1.13?
my IOP spec
|
I have added for 1.13. Please follow this PR #39082 |
@ramaraochavali thanks. can we also cherry-pick this for 1.12? |
1.12 needs manual PR. It will take some time. |
@ramaraochavali thanks! can you let us know when this will be in a release build? i've sometimes seen merges take months to see in a release build. |
Sorry I have been busy with other stuff. PR for backport #39286 to 1.12 |
🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2022-06-05. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions. Created by the issue and PR lifecycle manager. |
Hi @ramaraochavali , |
Sure. We can update the docs |
EXIT_ON_ZERO_ACTIVE_CONNECTIONS isn't mentioned in latest docs - has the default changed to true or it still exists in current istio as an option? and maybe related is there an uptodate doc on how to ensure the proxy doesn't die or breaks the network before the service ends, while at the same time stop new traffic from coming in asap? |
Its there on https://preliminary.istio.io/latest/docs/reference/commands/pilot-agent/. This is an experimental feature so there is not more docs beyond the generated reference. |
We have a service container that runs a background thread that needs to complete in-flight work and then exit. The Waiting for istio connections to drain does not help with this situation since the background thread doesn't have an open connection. One suggestion I have seen floating around is to have the service container open an otherwise unneeded connection to the istio pod so that istio will wait until the service pod exits. But it feels kind of hacky. Suggestions? |
https://istio.io/latest/blog/2023/native-sidecars/ is the perfect solution. Everything else is pretty rough. 1.29 is available on most platforms these days |
Thanks! More motivation to get current on k8s. Is there a "least bad" approach in the meantime? |
You can use terminationDrainDuration and set it high (works ok but is a static time), or use EXIT_ON_ZERO_ACTIVE_CONNECTIONS (experimental, doesn't work in some cases) |
Thank you. |
There are four (!) settings controlling the sidecar termination duration:
drainDuration
is envoy's graceful drain duration (default 45s).parentShutdownDuration
is envoy's parent shutdown delay (default 60s).terminationDrainDuration
is pilot-agent's termination duration delay after SIGTERM (default 5s).terminationGracefulPeriodSeconds
is pod's graceful termination delay after SIGTERM before SIGKILL (default 30s).This is very confusing and is actually inconsistent since (1)-(3) are normally mesh-level and (4) is pod-level. (4) overrides (3), and (3) overrides (1), while (2) is not used. (3) is also too short to be meaningful since envoy's drain is passive.
The proposal is to use (4) as the main deadline since it's orchestrated by kubelet irrespective of mesh-level settings. In this world, there is just one graceful deadline value (4) that dictates how long envoy gracefully drains open connections before shutting down. That means envoy is not terminated abruptly without sending final telemetry, for example. The process would look like this:
cc @ramaraochavali @mandarjog
The text was updated successfully, but these errors were encountered: