Add ability to set gateway container's lifecycle hooks #48956

den-is · 2024-01-24T13:01:59Z

Please provide a description of this PR:

Fulfills #47265 #47779 kubernetes-sigs/aws-load-balancer-controller#2131
In my company, we do not have any kustomize workflows or the ability to add it to our CI/CD.
The addition of the ability to set container lifecycle hooks does not interfere with any existing setups.

Tests
Test values yaml

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh","-c","sleep 300"]

Render template

helm template \
./istio/manifests/charts/gateway \
--set name=istio-ingressgateway \
-f values-gw-test.yaml \
-s templates/deployment.yaml \
--dry-run --debug

Output with enabled preStop lifecycle hook:

# Source: gateway/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: istio-ingressgateway
  namespace: kube-system
  labels:
    helm.sh/chart: gateway-1.0.0
    app: istio-ingressgateway
    istio: ingressgateway
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: istio-ingressgateway
  annotations:
    {}
spec:
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway
  template:
    metadata:
      annotations:
        inject.istio.io/templates: gateway
        prometheus.io/path: /stats/prometheus
        prometheus.io/port: "15020"
        prometheus.io/scrape: "true"
        sidecar.istio.io/inject: "true"
      labels:
        sidecar.istio.io/inject: "true"
        app: istio-ingressgateway
        istio: ingressgateway
    spec:
      serviceAccountName: istio-ingressgateway
      securityContext:
        # Safe since 1.22: https://github.com/kubernetes/kubernetes/pull/103326
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "0"
      containers:
        - name: istio-proxy
          # "auto" will be populated at runtime by the mutating webhook. See https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#customizing-injection
          image: auto
          securityContext:
            # Safe since 1.22: https://github.com/kubernetes/kubernetes/pull/103326
            capabilities:
              drop:
              - ALL
            allowPrivilegeEscalation: false
            privileged: false
            readOnlyRootFilesystem: true
            runAsUser: 1337
            runAsGroup: 1337
            runAsNonRoot: true
          env:
          lifecycle: 
            preStop:
              exec:
                command:
                - /bin/sh
                - -c
                - sleep 300
          ports:
          - containerPort: 15090
            protocol: TCP
            name: http-envoy-prom
          resources:
            limits:
              cpu: 2000m
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 128Mi
      terminationGracePeriodSeconds: 30

Output with empty lifecycle: {} var:

# Source: gateway/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: istio-ingressgateway
  namespace: kube-system
  labels:
    helm.sh/chart: gateway-1.0.0
    app: istio-ingressgateway
    istio: ingressgateway
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: istio-ingressgateway
  annotations:
    {}
spec:
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway
  template:
    metadata:
      annotations:
        inject.istio.io/templates: gateway
        prometheus.io/path: /stats/prometheus
        prometheus.io/port: "15020"
        prometheus.io/scrape: "true"
        sidecar.istio.io/inject: "true"
      labels:
        sidecar.istio.io/inject: "true"
        app: istio-ingressgateway
        istio: ingressgateway
    spec:
      serviceAccountName: istio-ingressgateway
      securityContext:
        # Safe since 1.22: https://github.com/kubernetes/kubernetes/pull/103326
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "0"
      containers:
        - name: istio-proxy
          # "auto" will be populated at runtime by the mutating webhook. See https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#customizing-injection
          image: auto
          securityContext:
            # Safe since 1.22: https://github.com/kubernetes/kubernetes/pull/103326
            capabilities:
              drop:
              - ALL
            allowPrivilegeEscalation: false
            privileged: false
            readOnlyRootFilesystem: true
            runAsUser: 1337
            runAsGroup: 1337
            runAsNonRoot: true
          env:
          ports:
          - containerPort: 15090
            protocol: TCP
            name: http-envoy-prom
          resources:
            limits:
              cpu: 2000m
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 128Mi
      terminationGracePeriodSeconds: 30

Signed-off-by: Denis Iskandarov <d.iskandarov@gmail.com>

istio-policy-bot · 2024-01-24T13:02:04Z

😊 Welcome @den-is! This is either your first contribution to the Istio istio repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

istio-testing · 2024-01-24T13:02:10Z

Hi @den-is. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ymesika · 2024-01-24T15:57:25Z

/ok-to-test

You will probably want to add a release note for this

den-pluto · 2024-01-24T16:15:47Z

@ymesika
Can't find anything about it in the Contribution notes.
Can you show me an example on how to add release notes?

howardjohn

see my comments in #47779 is not the right approach. If the current way doesn't meet use cases we should enhance it.

den-pluto · 2024-01-24T16:32:55Z

@howardjohn

see my comments in #47779 is not the right approach. If the current way doesn't meet use cases we should enhance it.

I'm not going to argue.
"I want this feature."
This is a very native kubernetes container feature which can be present in the default template in the official chart.

It is up to me whether enable it or not, and my responsibility - same way as if I insert it using kustomize, or cloning your chart and hosting altered version for myself.

And yes, I'm using terminationGracePeriodSeconds + terminationDrainDuration to solve the issue.

Also my PR is not setting any lifecycle by default keeping it blank/unset.

howardjohn · 2024-01-24T16:35:28Z

Thanks I understand your point of view but the projects position (after much discussion)is to keep the helm charts opinionated.

linsun · 2024-01-25T02:56:49Z

Adding a few @istio/wg-environments-maintainers for input.

@den-is is the requirement/pain point to not have gateway ready too fast so you add pre-stop hook? In the past, we tend to approve helm values PRs when there are multiple people/vendors asking for it.

akamac · 2024-02-13T12:17:16Z

I used to have preStop hook configured for Nginx Ingress + AWS ALB for zero downtime ingress updates. On receiving the SIGTERM I make health check to fail for ALB, while Nginx still accepting connections. Only once the target is down in AWS, the pod can be safely terminated.

costinm · 2024-03-10T17:45:49Z

I think the actual feature - ability to configure drain duration - is a pretty important one.
I agree with John that adding the lifecycle hook is not the right approach - but I think adding an option ( or annotation on the gateway ) to indicate the desired drain duration/grace period as first class is not bad.

The implementation should also take into account the new way of creating Gateways - managed
by Istiod, which is what other non-Istio Gateways are using, we should not add features that only work on the helm-self-managed install.

istio-testing · 2024-03-10T18:04:17Z

@den-is: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
integ-telemetry_istio	`8c6248c`	link	true	`/test integ-telemetry`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

istio-policy-bot · 2024-04-25T05:02:49Z

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2024-03-10. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

clayvan · 2024-05-01T15:01:06Z

Please re-open and let this PR through.

Thanks I understand your point of view but the projects position (after much discussion)is to keep the helm charts opinionated.

Yet, the helm chart allows configuration of several other fields such as tolerations, topologySpreadConstraints, affinity, etc.

we tend to approve helm values PRs when there are multiple people/vendors asking for it.

There are tons of people in #47779 and #47265 asking for this. Having lifecycle hook support is critical for any ingress controller helm chart, they all have it except istio gateway.

see my comments in #47779 is not the right approach. If the current way doesn't meet use cases we should enhance it.

This issue is still open because people can't agree on the fact the current way does not work. I would love it if there was native istio configuration to prevent this, but I don't believe it does from my testing. Whereas this lifecycle hook 100% works and is the simplest way for any ingress controller to avoid downtime.

is the requirement/pain point to not have gateway ready too fast so you add pre-stop hook? In the past

@linsun , no this preStop helps avoid downtime during gateway terminations. Please see #47265 (comment)

This PR will help your end users avoid downtime during rolling restarts. @howardjohn Please reconsider your "projects position" on this PR as it has no downsides, and only major upsides.

howardjohn · 2024-05-01T15:04:30Z

@clayvan I can assure you there is no PR with no downsides 🙂.

This PR or similar would make a lot more sense if someone could explain why terminationDrainDuration is not an acceptable solution; the only one I have seen is #47779 (comment) which in my understanding is not technically accurate.

FWIW comments may be more visible on #47779 (open issue vs closed PR)

ryanmac8 · 2024-05-01T16:19:18Z

@howardjohn terminationDrainDuration isn't a viable solution because we need a prestop hook so that we have a chance to inform the load balancer to remove the node from service and mark it as unhealthy before envoy receives a sigterm. Without the prestop hook and just terminationDrainDuration , the load balancer isn't made aware of envoy no longer receiving active connections and will cause client side errors.

howardjohn · 2024-05-01T16:23:32Z

@ryanmac8 can you help walk me through why a prestop informs the LB but terminationDrainDuration?

When envoy gets a SIGTERM it does not stop serving traffic. It immediately marks itself (or really, k8s does for us) as NotReady (which should stop the LB from sending it traffic). It also starts telling the LB to goaway directly with connection:close and GoAway messages.

I would expect preStop to be worse because it doesn't send connection:close or GoAway

ryanmac8 · 2024-05-01T16:39:11Z

@howardjohn terminationDrainDuration and SIGTERM causes new connections to close. This means that connections now are being denied. We are trying to avoid denying connections because that causes downtime. A prestop can resolve this because when the prestop is excecuted we are telling the container to just wait before executing the SIGTERM. We need the additional time so that a LB can notice the health checks are failing and mark the node as unhealthy. Now new connections are routed away from the node that's terminating and we don't experience any connections getting a connection close message. It's all about ensure uptime and making sure new connections are routed properly.

howardjohn · 2024-05-01T16:41:57Z

terminationDrainDuration and SIGTERM causes new connections to close. This means that connections now are being denied.

Can you provide more details? That is neither how it was designed, nor how it works in out testing. If it is, it is a bug.

Add ability to set gateway container's lifecycle hooks

8c6248c

Signed-off-by: Denis Iskandarov <d.iskandarov@gmail.com>

den-is requested review from costinm, jacob-delgado, Monkeyanator, hanxiaop, GregHanson and a team as code owners January 24, 2024 13:01

istio-policy-bot added area/environments release-notes-none Indicates a PR that does not require release notes. labels Jan 24, 2024

istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-ok-to-test labels Jan 24, 2024

den-is mentioned this pull request Jan 24, 2024

Add support for preStop in gateway chart #47265

Open

istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Jan 24, 2024

howardjohn requested changes Jan 24, 2024

View reviewed changes

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Feb 24, 2024

istio-policy-bot closed this Mar 10, 2024

istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Mar 10, 2024

costinm reopened this Mar 10, 2024

istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 10, 2024

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 10, 2024

istio-policy-bot closed this Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to set gateway container's lifecycle hooks #48956

Add ability to set gateway container's lifecycle hooks #48956

den-is commented Jan 24, 2024 •

edited

istio-policy-bot commented Jan 24, 2024

istio-testing commented Jan 24, 2024

ymesika commented Jan 24, 2024

den-pluto commented Jan 24, 2024

howardjohn left a comment

den-pluto commented Jan 24, 2024 •

edited

howardjohn commented Jan 24, 2024

linsun commented Jan 25, 2024 •

edited

akamac commented Feb 13, 2024

costinm commented Mar 10, 2024

istio-testing commented Mar 10, 2024

istio-policy-bot commented Apr 25, 2024

clayvan commented May 1, 2024

howardjohn commented May 1, 2024

ryanmac8 commented May 1, 2024

howardjohn commented May 1, 2024

ryanmac8 commented May 1, 2024

howardjohn commented May 1, 2024

Add ability to set gateway container's lifecycle hooks #48956

Add ability to set gateway container's lifecycle hooks #48956

Conversation

den-is commented Jan 24, 2024 • edited

istio-policy-bot commented Jan 24, 2024

istio-testing commented Jan 24, 2024

ymesika commented Jan 24, 2024

den-pluto commented Jan 24, 2024

howardjohn left a comment

Choose a reason for hiding this comment

den-pluto commented Jan 24, 2024 • edited

howardjohn commented Jan 24, 2024

linsun commented Jan 25, 2024 • edited

akamac commented Feb 13, 2024

costinm commented Mar 10, 2024

istio-testing commented Mar 10, 2024

istio-policy-bot commented Apr 25, 2024

clayvan commented May 1, 2024

howardjohn commented May 1, 2024

ryanmac8 commented May 1, 2024

howardjohn commented May 1, 2024

ryanmac8 commented May 1, 2024

howardjohn commented May 1, 2024

den-is commented Jan 24, 2024 •

edited

den-pluto commented Jan 24, 2024 •

edited

linsun commented Jan 25, 2024 •

edited