Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost requests when doing a rolling update #43576

Closed
foxylion opened this issue Mar 23, 2017 · 13 comments

Comments

Projects
None yet
6 participants
@foxylion
Copy link

commented Mar 23, 2017

Kubernetes version (use kubectl version): 1.5.2

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS: Debian GNU/Linux 8 (jessie)
  • Kernel: Linux ip-10-234-53-65 4.4.26-k8s #1 SMP Fri Oct 21 05:21:13 UTC 2016 x86_64 GNU/Linux
  • Install tools: kops
  • Others: -

What happened:
When I update a deployment behind a service (type NodePort) for a short time requests are lost.

What you expected to happen:
There shouldn't be any request getting lost.

How to reproduce it (as minimally and precisely as possible):

  1. Use kubectl apply
kind: Service
apiVersion: v1
metadata:
  name: zero-downtime-test
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: zero-downtime-test
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: zero-downtime-test
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: zero-downtime-test
    spec:
      containers:
      - name: backend
        image: nginx:1.11
        livenessProbe:
          httpGet:
            path: /
            port: 80
            scheme: HTTP
        readinessProbe:
          httpGet:
            path: /
            port: 80
            scheme: HTTP
        ports:
        - containerPort: 80
          protocol: TCP
  1. Now do constantly request the node port and check if response is a valid 200 status code. (e.g. with JMeter and ~60 req/second)
  2. Modify the deployment so that a rolling update is triggered.
  3. For a few milliseconds requests won't be answered.

Anything else we need to know:
When adding the following to my deployment it works fine.

    lifecycle:
      preStop:
        exec:
          command: ["sleep, "1"]

Therefore I think this might be a timing issue where the pod gets the termination signal before it is removed from the service load balancing.

@edouardKaiser

This comment has been minimized.

Copy link

commented Apr 26, 2017

There is no orchestration between sending SIGTERM and removing a pod from a service.
You need to implement a proper healthcheck/readyness probe, and handle SIGTERM to start being unhealthy. Meanwhile, with a big terminationGracePeriodSeconds, the app can finish processing its requests.

@foxylion

This comment has been minimized.

Copy link
Author

commented Apr 27, 2017

@edouardKaiser Is the pod being removed from the service directly after sending the SIGTERM to the pod (and switching to terminating state) or after the pod is marked as unhealthy during the shutdown.

In case of the former wouldn't it be a good thing to remove the endpoint (pod) first from the service and then terminating it? When I use a normal load balancer this is the default workflow I would use: (1) Remove the service from the load balancer; (2) terminate it gracefully.

I can see that the readiness probe is some kind of marker that the service should be removed. But this adds a implementation overhead to each container. Many web servers support a graceful shutdown. But they won't accept any new connection after a shutdown was initiated. When Kubernetes defers the SIGTERM signal after the pod is removed from the service this would simplify the termination sequence tremendously.

In the moment I'm using a preStop hook with some seconds sleep. This gives Kubernetes enough time to remove the pod from the service (so no lost requests), but this seems more like a hack than a solution.

@edouardKaiser

This comment has been minimized.

Copy link

commented Apr 27, 2017

I agree @foxylion, that how I thought it would work too. But it doesn't, so we adapted our applications to handle SIGTERM and do a graceful shutdown.

@sean11

This comment has been minimized.

Copy link

commented Apr 27, 2017

I've implemented a similar solution to @foxylion with a preStop hook that will cause the readinessProbe to fail and take the pod out of service before the SIGTERM is sent so that I don't have to build SIGTERM handling into all of my applications. I do feel like this should be a bit more straightforward/built-in though. Any time I have to add a 'sleep 30' into a script I question whether or not there's a better solution.

@foxylion

This comment has been minimized.

Copy link
Author

commented Apr 27, 2017

@sean11 For me it is still unclear if it is only necessary to add a ["sleep", "5"], or we have to explicitly fail the readiness probe to remove the pod from the service.

@sean11

This comment has been minimized.

Copy link

commented Apr 27, 2017

Just adding a preStop hook that sleeps will only delay the SIGTERM from being sent to the pod. According to the latest kube Pod docs, after the preStop hook completes, a pod is marked as "Terminating" at the same time the SIGTERM is sent, meaning the pod will still be in service during the preStop hook unless the readinessProbe returns a failure.

My preStop hook removes a readyCheck file and then sleeps (2 * readinessProbe's "periodSeconds") so that the probe has time to fail and take the pod out of service before the SIGTERM is sent. You need to also make sure that your terminationGracePeriodSeconds is greater than the time you're sleeping otherwise your pod will be forcibly killed with a SIGKILL.

@erikgrinaker

This comment has been minimized.

Copy link

commented Apr 27, 2017

Actually, the last time I tested this I found the docs to be incorrect. The preStop hook is called roughly at the same time that the endpoint is removed in the API, but there will be some lag before kube-proxy updates the iptables rules. sleep 3 is generally sufficient for the endpoint to be removed from the service, regardless of the readiness probe, although this will not be the case if a kube-proxy instance is unable to communicate with the master for whatever reason.

The app should still handle SIGTERM properly though, as it may be processing long-lived requests that take more than 3 seconds (or however long you choose to sleep), and these should be allowed to complete before shutting down. Many app-servers will handle this for you automatically. Also note that persistent connections (which are default in HTTP/1.1) means requests can still arrive on an active connection after the endpoint has been removed. The app must gracefully close these connections on SIGTERM as well, or alternatively disable persistent connections completely by using the Connection: close header (which is what we ended up doing).

Using a service mesh, such as Linkerd, will handle most of the connection-related issues for you automatically.

@foxylion

This comment has been minimized.

Copy link
Author

commented Apr 27, 2017

@erikgrinaker I had the same results when I tested this. So the docs should be fixed. And as you said it, graceful shutdown is already handled pretty well by most of the web servers, so the sleep seems really to be enough.

But it still bugs me, that it is required to add a preStop hook to prevent a immediate graceful shutdown (= do not accept any new connections, but process existing ones) of the pod. I think most users would expect that the pod is first removed from the service before the pod terminated is initiated.

@timoreimann

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2017

This subject has also been discussed in kubernetes-retired/contrib#1140.

@foxylion

This comment has been minimized.

Copy link
Author

commented Apr 27, 2017

So in summary one could say "use sleep, it works" (based on this).
Is there anywhere a response of a maintainer why pods are terminated before service have a chance to stop routing requests to that specific pod?


Okay some documentation says: "(7) Pod is removed from endpoints list for service" is executed at the same time as "(3) Pod shows up as Terminating" and "(4) kubelet begins the pod shutdown process".

When (7) would be executed before (4), and (4) waits until (7) completes. There won't be any issues for most containers which already handle graceful shutdowns (nginx, apache, tomcat, node, ...).

@timoreimann

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2017

@foxylion Tim Hockin provides some insights in this comment. In a nutshell, it's because Kubernetes is a distributed system, and any extended coordination in such a system increases the complexity by a significant degree. The alternative that Kubernetes chose is to implement graceful termination in terms of loosely coupled reconciliation loops that just react to observable state transitions. As you cannot tell for sure when the overall state has converged, the affected application just has to wait long enough until it's reasonable to assume so.

IMHO, it should be possible to implement request draining while maintaining the asynchronous approach. My idea was to file a dedicated feature request in order to drive a discussion; unfortunately, I haven't come around doing that yet.

@foxylion

This comment has been minimized.

Copy link
Author

commented Apr 28, 2017

@timoreimann Okay this makes sense, thanks for your response!
I've closed this, as it seems to works as designed.

Still hard to get this right in the sense of simplicity, many people seem to make difficult things instead of simply waiting a specific amount of time.

@AnkitVaidya01

This comment has been minimized.

Copy link

commented Dec 4, 2017

@foxylion can you please provide deployment yaml with your input. I am trying to deploy but facing issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.