New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Improved graceful shutdown (zero downtime) #18914
Comments
As discussed in advance, FYI: @sreber84, @eberlec, @saturnism, @smarterclayton, @knobunc. |
Just so I understand, using
won't solve your issue? I agree that the wait 10s in preStop is ugly, but your wait script can control how long before termination you have to exit, even if your process doesn't work. I.e. if you have a 60s timeout on requests, you should be able to set termination grace period to 120s, set preStop to wait 60s + 10s for buffer, then exit. Then you'll get 50s for graceful shutdown before SIGKILL gets sent. |
This should work yes, but this way all the apps/devs would have to add that themselves. At least the first part (stop sending new traffic to an app before sending the SIGTERM) seems like a good idea on platform level. In a "classic environment" one would remove an app from a load balancer before even thinking about stopping it. I think it would help a lot of people/apps if this would be changed globally. I heard that quite a lot of people struggle with the same problem. I agree on the second part (to finish active requets before quitting). This is the apps responsability and can also be done in the way you described. |
@openshift/sig-pod @openshift/sig-networking |
This would involve upstream as we are talking about core kube components here. I agree this is an issue and I do hear about it in the community. In fact, we talked about this proposal in the sig-node meeting which handles this problem on the pod bring-up side vs the tear-down side. I do think that an additional pod state would be difficult to get accepted upstream. It seems that this could be accomplished if the Pod was removed from the Endpoints when the I'm unsure about if Pods are removed from Endpoint once the Maybe I'm not grasping the nuance. |
A deleted pod is considered not ready. There's a *very* old issue for this
that is similar
kubernetes/kubernetes#13364
kubernetes/kubernetes#20473
…On Tue, Mar 20, 2018 at 11:45 PM, Seth Jennings ***@***.***> wrote:
This would involve upstream as we are talking about core kube components
here. I agree this is an issue and I do hear about it in the community.
In fact, we talked about this proposal in the sig-node meeting which
handles this problem on the pod bring-up side vs the tear-down side.
https://docs.google.com/document/d/1VFZbc_IqPf_Msd-
jul7LKTmGjvQ5qRldYOFV0lGqxf8/edit
I do think that an additional pod state would be difficult to get accepted
upstream.
It seems that this could be accomplished if the Pod was removed from the
Endpoints when the deletionTimestamp was set, indicating the Pod is
terminating. Then the pod could set whatever terminationGracePeriodSeconds
required as a timeout for draining connections. If the drain completes
early, the process and simple exit.
I'm unsure about if Pods are removed from Endpoint once the
deletionTimestamp is set or when the Pod is actually deleted.
Maybe I'm not grasping the nuance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18914 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p-C1mAeKwLBf8LHiTQ9BdIzSd6wkks5tgczLgaJpZM4SkJAl>
.
|
Agree this is something that needs to get some real attention.
On Wed, Mar 21, 2018 at 10:19 AM, Clayton Coleman <ccoleman@redhat.com>
wrote:
… A deleted pod is considered not ready. There's a *very* old issue for
this that is similar
kubernetes/kubernetes#13364
kubernetes/kubernetes#20473
On Tue, Mar 20, 2018 at 11:45 PM, Seth Jennings ***@***.***>
wrote:
> This would involve upstream as we are talking about core kube components
> here. I agree this is an issue and I do hear about it in the community.
>
> In fact, we talked about this proposal in the sig-node meeting which
> handles this problem on the pod bring-up side vs the tear-down side.
> https://docs.google.com/document/d/1VFZbc_IqPf_Msd-jul7LKTmG
> jvQ5qRldYOFV0lGqxf8/edit
>
> I do think that an additional pod state would be difficult to get
> accepted upstream.
>
> It seems that this could be accomplished if the Pod was removed from the
> Endpoints when the deletionTimestamp was set, indicating the Pod is
> terminating. Then the pod could set whatever
> terminationGracePeriodSeconds required as a timeout for draining
> connections. If the drain completes early, the process and simple exit.
>
> I'm unsure about if Pods are removed from Endpoint once the
> deletionTimestamp is set or when the Pod is actually deleted.
>
> Maybe I'm not grasping the nuance.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#18914 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_p-C1mAeKwLBf8LHiTQ9BdIzSd6wkks5tgczLgaJpZM4SkJAl>
> .
>
|
Thanks for the feedback. As far as I am concerned, a new pod state is not mandatory. Anything that helps to improve the situation is welcome :) I agree that this hat so be fixed in kubernetes first and then openshift just needs to add the ha-proxy part. |
The current state seems to be contradictory to the Openshift documentation. On https://docs.openshift.com/container-platform/3.7/dev_guide/deployments/advanced_deployment_strategies.html we can read:
However Openshift will continue to send requests to the pod for some seconds after sending it SIGTERM, and these requests will fail if we stop accepting connections. |
The current behaviour is quite weird and unexpected: |
The problem with the recommended process "application code [...] should stop accepting new Connections" in java environments this is part of the application server. I fully agree this should be handled by OpenShift (Kubernetes). Before pods are evicted they should be removed from services and routes (load balancers). A couple of other project teams in my customer's company (operating a large OpenShift cluster) are struggling with the same problem. |
@jmencak this seems familiar :) |
The problem is that there is no tight coupling between the pieces of the system. The router only learns that the backing pods are gone when the endpoint updates. BUT the router can't immediately reload (reloads are rate limited, and even when it can immediately reload, a reload can take a few seconds to a minute depending on the number of routes and the speed of the box). With haproxy 1.8 we can make some dynamic changes to the running router, so we don't need to do a reload for a lot of changes so the responsiveness will be greatly improved. But for now, you need to make sure that you have some delay between when termination is started and when the pod exits. You can either add a SIGTERM handler to the process, or have a PreStop hook registered that sleeps for a little while. |
Problem / Motivation
We operate more than 3500 containers on a large OpenShift cluster. A lot of of applications have the same problem with the current termination process. To achieve zero downtime in rolling updates, pod restarts and evacuation of nodes due to maintenance an application has to do the following things:
@sbb we implemented this behaviour for Spring Boot 1 & 2 with this extension library:
https://github.com/SchweizerischeBundesbahnen/springboot-graceful-shutdown
But this solution only works for java apps. All other languages/webservers have to implement the same thing again and again. We talked to a lot of people/companies that use OpenShift/Kubernetes and all of them struggle with this issue. Thus, I would like to propose a solution where the container platform handles the termination a bit differently.
Proposal
Introduce a pod new life cycle state, something like "TerminationPreparation", in this state
This would massively improve the availability of our applications during any form of container termination. Developers would no longer need to take care of that manually.
The text was updated successfully, but these errors were encountered: