-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Haproxy: drain connections on target service upgrades #2777
Comments
+1 |
+1 |
Yes have been debugging and issue related to this all day! :) |
@alena1108 is there anyway to receive an event when HAProxy has reloaded and ready for new connections (that won't be dropped). Until this issue is resolved it would at least be nice to know when I can make new requests safely. |
Alternative approach from Unbounce - http://inside.unbounce.com/product-dev/haproxy-reloads/ |
+1 |
+1 |
With GA coming up very soon (aiming for end of March), we are trying to fit in as much as possible, but with other customer priorities, we can only try our best for this feature. |
This is pretty essential IMO, are there any ETA on when it might be implemented? |
+1 |
+1 this is essential |
+1 This is a blocker for a project we're working on. So, we'll need to probably use Nginx instead of Rancher's native LB with HAProxy. But I'm all about using builtin solutions. @alena1108, one question: While "true zero downtime" is not yet implemented, is it possible to use Nginx as load balancer with Rancher's native Service Discovery? If the answer is No, I understand that we'll need to use something like Nginx + Consul + Consul Templates. Thanks. |
@fewbits we are planning to re-work our load balancer to support pluggable providers. It will have controller->provider model where controller would read info from Rancher metadata, generate LB config and pass it to the provider to apply. It would require moving all haproxy specific code from Rancher to its own microservice. So if you would need to use other provider instead of haproxy, all you will have to do - write a provider implementation. I will update this ticket once the project skeleton is uploaded to github. |
After reading both the Yelp and Unbounce posts, nginx as a load balancer (with It would be great to get it working with HAProxy, since it has been designed for this kind of scenario, my vote would be to go the route without having 1second latency on those unfortunate requests, if there is an alternative. |
FYI the refactor mentioned above is #2179 |
+1 would love to see this, not sure we'll be able to use it in production without, we can't afford to have lots of HTTP requests fail during an upgrade 😢 |
A correction to my initial comment:
this is not quite correct. Per http://www.haproxy.org/download/1.2/doc/haproxy-en.txt:
-sf option drains the existing connections: "Otherwise, it will either ask them to finish (-sf) their work then softly exit, or immediately terminate (-st), breaking existing sessions." But there is a chance that all the new connections might get blocked while haproxy config is getting reloaded. Existing connections drop can be caused by the backend server picked by LB to forward the request to, going down in the middle of the request. So when backend service is stopped via Rancher, and the service acts as a target service in the LB(s), there is a chance that the existing connections to it can be dropped. Ideally we should drain the connections to it from all the LBs, and only then execute the stop. We'll have to think what would be the best way of implementing it. |
@miguelpeixe does this describe the issue you are seeing when upgrading load balanced services with the "start before stopping"? #9287 |
One would argue that this issue this not a feature/enhancement as the service upgrade process works correctly in old rancher + cattle setups. After creating stacks with the most recent versions I've started seeing this behavior: bad responses during a "zero downtime" service upgrade with "start before stopping" checked. |
@janeczku yes, but I've seen your issue being reported before, rancher team keeps closing it and pointing as related to haproxy connection drain issue. Which I think it doesn't make much sense, even though I don't really understand this connection drain problem. So I still might be wrong about this... |
+1 (i'm wondering because in earlier versions there wasn't such a long downtime 503s from lb) Using
|
+1 I was relying on this being a working featurefor our no downtime deployments |
I see this issue is old and often rescheduled/paused/resumed, but it is a quite important one. |
Hi @deniseschannon. |
@stavarengo resolved doesn't mean its released yet. It is in testing now. When it is released, it will be in the release notes for rancher with sufficient instructions on how to use it. |
Thanks for the clarification @cjellick |
@miguelpeixe @janeczku I think the issue reported here #9287 should be resolved after the following fix #8684 was released. Can you please check if it does? |
This feature is available in v1.6.11-rc6. With Drain Timeout parameter is set for services that are backends for LB services , When this backend target server gets picked by a LoadBalancer to forward the request to, goes down in the middle of the request being served ( due to service being upgraded) , service will be put in Drain state so that it is able to serve the request that is currently in progress before it can be stopped. Basic use case that would have resulted in user getting Create a service - S1 with scale 1 and drainTimeoutMs set 10000ms. Full documentation for this feature - rancher/rancher.github.io#920 Some of the bugs that were found and validated during the development of this feature: |
@sangeethah is this feature working also in Kubernetes Ingress ? I keep getting 504 Gateway time-out when updating a deployment image. Any thoughts? |
@robikovacs no, this feature is not supported for Kubernetes Ingress. |
Today when reload haproxy config, we only ensure that new connections will never get dropped (https://github.com/rancher/cattle/blob/0c4066f9fd2652f99d29989c2a29065f0378c20e/resources/content/config-content/configscripts/common/scripts.sh#L176). But we do terminate all existing connections. We have to "drain" all existing connections first before reloading haproxy config. Here are several ways of implementing it:
@ibuildthecloud ^^
TODO for @leodotcloud: #9561
The text was updated successfully, but these errors were encountered: