New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebalancing service containers across hosts #2558
Comments
i suppose disable host, some combination of scale -1, then +1 would work? And maybe when you disable host rancher will automatically start rebalancing? For statefull containers I am using combination convoy + EBS but not in volume-driver mode :D So in this case I have to add unmount/mount, detach/attach step |
So far I've done it by deleting and recreating the services from compose files. This is only possible because they're stateless. When disabling a host, Rancher already seems to recreate its containers on other hosts. |
From what I'm reading, you're wanting to be able to click on something to rebalance a service to allow it distribute containers across the additional hosts that you've added into your environment. This would be for something that has a specific scale versus a global service since a global scale would obviously start more containers if more hosts were added that followed it's scheduling rules. |
Correct. |
+1 this is a requirement for us as well @will-chan as discussed via email, I found this ticket which is strictly related I think. we must be able to guarantee a Stack is running with at least N-number of containers across at least two hosts. This is a basic requirement for HA. You don't want to have your entire Stack (or large part of it) running on a single host. that host is a single point of failure, even though Rancher will keep the scale, this can be undesired for containers that take longer time to start. Rebalancing is not possible with current Rancher scheduling rules. Only if you manually intervene by deleting hosts as @gordontyler already mentioned. The current simple Rancher scheduling algorithm to place containers on the host with least number of containers has an undesired effect. It only works the first time you deploy on hosts with equal number of containers. In the longer term, as you have hosts that fails or get replaced, you end up with very undesirable stack distributions, e.g. an entire stack on a single host or large part of it. Imagine the following (I simplify here). You start from scratch with rancher and you create a new environment "production" and you put 3 new hosts there (A,B,C). Than you deploy your first Stack A which has one service (add health-check) with a scale of 6 (no affinity rule, just to make a simple example and let it distribute across alls hosts). You will get 2 containers on each host. All fine here, nice redundancy across hosts. Now simulate a disruption of service, e.g. take down host A. Rancher will detect the 2 containers "unhealthy" and it will place them on the other hosts, one on B and one on C, so they will have 3 containers each. Now simulate you get the host A back online. What do you see? Host A is free, the initial containers are not rescheduled here. So the other Hosts get packed with more containers but the entire cluster/environment does not rebalance! So as time goes, more hosts will be empty/free, and fewer hosts will be packed more. Unless you manually intervene now and than, which is what we do not want to do, we want Rancher to take care of that for us in a graceful way. Now, if you launch a new Stack B and let us say we have one service with a scale of 2, both containers will be launched on the Host A. So now we have Stack B only on one node, even though there are 3 nodes available. We (operation team) could establish a policy that every time we deploy a new Stack, we need to add at least two empty hosts ... but that would be weird and over allocation of hosts over time. I have tried many different things, using hard and soft anti-affinity rules like It still does not guarantee that a stack is running on at least two hosts and prevents the undesired long-term stated above. So, basically everytime we have maintenance of a host, we are kind of forced to just to put a new host in place and just destroy the old one, so that the containers are put on the new host directly. But this does not protect us from "unexpected host/network failures", Rancher will not re-balance and you end-up with empty hosts, where new stacks will possibly be deployed entirely. We expect Rancher to re-balance and ensure multi-host setup at least for ephemeral services is very easy to be done for GA release. Rancher is meant for production deployments. The quick and dirty solution idea we have now is to either duplicate identical services within a Stack or duplicate the same Stacks. For example, imagine I have a stack "mystack": docker-compose.yml: myapp1: image: myorg/myapp:mytag label: io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=mystack/myapp1 io.rancher.scheduler.affinity:container_label_ne: io.rancher.stack_service.name=mystack/myapp2 myapp2: image: myorg/myapp:mytag label: io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=mystack/myapp2 io.rancher.scheduler.affinity:container_label_ne: io.rancher.stack_service.name=mystack/myapp1 rancher-compose.yml: myapp1: scale: 1 health_check: port: 3000 interval: 2000 unhealthy_threshold: 3 response_timeout: 2000 healthy_threshold: 2 myapp2: scale: 1 health_check: port: 3000 interval: 2000 unhealthy_threshold: 3 response_timeout: 2000 healthy_threshold: 2 The above I hope will assure that I have at least 2 containers of the same application (i use same image on app1 and app2) are not running on the same host. Being separated services I can scale them independently and having soft anti-affinity will still allow them to be on same host if the other are "too packed". It is a kind of dirty-trick ... as I don't like to duplicate identical services. We will experiment with the above. It can also be done by deploying two identical Stacks. Maybe Rancher should add scale to the Stack as well? |
+1 I generally agree on most here ;) One thing which isnt really acceptable is the delete host approach... at least not for me... Lets say a degraded number of hosts has led our stack to concentrate X services on one or more hosts. If I then get the number of hosts back up, the last thing I'd want to do is to delete a host in order to have containers balance out... If this was a simple/small 2 host deployment it would be even more critical as deleting the only host left = no service at all.... Its a case of "2 wrongs to make a right" :p I'm sure in the long run, one of the more involved strategies is the way to go... Implementing moving containers would, IMO, be the way out... I'm not sure if its 100% production ready, but there is some demos on docker containers being moved almost seamlessly... for stateful containers I would assume you would need to be using a "cluster-wide" FS such as convoy with gluster or nfs? (otherwise you will always have an issue.... the container may be stateful but can you always count 100% on the host? if the host is a SPOF then you wont be delivering HA) |
@RVN-BR exactly, we want Rancher to automatically re-balance services and make sure a Stack is on at least two hosts. This could be done by allowing one to say in docker-compose via specific rancher labels "run this service with a total scale of X with at least Y instances on each host that meets the scheduling rules". That would solve my HA requirements. |
+1 |
2 similar comments
+1 |
+1 |
+1 |
2 similar comments
+1 |
+1 |
Please stop using |
We are in the testing phase of Rancher, coming from DockerCloud.
All our application's containers are currently deployed in HA. |
We recently upgraded to 1.3.3 and I just realized that this can be achieved with the option "Always run one instance of this container on every host" when adding a service. |
Looks like #7253 could solve that problem by
|
You can get some way towards this by:
Unfortunately, if there is a new (empty) host in the cluster, and your service is already running across all hosts, you might end up with all your containers created on the empty host. A simple solution would be that any container which is stopped (or to-be-stopped) during an upgrade cycle is not counted towards the anti-affinity rule. This might be the case already - I have not checked. Another option is for the anti-affinity rule to be weighted by the number of matching instances - that is, it would prefer the host with the fewest number of running instances, rather than only matching a host with zero instances. The problem with using an anti-affinity rule like this is that it will try to force the service to run on as many hosts as possible - taking priority over resource concerns, until one instance is running on every node. However in practice, the user probably only wants to be sure that it's running on 2 or 3 nodes for redundancy. A better approach could be to weight the constraint inversely proportional to the number of nodes where the service is already running. For example: if the service is only running on one node, then aggressively choose a different node. If it is running on two nodes then prefer to run it on a third. If it is running on three nodes then weakly prefer it to run on a fourth. At this point, balancing of other resources is probably more important, since you have good redundancy. I would argue that this sort of anti-affinity should be part of the default scheduler behaviour, since this is probably what people expect. That is, if they request more than one instance of the same service then it's likely to be for redundancy purposes, not just for spreading load over multiple cores. Also, I would like to see the default scheduler behaviour explicitly documented. In particular, does it take into account any of the following, and if so how?
[^1] Aside: clicking Edit on a running service doesn't give you this option, but once you select 'Upgrade' you can modify labels and scheduling rules. Rather stupidly, I used the UI to paste in a label |
I agree with candlerb in his suggestions. ".. the user probably only wants to be sure that it's running on 2 or 3 nodes for redundancy." Correct, that is what we want. His approach to weight the constraint inversely proportional to the number of nodes where the service is already running, would solve the one thing in Rancher that does not work satisfactorily for us. I also agree that this should be part of the default scheduler behaviour, since this was what we originally expected by Rancher. |
Still hoping for a HA cluster strategy for Rancher! |
+1 |
+1 if that helps but I fear that we'll need to wait and migrate to rancher 2.0/rke |
This is one those things where everybody thinks they want "feature x" but when you start taking they all have a different and incompatible idea of what X means.. I want them spread across just a few hosts. No, all the hosts. Or spread them according to the value of this label so they're in different zones. But don't reschedule them if one dies because they have storage over here and I want to reuse it. And if a host too full then it's ok to colocate temporarily.. but rebalance if a new host comes in. But not too many at a time or I'll lose quorum... Etc It's (clearly) not going to change for 1.xmore after 2 years, and in 2.0 you can do whatever k8s supports. |
I have a solution for this that is very much in the vein of "if it's stupid, but it works, it's not stupid".
|
Eventually you have to spin down the "noop" stack and your production containers still get stacked on the same box |
@arwineap it's an ongoing process until you get to a "good" state, but the Earlier in the week I had to evacuate a host and the first thing that happened after that was a restart of the 3 most RAM-intensive services in our stacks, all of whose containersl landed on that newly-empty host and nearly blew out the ram. The This is by no means a perfect solution, but at least it will keep me somewhat sane while we work towards our Rancher2/K8s migration. |
Say for example that I have N hosts and a number of containers running on those hosts. Initially, the load was fine, but it has subsequently increased and I'm facing resouce constraints on these hosts. I need to add more hosts to my system and reassign existing containers to these new hosts.
In the case of stateless containers, this should be fairly easy -- destroy existing containers and recreate them on the new hosts.
It's harder in the case of stateful containers, but a stop, export, remove, load, start sequence would probably work, although I'm not sure how volumes would factor into that.
It would super awesome if Rancher could handle at least the stateless container case for me. Something like an action for a service or a stack maybe to "rebalance" the containers across available hosts.
The text was updated successfully, but these errors were encountered: