-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling-apply to instances, or individual instance application #2896
Comments
Actually, it is entirely possible to For example: plan outputs: however, for Still not sure whether Terraform could apply changes to at most X nodes in a graph of a given type in parallel. |
It seems that Terraform terraform/terraform/context.go Line 98 in 9de673b
Is this settable somehow on the command line? @mitchellh |
@phinze thanks for coming back to me. The use case we have is a rolling update of instances that host an Etcd cluster. The problem with parallelism is that it first destroys all the instances (granted, 1 at a time), and only then creates them again (again, 1 at a time). This means that there is no sufficient Etcd quorum left :( What we thought parallelism would do would be apply complete (down+up) changes to resources of a given In the mean time we wrote a wrapper around terraform that reads the plan paths and issues terraform per-resource with the explicit The discussion in #1552 seems to be purely around AWS autoscaling, and it isn't clear whether it'd be applicable to groups of normal instances (in our case GCE). Could you clarify? |
@mwitkow I am in the same boat. I resorted in building plumbing around our etcd cluster to ensure that quorum is always met. |
@mwitkow if you mark your instances as For a similar problem (rolling upgrades to a Consul cluster) I've sadly been just creating an entirely new Terraform resource alongside the old, applying to create both of them, manually fixing up the cluster, and then removing the old ones from the Terraform config. It's clunky to have to work in multiple steps like that, so I'd love a better solution. |
@pmoust thanks for the tip. It would have been viable if etcd was the only thing that needed to survive a rolling-restart. We also have API servers and monitoring instances that require at least one of them to be up during an update. @apparentlymart, we'll take a look as to whether @phinze, is a rolling update mechanism something that is considered a feature for Terraform? |
@mwitkow we'd definitely love to figure out some sort of story for rolling updates in Terraform, it's just a matter of figuring out how to model the feature in a clean way that preserves our declarative model. That part is anything but straightforward. Tagging this thread |
@mwitkow Even if it would be possible to create a new, then delete the old instance you still need to make sure the cluster is in sync before terminating the old instance and proceed with the next one. |
Nice blog post @discordianfish! Any rolling update facility we bake into Terraform wouldn't be ASG specific, so I'm going to merge this thread back down with #1552 and shift that issue to cover the general concept of rolling applies. 👍 |
Ah just reviewed that issue once more and realized that because we'd need to interact with the instances inside the ASG, that it might actually be reasonable to expect an ASG-specific feature there rather than one generic feature. So we'll use this thread to track "generic rolling updates to resources with count > 1" and leave that to the ASG story. |
Just looked into this again and what make most sense to me (although I'm not very experienced with terraform, so let me know if this doesn't make sense) would be to create a terraform-side success condition. Every resource could have a attribute which defines a condition built on top of 'remote-exec' which needs to return true for a resource update to success:
|
Actually now I'm wondering if we could use |
If someone is interested I blogged on how we do immutable infrastructure and rolling upgrades (also of stateful services) here. There's also an example of immutable infrastructure with consul and rolling upgrades in this github repository. It can be obviously improved but we are quite happy with it. Between instance upgrades we have to do various pre and post actions and tests (for example to correctly handle #2957) so I'm not sure how all the different needs will fit inside terraform without breaking its declarative model. |
FYI: I've tried to use remote-exec provisioner and it doesn't work. Even if executed with --parallelism=1 it's not working because terraform doesn't abort if the provisioner fails so it continues and brings down the whole cluster if something isn't working. @sgotti Have you saw my blog article? https://5pi.de/2015/04/27/cloudformation-driven-consul-in-autoscalinggroup/ CloudFormation is declarative as well but it supports a "WaitCondition"[1]. It's pretty straight forward: You tell a ASG it should only take down one instance at a time and waits for a success signal from the replacement instance before it continues with replacing the next instance. Similarly a on_sucess attribute could mean: Wait for that shell fragment to return. If it's non-zero, abort TF. If it's zero, continue with whatever is next in the TF plan. Seems straight forward to me. @phinze: Is this something you would consider having in TF? Who ever ends up implementing it. |
@discordianfish Yeah I read it when you wrote your comment (since I'm subscribed to this issue). Very interesting. We took another road due to different ideas and requirements. @discordianfish @phinze At the end, doesn't |
@sgotti It definitely sounds similar - if it can be made sure that terraform actually aborts if such hook fails. |
This is a fairly open ended issue so I'm going to make some closing remarks and then close it. I'm supportive of Terraform having some sort of functionality to assist with rolling deploys. In many cases, rolling deploys make sense at a higher layer (such as a scheduler, autoscaling group, etc.). However, some of the examples shown here such as targeting a specific instance should be completely possible. |
@mitchellh Not sure I understand your answer: Are you saying that implementing the desired behavior is already possible or that you prefer a more specific discussion? Specifically I'm thinking about cloud providers like DigitalOcean which don't support something like ASG to coordinate rolling upgrades. Without additional wrapper scripts, I don't see a way to do rolling upgrades there(*). But even if you have a scheduler or autoscaling group to do rolling deploys, you still might want TF to apply changes in a rolling way. That is of course, only if you actually use TF to update your infrastructure and not only have it for disaster recovery and initial setup which it seem a lot people are doing.. *) I ended up with a wrapper script to apply changes to each instance individually but that defeats the purpose of TFs dependency management and rollout planning as described here already. |
I'm saying that I prefer a more specific feature request/discussion if there is one. We've had the "rolling deploy" conversation a lot over the past couple years and a vast majority of the time the answer is: use a scheduler. However, I'm also open to partial implementations in Terraform: things like But, it may be the case that scripting Terraform is the best approach, realistically. Terraform's goal is: current state to desired state. As long as your desired state is the next step in a rolling deploy, Terraform can already do it! (with scripts above orchestrating updating the desired state to the next step) |
@mitchellh What about rolling updates to schedulers, like Nomad servers? I think the examples of Consul or Etcd (mentioned above) are also pretty compelling. If I run Nomad backed by Consul, what scheduler should I run it on? It feels like my stack is high enough already 🤕 Quick idea: maybe |
In my case I am trying to achieve rolling upgrade of the hosts when a new template has to be used to create the VMs. The problem is exactly what @mwitkow commented on Oct 25, 2015. Is there any update on this? |
Just to provide some context, this is one of the reasons why I still much prefer the cloud provider's native declarative infrastructure tooling (e.g Cloudformation). I know it's not trivial to get right but for me lacking control over the instance lifecycle during updates is the reason why I ever only used TF partially. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I've been experimenting with applying changes to individual instances in Terraform a'la:
The goal would be to implement a rolling update/reprovision for certain types of changes (e.g. changing machine size).
Is that something that you guys are planning or, or possibly would accept patches for?
The text was updated successfully, but these errors were encountered: