Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash for the new user_data in plan is incorrect and not the same one as apply #2627

Closed
killercentury opened this issue Jul 6, 2015 · 27 comments

Comments

@killercentury
Copy link

When using template_file for user_data in aws_instance, hash for the new user_data in plan always be the same one.

(I have updated this issue to better describe the situation.)

Step1 - Run plan (there is existing resource):

user_data: "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0" => "52af9502d532f588235a22618e42a3ce3c395fd4" (forces new resource)

Step2 - Then apply:

user_data: "" => "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0"

You can see the hash for the new user_data in plan is not actually the one as apply.

Step3 - When I add a new line to the user_data template file, and run the plan again:

user_data: "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0" => "52af9502d532f588235a22618e42a3ce3c395fd4" (forces new resource)

It generates the same hash for the new user_data.

Step4 - I revert my change on user_data (same as original one), and run the plan again:

user_data: "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0" => "52af9502d532f588235a22618e42a3ce3c395fd4" (forces new resource)

It has the same output as previous step. The hash for user_data doesn't change back.

So the main problem caused by this issue is that if I accidentally change my user_data and run plan, there is no way back, I have to recreate all instances again even though there is no change at all. This is a critical bug in terms of workflow. And I cannot safely manage the infrastrucutre without this issue being fixed.

@killercentury killercentury changed the title Hash for user_data always revert back to previous one Hash for the new user_data in plan is incorrect and not the same one as apply Jul 7, 2015
@pikeas
Copy link
Contributor

pikeas commented Jul 11, 2015

Just bumped into the opposite problem - I'm modifying template_file and terraform plan ignores my change.

I have an aws_launch_configuration which depends on a template_file for its user_data field. When I modify the template file (that is, the file contents, not the template_file resource definition), terraform plan does not register this change. Expected behavior is modification or recreation of the LC.

I'm running Terraform v0.6.1-dev (ab0a7d8).

@phinze
Copy link
Contributor

phinze commented Jul 14, 2015

Hi @killercentury and @pikeas - thanks for the report.

Can you provide some config that reproduces this behavior? That would be a big help to us in debugging.

@pikeas
Copy link
Contributor

pikeas commented Jul 14, 2015

Unfortunately, this seems to be intermittent. If and when I find a simple repro, I'll share a test case!

@mlrobinson
Copy link

I have some configs that will do it:

provider "aws" {
    region = "us-east-1"
}

variable "count" {
    default = 2
}

resource "template_file" "example" {
    filename = "test.txt"
    count = "${var.count}"
    vars = {
        name = "file-${count.index+1}"
    }
}

resource "aws_instance" "servers" {
    count = "${var.count}"
    ami = "ami-e0efab88"
    instance_type = "t2.micro"
    user_data = "${element(template_file.example.*.rendered, count.index)}"
}

output "templates" {
    value = "${join(\",\",template_file.example.*.rendered)}"
}

With a template file of this:

${name}

First time you apply with count set to 2, it builds 2 instances. If you change count to 3, it will properly show only 1 template file will be added, but all the user_data hashes will be wrong and cause all instances to be destroyed/recreated, when only 1 needs to be created.

I use this as a way to build slaves for different things quickly, and allow me to scale them slowly (while also inserting data into each to make them unique). The file here is just an example, I'm usually passing in a hostname to set, that allows each host in this set to have a unique name coming from Terraform (thanks to cloud-init).

EDIT: The ami above is for the Debian provided wheezy instance, but any other AMI will show the same issue. Just wanted to tell people what it was without them having to hunt things down themselves.

@mlrobinson
Copy link

@phinze Any chance to look at this soon? I'm working around it for now by reworking the user_data file to not be a rendered template, but static that does it's own data lookups and just version that file manually, but I'd love for this to be as simple as it should be here.

@catsby catsby added the core label Oct 8, 2015
@mikelaws
Copy link

+1

mlrobinson's scenario is almost identical to mine. I'm using a Terraform-templated cloud-init as user-data passed to our AWS instances. I'm also using count to allow us to manage the number of AWS instances in a resource pool. Our cloud-init config is fairly simple (hostname, provisioning user, SSH allowed key).

  1. I start with, say, 5 AWS instances and things work as expected
    • Instances 0-4 are created and the appropriate cloud-init user-data is applied
  2. Decreasing count to 4 works as expected
    • Instance 4 (and only instance 4) is deleted on subsequent terraform apply
  3. Increasing count back to 5 results in one rendered template for the new resource (template_file.cloud-init.4), seen when running terraform plan -module-depth=-1
    • No other templates appear to be rendered (seems like correct behavior)
    • A new hash is created for the one new rendered template (seems like correct behavior)
    • The new hash appears in the user_data for the new AWS instance (seems like correct behavior)
    • The new hash appears to also be compared with the user_data hash for all of the existing instances of the same resource, forcing new resources for instances 0-3 because the existing hashes don't match the value of the new hash for instance 4 (seems like incorrect behavior).

@bennycornelissen
Copy link

@mikelaws I have the exact same issue. As a workaround, I tried to run terraform plan using the -target option, only targeting the specific new instance, but that doesn't work. Terraform just claims there's nothing to be done.

Also tried working around this using the lifecycle parameter but that caused my Terraform run to fail.

For instance:

resource "aws_instance" "foo" {
  count = "${var.count}"

  lifecycle {
    ignore_changes = ["user_data"]
  }

  # more config
  # ...
}

caused:

* aws_instance.foo.1: diffs didn't match during apply. This is a bug with Terraform and should be reported.
* aws_instance.foo.0: diffs didn't match during apply. This is a bug with Terraform and should be reported.
* aws_instance.foo.2: diffs didn't match during apply. This is a bug with Terraform and should be reported.

@Fodoj
Copy link

Fodoj commented Nov 13, 2015

The reason for this happening is this issue #3864

@bennycornelissen
Copy link

meanwhile I've worked around it by using autoscaling groups instead of instances with counts. This does introduce some new issues regarding create_before_destroy and dependency cycles when destroying, but those I can quite easily work around.

@Fodoj
Copy link

Fodoj commented Nov 13, 2015

@bennycornelissen sounds like a viable option till this bug is fixed. But

  1. It's a workaround only for AWS
  2. Then your infrastructure templates don't represent your infrastructure any longer

@Fodoj
Copy link

Fodoj commented Nov 16, 2015

@bennycornelissen another workaround would be to rollback this PR #2788 If you roll it back and recompile terraform then everything should be fine

@bennycornelissen
Copy link

@Fodoj what exactly did you mean by the second point? I'm not sure I quite understand. I've been testing the autoscaling groups for about a week now, and it works exactly the way I wanted. One thing that is worth noting, and that might be what you were getting at, is that whenever I update the launch configuration, it doesn't rebuild the instances already running in the ASG. In my specific use case however, that is actually a good thing (I can manage 'rolling' upgrades myself.

But I can see how it could cause problems for other people.

@Fodoj
Copy link

Fodoj commented Nov 16, 2015

@bennycornelissen well terraform is a tool to describe your infrastructure via templates. Your tempalte would say "1 instance"", but then autoscaling will create 9 more. And now your infrastructure template says "1" while in fact there are "10"of them. But that applies only if you are still using aws_instance resource. If not and if you are using only ASG resourcre in the template then I am wrong :)

@bennycornelissen
Copy link

I replaced the aws_instance resource with the aws_autoscaling_group and aws_launch_configuration resources

@calvn
Copy link
Member

calvn commented Dec 9, 2015

I am also encountering this issue, but I am seeing this right from the start and not after changing the count.

I do terraforom apply on a clean project and it gives me this error: diffs didn't match during apply. This is a bug with Terraform and should be reported. on aws_instance resource. I am passing in user_data argument on it. When I run it a second time it creates the aws_instance resources, but now user_data is not using the correct generated ID from template_file.

Edit: The user_data was actually loaded to the machine, I had the cloud-init misconfigured so it wasn't running properly. However, using template_file inside a module does give me that error the first time exactly as described in #3732.

@ryangraham
Copy link

I've been fighting this for the last few hours just to add a new server. I eventually built a new terraform binary with #2788 reverted. But I still get the same new user_data hash no matter what I put in my template.

@jordiclariana
Copy link

It seems that this bug it's been here for a while with no fix and my impression is that is really important. Is there any forecast for its resolution?

@calvn
Copy link
Member

calvn commented Dec 23, 2015

I am not sure whether this is a related issue or not, but I am encountering an problem where the plan says my user_data (which comes from a template_file) is changed and thus forces resource recreation.
However, the plan does not indicate that my template_file.cloud-init has been modified.

@Phomias
Copy link
Contributor

Phomias commented Feb 19, 2016

FYI: Still an issue in Terraform v0.6.11

@osterman
Copy link

Yea, this is annoying. The hash of the user data is not idempotent. We are not able to resize a cluster of machines by changing count.

@phinze
Copy link
Contributor

phinze commented Feb 29, 2016

Hey folks, sorry for all the trouble here. The core problem with changing counts is described over in #3449 - the fix requires that we introduce first class list indexing into Terraform's config language. This is work that @jen20 has been pushing forward, and we expect it to land with 0.7, our next major release.

So for everybody on this thread reporting problems when count is changing, that will address your issue.

It sounds like a few of you are seeing behavior unrelated to count changing - in those cases it'd be great to see some steps to reproduce so we can investigate.

@jordiclariana
Copy link

Thanks @phinze! Maybe not the topic here (my apologize in advance) but, when do you think 0.7 will be released?

@phinze
Copy link
Contributor

phinze commented Feb 29, 2016

@jordiclariana We're still in the thick of it, so no precise timeline quite yet - should get a better handle on the expected timeline in the coming week or two. We'll continue with the relatively high patch release cadence in the meantime. 👍

@JesperTerkelsen
Copy link

We also ran into this 👍

@slobodyanyuk
Copy link

Looks like it's not fixed in v0.7.0
Such a pity

@apparentlymart
Copy link
Member

apparentlymart commented Apr 4, 2017

Hi everyone! Thanks for all the great discussion here and sorry for the lack of movement on this issue for a while.

The good news is that some core changes have been made in the intervening time that address this problem. These don't make the existing configurations work, but they provide new features of the configuration language that mitigate the root causes here.

First, some background one what's going on here: sometimes the result of an interpolation expression can't be resolved during the plan phase because its value depends on the result of an action that won't be executed until apply. In this case, Terraform puts a placeholder value in the plan, which renders in the plan out put as <computed>. Unfortunately when such a value appears in a "forces new resource" attribute, Terraform is forced to assume that the value is going to change because it can't prove during plan that the resulting value will match the current state of the resource, and so we get the problem described here where instances get recreated unnecessarily. (the "magic value" of user_data shown in these diffs is a leaky abstraction where Terraform is hashing the placeholder value used to represent a computed attribute.)

The resource "template_file" block in the earliest example in this discussion is problematic because although we (via human intuition) know that rendering a template is safe to do during plan, Terraform just assumes that all resources can't be created until apply. In 0.7 we introduced a new feature called data sources which allows us to express to Terraform that certain operations are safe to execute during the plan phase, and template_file was recast as one to enable the use-case shown in that first example:

provider "aws" {
    region = "us-east-1"
}

variable "count" {
    default = 2
}

data "template_file" "example" {
    filename = "test.txt"
    count = "${var.count}"
    vars = {
        name = "file-${count.index+1}"
    }
}

resource "aws_instance" "servers" {
    count = "${var.count}"
    ami = "ami-e0efab88"
    instance_type = "t2.micro"
    user_data = "${data.template_file.example[count.index]}"
}

output "templates" {
    value = "${join(",", data.template_file.example.*.rendered)}"
}

With template_file now a data source, Terraform can render the template during plan and notice that its result is the same as what's already in the state for user_data and thus avoid replacing the aws_instance resources.

The other thing that has changed in the mean time is the introduction of the list indexing syntax via the [ .. ] operator, which has replaced the element function in the user_data interpolation. (There are some remaining limitations of this which are captured in #3449.)

With these two changes to configurations it is possible to avoid the issue described here.

There are still some situations this doesn't cover where replacing with a data source is not appropriate and these should eventually get addressed once we've done the foundational work described in #4149. Thus I'm going to close this issue with the recommendation to switch to the template_file data source as a solution for the common case described in this issue, and anticipate a more complete solution to follow after #4149.

@ghost
Copy link

ghost commented Apr 14, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@hashicorp hashicorp locked and limited conversation to collaborators Apr 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests