New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing count of instances destroys all of them #3885

Closed
Fodoj opened this Issue Nov 12, 2015 · 27 comments

Comments

Projects
None yet
9 participants
@Fodoj
Copy link
Contributor

Fodoj commented Nov 12, 2015

Title says it all: I increased the count and all of previously created instances were destroyed. Doesn't seem like an expected behavior to me.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 12, 2015

Using prevent_destroy flag doesnt help neither:

* openstack_compute_instance_v2.my_little_server: the plan would destroy this resource, but it currently has lifecycle.prevent_destroy set to true. To avoid this error and continue with the plan, either disable lifecycle.prevent_destroy or adjust the scope of the plan using the -target flag.
@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 12, 2015

Ouch. This wouldn't be good and definitely not what should happen.

What does your TF configuration look like? I created a basic instance with a count attribute set to 1, then increased it to 3. The original stayed online and accessible the entire time.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 12, 2015

I am using user_data with tempalte_file renderer. Is that related to this issue? https://groups.google.com/forum/#!topic/terraform-tool/kvaaUlT6cg4

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 12, 2015

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 12, 2015

Ah, that's quite possible. I've only had a chance to skim those two -- I'll look deeper later today.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 13, 2015

Do you mind providing the config you're using to cause this?

Any change to user_data will cause the instance to recreate. My guess is that your user_data isn't changing for existing resources, but for some reason Terraform thinks so. I think this may be a bug outside of the OpenStack provider, but if you can provide a config, I can take a look and see if it's fixable or if there's a workaround.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 13, 2015

It behaves exactly the same way as described in this comment #2627 (comment)

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 13, 2015

It is indeed a bug with terraform. I am trying to figure what it is exactly, but basically the problem is that after increasing count hash of a string is calculated not from the result of rendered template but of a string like "${element(..., count.index)}" (I repeat: not the result of this interpolated function call, but string itself before interpolation). And then the hash of this string is always the same for all elements and changing it forces recreation.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 13, 2015

Most likely related to #3864

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 13, 2015

After debugging and trying to find the source of the issue for 5 hours I can confirm that destruction of all instances happens exactly same reasons as here #3662 and here #2627 (comment).

Seems like there is a fatal bug in the implementation of element() function that leads to it being not interpolated at all.

It happens not only for user_data together with template_file, but also for block volumes, ip addresses etc (just like in #3864 ), so the only common piece is element() function and count variable.

I tried to fix it myself, but so far I just spent lots of time learning internals of terraform. Even though I feel like I know it much (much much) better now, it's still not enough to track down the source of the bug.

I would highly appreciate any tips on how to fix it or at least to which part of the source code I should stare at and debug longer, because this bug basically destroys not only instances, but the whole range of use cases for Terraform (most important one being the generation of user data with template_file).

My current suspects are config/interpolate_walk.go and config/raw_config.go but I might be absolutely wrong with this assumption.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 14, 2015

@Fodoj Thanks so much for digging into this. It sounds like this is a bug with core, so I'm going to label it as such.

@jtopjian jtopjian added bug core labels Nov 14, 2015

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 14, 2015

@jtopjian so any tips on where to search for bug? I have some hours this weekend to work on this.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 14, 2015

Fastests way to reproduce the bug:

variable node_count {} 

resource "template_file" "test_file" {
  filename = "test_1.yml"
  count = "${var.node_count}"
}

resource "template_file" "another_test_file" {
  filename = "test_2.yml"
  count = "${var.node_count}"
  vars {
    var1 = "${element(template_file.test_file.*.rendered, count.index)}"
  }
}

Increasing count by 1 leads to template_file.another_test_file.0 destruction.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 14, 2015

@Fodoj Off of the top of my head, I have no idea how to approach this one. If I have some time, I'll venture outside of the OpenStack provider in Terraform. At the moment, it's uncharted territory for me 😄

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 14, 2015

Seems to be the same issue #3449

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 14, 2015

Rolling #2788 back solves the issue, no more total destruction of previous resources. @mitchellh any idea for a better fix that won't lead to this bug?

@jgross206

This comment has been minimized.

Copy link

jgross206 commented Nov 25, 2015

+1 I am also running into this issue. As @Fodoj suggested, rolling back #2788 and recompiling solves it (it's a 3 line change + tests) but I don't know enough about Terraform internals to know if that's the solution or just a workaround.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Nov 25, 2015

@jgross206 it is not a sustainable solution because rolling #2788 back re-introduces the other bug, that was fixed by this PR. For now I don't have time to go even deeper into TF internals to fix both of these bugs and still using own compiled version.

I wish @mitchellh or someone else from core contributors would pay more attention to this bug, because, as I mentioned already, it leads to a lot of different bugs across many important use cases of terraform.

@Fodoj

This comment has been minimized.

Copy link
Contributor

Fodoj commented Dec 3, 2015

Any update on this bug?

@justinclayton

This comment has been minimized.

Copy link
Contributor

justinclayton commented Dec 29, 2015

+1. This severely limits us from using terraform to manage and mutate an existing stack in production.

@justinclayton

This comment has been minimized.

Copy link
Contributor

justinclayton commented Dec 29, 2015

I think I've narrowed this down some more. It's not element() specifically, but the fact that anything looking at attributes that are marked as <computed> during the plan phase will also be recomputed, even if the resulting value ends up not changing at all. This is because Terraform does not re-evaluate whether something needs to be destroyed/rebuilt at any point during the apply phase, only during plan.

Also, we should remove the provider/openstack label for this issue based on the fact that @Fodoj 's template example is definitely reproducible.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Dec 30, 2015

@phinze Thoughts?

@Fodoj Fodoj changed the title provider/openstack: changing count of instances destroys all of them Changing count of instances destroys all of them Jan 4, 2016

@hingstarne

This comment has been minimized.

Copy link

hingstarne commented Jan 5, 2016

If it matters, i could reproduce it as well

@rgabo

This comment has been minimized.

Copy link

rgabo commented Jan 6, 2016

The same issue triggers recreation of aws_volume_attachment, aws_route53_record resources when using count and element in a cluster of instances where a single instance is supposed to be recreated.

Aggressive -target-ing of resources with explicit indexes often helps in similar cases but not this time. Example in our case would be:

script/plan/mongodb -target='aws_volume_attachment.mongodb-data[1]' -target='aws_instance.mongodb[1]'

Results in:

-/+ aws_volume_attachment.mongodb-data.0
    device_name:  "/dev/xvdf" => "/dev/xvdf"

    force_detach: "" => "<computed>"
    instance_id:  "i-deadbeef" => "${element(aws_instance.mongodb.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-shelloil" => "vol-shelloil"
@calvn

This comment has been minimized.

Copy link
Member

calvn commented Jan 8, 2016

I am also encountering this issue, more specifically with template_file and count.

@phinze

This comment has been minimized.

Copy link
Member

phinze commented Jan 13, 2016

Hey folks - sorry for the trouble here and many apologies for the delay on response. This is definitely an important core bug that I'd like to get fixed soon. There is a great summarizing write up in #4528 that walks through the shape of the problem and shows why a simple revert of #2788 won't work as a solution. This is indeed an expression of #3449, so I'm going to close this thread to consolidate the discussion over there. I'll follow up with a plan of action on that thread.

@jzampieron

This comment has been minimized.

Copy link

jzampieron commented Dec 30, 2017

I'm still seeing this problem with v0.11.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment