providers/aws: rework instance block devices #1045

phinze · 2015-02-25T01:51:27Z

Instance block devices are now managed by three distinct sub-resources:

root_block_device - introduced previously
ebs_block_device - all additional ebs-backed volumes
ephemeral_block_device - instance store / ephemeral devices

The AWS API support around BlockDeviceMapping is pretty confusing. It's
a single collection type that supports these three members each of which
has different fields and different behavior.

My biggest hiccup came from the fact that Instance Store volumes do not
show up in any response BlockDeviceMapping for any EC2 Describe* AWS API
calls. They're only available from the instance meta-data service as
queried from inside the node.

Also note this removes block_device altogether for a clean break from
old configs. New configs will need to sort their block_device
declarations into the three new types.

Fixes #858

phinze · 2015-02-25T01:51:44Z

Need to follow on with docs, but this is ready for code review I think.

mitchellh · 2015-02-25T16:45:52Z

builtin/providers/aws/resource_aws_instance.go

+	if d.HasChange("source_dest_check") {
+		opts.SetSourceDestCheck = true
+		opts.SourceDestCheck = d.Get("source_dest_check").(bool)
+		modify = true
 	}


Does this accidentally revert the set/unset case?

Yeah bad rebase there. Fixed locally at EOD but didn't push. On it!

mitchellh · 2015-02-25T16:46:56Z

A couple questions but LGTM

pmoust · 2015-02-25T17:10:35Z

Currently we attach ephemeral block devices that come in as a pack offered by AWS in certain instance types i.e. c3.2xlarge (they offer 2 gp2 type ephemeral_block_devices). The device mapping/mountpoint-set is done via cloud-init as follows:

resize_rootfs: true
mounts:
  - [ ephemeral0, /var/www, ext4, "defaults" ]
  - [ ephemeral1, /var/log, ext4, "defaults" ]

I would prefer if this was done via terraform of course.

Those ephemeral volumes are not are not in state, whereas root_block_device is i.e.

                "aws_instance.pph_webserver.1": {
                    "type": "aws_instance",
                    "depends_on": [
                        "aws_security_group.pph_admins",
                        "aws_security_group.pph_allow_elb",
                        "aws_security_group.pph_allow_jarvis",
                        "aws_security_group.pph_allow_vpn",
                        "aws_security_group.pph_webserver",
                        "aws_subnet.pph"
                    ],
                    "primary": {
                        "id": "i-587df0a2",
                        "attributes": {
                            "ami": "ami-08f8b160",
                            "associate_public_ip_address": "true",
                            "availability_zone": "us-east-1b",
                            "block_device.#": "0",
                            "id": "i-587df0a2",
                            "instance_type": "c3.2xlarge",
                            "key_name": "",
                            "private_dns": "<snipped>",
                            "private_ip": "10.30.1.227",
                            "public_dns": "<snipped>",
                            "public_ip": "<snipped>",
                            "root_block_device.#": "1",
                            "root_block_device.0.delete_on_termination": "true",
                            "root_block_device.0.device_name": "/dev/sda1",
                            "root_block_device.0.volume_size": "8",
                            "root_block_device.0.volume_type": "standard",
                            "security_groups.#": "5",
                            "security_groups.1610004074": "sg-d1b443b5",
                            "security_groups.1886755183": "sg-d9b443bd",
                            "security_groups.2391182664": "sg-d0b443b4",
                            "security_groups.390598185": "sg-dab443be",
                            "security_groups.3985752899": "sg-b4b443d0",
                            "subnet_id": "subnet-13b90264",
                            "tags.#": "1",
                            "tags.Name": "web01",
                            "tenancy": "default",
                            "user_data": "56901ddbd9cd1ba3940b0c3210f62abd54e340a1"
                        }
                    }
                },

Perhaps those type of cases should be handled in a manner similar to root_block_device and not perform any calls on AWS API to list their state after creation but keep them in state. After they are added the should be only linked to the aws_instance entity that summoned them and that's that.
I might be off, but I think plan/refresh rechecks the status of aforementioned ephemeral_block_devices?

phinze · 2015-02-25T18:52:17Z

Perhaps those type of cases should be handled in a manner similar to root_block_device and not perform any calls on AWS API to list their state after creation but keep them in state. After they are added the should be only linked to the aws_instance entity that summoned them and that's that.

That is definitely the intent in this PR.

I might be off, but I think plan/refresh rechecks the status of aforementioned ephemeral_block_devices?

In the guard here:

https://github.com/hashicorp/terraform/blob/f-block-devices/builtin/providers/aws/resource_aws_instance.go#L582-L585

I'm attempting to only set the ephemeral_block_device details in the state once, and only if they weren't already set in the config.

cc @mitchellh for a double check of the semantics here and because we're talking about changing GetOk soon, though i think this particular call would still work even with the changes we're considering.

My desired behavior for ephemeral_block_device is:

On create, if config provided, that gets written to the state
On create, if no config provided, we read from AMI details once for the state info
Plan, refresh, etc, should never touch/update the value in the state
Any update to the field in the config results in a ForceNew destroy/create, which just reapplies these same rules to a fresh instance

I'm not sure if that's what I've achieved, but that's what I was going for.

mitchellh · 2015-02-25T21:24:00Z

@phinze I may be missing something but it looks like nothing in seenBlockDeviceNames is ever set.

Can you explain what that is guarding against? It isn't clear to me.

phinze · 2015-02-25T23:43:19Z

I may be missing something but it looks like nothing in seenBlockDeviceNames is ever set.

Can you explain what that is guarding against? It isn't clear to me.

Sure - so the rules for ephemeral device assignment are as follows:

If no runtime block device mapping (BDM), use BDM from AMI
If runtime BDM includes ephemeral devices, use that and ignore AMI's BDM
If runtime BDM does not include ephemeral devices, use AMI's BDM except when device names collide, in which case the runtime BDM trumps the AMI.

So seenDeviceNames is needed to track that last caveat.

We initialize a map[string]bool, record device names we see, grab the list out from the helper function's return value, then ignore devices we've already seen.

It's entirely possible I'm missing something though! Let me know if this makes sense or if there's a way for me to make the code clearer.

mitchellh · 2015-02-27T07:53:17Z

@phinze Got it got it. This LGTM then.

I recommend just making sure we have good acceptance test coverage here and I think we do.

phinze · 2015-02-27T12:48:12Z

After talking with @sparkprime in IRC about this (and related changes he'd like to make in the Google Cloud provider), I don't think it's a good idea to merge this without state migration support in core.

Without state migration, UX around terraform 0.4 upgrade looks like:

Upgrade to terraform 0.4
Get config errors about block_device not being valid config
Read docs about breaking changes with instructions on how to update config
Make appropriate config updates
Watch as Terraform requires that you destroy and recreate every instance in your infrastructure 😲

I suppose we could add docs around manual state munging to prevent a massive rebuild from happening, but for my money this is a core feature we're going to need over and over again, so it's better to get it done sooner than later.

pmoust · 2015-02-27T13:49:21Z

@phinze or, add a terraform version node in the JSON of tfstate that indicates which TF version built said state. If major and/or minor versions have mismatch with current terraform version that is executing a plan, then stop and prompt the user to check the changelog. That is if we keep a type of SemVer.
-f would allow performing a plan despite version mismatch.

When you changed added root_block_device our plan marked all our ec2 instaces to be recreated, thus removing them from plan. We have the state under revision control, so we stashed that state change, checked the docs and applied the new DSL change. New plan did not require recreation.

phinze · 2015-02-27T13:59:43Z

When you changed added root_block_device our plan marked all our ec2 instaces to be recreated, thus removing them from plan. We have the state under revision control, so we stashed that state change, checked the docs and applied the new DSL change. New plan did not require recreation.

Thanks for providing this real-world context! Can you elaborate a little more: in order to get back to a "clean plan" did you need to partially modify the state? I can't tell if "stashed that state change" refers to the whole file or just parts of it.

The thing I'd like to avoid is an upgrade story that requires manually munging the statefile.

pmoust · 2015-02-27T14:07:56Z

@phinze by "We have the state under revision control, so we stashed that state change" I only mean I 've just issued git checkout -- terraform.tfstate after the breaking change removed our instances from plan. Changed to root_block_device where applicable, planned again and there was no need to change as the refresh matched the instructions of the .tf.

Of course this applies to our case specifically, one might not keep a state in VCS but use other provided ways of storing it I suppose.

If you can keep everything BC that would be great, but some stuff just add more complexity like this. Your teams engineering skills beat mine by miles though, and if a state migrator is something that is easily added then disregard my input. But I do believe that in some cases a proper state migration is easier said than done.

phinze · 2015-03-05T16:24:58Z

This is chilling until core gets support for deprecation, a feature that I'm starting today!

phinze · 2015-03-13T21:57:56Z

I'm close I promise. 😀

Acceptance tests are passing, just need to get state migration finished up and this can land... stay tuned! 📺

phinze · 2015-03-17T17:28:16Z

Still sussing out a few remaining issues with ephemeral devices in the upgrade case. I haven't forgotten about you, oh block devices! I'll follow up at EOD today with an update.

We were previously only recording the schema version on refresh. This caused the state to be incorrectly written after a `terraform apply` causing subsequent commands to run the state through an unnecessary migration.

phinze · 2015-03-19T01:07:04Z

😅 😅 😅 😅 😅 FINALLY I THINK IT MIGHT BE DONE 😅 😅 😅 😅 😅

@mitchellh @catsby give me some 👀

I'll make a few inline comments pointing out some details.

phinze · 2015-03-19T01:08:46Z

builtin/providers/aws/resource_aws_instance.go

+				Type:     schema.TypeMap,
+				Optional: true,
+				Removed:  "Split out into three sub-types; see Changelog and Docs",
+			},


First use of Removed - intended UX here for user that's upgraded:

Receive this message

Update config

Catch state migration on next plan or apply which, if all goes well, should result in an empty plan

@phinze Interesting idea: make a web page (unlinked from docs) dedicated to explaining the change and how to migrate, and just link to that in this message. i.e. /docs/providers/aws/changes/instance_1.html (resource type + version)

Ooo I like this idea. I'll follow up with it in another PR.

phinze · 2015-03-19T01:11:54Z

helper/schema/resource.go

@@ -151,7 +151,7 @@ func (r *Resource) Apply(
 		err = r.Update(data, meta)
 	}

-	return data.State(), err
+	return r.recordCurrentSchemaVersion(data.State()), err


This was the bug that wasted me a nice little bit of time. 😭 See the first commit's msg for details.

mitchellh · 2015-03-19T08:48:17Z

LGTM! I made a few comments but none that really require any action.

Instance block devices are now managed by three distinct sub-resources: * `root_block_device` - introduced previously * `ebs_block_device` - all additional ebs-backed volumes * `ephemeral_block_device` - instance store / ephemeral devices The AWS API support around BlockDeviceMapping is pretty confusing. It's a single collection type that supports these three members each of which has different fields and different behavior. My biggest hiccup came from the fact that Instance Store volumes do not show up in any response BlockDeviceMapping for any EC2 `Describe*` API calls. They're only available from the instance meta-data service as queried from inside the node. This removes `block_device` altogether for a clean break from old configs. New configs will need to sort their `block_device` declarations into the three new types. The field has been marked `Removed` to indicate this to users. With the new block device format being introduced, we need to ensure Terraform is able to properly read statefiles written in the old format. So we use the new `helper/schema` facility of "state migrations" to transform statefiles in the old format to something that the current version of the schema can use. Fixes #858

phinze · 2015-03-19T14:09:34Z

providers/aws: rework instance block devices

catsby · 2015-03-19T14:15:41Z

builtin/providers/aws/resource_aws_instance.go

-		blockDevices = append(blockDevices, rootBlockDevices...)
+		// if err := d.Set("ephemeral_block_device", vL); err != nil {
+		// 	return err
+		// }


commented out code

Hah whoops I was too excited to land. Fixing right on master.

ghost · 2020-05-04T01:55:16Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

mitchellh reviewed Feb 25, 2015
View reviewed changes

phinze force-pushed the f-block-devices branch from b1cbad0 to 967b689 Compare February 25, 2015 18:39

mitchellh mentioned this pull request Mar 2, 2015

AWS block device seems to be missing a parameter #1085

Closed

phinze mentioned this pull request Mar 6, 2015

provider/aws: Convert AWS Instance to aws-sdk-go #1147

Merged

phinze force-pushed the f-block-devices branch 4 times, most recently from 00acd9b to 9c5bbe6 Compare March 13, 2015 21:01

phinze mentioned this pull request Mar 18, 2015

Adding a root_block_device forces resource refresh in AWS #1204

Closed

helper/schema: record schema version on apply

3ba8ed5

We were previously only recording the schema version on refresh. This caused the state to be incorrectly written after a `terraform apply` causing subsequent commands to run the state through an unnecessary migration.

phinze force-pushed the f-block-devices branch from 9c5bbe6 to 34b6eef Compare March 19, 2015 01:00

phinze reviewed Mar 19, 2015
View reviewed changes

phinze force-pushed the f-block-devices branch from 34b6eef to 2b23c40 Compare March 19, 2015 14:03

phinze added a commit that referenced this pull request Mar 19, 2015

Merge pull request #1045 from hashicorp/f-block-devices

46b6307

providers/aws: rework instance block devices

phinze merged commit 46b6307 into master Mar 19, 2015

phinze deleted the f-block-devices branch March 19, 2015 14:09

catsby reviewed Mar 19, 2015
View reviewed changes

phinze mentioned this pull request Mar 30, 2015

Block Device Support for Autoscaling Launch Configurations #1271

Closed

radeksimko mentioned this pull request Apr 5, 2018

resource/aws_launch_configuration: Add support for ebs_block_device.*.no_device hashicorp/terraform-provider-aws#4070

Merged

ghost locked and limited conversation to collaborators May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

providers/aws: rework instance block devices #1045

providers/aws: rework instance block devices #1045

phinze commented Feb 25, 2015

phinze commented Feb 25, 2015

mitchellh Feb 25, 2015

phinze Feb 25, 2015

mitchellh commented Feb 25, 2015

pmoust commented Feb 25, 2015

phinze commented Feb 25, 2015

mitchellh commented Feb 25, 2015

phinze commented Feb 25, 2015

mitchellh commented Feb 27, 2015

phinze commented Feb 27, 2015

pmoust commented Feb 27, 2015

phinze commented Feb 27, 2015

pmoust commented Feb 27, 2015

phinze commented Mar 5, 2015

phinze commented Mar 13, 2015

phinze commented Mar 17, 2015

phinze commented Mar 19, 2015

phinze Mar 19, 2015

mitchellh Mar 19, 2015

phinze Mar 19, 2015

phinze Mar 19, 2015

mitchellh commented Mar 19, 2015

phinze commented Mar 19, 2015

catsby Mar 19, 2015

phinze Mar 19, 2015

ghost commented May 4, 2020

providers/aws: rework instance block devices #1045

providers/aws: rework instance block devices #1045

Conversation

phinze commented Feb 25, 2015

phinze commented Feb 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitchellh commented Feb 25, 2015

pmoust commented Feb 25, 2015

phinze commented Feb 25, 2015

mitchellh commented Feb 25, 2015

phinze commented Feb 25, 2015

mitchellh commented Feb 27, 2015

phinze commented Feb 27, 2015

pmoust commented Feb 27, 2015

phinze commented Feb 27, 2015

pmoust commented Feb 27, 2015

phinze commented Mar 5, 2015

phinze commented Mar 13, 2015

phinze commented Mar 17, 2015

phinze commented Mar 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitchellh commented Mar 19, 2015

phinze commented Mar 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented May 4, 2020