Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform Creation complete before chef recipe action (docker load) has actually completed #15963

Closed
bmdu1569 opened this issue Aug 30, 2017 · 8 comments
Labels
bug needs-maintainer This code is currently unmaintained. Please submit a PR against our CODEOWNERS to volunteer. provisioner/chef v0.9 Issues (primarily bugs) reported against v0.9 releases

Comments

@bmdu1569
Copy link

Terraform Version

terraform -v
Terraform v0.9.3

Summary

I'm running a chef recipe on an aws instance (provisioned using terraform init) via the chef provisioner resource.
Part of the recipe loads a docker image from a tar file (using docker_image). This actions takes ~5mins to complete when run manually.
The tar fine download is fine.
When terraform gets to the part were we should to docker load from the tar, I see the usual Still creating... output while the docker load is running.
BUT .. EVERY time ... after 4minutes (240s), terraform gives a message Creation complete and exits.
And though terraform has exited, as I'd ssh'ed to the aws instance, I can see the image is still being loaded (docker images).

If I run the same recipe using kitchen on the same aws instance, it runs fine.
It completes the docker load and moves onto the docker run action which is next step in the recipe (which terraform never gets to, as it exited with Creation complete during docker load).

Could this relate to some timeout setting, when there is nothing happening on the ssh connection ? (it always exits after the same time = 240s).

module.app.null_resource.app-provision (chef): * docker_image[s3_image_local] action load
module.app.null_resource.app-provision: Still creating... (5m20s elapsed)
module.app.null_resource.app-provision: Still creating... (5m30s elapsed)
module.app.null_resource.app-provision: Still creating... (5m40s elapsed)
module.app.null_resource.app-provision: Still creating... (5m50s elapsed)
module.app.null_resource.app-provision: Still creating... (6m0s elapsed)
module.app.null_resource.app-provision: Still creating... (6m10s elapsed)
module.app.null_resource.app-provision: Still creating... (6m20s elapsed)
module.app.null_resource.app-provision: Still creating... (6m30s elapsed)
module.app.null_resource.app-provision: Still creating... (6m40s elapsed)
module.app.null_resource.app-provision: Still creating... (6m50s elapsed)
module.app.null_resource.app-provision: Still creating... (7m0s elapsed)
module.app.null_resource.app-provision: Still creating... (7m10s elapsed)
module.app.null_resource.app-provision: Still creating... (7m20s elapsed)
module.app.null_resource.app-provision: Still creating... (7m30s elapsed)
module.app.null_resource.app-provision: Still creating... (7m40s elapsed)
module.app.null_resource.app-provision: Still creating... (7m50s elapsed)
module.app.null_resource.app-provision: Still creating... (8m0s elapsed)
module.app.null_resource.app-provision: Still creating... (8m10s elapsed)
module.app.null_resource.app-provision: Still creating... (8m20s elapsed)
module.app.null_resource.app-provision: Still creating... (8m30s elapsed)
module.app.null_resource.app-provision: Still creating... (8m40s elapsed)
module.app.null_resource.app-provision: Still creating... (8m50s elapsed)
module.app.null_resource.app-provision: Still creating... (9m0s elapsed)
module.app.null_resource.app-provision: Still creating... (9m10s elapsed)
module.app.null_resource.app-provision: Creation complete (ID: 1933641522798379892)

Debug Output

module.app.null_resource.app-provision: Still creating... (9m20s elapsed)
2017/08/30 16:22:23 [DEBUG] dag/walk: vertex "meta.count-boundary (count boundary fixup)", waiting for: "module.app.null_resource.app-provision"
2017/08/30 16:22:28 [DEBUG] dag/walk: vertex "meta.count-boundary (count boundary fixup)", waiting for: "module.app.null_resource.app-provision"
module.app.null_resource.app-provision: Still creating... (9m30s elapsed)
2017/08/30 16:22:33 [DEBUG] dag/walk: vertex "meta.count-boundary (count boundary fixup)", waiting for: "module.app.null_resource.app-provision"
2017/08/30 16:22:38 [DEBUG] dag/walk: vertex "meta.count-boundary (count boundary fixup)", waiting for: "module.app.null_resource.app-provision"
module.app.null_resource.app-provision: Still creating... (9m40s elapsed)
2017/08/30 16:22:43 [DEBUG] dag/walk: vertex "meta.count-boundary (count boundary fixup)", waiting for: "module.app.null_resource.app-provision"
2017/08/30 16:22:46 [DEBUG] plugin: terraform: chef-provisioner (internal) 2017/08/30 16:22:46 remote command exited with '0': sudo chef-client -j "/etc/chef/first-boot.json" -E "dev-us-east-1"
2017/08/30 16:22:46 [DEBUG] root.app: eval: *terraform.EvalIf
2017/08/30 16:22:46 [DEBUG] root.app: eval: *terraform.EvalWriteState
2017/08/30 16:22:46 [DEBUG] root.app: eval: *terraform.EvalWriteDiff
2017/08/30 16:22:46 [DEBUG] root.app: eval: *terraform.EvalApplyPost
2017/08/30 16:22:46 [DEBUG] root.app: eval: *terraform.EvalUpdateStateHook
2017/08/30 16:22:46 [TRACE] Preserving existing state lineage "47f755bb-3cd9-49da-9ce8-d0affbc462cc"
module.app.null_resource.app-provision: Creation complete (ID: 2161705815539838428)

Expected Behavior

Terraform to wait for chef recipe to complete

Actual Behavior

Get Creation complete in terraform before chef recipe has completed.

Steps to Reproduce

Ran terraform init
Created the aws instance successfully
Used chef provisioner to connect and install chef-client and execute runlist
Get Creation complete in terraform before chef recipe has completed.

Important Factoids

@bmdu1569
Copy link
Author

The docker load snippet

docker_image 'image_local' do
tag container['tag']
source "#{node['local_file_tar']}"
read_timeout 900
write_timeout 900
action :load
end

@bmdu1569
Copy link
Author

bmdu1569 commented Aug 30, 2017

Running exactly the same command locally on the aws instance ... sudo chef-client -j "/etc/chef/first-boot.json" -E "dev-us-east-1" ... it runs thro the full recipe ie. docker load, followed by docker run, as expected. It does not exit with '0' during the docker load as when run via chef provisioner in terraform

@bmdu1569
Copy link
Author

bmdu1569 commented Sep 7, 2017

If any more debug needed pls suggest, thx

@bmdu1569
Copy link
Author

bmdu1569 commented Sep 11, 2017

Looks similar to:
https://github.com/hashicorp/terraform/pull/10081/commits/49c7d272a35b8cfbbb2c6231e4d1bd496e59d54b#diff-afd7fc9eabec13c47e82ea17a21d6a97
https://github.com/hashicorp/terraform/issues/15532

@bmdu1569 bmdu1569 changed the title Terraform Creation complete after 4 mins of 'Still creating..' before chef recipe action (docker load) has completed Terraform Creation complete before chef recipe action (docker load) has actually completed Sep 12, 2017
@bmdu1569
Copy link
Author

Hi there - has this been reproduced? Any idea what causes it? thanks

@YakDriver
Copy link
Member

YakDriver commented Jan 10, 2018

I'm seeing a similar problem. When provisioning an AWS instance with a userdata script as part of the setup, I signal the completion of the userdata script by placing a file in a temp directory, like what @calvn suggests here: #4668. (The last line of my userdata powershell script is: New-Item 'C:\Temp\SETUP_COMPLETE_SIGNAL' -type file -force)

I have verified that the userdata script completes successfully and does create the C:\Temp file after about 12-15 minutes. However, after 11 minutes, Terraform says, "Creation complete after 11m26s," and never runs the next provisioner script.

Here is my remote-exec provisioner. The first script should block until the file appears in C:\Temp (signaling that the AWS userdata script completed). But, since Terraform has "finished" with creation by then, it never runs the second script (test.ps1).

provisioner "remote-exec" {
inline = [
"powershell.exe -File C:\\scripts\\block_until_setup.ps1",
"powershell.exe -File C:\\scripts\\test.ps1",
]
}

My Terraform AWS resource connection timeout is timeout = "30m" and the resource create and destroy timeouts are both "120m".
Any thoughts or suggestions?

@hashibot hashibot added the v0.9 Issues (primarily bugs) reported against v0.9 releases label Aug 29, 2019
@pkolyvas pkolyvas added the needs-maintainer This code is currently unmaintained. Please submit a PR against our CODEOWNERS to volunteer. label Apr 14, 2020
@danieldreier
Copy link
Contributor

I'm closing this issue because we announced tool-specific (vendor or 3rd-party) provisioner deprecation in mid-September 2020. Additionally, we added a deprecation notice for tool-specific provisioners in 0.13.4. On a practical level this means we will no longer be reviewing or merging PRs for built-in plugins like the chef provisioner.

The discuss post linked above explains this in more depth, but the basic reason we're making this change is that these vendor provisioners have been extremely challenging for us to maintain, and are a weak spot in the terraform user experience. People reach for them not realizing the bugs and UX limitations, and they're areas that are difficult for us to maintain because of the huge surface area of integrating with a bunch of different tools (Puppet, Chef, Salt, etc) that each require deep domain knowledge to do right. For example, testing each of these against all the versions of those tools, on multiple platforms, is prohibitive, and so we don't - but users have a reasonable expectation that everything in the Terraform Core codebase is well tested. Similarly, it's tough to accept PRs, even for useful improvements, because we don't have anyone on the core team with deep Chef knowledge, and we have not been able to get community volunteers to own PR review for this codebase, so it's a shot in the dark whether a given PR makes things better or worse from the perspective of an experienced Chef + Terraform user.

For the time being, the best option if you want to fix this bug, is to work with the community and build a standalone chef provisioner, fix this in it, and distribute it as a plugin binary, similar to how the ansible provisioner is distributed.

I'm aware of the limitations of this approach, but it's the best option compared to coupling provisioner development to the Terraform Core release lifecycle. We believe the benefit to users of having provisioner development decoupled from core, exceeds the convenience of having these provisioners built in to core. We want to provide a better user experience in the future, and our hope here is that the ability to improve, fix and repair provisioners without us blocking their development, much like providers, will help make a strong case for what's next.

I think it’s also important to highlight that we have no plans to remove the generic provisioners or the pluggable functionality during Terraform's 1.0 lifecycle.

I appreciate your input here to improve Terraform, and am always happy to talk. Please feel free to reach out to me or Petros Kolyvas if you would like to talk more about this change.

@ghost
Copy link

ghost commented Nov 15, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug needs-maintainer This code is currently unmaintained. Please submit a PR against our CODEOWNERS to volunteer. provisioner/chef v0.9 Issues (primarily bugs) reported against v0.9 releases
Projects
None yet
Development

No branches or pull requests

6 participants