Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to allow skipping of EC2 instance checks #5705

Closed
jurajseffer opened this issue Dec 14, 2017 · 29 comments · Fixed by #5773
Closed

Option to allow skipping of EC2 instance checks #5705

jurajseffer opened this issue Dec 14, 2017 · 29 comments · Fixed by #5773
Labels
builder/amazon stage/thinking Flagged for internal discussions about possible enhancements
Milestone

Comments

@jurajseffer
Copy link

#5678 introduced EC2 checks before attempting to connect via SSH (when using ansible). I've just upgraded from 1.1.1 to 1.1.3 and noticed significant wait before it attempts an SSH connection. It would be nice to be able to skip this and only rely on SSH being ready as in our case it's unnecessary wait time with default being EC2 checks.

@SwampDragons
Copy link
Contributor

SwampDragons commented Dec 14, 2017

Have you done a time comparison to determine whether the build length is actually different? This check prevents Packer from trying to connect until there's a chance of connecting; I have a suspicion that the build time just feels different because the bulk of the waiting has been moved to a different step in the build.

@mwhooker mwhooker added builder/amazon stage/thinking Flagged for internal discussions about possible enhancements labels Dec 14, 2017
@mwhooker
Copy link
Contributor

we should investigate this further. We've discovered that we can't reliably try to connect to an instance until its status checks pass, and I'd rather err on the side of repeatability at the cost of build time.

I could conceivably see adding an option to adjust the "instance ready marker", but it seems like a big change for what seems like a trivial improvement. Would love to see more data about how much of an increase this actually is.

@jurajseffer
Copy link
Author

Hi, I just ran a simple test for the same scenario. It's ~40 seconds with SSH vs ~2m50s with EC2 checks for a t2.medium with CentOS 7 AMI.

ec2 checks (packer 1.1.3):

2017/12/15 09:59:33 ui:     bastion: Instance ID: ***
2017/12/15 09:59:33 ui: ==> bastion: Waiting for instance (***) to become ready...
2017/12/15 10:03:19 packer: 2017/12/15 10:03:19 [INFO] Not using winrm communicator, skipping get password...
2017/12/15 10:03:19 packer: 2017/12/15 10:03:19 [INFO] Waiting for SSH, up to timeout: 5m0s
2017/12/15 10:03:19 ui: ==> bastion: Waiting for SSH to become available...
2017/12/15 10:03:19 packer: 2017/12/15 10:03:19 [INFO] Attempting SSH connection...
2017/12/15 10:03:19 packer: 2017/12/15 10:03:19 reconnecting to TCP connection for SSH
2017/12/15 10:03:19 packer: 2017/12/15 10:03:19 handshaking with SSH
2017/12/15 10:03:20 ui: ==> bastion: Connected to SSH!

ssh wait only (packer 1.1.1):

2017/12/15 10:14:26 ui:     bastion: Instance ID: ***
2017/12/15 10:14:26 ui: ==> bastion: Waiting for instance (***) to become ready...
2017/12/15 10:14:26 packer: 2017/12/15 10:14:26 Waiting for state to become: running
2017/12/15 10:14:26 packer: 2017/12/15 10:14:26 Using 2s as polling delay (change with AWS_POLL_DELAY_SECONDS)
2017/12/15 10:14:26 packer: 2017/12/15 10:14:26 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)
2017/12/15 10:14:34 packer: 2017/12/15 10:14:34 [INFO] Not using winrm communicator, skipping get password...
2017/12/15 10:14:37 packer: 2017/12/15 10:14:37 [INFO] Waiting for SSH, up to timeout: 5m0s
2017/12/15 10:14:37 ui: ==> bastion: Waiting for SSH to become available...
2017/12/15 10:14:49 packer: 2017/12/15 10:14:49 [DEBUG] TCP connection to SSH ip/port failed: dial tcp ***: i/o timeout
2017/12/15 10:15:05 packer: 2017/12/15 10:15:05 [INFO] Attempting SSH connection...
2017/12/15 10:15:05 packer: 2017/12/15 10:15:05 reconnecting to TCP connection for SSH
2017/12/15 10:15:05 packer: 2017/12/15 10:15:05 handshaking with SSH
2017/12/15 10:15:06 packer: 2017/12/15 10:15:06 handshake complete!
2017/12/15 10:15:06 packer: 2017/12/15 10:15:06 [INFO] no local agent socket, will not connect agent
2017/12/15 10:15:08 ui: ==> bastion: Connected to SSH!

This is not as much of a penalty for pipelines but local development where multiple serial runs are needed may be significantly slower. I think my ideal would be to make the ready check type configurable with possible values being EC2 and SSH, EC2 being the default.

@SwampDragons
Copy link
Contributor

That does seem like a fairly significant wait time increase. This is definitely worth chewing on -- is simplicity of the checking code worth this inefficiency?

@roman-cnd
Copy link

roman-cnd commented Dec 15, 2017

+1 for making check types configurable. We saw significant increase builds times (total: 9 min, waiting for becoming ready - 4 min)

@mwhooker
Copy link
Contributor

I'll take a look at making this check configurable

@mwhooker
Copy link
Contributor

trying to decide the best way to do this...

at first blush we could add a config key that swaps between WaitUntilInstanceRunning and WaitUntilInstanceStatusOk. We'd probably want to have the default be WaitUntilInstanceRunning, since that works for the general case. This has the unfortunate side effect of breaking BC for anyone who was depending on the new check to get their build running.

Perhaps the best thing to do would be to force us into WaitUntilInstanceStatusOk if we're not explicitly using WaitUntilInstanceRunning if the user has a c5.x instance type.

@rickard-von-essen
Copy link
Collaborator

rickard-von-essen commented Dec 15, 2017 via email

@onlydole
Copy link

onlydole commented Dec 15, 2017

I am unable to create any packer AMIs in 1.1.3 using Ubuntu 16.04 (latest from Amazon) and am getting timeouts while it's waiting for the instance to become available. I am having success with version 1.1.2, however

@minac
Copy link
Contributor

minac commented Dec 18, 2017

Same as @onlydole. With Packer 1.1.3, using the current (ami-8fd760f6) or previous (ami-785db401) AMIs for Ubuntu 16.04 LTS, the build hangs on "Waiting for instance....".
During my tests the instance is up and running in AWS and I am able to access it from the same container (different TTY) using packer's SSH key written to disk in debug mode.
The only way to make this work is using Packer 1.1.2.

Debug mode enabled. Builds will not be parallelized.
amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name: dev-2017-12-18-12-22-02Z
==> amazon-ebs: Pausing after run of step 'StepPreValidate'. Press enter to continue.
    amazon-ebs: Found Image ID: ami-785db401
==> amazon-ebs: Pausing after run of step 'StepSourceAMIInfo'. Press enter to continue.
==> amazon-ebs: Creating temporary keypair: packerKeyPair1513599722
    amazon-ebs: Saving key for debug purposes: ec2_amazon-ebs.pem
==> amazon-ebs: Pausing after run of step 'StepKeyPair'. Press enter to continue.
==> amazon-ebs: Creating temporary security group for this instance: packer_5a37b307-5b52-8a53-bf98-429bae645303
==> amazon-ebs: Authorizing access to port 22 from 0.0.0.0/0 in the temporary security group...
==> amazon-ebs: Pausing after run of step 'StepSecurityGroup'. Press enter to continue.
==> amazon-ebs: Pausing after run of step 'stepCleanupVolumes'. Press enter to continue.
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
    amazon-ebs: Adding tag: "type": "packer"
    amazon-ebs: Adding tag: "Name": "Temporary Packer Dev server"
    amazon-ebs: Instance ID: i-01190fda7886d6efd
==> amazon-ebs: Waiting for instance (i-01190fda7886d6efd) to become ready...
==> amazon-ebs: Error waiting for instance (i-01190fda7886d6efd) to become ready: ResourceNotReady: exceeded wait attempts
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Pausing before cleanup of step 'stepCleanupVolumes'. Press enter to continue. ==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Pausing before cleanup of step 'StepSecurityGroup'. Press enter to continue. ==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Pausing before cleanup of step 'StepKeyPair'. Press enter to continue. ==> amazon-ebs: Deleting temporary keypair...
==> amazon-ebs: Pausing before cleanup of step 'StepSourceAMIInfo'. Press enter to continue. ==> amazon-ebs: Pausing before cleanup of step 'StepPreValidate'. Press enter to continue. Build 'amazon-ebs' errored: Error waiting for instance (i-01190fda7886d6efd) to become ready: ResourceNotReady: exceeded wait attempts

==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: Error waiting for instance (i-01190fda7886d6efd) to become ready: ResourceNotReady: exceeded wait attempts```

@lenfree
Copy link

lenfree commented Dec 18, 2017

I bumped into this same issue couple of days ago and I realised it was due to missing ec2:DescribeInstanceStatus permission. Not sure if this is what you guys are experiencing though.

@minac
Copy link
Contributor

minac commented Dec 18, 2017

@lenfree That is not the problem for me, I also tested with a different set of permissions to see if that was it.
Unfortunately, even with the PACKER_LOG set to 1 this waiting steps says nothing about what is (not) going on.

@elisiano
Copy link

Adding myself to the list of people who cannot generate an AMI anymore since 1.1.3 (1.1.2 works fine)

@elisiano
Copy link

Thank you @lenfree, adding ec2:DescribeInstanceStatus solved my issue!

@mwhooker
Copy link
Contributor

There's an issue where we realized we didn't update the docs about the permissions (#5694), and I want to apologize here about our poor communication around that. We didn't notice at the time that we were introducing another permission requirement, and I'm going to see about writing a test to make sure it doesn't happen again.

I think most of the issues with the new version are related to permissions, while at the same time I recognize that the added time is undesirable. Still working towards a good solution for the next release

@minac
Copy link
Contributor

minac commented Dec 19, 2017 via email

@mwhooker
Copy link
Contributor

Okay, thanks, I thought I saw an issue where the permissions were in place, but couldn't find it off hand

@minac
Copy link
Contributor

minac commented Dec 19, 2017 via email

@mwhooker mwhooker added this to the v1.1.4 milestone Dec 20, 2017
@orirawlings
Copy link

orirawlings commented Dec 28, 2017

I recently came across this as well, but it seems like I may have encountered some racy behavior where packer proceeded to run the provisioners and publish the AMI even though it faced an error waiting for the EC2 instance to become ready:

amazon-ebs output will be in this color.

==> amazon-ebs: Force Deregister flag found, skipping prevalidating AMI Name
    amazon-ebs: Found Image ID: ami-acd005d5
==> amazon-ebs: Creating temporary keypair: packer_5a455356-e2c5-0538-8d25-019c9e8bc7e7
==> amazon-ebs: Creating temporary security group for this instance: packer_5a455358-39a4-641f-68d7-5213e892bec3
==> amazon-ebs: Authorizing access to port 22 from 0.0.0.0/0 in the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
    amazon-ebs: Adding tag: "Name": "Packer Builder"
    amazon-ebs: Instance ID: i-0cb5fb919dbdbb9ff
==> amazon-ebs: Waiting for instance (i-0cb5fb919dbdbb9ff) to become ready...
==> amazon-ebs: Error waiting for instance (i-0159f003ec5edf03a) to become ready: ResourceNotReady: exceeded wait attempts
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Error waiting for instance (i-0159f003ec5edf03a) to become ready: ResourceNotReady: exceeded wait attempts
Cleanly cancelled builds after being interrupted.
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Provisioning with shell script: /var/folders/__/1bnpyd4d7t31wkxd9vmk2sz9985fpv/T/packer-shell527809468
    amazon-ebs: var []
==> amazon-ebs: Stopping the source instance...
    amazon-ebs: Stopping instance, attempt 1
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating the AMI: orawlings packer-example
    amazon-ebs: AMI: ami-6ad75f13
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
eu-west-1: ami-6ad75f13

It is quite confusing to see that it is starting to delete things and clean up while it simultaneously continues to provision the instance.

Here is the corresponding template.

{
  "builders": [{
    "type": "amazon-ebs",
    "source_ami": "ami-acd005d5",
    "instance_type": "t2.micro",
    "ssh_username": "ec2-user",
    "ami_name": "orawlings packer-example",
    "force_deregister": true,
    "force_delete_snapshot": true,
    "shutdown_behavior": "terminate"
  }],
  "provisioners": [
    {
      "type": "shell",
      "inline": [
        "echo var [{{user `variable`}}]"
      ]
    }
  ]
}

EDIT: It is possible that this could be output from a previous packer invocation that I interrupted with ^C interspersed with output from the invocation that encountered the ResourceNotReady error.

@roman-cnd
Copy link

I'm using the previous version while this is being fixed, it works without any issues.

@cknowles
Copy link

cknowles commented Jan 2, 2018

@lenfree's permission fix above also fixed this for us. Could the packer logs indicate in more detail if the correct permissions are not present? There should be an AWS SDK error being receive but seems like it is being swallowed somewhere.

@et304383
Copy link

et304383 commented Jan 3, 2018

The wait time for status checks is excruciating and almost never warranted once SSH is available. Please revert or provide a way to disable this unnecessary wait.

@tmaher
Copy link

tmaher commented Jan 3, 2018

The new, slow “host is ready” check is only necessary for the new KVM-based C5 instances, correct? Could it be only used there?

I use aws-vault (unrelated to hashicorp vault) with role assumption to manage credentials locally. An extra 2-3 minutes might not seem like a lot, but given AWS is “eventually consistent”, I keep running into different time-outs, each requiring their own artisinally crafted knob to increase.

@mwhooker
Copy link
Contributor

mwhooker commented Jan 8, 2018

There's a PR that should solve this in #5773 that I would appreciate some eyes on.

@et304383
Copy link

When is 1.1.4 going to land? This wait time is driving me nuts, but 1.1.2 has its own bugs I'd rather not deal with.

@SwampDragons
Copy link
Contributor

We're gonna skip 1.1.4 and go straight to 1.2.0 for the next release; we're going to try to get it out in the next two weeks. Sorry for the long lead time between releases this time.

@SwampDragons SwampDragons modified the milestones: v1.1.4, v1.2.0 Jan 30, 2018
sihil added a commit to guardian/amigo that referenced this issue Feb 7, 2018
I've not used 1.1.3 here as it is slower. See hashicorp/packer#5705
@diegofduarte
Copy link

diegofduarte commented Jul 9, 2018

Just reporting since this stills open: The ec2:DescribeInstanceStatus saved my life here too... But I couldn't find anything regarding this in the docs.

Hey guys, isn't there a way to make the Packer errors a little bit more meaningful in this case? I just saw something like "a error happened, we're killing your job" instead of a access denied or something.

Thanks

@SwampDragons
Copy link
Contributor

This issue has been closed for a while; Can you open a new issue with your problem please?

@ghost
Copy link

ghost commented Mar 31, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@hashicorp hashicorp locked and limited conversation to collaborators Mar 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
builder/amazon stage/thinking Flagged for internal discussions about possible enhancements
Projects
None yet
Development

Successfully merging a pull request may close this issue.