-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failures due to recoverable AWS errors have returned #1539
Comments
Indeed, I'm seeing these errors too, when running packer build -debug, I've seen these failure modes:
and also, this one, with no detailed error output:
I wonder if it could have to do with the AWS 9/25-9/30 maintenance, unlikely, but possible I suppose. |
This has actually been happening since we upgraded to 0.7.1 from... some older version a few weeks ago. I'll look back at where we upgraded from tomorrow and post here. |
We upgraded from 0.5.1 to 0.7.0 on 12 September and these failures began very shortly thereafter. |
Thanks, we'll take a look back at this. |
We started experience this error recently:
It seems that sometimes SG is not available straight after creation and it's worth checking if it is indeed created. |
With log enabled:
|
"Error waiting for AMI" after successful build:
|
@patricklucas #1533 had a patch intended to solve the problem, but obviously it doesn't work. To my understanding something is wrong with error processing in goamz because no error message reported |
@renat-sabitov-sirca: actually, since I never saw an update to this issue, I didn't realize a fix had been applied. We are still running 0.7.0, so I'll upgrade to 0.7.2 and see if there is improvement. |
@patricklucas To make it clear, I tried that patch and it didn't work. See last logs in #1533 |
👍 We frequently run into Error finding source instance errors as the AWS API is sometimes slow to return recently created resources. I'm surprised more users aren't experiencing this. |
Got a new one today: 'Error modify AMI attributes'.
|
We're seeing this "Error finding source instance" issue as well. @mitchellh is there anything I can do in terms of gathering debug info that would be helpful in diagnosing? |
Relates to hashicorp#1539 AWS is eventually consistent and instance can be not visibile for some time after creation. This fix eliminates describe-instances call before going to the proper wait loop
Seeing this too. I have a Jenkins job that will build about a dozen AMIs concurrently using the FWIW, I've seen this behavior with |
Hi Rich, Try my last two patches to packer and goamz, we stopped seeing errors. Note
|
@renat-sabitov-sirca I saw that and it looks good to me. Just not sure I want to start running a forked version... |
👍 I've seen this a few times lately regarding instances. |
Seeing this several times this week (using v0.7.5). |
We see several times a week too (using v0.7.2). Is there anything that we can do to help fix or identify the problem better? |
Just wanted to add another comment saying that we're seeing this on a very regular basis as well. Let us know if there is any additional information that would make tracking this down any easier. |
By chance have you tried running packer in a different AWS Region? |
@timurb we've seen this in both us-west-1 and us-west-2 |
I've encountered it regularly on ap-southeast-2 as well. |
Ours have been across us-east-1 and us-west-2, haven't executed in other regions. |
We are experiencing this in us-west-1 inside a VPC with v0.7.5. This was also referenced in |
Assuming this is the same issue, we've been seeing this frequently with v0.7.5 in us-west-1:
|
To add some specific frequency data...over the past 5 days, we've triggered jobs in our deployment pipeline 44 times, 11 of which have failed due to this "Error finding source instance" error. I'm hoping someone has a decent workaround soon. |
Getting this error ap-southeast-2 - annoying as its sporadic. |
+1 also seeing this error. We are running a custom build of packer with this branch merged in: #1764. 1764 resolves the errors it addresses, but we have now moved on to the (occasional) error tracked by this issue. |
@mgcleveland Try also this patch: mitchellh/goamz#180 It fixes retry bugs and increases retry counts in goamz, which is the root cause for transient AWS errors |
Relates to hashicorp#1539 AWS is eventually consistent and instance can be not visibile for some time after creation. This fix eliminates describe-instances call before going to the proper wait loop
Yep, getting these, too. It's not just a packer problem, as Amazon spews these pretty frequently in other contexts, but this class of error really should trigger a backoff and retry. Yeah, I know that's easier said than done, since I need to build it into my own tooling… :-P
|
I just got bitten by this too. I'm not sure I'd say this "isn't a packer problem" - it's certainly an issue with either Packer or the underlying library it uses. For better or worse, yeah, 503s aren't uncommon from AWS. It's up to the tool to perform a retry (ideally with exponential backoff) before throwing an error. |
I was hitting timeouts on every run of amazon-ebs even after premaking a security group. Patching goamz with mitchellh/goamz#180 made it work first try. Thanks for the fix, @renat-sabitov-sirca. |
Relates to hashicorp#1539 AWS is eventually consistent and instance can be not visibile for some time after creation. This fix eliminates describe-instances call before going to the proper wait loop
To add some observations to this, working with the really basic 'example.json' from the starter guide: .... emitting .... ==> amazon-ebs: Inspecting the source AMI... ==> Some builds didn't complete successfully and had errors: ==> Builds finished but no artifacts were created. ... hopefully that adds something useful ... I am using Packer v0.7.5 from homebrew on OSX |
+1. I'd love to see this merged and in a new release. |
+1 Any word on when / if the pull req above is going to get merged in? |
And failing is fine, but the script needs to clean up after itself. |
+1 also very keen to hear if there is a plan to fix this |
This should be fixed with our switch to the official AWS library that does this automatically. |
@mitchellh can you add a link to the relevant issue or PR or commit? |
Updated to 0.8.1 and getting an intermittent failure on some of our AMI builds that seem related to this? Fails to create a security group occasionally See logs below
|
@richard-hulm sounds like an exponential back-off should be used for that query in a similar fashion to this original issue with the instance IDs, due to EC2's back-end being eventually consistent. The switch to goamz may not have helped in that regard, but I don't know enough about that code to confirm. |
A work around could be to specify your own security group. From: Richard Hulm <notifications@github.commailto:notifications@github.com> Updated to 0.8.1 and getting an intermittent failure on some of our AMI builds that seem related to this? Fails to create a security group occasionally See logs below 1436361832,,ui,say,amazon-ebs output will be in this color. Reply to this email directly or view it on GitHubhttps://github.com//issues/1539#issuecomment-119584480. |
Most of these were fixed up last fall after I opened #574 and #668, but a number of these transient AWS errors have returned, causing a steady stream of builds to fail.
For example, we are again seeing Packer fail after it can't find the instance immediately after launch:
That one is the most frequent, but we also occasionally see IO timeouts as well:
IIRC, these all used to be covered by retries, so perhaps it is a regression.
The text was updated successfully, but these errors were encountered: