-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accumulating spot instances that are not attached to ASG's #484
Comments
Samples fields from several TerminateInstanceInAutoScalingGroup API calls. The ones that are failing don't have null "request" data. The ones that work have a request blob with an instance id. cloudtrail-api-calls-extract.txt Scrubbed CloudWatch logs from two Lambda invocations. |
Thanks for reporting this issue. I currently only work on issues and feature requests submitted by paying customers. If you installed AutoSpotting from the marketplace tell me your account number on Slack and I'll look into this. Otherwise I'm going to leave this open just in case someone else is willing to give it a try and contribute a pull request, which I'm more than happy to consider. |
I've recently committed a change to the instance replacement logic that should address this. We're now essentially using the same replacement logic we use for the new instances, without waiting for the grace period anymore with the Spot instance outside the ASG but just swapping instances once every 30min. Let me know if this addresses your issues. |
Awesome! Thanks for the update... |
@jjones-smug did you get the chance to test this out? |
Github issue
We upgraded our "live" ASG's to the latest release of the Autospotting code last week after running it without issue in out "test" environment for several days. But after about 12 hours we wound up with a group of unattached spot instances that were launched, running, and tagged like one of the Autospotting enabled ASG's. In other words it looked like spot replacements were provisioned and configured, but never ended up getting attached to the ASG's.
Issue type
Build number
This is a custom build, but we are using an as-is clone of the repo. Meaning there are no local code modifications in place.
Configuration
We're running this as a Lambda, invoked by CW Events every 2 minutes.
Environment
Summary
We upgraded our "live" ASG's to the latest release of the Autospotting code last week after running it without issue in out "test" environment for several days. But after about 12 hours we wound up with a group of unattached spot instances that were launched, running, and tagged like one of the Autospotting enabled ASG's. In other words it looked like spot replacements were provisioned and configured, but never ended up getting attached to the ASG's.
Steps to reproduce
I don't know what specific steps/events are required to cause "stranded spot instances". We've just noticed that they accumulate over time (and are never terminated).
Expected results
N/A
Actual results
What I do have is sample output from CloudWatch logs and CloudTrail API calls that show something I think it unusual. I'll submit those as attachments, but will also describe what I think they show.
One invocation of the Lambda appears to terminated an OD instance after replacing it with a spot instance. That appears to work fine, and our EC2 logs show the OD instance is terminated by AWS. But a subsequent Lambda invocation ~4 minutes later logs that it doesn't like the state of that same instance and tries to terminate it again. That API call fails, since the instance is no longer running.
The text was updated successfully, but these errors were encountered: