Accumulating spot instances that are not attached to ASG's #484

jjones-smug · 2022-03-28T23:37:58Z

Github issue

We upgraded our "live" ASG's to the latest release of the Autospotting code last week after running it without issue in out "test" environment for several days. But after about 12 hours we wound up with a group of unattached spot instances that were launched, running, and tagged like one of the Autospotting enabled ASG's. In other words it looked like spot replacements were provisioned and configured, but never ended up getting attached to the ASG's.

Issue type

Bug Report

Build number

This is a custom build, but we are using an as-is clone of the repo. Meaning there are no local code modifications in place.

Configuration

We're running this as a Lambda, invoked by CW Events every 2 minutes.

Environment

us-east-1
VPC
Anonymized launch configuration: I don't know what this means?

Summary

We upgraded our "live" ASG's to the latest release of the Autospotting code last week after running it without issue in out "test" environment for several days. But after about 12 hours we wound up with a group of unattached spot instances that were launched, running, and tagged like one of the Autospotting enabled ASG's. In other words it looked like spot replacements were provisioned and configured, but never ended up getting attached to the ASG's.

Steps to reproduce

I don't know what specific steps/events are required to cause "stranded spot instances". We've just noticed that they accumulate over time (and are never terminated).

Expected results

N/A

Actual results

What I do have is sample output from CloudWatch logs and CloudTrail API calls that show something I think it unusual. I'll submit those as attachments, but will also describe what I think they show.

One invocation of the Lambda appears to terminated an OD instance after replacing it with a spot instance. That appears to work fine, and our EC2 logs show the OD instance is terminated by AWS. But a subsequent Lambda invocation ~4 minutes later logs that it doesn't like the state of that same instance and tries to terminate it again. That API call fails, since the instance is no longer running.

The text was updated successfully, but these errors were encountered:

jjones-smug · 2022-03-28T23:49:50Z

Samples fields from several TerminateInstanceInAutoScalingGroup API calls. The ones that are failing don't have null "request" data. The ones that work have a request blob with an instance id.

cloudtrail-api-calls-extract.txt

Scrubbed CloudWatch logs from two Lambda invocations.

cloudwatch-logs-extract.txt

cristim · 2022-03-29T18:45:55Z

Thanks for reporting this issue.

I currently only work on issues and feature requests submitted by paying customers.

If you installed AutoSpotting from the marketplace tell me your account number on Slack and I'll look into this.

Otherwise I'm going to leave this open just in case someone else is willing to give it a try and contribute a pull request, which I'm more than happy to consider.

cristim · 2022-05-19T08:27:48Z

I've recently committed a change to the instance replacement logic that should address this.

We're now essentially using the same replacement logic we use for the new instances, without waiting for the grace period anymore with the Spot instance outside the ASG but just swapping instances once every 30min.

Let me know if this addresses your issues.

jjones-smug · 2022-05-24T17:17:09Z

Awesome! Thanks for the update...

cristim · 2022-07-27T12:27:07Z

@jjones-smug did you get the chance to test this out?

cristim closed this as completed May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accumulating spot instances that are not attached to ASG's #484

Accumulating spot instances that are not attached to ASG's #484

jjones-smug commented Mar 28, 2022

jjones-smug commented Mar 28, 2022

cristim commented Mar 29, 2022

cristim commented May 19, 2022

jjones-smug commented May 24, 2022

cristim commented Jul 27, 2022

Accumulating spot instances that are not attached to ASG's #484

Accumulating spot instances that are not attached to ASG's #484

Comments

jjones-smug commented Mar 28, 2022

Github issue

Issue type

Build number

Configuration

Environment

Summary

Steps to reproduce

Expected results

Actual results

jjones-smug commented Mar 28, 2022

cristim commented Mar 29, 2022

cristim commented May 19, 2022

jjones-smug commented May 24, 2022

cristim commented Jul 27, 2022