Skip to content

Conversation

@andrew-nowak
Copy link
Member

@andrew-nowak andrew-nowak commented Oct 6, 2025

Allows rotating nodes between two autoscaling groups, to cater for the occasion where the cluster needs relocating. It's possible to do the migration manually, but if we've got an automated system that does 90% of what we need, why not make use of it for the final 10% too?

It works by checking for an autoscaling group with the tag gu:riffraff:new-asg - if found, presume that the node should be rotated into that asg instead of whichever one it's currently located in. The only major difference here is that if they are in different asgs, we create new instances by increasing the desired capacity in that asg, rather than detaching the instance to be rotated out. (If we are not rotating between ASGs, we continue with the existing behaviour)

@andrew-nowak andrew-nowak force-pushed the an/rotate-between-asgs branch 2 times, most recently from 89116a9 to e15c259 Compare October 8, 2025 14:04
@andrew-nowak andrew-nowak marked this pull request as ready for review October 8, 2025 16:22
@andrew-nowak andrew-nowak requested a review from a team as a code owner October 8, 2025 16:22
Copy link

@SHession SHession left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

I don't know this codebase at all! I have also not tested these changes. This +1 is not legally binding.

I've reviewed the code changes and they all seem straightforward and sensible.

I suppose this wouldn't work so well with a setup already featuring multiple ASGs. It may be worth mentioning this in the README.

One other thoughts, is the gu:riffraff:new-asg and existing tag? Is there an argument for making it gu:elasticsearch:new-asg ?

@andrew-nowak
Copy link
Member Author

Yep good point about anyone running with multiple ASGs normally, I'll mention that (though as long as at most one ASG was tagged as a "new ASG" I think I'd expect it to still work for that use case, but I haven't and wouldn't be able to test that)

I've selected that tag as it's the one that is expected by riff-raff for the same scenario - but I should mention that too.

@SHession
Copy link

SHession commented Oct 9, 2025

Yep good point about anyone running with multiple ASGs normally, I'll mention that (though as long as at most one ASG was tagged as a "new ASG" I think I'd expect it to still work for that use case, but I haven't and wouldn't be able to test that)

The problem I was anticipating (with Ophan in mind) is that, if your ES cluster has multiple ASGs, you would presumably want to migrate each of those ASGs to a new one. IIRC the rotation lambda will simply picked the oldest node in the cluster regardless of ASG, so you could end up with nodes being moved into an inappropriate ASG e.g. A warm node moving to a Cold node ASG.

@andrew-nowak andrew-nowak force-pushed the an/rotate-between-asgs branch 2 times, most recently from d5e120b to 472a377 Compare October 9, 2025 13:51
@andrew-nowak
Copy link
Member Author

chatted offline with @SHession which helped me realise that I wasn't properly validating that the discovered "new" ASG was the updated equivalent of the "old" ASG, which the target node had previously been a member of. I've added that validation, and I've also added a suite of tests for the logic in getTargetNode (AI generated and human curated/corrected)

@andrew-nowak andrew-nowak requested a review from Copilot October 9, 2025 14:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables automated node rotation between different autoscaling groups (ASGs) by introducing logic to detect and use a new target ASG tagged with gu:riffraff:new-asg. This allows for automated cluster migration when nodes need to be relocated to a different ASG.

  • Adds support for cross-ASG node rotation using the gu:riffraff:new-asg tag
  • Updates the data flow to use destinationAsgName instead of asgName to distinguish between source and destination ASGs
  • Implements new instance creation logic that scales up the destination ASG when rotating between different ASGs

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/getTargetNode.test.ts Comprehensive test coverage for new cross-ASG rotation functionality
src/utils/handlerInputs.ts Updates interface to use destinationAsgName with documentation
src/getTargetNode.ts Core logic for detecting new ASG and matching tags between source and destination
src/reattachTargetInstance.ts Skip reattachment when rotating between different ASGs
src/elasticsearch/types.ts Type annotation fix for constructor parameter
src/clusterSizeCheck.ts Updated to use destinationAsgName
src/aws/autoscaling.ts New function for launching instances with capacity scaling logic
src/autoScalingGroupCheck.ts Updated to use destinationAsgName
src/addNode.ts Updated function call and property names
cloudformation.yaml Added required IAM permission for SetDesiredCapacity
README.md Documentation for the new cross-ASG rotation feature

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@twrichards twrichards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work - the tests look great too!

@andrew-nowak andrew-nowak force-pushed the an/rotate-between-asgs branch from 0be65b2 to 00c7b7a Compare October 14, 2025 11:43
@andrew-nowak andrew-nowak merged commit ca23f0b into main Oct 14, 2025
3 checks passed
@andrew-nowak andrew-nowak deleted the an/rotate-between-asgs branch October 14, 2025 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants