feat: add eks_scale_node_group playbook action (#223)#2089
feat: add eks_scale_node_group playbook action (#223)#2089gerardrecinto wants to merge 4 commits into
Conversation
Adds a new playbook action that increases the maxSize of an EKS managed node group via boto3. Designed as a remediation step when the cluster autoscaler cannot provision nodes because the node group has reached its configured maximum. - EksNodeGroupParams: cluster_name, region, node_group_name, new_max_size, optional explicit AWS credentials (falls back to instance role/env) - Guards against no-op updates (new_max_size <= current maxSize) - Raises ActionException on AWS ClientError for describe or update calls - Preserves existing minSize and desiredSize during the update - Adds 6 pytest unit tests covering success, no-op, and error paths Resolves robusta-dev#223
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR adds a new EKS remediation action ChangesEKS Node Group Scaling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
playbooks/robusta_playbooks/aws_node_group_actions.py (1)
61-65: ⚡ Quick winPreserve original exception context when re-raising.
Use explicit exception chaining in both
except ClientError as eblocks (raise ... from e) so the root AWS error remains visible in tracebacks.Proposed patch
except ClientError as e: raise ActionException( ErrorCodes.ACTION_UNEXPECTED_ERROR, f"Failed to describe node group '{params.node_group_name}' " f"in cluster '{params.cluster_name}': {e}", - ) + ) from e @@ except ClientError as e: raise ActionException( ErrorCodes.ACTION_UNEXPECTED_ERROR, f"Failed to update node group '{params.node_group_name}': {e}", - ) + ) from eAlso applies to: 94-97
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@playbooks/robusta_playbooks/aws_node_group_actions.py` around lines 61 - 65, The except ClientError as e handlers that raise ActionException should preserve the original exception context by using explicit exception chaining; locate the raise statements that construct ActionException with ErrorCodes.ACTION_UNEXPECTED_ERROR (the blocks referencing params.node_group_name and params.cluster_name and the later similar block at lines 94-97) and change the re-raise to use "raise ActionException(... ) from e" so the original AWS ClientError is retained in the traceback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@playbooks/robusta_playbooks/aws_node_group_actions.py`:
- Around line 61-65: The except ClientError as e handlers that raise
ActionException should preserve the original exception context by using explicit
exception chaining; locate the raise statements that construct ActionException
with ErrorCodes.ACTION_UNEXPECTED_ERROR (the blocks referencing
params.node_group_name and params.cluster_name and the later similar block at
lines 94-97) and change the re-raise to use "raise ActionException(... ) from e"
so the original AWS ClientError is retained in the traceback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd5c2ac7-227d-4783-8f36-07774d6f876d
📒 Files selected for processing (2)
playbooks/robusta_playbooks/aws_node_group_actions.pytests/test_aws_node_group_actions.py
Preserves original ClientError in tracebacks when re-raising as ActionException, per CodeRabbit review on PR robusta-dev#2089.
Summary
Closes #223
Adds
eks_scale_node_group— a playbook action that increases themaxSizeof an EKS managed node group. Intended as a remediation step when the cluster autoscaler is blocked because the node group has reached its configured maximum.New files:
playbooks/robusta_playbooks/aws_node_group_actions.py— action + params modeltests/test_aws_node_group_actions.py— 6 pytest unit testsHow it works
The action calls
eks:DescribeNodegroupto read the current scaling config, then callseks:UpdateNodegroupConfigto raisemaxSize.minSizeanddesiredSizeare left unchanged.Params:
cluster_nameregionus-east-1)node_group_namenew_max_sizeaws_access_key_idaws_secret_access_keyExample playbook config:
Required IAM permissions:
eks:DescribeNodegroup,eks:UpdateNodegroupConfigTest plan
test_scale_up_succeeds— verifiesupdate_nodegroup_configcalled with correct args and finding emittedtest_no_op_when_new_max_not_larger— new_max_size < current max → no update, enrichment message returnedtest_no_op_when_new_max_is_equal— new_max_size == current max → no updatetest_raises_on_describe_failure—ClientErroron describe →ActionExceptionraised, update never calledtest_raises_on_update_failure—ClientErroron update →ActionExceptionraisedtest_boto_client_uses_explicit_credentials— explicit key/secret passed through to boto3