CNTRLPLANE-3203: Add autoscaling documentation for self-managed Azure#8239
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bryan-cox: This pull request references CNTRLPLANE-3203 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a3bec33 to
9467952
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request adds documentation for configuring autoscaling for HostedClusters: a general autoscaling guide and an Azure self-managed-specific guide. It documents NodePool-level Sequence Diagram(s)sequenceDiagram
participant Scheduler
participant ClusterAutoscaler
participant HostedClusterAPI
participant NodePoolController
participant CAPZ as CAPZ/Azure
participant Azure
Scheduler->>ClusterAutoscaler: Pod pending (no nodes)
ClusterAutoscaler->>ClusterAutoscaler: Evaluate NodePools and HostedCluster autoscaling config
ClusterAutoscaler->>HostedClusterAPI: Request scale-up (select NodePool, desired replicas)
HostedClusterAPI->>NodePoolController: Reconcile NodePool replicas/scale request
NodePoolController->>CAPZ: Create Machines/VMs for NodePool
CAPZ->>Azure: Provision VMs
Azure-->>CAPZ: VM(s) ready
CAPZ-->>NodePoolController: Machines registered
NodePoolController-->>HostedClusterAPI: NodePool reports new nodes
HostedClusterAPI-->>ClusterAutoscaler: Scale-up observed (nodes available)
ClusterAutoscaler-->>Scheduler: Pods scheduled onto new nodes
sequenceDiagram
participant Metrics as Metrics/Controller
participant ClusterAutoscaler
participant HostedClusterAPI
participant NodePoolController
participant CAPZ as CAPZ/Azure
participant Azure
Metrics->>ClusterAutoscaler: Nodes underutilized (scale-down eligible)
ClusterAutoscaler->>ClusterAutoscaler: Evaluate scale-down rules, balancing, and timing
opt Scale-down enabled
ClusterAutoscaler->>HostedClusterAPI: Request scale-down (reduce replicas)
HostedClusterAPI->>NodePoolController: Reconcile NodePool replicas/scale request
NodePoolController->>CAPZ: Delete Machines/VMs
CAPZ->>Azure: Deallocate/delete VMs
Azure-->>CAPZ: VMs deleted
CAPZ-->>NodePoolController: Machines removed
NodePoolController-->>HostedClusterAPI: NodePool reports fewer nodes
HostedClusterAPI-->>ClusterAutoscaler: Scale-down observed
end
🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@bryan-cox: This pull request references CNTRLPLANE-3203 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/content/how-to/azure/autoscaling-self-managed.md`:
- Around line 98-108: Remove the entire "Expanders" section including the
"Expanders" heading, the table describing `LeastWaste`, `Priority`, and
`Random`, the reference to `spec.autoscaling.expanders`, and the "Default:
[Priority, LeastWaste]" line; ensure no leftover mentions of "Expanders" or
`spec.autoscaling.expanders` remain in the document since the ClusterAutoscaling
API and HyperShift on Azure do not support configurable expander strategies.
- Around line 122-127: Remove the incorrect `maxFreeDifferenceRatioPercent`
table row from the Node Group Balancing section because that field does not
exist on the ClusterAutoscaling API; update the markdown table to only include
`balancingIgnoredLabels` (and any Platform-specific note) so the docs match the
actual ClusterAutoscaling schema and API.
- Around line 88-96: The "Scaling Behavior" table documents fields that don't
exist on the ClusterAutoscaling API; remove the four unsupported rows
(`maxNodesTotal`, `maxPodGracePeriod`, `maxNodeProvisionTime`,
`podPriorityThreshold`) and leave only the `scaling` row in the table so the doc
matches the actual ClusterAutoscaling fields; update any surrounding text
referencing those removed fields to avoid dangling mentions.
- Around line 182-188: Step 2 incorrectly references a non-existent "configured
expander strategy"; update the text so the autoscaler selects a suitable
NodePool based on capacity and scheduling constraints (not an expander) and then
triggers a scale-up by increasing the NodePool replica count—leave references to
NodePool, autoscaler, HyperShift, scaling/ScaleUpAndScaleDown,
utilizationThresholdPercent, and unneededDurationSeconds intact and only remove
the expander strategy wording.
- Around line 129-180: Remove the unsupported autoscaling fields from the
HostedCluster example: delete spec.autoscaling.maxNodesTotal,
spec.autoscaling.expanders (the Random expander list), and
spec.autoscaling.maxFreeDifferenceRatioPercent; keep the remaining autoscaling
entries (scaling, scaleDown, balancingIgnoredLabels) and leave the two NodePool
definitions unchanged (my-cluster-nodepool-1 and my-cluster-nodepool-2).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 5e66a73a-b4f8-441b-ab75-4055cf0fa5a8
📒 Files selected for processing (3)
docs/content/how-to/azure/autoscaling-self-managed.mddocs/content/how-to/azure/self-managed-azure-index.mddocs/mkdocs.yml
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8239 +/- ##
=======================================
Coverage 34.65% 34.65%
=======================================
Files 767 767
Lines 93263 93263
=======================================
Hits 32318 32318
Misses 58266 58266
Partials 2679 2679 🚀 New features to boost your workflow:
|
9467952 to
81fba61
Compare
|
@bryan-cox: This pull request references CNTRLPLANE-3203 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
afccc01 to
c9b5279
Compare
|
@bryan-cox: This pull request references CNTRLPLANE-3203 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
c9b5279 to
404fab6
Compare
|
@bryan-cox: This pull request references CNTRLPLANE-3203 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/content/how-to/autoscaling.md`:
- Around line 25-28: The table entry for the autoscaler `min` constraint is
incorrect; update the docs table row describing `min` to state that `min` can be
0 for the AWS platform (supporting scale-from-zero) but must be >= 1 for all
other platforms, matching the API validation in
api/hypershift/v1beta1/nodepool_types.go (see the `min` validation rule) and
retain the existing scale-from-zero note; ensure the wording explicitly calls
out the platform-specific behavior for AWS vs non-AWS platforms.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 6bf5244a-ffbe-4621-85da-41dbf8cb11a5
⛔ Files ignored due to path filters (1)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.md
📒 Files selected for processing (4)
docs/content/how-to/autoscaling.mddocs/content/how-to/azure/autoscaling-self-managed.mddocs/content/how-to/azure/self-managed-azure-index.mddocs/mkdocs.yml
✅ Files skipped from review due to trivial changes (3)
- docs/content/how-to/azure/self-managed-azure-index.md
- docs/mkdocs.yml
- docs/content/how-to/azure/autoscaling-self-managed.md
404fab6 to
f8e5bcd
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/content/how-to/autoscaling.md (1)
61-62: Clarify scope: cluster autoscaling doesn’t act on fixed-replica NodePools.Line 61 currently reads as “all NodePools,” which can be interpreted as including NodePools without
spec.autoScaling. Consider tightening the wording to avoid confusion.Suggested wording tweak
-Cluster autoscaling configures global autoscaling behavior that applies to all NodePools in a HostedCluster. This includes scale-down policies, node group balancing, and expander strategies. +Cluster autoscaling configures global autoscaling behavior for autoscaling-enabled NodePools in a HostedCluster. This includes scale-down policies, node group balancing, and expander strategies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/how-to/autoscaling.md` around lines 61 - 62, Change the wording that claims "applies to all NodePools" to clarify scope: state that Cluster autoscaling configures global behavior for NodePools that use autoscaling and does not affect NodePools with a fixed replica count (i.e., those lacking spec.autoScaling). Update the sentence referencing "all NodePools" to explicitly mention "NodePools with spec.autoScaling enabled (it does not act on NodePools without spec.autoScaling / fixed-replica NodePools)" so readers understand the limitation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/content/how-to/autoscaling.md`:
- Around line 61-62: Change the wording that claims "applies to all NodePools"
to clarify scope: state that Cluster autoscaling configures global behavior for
NodePools that use autoscaling and does not affect NodePools with a fixed
replica count (i.e., those lacking spec.autoScaling). Update the sentence
referencing "all NodePools" to explicitly mention "NodePools with
spec.autoScaling enabled (it does not act on NodePools without spec.autoScaling
/ fixed-replica NodePools)" so readers understand the limitation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 4984fc07-c4ee-45ce-970f-6d0943624549
⛔ Files ignored due to path filters (1)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.md
📒 Files selected for processing (4)
docs/content/how-to/autoscaling.mddocs/content/how-to/azure/autoscaling-self-managed.mddocs/content/how-to/azure/self-managed-azure-index.mddocs/mkdocs.yml
✅ Files skipped from review due to trivial changes (3)
- docs/mkdocs.yml
- docs/content/how-to/azure/self-managed-azure-index.md
- docs/content/how-to/azure/autoscaling-self-managed.md
csrwng
left a comment
There was a problem hiding this comment.
Just one question, otherwise lgtm
| | Expander | Description | | ||
| |----------|-------------| | ||
| | `LeastWaste` | Selects the NodePool with the least idle CPU and memory after scaling. | | ||
| | `Priority` | Selects the NodePool with the highest user-defined priority. | |
There was a problem hiding this comment.
How do you configure priority for a nodepool?
There was a problem hiding this comment.
The Priority expander uses the upstream cluster autoscaler's ConfigMap-based mechanism. Users create a ConfigMap named cluster-autoscaler-priority-expander in the kube-system namespace of the guest cluster, mapping integer priorities to node group name patterns (regex). HyperShift doesn't expose a dedicated API field for this on NodePool — it relies on the upstream behavior. Added a note to the docs clarifying this.
AI-assisted response via Claude Code
There was a problem hiding this comment.
f8e5bcd to
1ff74c3
Compare
1ff74c3 to
17fc2da
Compare
… Azure Add documentation covering node pool and cluster autoscaling configuration for self-managed Azure HostedClusters, including NodePool autoScaling, ClusterAutoscaling with scale-down policies, expander strategies, and node group balancing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
17fc2da to
61e94da
Compare
|
/lgtm |
|
Pipeline controller notification No second-stage tests were triggered for this PR. This can happen when:
Use |
|
/verified by @bryan-cox |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@bryan-cox: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
autoScaling(min/max), HostedClusterautoscaling(scale-down, expanders, balancing), monitoring, and limitationsContext
Autoscaling is validated in CI via
TestAutoscalingon thee2e-azure-self-managedjob (PR #77597), but no user-facing documentation existed for configuring it on self-managed Azure.Test plan
mkdocs serve🤖 Generated with Claude Code
Summary by CodeRabbit