SPLAT-2747: Updated kube cloud config controller to react to feature gate updates#489
Conversation
|
@vr4manta: This pull request references SPLAT-2747 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReorders operator startup to start informers, run the feature-gate accessor, and block up to 1 minute for initial feature gates before constructing controllers. The kube-cloud-config controller now stores a ChangesFeature Gate Synchronization & Startup Sequencing
sequenceDiagram
participant Startup as Startup Flow
participant Informer as Informers
participant FGGate as FeatureGateAccessor
participant Controller as Controllers
Startup->>Informer: Start dynamic/config/kube informers
Informer-->>Startup: Informers running
Startup->>FGGate: Start feature-gate accessor (Run)
FGGate->>FGGate: Await InitialFeatureGatesObserved (<=1m)
alt Initial observed
FGGate-->>Startup: Initial gates available
else Timeout
FGGate-->>Startup: Return timeout error
end
Startup->>Controller: Construct feature-gate–dependent controllers (use accessor)
Controller->>FGGate: Query CurrentFeatureGates on demand
FGGate-->>Controller: Return latest gates
Controller->>Controller: isFeatureGateEnabled re-fetches gates and logs result
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRsSuggested labelslgtm Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (9 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/retest |
|
/hold |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/kube_cloud_config/controller.go`:
- Around line 75-83: The field c.currentFeatureGates is written in
featureGateAccess.SetChangeHandler's callback and read concurrently by
isFeatureGateEnabled, causing a data race; add a sync.RWMutex (e.g.,
featureGatesMu) to the controller struct, use featureGatesMu.Lock/Unlock when
assigning c.currentFeatureGates in the SetChangeHandler callback and during the
initial assignment, and use featureGatesMu.RLock/RUnlock inside
isFeatureGateEnabled when reading c.currentFeatureGates to serialize access.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2cc482a0-127a-4616-a243-4bd79f60a718
📒 Files selected for processing (2)
pkg/operator/kube_cloud_config/controller.gopkg/operator/starter.go
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/operator/kube_cloud_config/controller.go (1)
80-91: 💤 Low valueConsider using
featureChangeparameter directly instead of re-fetching.The
featureChangeparameter passed to the handler likely contains the new feature gates directly (commonly asfeatureChange.New), which would avoid the redundant call toCurrentFeatureGates(). However, this is a minor optimization.Also note: feature gate changes won't trigger an immediate sync - the controller relies on the 1-minute resync interval to pick up changes. This is acceptable given feature gates rarely change in production.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/kube_cloud_config/controller.go` around lines 80 - 91, The handler passed to featureGateAccess.SetChangeHandler should use the provided featureChange argument instead of calling CurrentFeatureGates() again: extract the new gates from featureChange (e.g., featureChange.New or the appropriate field on featuregates.FeatureChange), acquire c.featureGatesMu, assign c.currentFeatureGates to that new value, and unlock; optionally keep a fallback to call featureGateAccess.CurrentFeatureGates() only if featureChange does not contain the new gates. Update the code in the SetChangeHandler closure (referencing featureGateAccess.SetChangeHandler, featureChange, CurrentFeatureGates, c.featureGatesMu, and c.currentFeatureGates) accordingly.pkg/operator/starter.go (1)
201-208: ⚡ Quick winHandle context cancellation in the select for responsive shutdown.
The select statement doesn't handle
ctx.Done(). If the operator receives a shutdown signal during startup, it will wait up to 1 minute before exiting. Adding a context cancellation case improves shutdown responsiveness.♻️ Proposed fix
select { case <-featureGateAccessor.InitialFeatureGatesObserved(): klog.V(4).Info("FeatureGates initialized") case <-time.After(1 * time.Minute): accessError := errors.New("timed out waiting for FeatureGate detection") - klog.Error(accessError, "unable to start operator") + klog.Errorf("unable to start operator: %v", accessError) return accessError + case <-ctx.Done(): + return ctx.Err() }Also:
klog.Error(accessError, "unable to start operator")concatenates both arguments without formatting. Usingklog.Errorforklog.ErrorSwould produce cleaner log output.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/starter.go` around lines 201 - 208, The select waiting on featureGateAccessor.InitialFeatureGatesObserved() currently ignores context cancellation and uses klog.Error with concatenated args; update the select in starter.go to include a case for <-ctx.Done() that logs the cancellation using klog.ErrorS (or klog.Errorf) with the context error and immediately returns ctx.Err(), and also change the timeout branch to log the timeout using klog.ErrorS/klog.Errorf with the accessError (instead of klog.Error) so both branches produce structured/ formatted log output; locate the select around featureGateAccessor.InitialFeatureGatesObserved(), the accessError variable, and use ctx.Done()/ctx.Err() for responsive shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/operator/kube_cloud_config/controller.go`:
- Around line 80-91: The handler passed to featureGateAccess.SetChangeHandler
should use the provided featureChange argument instead of calling
CurrentFeatureGates() again: extract the new gates from featureChange (e.g.,
featureChange.New or the appropriate field on featuregates.FeatureChange),
acquire c.featureGatesMu, assign c.currentFeatureGates to that new value, and
unlock; optionally keep a fallback to call
featureGateAccess.CurrentFeatureGates() only if featureChange does not contain
the new gates. Update the code in the SetChangeHandler closure (referencing
featureGateAccess.SetChangeHandler, featureChange, CurrentFeatureGates,
c.featureGatesMu, and c.currentFeatureGates) accordingly.
In `@pkg/operator/starter.go`:
- Around line 201-208: The select waiting on
featureGateAccessor.InitialFeatureGatesObserved() currently ignores context
cancellation and uses klog.Error with concatenated args; update the select in
starter.go to include a case for <-ctx.Done() that logs the cancellation using
klog.ErrorS (or klog.Errorf) with the context error and immediately returns
ctx.Err(), and also change the timeout branch to log the timeout using
klog.ErrorS/klog.Errorf with the accessError (instead of klog.Error) so both
branches produce structured/ formatted log output; locate the select around
featureGateAccessor.InitialFeatureGatesObserved(), the accessError variable, and
use ctx.Done()/ctx.Err() for responsive shutdown.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7e9e6a22-fce2-4ad8-8bdf-488330eb3b03
📒 Files selected for processing (2)
pkg/operator/kube_cloud_config/controller.gopkg/operator/starter.go
|
/retest |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/kube_cloud_config/controller.go`:
- Line 44: Tests still construct the controller using the old field name
currentFeatureGates; update those struct literals to use the renamed field
featureGateAccessor instead (replace occurrences of currentFeatureGates with
featureGateAccessor in the controller test setup so the controller struct
literal matches the controller.go definition).
- Around line 104-105: The log call using klog.V(4).Infof in
KubeCloudConfigController is malformed: the first argument is a literal
("KubeCloudConfigController") not a format string, causing EXTRA output; fix by
making the first argument a proper format string that includes the platform
placeholder (e.g., "KubeCloudConfigController: Skipping kube-cloud-config
management for platform %s") and pass platformName as the single format
argument, or alternatively use Info with a concatenated message or InfoS with
structured key/value logging; update the klog.V(4).Infof invocation accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8dbd8198-1a60-4f2a-a50a-e07831809d46
📒 Files selected for processing (2)
pkg/operator/kube_cloud_config/controller.gopkg/operator/starter.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/starter.go
|
/retest |
|
/unhold |
|
@coderabbitai summary |
✅ Actions performedSummary regeneration triggered. |
|
/assign @JoelSpeed |
|
/lgtm |
|
/retest |
|
/retest |
|
@vr4manta: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
So upgrade is failing due to : This looks like an issue being caused by the new feature gate accessor being added. @JoelSpeed Not sure why this would be happening. Is this something that may be an issue w/ the accessor? |
|
Current version in the featuregates CR: |
|
The accessor was being started before the feature gate controller starts and processes the upgrade feature gate list. Moving the start of the accessor to after the feature gate controller will fix the trick. |
|
@vr4manta: This pull request references SPLAT-2747 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
| // Start the feature gate accessor and wait for it to observe initial feature gates | ||
| // This must happen before creating controllers that depend on feature gates | ||
| go featureGateAccessor.Run(ctx) | ||
| go featureGateController.Run(ctx, 1) |
There was a problem hiding this comment.
Nit, this probably should come before the accessor, and also, can we have a comment explaining why the featuregate controller must be started early
// The featuregate controller must never be featuregates. It is responsible for ensuring the featuregate
// object status contains the correct feature gate states for the current release.
// It must run before the featuregeate accessor as the accessor depends on the featuregate status being updated.
There was a problem hiding this comment.
sure, i can add a comment and change order.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed, vr4manta The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@vr4manta: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified by @vr4manta |
|
@vr4manta: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
SPLAT-2747
Changes
Summary by CodeRabbit
New Features
Improvements
Tests