CNTRLPLANE-3222: Port v1 lifecycle tests to v2 Ginkgo framework#8527
CNTRLPLANE-3222: Port v1 lifecycle tests to v2 Ginkgo framework#8527bryan-cox wants to merge 9 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
@bryan-cox: This pull request references CNTRLPLANE-3222 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds v2 Ginkgo E2E suites and orchestration tooling: control-plane upgrade, etcd chaos (five scenarios), nodepool autoscaling, and nodepool lifecycle tests. Widened e2e helper signatures to testing.TB, registered additional lifecycle env vars, added e2ev2 CLI binaries (create-guests, run-tests, destroy-guests, dump-guests), and wired Makefile and Dockerfile to build and package those binaries. Sequence Diagram(s)sequenceDiagram
participant CreateCLI as create-guests
participant HypershiftCLI as hypershift
participant MgmtAPI as ManagementCluster API
participant HostedCluster as HostedCluster CR
CreateCLI->>HypershiftCLI: exec "hypershift create cluster azure" (args per variant)
HypershiftCLI->>MgmtAPI: submit HostedCluster CR / related resources
MgmtAPI->>HostedCluster: create/update HostedCluster
HostedCluster->>MgmtAPI: status updates (Available, Version History)
MgmtAPI->>CreateCLI: status observed via controller-runtime watch
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 7 | ❌ 5❌ Failed checks (5 warnings)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8527 +/- ##
==========================================
+ Coverage 40.07% 40.10% +0.02%
==========================================
Files 751 755 +4
Lines 92863 93018 +155
==========================================
+ Hits 37215 37303 +88
- Misses 52956 53016 +60
- Partials 2692 2699 +7 see 9 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (9)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (2)
1543-1544: 💤 Low valueMisleading
%win log message.The
%wverb is afmtdirective for error wrapping, butr.Log.Error()doesn't perform formatting on the message string. The error is already passed as the first argument and will be logged correctly; the%wis printed literally.Suggested fix
if err != nil { - r.Log.Error(err, "failed to remove service ca annotation and secret: %w") + r.Log.Error(err, "failed to remove service ca annotation and secret") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go` around lines 1543 - 1544, The log call in hostedcontrolplane_controller.go uses the fmt verb "%w" in the message string (`r.Log.Error(err, "failed to remove service ca annotation and secret: %w")`), which is meaningless for the logger and will be printed literally; fix it by removing the "%w" from the message so the error is passed as the first argument and logged correctly (e.g., `r.Log.Error(err, "failed to remove service ca annotation and secret")`), or alternatively keep the error and add structured context via key/value pairs to the same `r.Log.Error` call.
1721-1729: 💤 Low valueSame
%wissue and unnecessary intermediate variable.The log messages at lines 1722 and 1755 have the same misleading
%wissue as flagged earlier. Additionally, thezvariable at lines 1725-1726 and 1757-1759 is unnecessary—the reconcile function result can be returned directly.Suggested fixes
if err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert); err != nil { - r.Log.Error(err, "failed to remove service ca annotation and secret: %w") + r.Log.Error(err, "failed to remove service ca annotation and secret") } if _, err := createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error { - z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef) - return z + return pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef) }); err != nil {Apply the same pattern at lines 1754-1759 for the Azure File CSI driver operator.
Also applies to: 1752-1762
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go` around lines 1721 - 1729, The r.Log.Error calls use a printf-style %w placeholder incorrectly and should be called as r.Log.Error(err, "failed to remove service ca annotation and secret") (i.e., pass the error as the first argument to r.Log.Error without formatting), and the reconciliation closures create an unnecessary intermediate variable (z); replace constructs like "z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(...); return z" with "return pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(...)" inside the createOrUpdate call (apply the same fixes for AzureDiskCsiDriverOperator* and the analogous AzureFileCsiDriverOperator* usages, and for functions removeServiceCAAnnotationAndSecret, createOrUpdate, and pki.ReconcileAzure...ServingCertSecret).api/karpenter/v1/kubelet_config.go (1)
50-50: 💤 Low valueConsider simplifying or documenting the complex eviction threshold validation.
The XValidation rule on line 50 is correct but extremely complex (>300 characters). It validates that
evictionSoft >= evictionHardfor matching signals, handling both percentage strings and quantity types. While the logic is sound, this level of complexity in a single CEL expression makes it difficult to:
- Review for correctness
- Debug when validation fails
- Maintain if requirements change
Consider adding an inline comment above the validation explaining the high-level intent, or explore whether this could be split into separate validation rules (though CEL may not permit that given the cross-field nature).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/karpenter/v1/kubelet_config.go` at line 50, The XValidation CEL expression on the kubebuilder annotation that enforces evictionSoft >= evictionHard for matching signals is correct but unreadable; add a clear inline comment directly above the annotation describing the high-level intent (e.g., "Ensure evictionSoft thresholds are greater than or equal to evictionHard for the same signal, handling percentage strings and resource quantities") and briefly note the two cases handled (percentage strings vs. resource quantities), or alternatively factor the check into multiple, named XValidation annotations if feasible; reference the annotation (the +kubebuilder:validation:XValidation line) and the related fields evictionSoft and evictionHard so reviewers can find and understand the rule quickly.api/hypershift/v1beta1/hostedcluster_types.go (1)
676-681: ⚡ Quick winClarify pull secret propagation mechanism for in-place updates.
Lines 678-680 state that updating the Secret's data in place "does not trigger that rollout," but then immediately say the changes "will still propagate the updated credentials down to the guest cluster and kubelet config" for AWS/Azure Replace strategy. This appears contradictory—how do the credentials propagate without triggering a rollout?
Consider clarifying:
- Does "propagate" mean existing nodes are updated in place without replacement?
- Is this via a separate controller watching the Secret?
- Does "rollout" specifically mean node replacement, while propagation is a different mechanism?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/hypershift/v1beta1/hostedcluster_types.go` around lines 676 - 681, The comment is ambiguous about how Secret data changes "do not trigger that rollout" yet "will still propagate" for AWS/Azure Replace NodePools; update the paragraph (the NodePool/pull secret description) to explicitly state that “rollout” refers to NodePool replacement/in-place upgrade actions (node recreation), and that updating the referenced Secret's data in-place does not automatically trigger node replacement, but cloud-provider-specific controllers or credential reconciliation (e.g., the cloud provider credential controller or kubelet secret refresh mechanisms) will propagate updated credentials to the guest cluster and kubelet config for AWS/Azure Replace strategy; mention whether propagation is in-place (no node replacement) and name the controller or mechanism if available, or note “a separate controller/watch reconciler” if unspecified, so readers understand the distinct behaviors for secret updates vs. NodePool rollouts.control-plane-operator/controllers/azureprivatelinkservice/controller.go (1)
863-881: 💤 Low valueGood refactoring of deletion logic into focused helpers.
The extraction of deletion logic into dedicated helper functions improves maintainability and testability. Using
azPLS.Status.DNSZoneNameto avoid dependency on the HostedControlPlane during deletion is a solid design decision.The verbose log at line 873 when
DNSZoneNameis not set useslog.V(1), which may not be visible by default. Consider whether this should be a regularlog.Infogiven that it represents a potentially unexpected condition during cleanup.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/azureprivatelinkservice/controller.go` around lines 863 - 881, The log for the case where azPLS.Status.DNSZoneName is empty uses log.V(1).Info which may be too quiet for an unexpected cleanup condition; update the logging in the controller deletion path (the dnsZoneName check in the reconcile/finalizer routine that calls deleteDNSResources and deleteBaseDomainResources) to use log.Info (or a higher visibility) instead of log.V(1).Info so the message "DNSZoneName not set in status, skipping DNS cleanup" is visible by default when DNSZoneName is missing during deletion.control-plane-operator/controllers/azureprivatelinkservice/controller_test.go (1)
1747-1774: ⚡ Quick winConsolidate duplicated
errMsgQualifiercoverage.Line 1747 adds
TestDNSZoneConfigErrMsgQualifier, but this is functionally duplicatingTestErrMsgQualifierlater in the file (same inputs and expected outputs). Keeping one avoids drift and redundant maintenance.Suggested cleanup
-func TestDNSZoneConfigErrMsgQualifier(t *testing.T) { - t.Parallel() - tests := []struct { - name string - logPrefix string - expected string - }{ - { - name: "When logPrefix is empty, it should return empty string", - logPrefix: "", - expected: "", - }, - { - name: "When logPrefix is set, it should return prefix followed by a space", - logPrefix: "base domain", - expected: "base domain ", - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - t.Parallel() - g := NewGomegaWithT(t) - cfg := dnsZoneConfig{logPrefix: tt.logPrefix} - g.Expect(cfg.errMsgQualifier()).To(Equal(tt.expected)) - }) - } -}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/azureprivatelinkservice/controller_test.go` around lines 1747 - 1774, Remove the duplicated unit test TestDNSZoneConfigErrMsgQualifier and keep the existing TestErrMsgQualifier to avoid redundant coverage; locate the duplicate test function named TestDNSZoneConfigErrMsgQualifier in the controller_test.go file (it constructs dnsZoneConfig{logPrefix: ...} and calls cfg.errMsgQualifier()) and delete that entire test block so only the original TestErrMsgQualifier remains exercising dnsZoneConfig.errMsgQualifier.contrib/repo_metrics/weekly_pr_report.py (2)
965-973: ⚡ Quick winAdd
strict=Truetozip()for safety.Line 965 zips
repos_to_fetchandresultswhich should have the same length by construction, but addingstrict=Truemakes this assumption explicit and catches bugs if the invariant is violated.Note:
strict=requires Python 3.10+.🔒 Proposed fix
- for (owner, name), result in zip(repos_to_fetch, results): + for (owner, name), result in zip(repos_to_fetch, results, strict=True):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/repo_metrics/weekly_pr_report.py` around lines 965 - 973, The loop that pairs repos_to_fetch with results uses zip(repos_to_fetch, results) and assumes equal lengths; change it to zip(repos_to_fetch, results, strict=True) to enforce that invariant at runtime and fail fast if lengths differ; update the loop in the block that assigns to repo_fetch_status and repo_prs (the for (owner, name), result in zip(...) loop) so it uses strict=True, ensuring any mismatch between repos_to_fetch and results is caught immediately.
402-403: 💤 Low valueConsider using immutable types for class constants.
Lines 402-403 use mutable list and set for class constants that are never modified. Converting to tuple and frozenset would make the immutability explicit and silence the static analysis warning.
♻️ Proposed refactor
- BOT_PATTERNS = ['-bot', '-robot', '[bot]'] - BOT_LOGINS = {'coderabbitai', 'hypershift-jira-solve-ci'} + BOT_PATTERNS = ('-bot', '-robot', '[bot]') + BOT_LOGINS = frozenset({'coderabbitai', 'hypershift-jira-solve-ci'})🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/repo_metrics/weekly_pr_report.py` around lines 402 - 403, BOT_PATTERNS and BOT_LOGINS are declared as mutable list/set but never modified; change BOT_PATTERNS = ['-bot', '-robot', '[bot]'] to an immutable tuple BOT_PATTERNS = ('-bot', '-robot', '[bot]') and change BOT_LOGINS = {'coderabbitai', 'hypershift-jira-solve-ci'} to a frozenset: BOT_LOGINS = frozenset({'coderabbitai', 'hypershift-jira-solve-ci'}), and update any membership checks (e.g., "in BOT_LOGINS" or iterating BOT_PATTERNS) which will continue to work without other code changes..claude/commands/pr-report.md (1)
732-755: 💤 Low valueConsider documenting alternative environment variable names for backward compatibility.
The Python script (lines 69-70) accepts both
JIRA_API_TOKEN/JIRA_USERNAMEandJIRA_TOKEN/JIRA_EMAIL, but the documentation only mentions the latter. Consider adding a note about the alternative names for users with existing configurations.📝 Suggested documentation addition
Add a note under the environment variables table:
| `GITHUB_TOKEN` | No | GitHub token (falls back to `gh auth token` if not set) | + +**Note:** For backward compatibility, the script also accepts `JIRA_USERNAME` (alias for `JIRA_EMAIL`) and `JIRA_API_TOKEN` (alias for `JIRA_TOKEN`).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/commands/pr-report.md around lines 732 - 755, Update the environment variables docs to mention the alternative legacy names accepted by the script: add a brief note under the variables table stating that JIRA_TOKEN/JIRA_EMAIL are equivalent to the older JIRA_API_TOKEN/JIRA_USERNAME (the script checks both pairs around lines where env vars are read), so existing configurations using JIRA_API_TOKEN or JIRA_USERNAME will continue to work; reference the variable names JIRA_TOKEN, JIRA_EMAIL, JIRA_API_TOKEN, and JIRA_USERNAME in the note.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/commands/pr-report.md:
- Around line 115-119: The detection patterns in .claude/commands/pr-report.md
don’t match the actual runtime messages; update the checks to use partial or
regex matching against the real strings — e.g., match "Fetching Jira data via
REST API" (prefix of "Fetching Jira data via REST API ({len(tickets)}
tickets)...") and "Jira credentials not set" (prefix of "Jira credentials not
set (need JIRA_USERNAME/JIRA_API_TOKEN or JIRA_EMAIL/JIRA_TOKEN), loading from
cache") instead of the exact old phrases, so the script recognizes both messages
with variable content.
In `@api/.golangci.yml`:
- Around line 322-342: The review asks to fix the API types instead of adding
kubeapilinter exclusions: update the KubeletConfiguration and
OpenshiftEC2NodeClassSpec types to satisfy kube-apilinter rules—specifically
change CPUCFSQuota (in KubeletConfiguration) from a *bool to a string enum with
meaningful constants; add a kubebuilder:validation:MinProperties marker where
the KubeletConfiguration object is used (referenced by
OpenshiftEC2NodeClassSpec.Kubelet) and make OpenshiftEC2NodeClassSpec.Kubelet a
pointer if it must be optional to avoid the optionalfields warning; replace map
values that use Go Durations (EvictionSoftGracePeriod) with integer types whose
names include units (e.g., EvictionSoftGracePeriodSeconds) and, for the nomaps
warnings on SystemReserved, KubeReserved, EvictionHard, EvictionSoft, refactor
map fields into well-defined structured types or resource-list fields (or
concrete slice/struct representations) so they no longer use raw maps; run make
api-lint-fix and iterate until kube-apilinter passes.
In `@api/karpenter/v1/kubelet_config.go`:
- Around line 196-212: HasTypedFields currently lists typed fields manually
(KubeletConfiguration.HasTypedFields) and can get out of sync with the struct,
causing IsZero/omitzero bugs; replace the manual checks with a reflection-based
implementation or add a unit test that fails if any exported non-overflow struct
field is not checked: implement HasTypedFields by reflecting over
KubeletConfiguration, skipping overflow/XXX fields and detecting any non-zero
typed field (or alternatively write a test that compares the set of field names
in kubeletConfigKnownFields/reflection against the ones asserted in
HasTypedFields to force updates), and ensure IsZero/omitzero behavior remains
correct for all current and future typed fields.
In `@cmd/cluster/core/create.go`:
- Around line 476-484: The code currently returns early when opts.PausedUntil ==
"true" so HostedCluster.Spec.PausedUntil is never set; change the logic in the
function handling opts.PausedUntil to: if opts.PausedUntil is empty return nil;
if opts.PausedUntil == "true" set cluster.Spec.PausedUntil = &opts.PausedUntil
and return nil; otherwise parse opts.PausedUntil with time.RFC3339, return a
wrapped error on parse failure and set cluster.Spec.PausedUntil =
&opts.PausedUntil on success (update the unit test expectation for
HostedCluster.Spec.PausedUntil when "--pausedUntil=true" accordingly).
In `@cmd/fix/dr_oidc_iam.go`:
- Around line 524-568: Add unit tests covering the new OIDC helper branch logic:
write table-driven unit tests for ensureOIDCDocuments and ensureOIDCProvider
that exercise combinations of DryRun true/false, ForceRecreate true/false, and
existing-state flags (oidcDocsExist and exists) to assert correct behavior
(no-op, dry-run messages, calls into generateAndUploadOIDCDocuments,
deleteOIDCProviderIfExists, and createOIDCProvider). Mock/stub client
interactions (k8sClient, s3Client, iamClient) and the helper methods
generateAndUploadOIDCDocuments, deleteOIDCProviderIfExists, createOIDCProvider
to verify they are or are not invoked, and assert returned errors are
propagated/wrapped as expected; include tests for success and error propagation
paths.
- Around line 496-511: The check currently collapses all S3 HeadObject failures
into a false result; change checkOIDCDocumentsExist to return (bool, error)
instead of bool, have it return (false, nil) only for a genuine NotFound/404 and
return (false, err) for AccessDenied/network/transient errors so callers can
distinguish causes, then update checkOIDCState to handle the propagated error
from checkOIDCDocumentsExist (inspect error != nil to log/return the error or
abort rather than printing "documents missing") while preserving the existing
branches for oidcDocsExist and o.ForceRecreate; update any call sites of
checkOIDCDocumentsExist accordingly.
In `@Containerfile.cli`:
- Line 22: The Docker base image reference "FROM
registry.redhat.io/ubi9/nginx-124:latest" is using the mutable :latest tag;
update that FROM line to pin to a specific immutable version tag or image digest
(e.g., a released version tag or sha256 digest) so builds are reproducible and
stable—locate the FROM statement in Containerfile.cli and replace ":latest" with
the chosen version tag or digest, then commit the change and optionally add a
short comment noting the pinned version.
In `@contrib/repo_metrics/weekly_pr_report.py`:
- Around line 144-145: In _fetch_json replace the blocking time.sleep(30) with
an await asyncio.sleep(30) so the coroutine yields to the event loop; update the
call to await asyncio.sleep(30) in the same fallback path that currently does
return await self._fetch_json(...), and ensure asyncio is imported at the module
top if not already present.
In `@control-plane-operator/controllers/azureprivatelinkservice/controller.go`:
- Around line 863-1053: The new deletion helpers lack unit tests; add
table-driven unit tests for deleteDNSResources, deleteVNetLink, deleteDNSZone,
deleteBaseDomainResources, deleteBaseDomainARecords, and deletePrivateEndpoint
(and exercise hasSiblingCR behavior) that cover: (1) Azure API failures
returning errors and asserting the wrapper returns the expected formatted error,
(2) NotFound responses are treated as success (no error), (3) correct resource
name derivation by asserting the Azure client is called with expected names
(e.g., kasARecordName/appsARecordName,
vnetLinkName(crName)/baseDomainVNetLinkName, privateEndpointName(crName), and
base domain A record prefixes), and (4) sibling-CR detection flow where
deleteBaseDomainResources deletes or skips the base zone/records depending on
hasSiblingCR; implement mocks/stubs for RecordSets, VirtualNetworkLinks,
PrivateDNSZones, PrivateEndpoints, and the hasSiblingCR dependency and use
context timeouts/poller behavior to simulate BeginDelete/PollUntilDone success,
error, and nil-poller cases.
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go`:
- Around line 61-72: The helper assertCertVolumeCount should fail fast when the
returned slice lengths don't match expectations to avoid downstream panics: in
function assertCertVolumeCount (which calls certVolumesFromMonitors) replace the
t.Errorf checks for volume/mount count mismatches with t.Fatalf (or otherwise
return immediately with t.Fatalf) so the test stops on precondition failures
rather than allowing callers to index invalid slices; keep the existing error
check for certVolumesFromMonitors as-is and ensure callers can rely on asserted
counts when the helper returns.
---
Nitpick comments:
In @.claude/commands/pr-report.md:
- Around line 732-755: Update the environment variables docs to mention the
alternative legacy names accepted by the script: add a brief note under the
variables table stating that JIRA_TOKEN/JIRA_EMAIL are equivalent to the older
JIRA_API_TOKEN/JIRA_USERNAME (the script checks both pairs around lines where
env vars are read), so existing configurations using JIRA_API_TOKEN or
JIRA_USERNAME will continue to work; reference the variable names JIRA_TOKEN,
JIRA_EMAIL, JIRA_API_TOKEN, and JIRA_USERNAME in the note.
In `@api/hypershift/v1beta1/hostedcluster_types.go`:
- Around line 676-681: The comment is ambiguous about how Secret data changes
"do not trigger that rollout" yet "will still propagate" for AWS/Azure Replace
NodePools; update the paragraph (the NodePool/pull secret description) to
explicitly state that “rollout” refers to NodePool replacement/in-place upgrade
actions (node recreation), and that updating the referenced Secret's data
in-place does not automatically trigger node replacement, but
cloud-provider-specific controllers or credential reconciliation (e.g., the
cloud provider credential controller or kubelet secret refresh mechanisms) will
propagate updated credentials to the guest cluster and kubelet config for
AWS/Azure Replace strategy; mention whether propagation is in-place (no node
replacement) and name the controller or mechanism if available, or note “a
separate controller/watch reconciler” if unspecified, so readers understand the
distinct behaviors for secret updates vs. NodePool rollouts.
In `@api/karpenter/v1/kubelet_config.go`:
- Line 50: The XValidation CEL expression on the kubebuilder annotation that
enforces evictionSoft >= evictionHard for matching signals is correct but
unreadable; add a clear inline comment directly above the annotation describing
the high-level intent (e.g., "Ensure evictionSoft thresholds are greater than or
equal to evictionHard for the same signal, handling percentage strings and
resource quantities") and briefly note the two cases handled (percentage strings
vs. resource quantities), or alternatively factor the check into multiple, named
XValidation annotations if feasible; reference the annotation (the
+kubebuilder:validation:XValidation line) and the related fields evictionSoft
and evictionHard so reviewers can find and understand the rule quickly.
In `@contrib/repo_metrics/weekly_pr_report.py`:
- Around line 965-973: The loop that pairs repos_to_fetch with results uses
zip(repos_to_fetch, results) and assumes equal lengths; change it to
zip(repos_to_fetch, results, strict=True) to enforce that invariant at runtime
and fail fast if lengths differ; update the loop in the block that assigns to
repo_fetch_status and repo_prs (the for (owner, name), result in zip(...) loop)
so it uses strict=True, ensuring any mismatch between repos_to_fetch and results
is caught immediately.
- Around line 402-403: BOT_PATTERNS and BOT_LOGINS are declared as mutable
list/set but never modified; change BOT_PATTERNS = ['-bot', '-robot', '[bot]']
to an immutable tuple BOT_PATTERNS = ('-bot', '-robot', '[bot]') and change
BOT_LOGINS = {'coderabbitai', 'hypershift-jira-solve-ci'} to a frozenset:
BOT_LOGINS = frozenset({'coderabbitai', 'hypershift-jira-solve-ci'}), and update
any membership checks (e.g., "in BOT_LOGINS" or iterating BOT_PATTERNS) which
will continue to work without other code changes.
In
`@control-plane-operator/controllers/azureprivatelinkservice/controller_test.go`:
- Around line 1747-1774: Remove the duplicated unit test
TestDNSZoneConfigErrMsgQualifier and keep the existing TestErrMsgQualifier to
avoid redundant coverage; locate the duplicate test function named
TestDNSZoneConfigErrMsgQualifier in the controller_test.go file (it constructs
dnsZoneConfig{logPrefix: ...} and calls cfg.errMsgQualifier()) and delete that
entire test block so only the original TestErrMsgQualifier remains exercising
dnsZoneConfig.errMsgQualifier.
In `@control-plane-operator/controllers/azureprivatelinkservice/controller.go`:
- Around line 863-881: The log for the case where azPLS.Status.DNSZoneName is
empty uses log.V(1).Info which may be too quiet for an unexpected cleanup
condition; update the logging in the controller deletion path (the dnsZoneName
check in the reconcile/finalizer routine that calls deleteDNSResources and
deleteBaseDomainResources) to use log.Info (or a higher visibility) instead of
log.V(1).Info so the message "DNSZoneName not set in status, skipping DNS
cleanup" is visible by default when DNSZoneName is missing during deletion.
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 1543-1544: The log call in hostedcontrolplane_controller.go uses
the fmt verb "%w" in the message string (`r.Log.Error(err, "failed to remove
service ca annotation and secret: %w")`), which is meaningless for the logger
and will be printed literally; fix it by removing the "%w" from the message so
the error is passed as the first argument and logged correctly (e.g.,
`r.Log.Error(err, "failed to remove service ca annotation and secret")`), or
alternatively keep the error and add structured context via key/value pairs to
the same `r.Log.Error` call.
- Around line 1721-1729: The r.Log.Error calls use a printf-style %w placeholder
incorrectly and should be called as r.Log.Error(err, "failed to remove service
ca annotation and secret") (i.e., pass the error as the first argument to
r.Log.Error without formatting), and the reconciliation closures create an
unnecessary intermediate variable (z); replace constructs like "z :=
pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(...); return z"
with "return
pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(...)" inside the
createOrUpdate call (apply the same fixes for AzureDiskCsiDriverOperator* and
the analogous AzureFileCsiDriverOperator* usages, and for functions
removeServiceCAAnnotationAndSecret, createOrUpdate, and
pki.ReconcileAzure...ServingCertSecret).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 36192739-e1b9-421b-85fb-9abe5b9a217a
⛔ Files ignored due to path filters (37)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/TLSAdherence.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/karpenter/v1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcontrol-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Private_KAS_LoadBalancer.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Private_Route.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_PublicAndPrivate_Route.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Public_Route.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/control-plane-pki-operator/AROSwift/zz_fixture_TestControlPlaneComponents_control_plane_pki_operator_role.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/control-plane-pki-operator/GCP/zz_fixture_TestControlPlaneComponents_control_plane_pki_operator_role.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/control-plane-pki-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_control_plane_pki_operator_role.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/control-plane-pki-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_control_plane_pki_operator_role.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/control-plane-pki-operator/zz_fixture_TestControlPlaneComponents_control_plane_pki_operator_role.yamlis excluded by!**/testdata/**docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdkarpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlkarpenter-operator/controllers/karpenter/assets/zz_generated.crd-manifests/openshiftec2nodeclasses.crd.yamlis excluded by!**/zz_generated.crd-manifests/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1/kubelet_config.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (156)
.claude/agents/api-sme.md.claude/commands/pr-report.md.claude/rules/webhook-validation.md.github/workflows/dependabot-commit-fix-reusable.yaml.github/workflows/docs-deploy.yaml.golangci.yml.tekton/hypershift-gomaxprocs-webhook-pull-request.yaml.tekton/hypershift-gomaxprocs-webhook-push.yamlAGENTS.mdCLAUDE.mdCLAUDE.mdContainerfile.cliContainerfile.control-planeContainerfile.operatorMakefileapi/.golangci.ymlapi/AGENTS.mdapi/CLAUDE.mdapi/hypershift/v1beta1/hostedcluster_types.goapi/karpenter/v1/karpenter_types.goapi/karpenter/v1/kubelet_config.goapi/karpenter/v1/kubelet_config_test.gocmd/cluster/core/create.gocmd/cluster/core/create_test.gocmd/fix/dr_oidc_iam.gocmd/infra/aws/create.gocmd/infra/azure/create.gocmd/infra/gcp/iam-bindings.jsoncmd/infra/gcp/iam.gocmd/infra/gcp/iam_test.gocmd/install/assets/hypershift_operator.gocmd/install/assets/hypershift_operator_test.gocmd/install/install.gocmd/install/install_render.gocmd/install/install_render_test.gocmd/install/install_test.gocontrib/ai/adding-marketplace-plugins.mdcontrib/repo_metrics/weekly_pr_report.pycontrol-plane-operator/CLAUDE.mdcontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.gocontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller_test.gocontrol-plane-operator/controllers/azureprivatelinkservice/controller.gocontrol-plane-operator/controllers/azureprivatelinkservice/controller_test.gocontrol-plane-operator/controllers/gcpprivateserviceconnect/observer.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.gocontrol-plane-operator/controllers/hostedcontrolplane/infra/infra_test.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service_test.gocontrol-plane-operator/controllers/hostedcontrolplane/oauth/idp_convert.gocontrol-plane-operator/controllers/hostedcontrolplane/oauth/idp_convert_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/assets/control-plane-pki-operator/role.yamlcontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/auth_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/scrape_config.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/scrape_config_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/oauth/idp_convert.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/oauth/idp_convert_test.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/drainer/drainer.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/globalps/globalps.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/globalps/setup.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/globalps/setup_test.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/inplaceupgrader/inplaceupgrader.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/inplaceupgrader/inplaceupgrader_test.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/node/node.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/nodecount/controller.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources_test.gocontrol-plane-operator/main.gocontrol-plane-pki-operator/certificaterevocationcontroller/certificaterevocationcontroller.gocontrol-plane-pki-operator/certificaterevocationcontroller/certificaterevocationcontroller_test.godocs/content/how-to/ci/docs-preview.mddocs/content/how-to/ci/github-actions.mddocs/content/how-to/common/global-pull-secret.mddocs/content/how-to/gcp/configure-image-registry.mddocs/content/how-to/gcp/create-gcp-hosted-cluster.mddocs/content/how-to/gcp/index.mddocs/content/how-to/kubevirt/configuring-vm-with-jsonpatch.mddocs/mkdocs.ymlhack/github-actions-runner/README.mdhack/github-actions-runner/cache-warming-cronjob.yamlhack/kubelet-ratcheting-gen/main.gohypershift-operator/controllers/auditlogpersistence/snapshot_controller.gohypershift-operator/controllers/auditlogpersistence/snapshot_controller_test.gohypershift-operator/controllers/etcdbackup/reconciler.gohypershift-operator/controllers/etcdbackup/reconciler_test.gohypershift-operator/controllers/hostedcluster/etcd_recovery.gohypershift-operator/controllers/hostedcluster/etcd_recovery_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.gohypershift-operator/controllers/hostedcluster/internal/platform/platform.gohypershift-operator/controllers/hostedcluster/karpenter.gohypershift-operator/controllers/hostedcluster/karpenter_test.gohypershift-operator/controllers/hostedcluster/metrics/metrics.gohypershift-operator/controllers/hostedcluster/network_policies.gohypershift-operator/controllers/hostedcluster/network_policies_test.gohypershift-operator/controllers/hostedclustersizing/hostedclustersizing_controller.gohypershift-operator/controllers/hostedclustersizing/hostedclustersizing_controller_test.gohypershift-operator/controllers/hostedclustersizing/hostedclustersizing_validation_controller.gohypershift-operator/controllers/nodepool/aws.gohypershift-operator/controllers/nodepool/aws_test.gohypershift-operator/controllers/nodepool/capi.gohypershift-operator/controllers/nodepool/capi_test.gohypershift-operator/controllers/nodepool/metrics/metrics.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/platform/aws/controller.gohypershift-operator/controllers/platform/aws/controller_test.gohypershift-operator/controllers/scheduler/aws/autoscaler.gohypershift-operator/controllers/scheduler/aws/autoscaler_test.gohypershift-operator/controllers/scheduler/aws/dedicated_request_serving_nodes.gohypershift-operator/controllers/scheduler/aws/dedicated_request_serving_nodes_test.gohypershift-operator/controllers/scheduler/azure/controller.gohypershift-operator/main.goignition-server/controllers/local_ignitionprovider.goignition-server/controllers/local_ignitionprovider_test.gokarpenter-operator/controllers/karpenter/assets/tests/openshiftec2nodeclasses.karpenter.hypershift.openshift.io/stable.openshiftec2nodeclasses.kubelet-field-promotion.testsuite.yamlkarpenter-operator/controllers/karpenter/assets/tests/openshiftec2nodeclasses.karpenter.hypershift.openshift.io/stable.openshiftec2nodeclasses.kubelet-ratcheting.testsuite.yamlkarpenter-operator/controllers/karpenter/assets/tests/openshiftec2nodeclasses.karpenter.hypershift.openshift.io/stable.openshiftec2nodeclasses.kubelet.testsuite.yamlkarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.gokarpenter-operator/controllers/nodeclass/karpenter_util.gokarpenter-operator/controllers/nodeclass/karpenter_util_test.gosupport/api/scheme.gosupport/controlplane-component/CLAUDE.mdsupport/karpenter/karpenter.gosupport/karpenter/karpenter_test.gosupport/podspec/containers.gosupport/podspec/containers_test.gosupport/validations/authentication.gosupport/validations/authentication_test.gotest/e2e/create_cluster_test.gotest/e2e/karpenter_kubelet_checker_pod.yamltest/e2e/karpenter_test.gotest/e2e/util/dump/journals.gotest/e2e/util/hypershift_framework.gotest/e2e/util/reqserving/verifycp.gotest/e2e/util/reqserving/verifyenv.gotest/e2e/util/util.gotest/e2e/v2/backuprestore/cleanup.gotest/e2e/v2/cmd/create-guests/main.gotest/e2e/v2/internal/env_vars.gotest/e2e/v2/tests/backup_restore_test.gotest/e2e/v2/tests/control_plane_upgrade_test.gotest/e2e/v2/tests/etcd_chaos_test.gotest/e2e/v2/tests/hosted_cluster_image_registry_test.gotest/e2e/v2/tests/nodepool_autoscaling_test.gotest/e2e/v2/tests/nodepool_lifecycle_test.gotest/integration/control_plane_pki_operator.go
💤 Files with no reviewable changes (9)
- control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
- control-plane-operator/hostedclusterconfigoperator/controllers/drainer/drainer.go
- control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.go
- control-plane-operator/hostedclusterconfigoperator/controllers/nodecount/controller.go
- control-plane-operator/controllers/gcpprivateserviceconnect/observer.go
- control-plane-operator/hostedclusterconfigoperator/controllers/node/node.go
- control-plane-operator/hostedclusterconfigoperator/controllers/globalps/globalps.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.go
| // HasTypedFields reports whether any explicitly defined struct fields are set. | ||
| // This is used by IsZero, but is separate so we can differentiate the zero case | ||
| // from "only overflow fields set". This must be kept in sync with the typed | ||
| // fields in KubeletConfiguration. | ||
| func (k KubeletConfiguration) HasTypedFields() bool { | ||
| return k.MaxPods != 0 || | ||
| k.PodsPerCore != 0 || | ||
| k.SystemReserved != nil || | ||
| k.KubeReserved != nil || | ||
| k.EvictionHard != nil || | ||
| k.EvictionSoft != nil || | ||
| k.EvictionSoftGracePeriod != nil || | ||
| k.EvictionMaxPodGracePeriod != nil || | ||
| k.ImageGCHighThresholdPercent != nil || | ||
| k.ImageGCLowThresholdPercent != nil || | ||
| k.CPUCFSQuota != nil | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
HasTypedFields requires manual synchronization with struct fields.
This method must be manually updated whenever a new typed field is added to KubeletConfiguration. Unlike kubeletConfigKnownFields (which uses reflection to stay in sync automatically), this method has no compile-time or runtime check to ensure completeness. If a field is added but this method isn't updated, IsZero() will return incorrect results, breaking omitzero serialization behavior.
Consider one of:
- Add a test that uses reflection to verify all typed fields are checked here
- Implement this method using reflection (trading runtime cost for correctness)
- Add a prominent comment referencing the code review guideline about keeping this in sync
As per coding guidelines, API types should use omitzero correctly, and this is a potential source of bugs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/karpenter/v1/kubelet_config.go` around lines 196 - 212, HasTypedFields
currently lists typed fields manually (KubeletConfiguration.HasTypedFields) and
can get out of sync with the struct, causing IsZero/omitzero bugs; replace the
manual checks with a reflection-based implementation or add a unit test that
fails if any exported non-overflow struct field is not checked: implement
HasTypedFields by reflecting over KubeletConfiguration, skipping overflow/XXX
fields and detecting any non-zero typed field (or alternatively write a test
that compares the set of field names in kubeletConfigKnownFields/reflection
against the ones asserted in HasTypedFields to force updates), and ensure
IsZero/omitzero behavior remains correct for all current and future typed
fields.
There was a problem hiding this comment.
Not part of this PR's changes — this file was modified in a different commit on main.
There was a problem hiding this comment.
🧩 Analysis chain
@bryan-cox, thanks for the clarification — I'll resolve this comment now since the concern doesn't apply to this PR.
🏁 Script executed:
#!/bin/bash
# List my (coderabbitai) comments on PR `#8527` to find the one to resolve
gh api repos/openshift/hypershift/pulls/8527/comments --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, body: .body[0:80]}'
Length of output: 2593
🏁 Script executed:
#!/bin/bash
# Resolve the review thread containing comment 3248462110
# First, find the pull request review thread ID via GraphQL
gh api graphql -f query='
{
repository(owner: "openshift", name: "hypershift") {
pullRequest(number: 8527) {
reviewThreads(first: 100) {
nodes {
id
isResolved
comments(first: 1) {
nodes {
databaseId
body
}
}
}
}
}
}
}' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | select(.comments.nodes[0].databaseId == 3248462110) | {threadId: .id, resolved: .isResolved}'
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Resolve the review thread PRRT_kwDOE7ekcc6CXmgM
gh api graphql -f query='
mutation {
resolveReviewThread(input: {threadId: "PRRT_kwDOE7ekcc6CXmgM"}) {
thread {
id
isResolved
}
}
}'
Length of output: 266
@bryan-cox, thanks for the clarification! I attempted to resolve this comment automatically but don't have the permissions to do so on this repository — you're welcome to resolve it manually.
Since the concern about HasTypedFields being prone to drift from the struct fields is still valid for the codebase, would you like me to open a follow-up GitHub issue to track it?
| if len(opts.PausedUntil) == 0 || opts.PausedUntil == "true" { | ||
| return nil | ||
| } | ||
| _, err := time.Parse(time.RFC3339, opts.PausedUntil) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid pausedUntil value, should be \"true\" or a valid RFC3339 date format: %w", err) | ||
| } | ||
| cluster.Spec.PausedUntil = &opts.PausedUntil | ||
| return nil |
There was a problem hiding this comment.
Set HostedCluster.Spec.PausedUntil when --pausedUntil=true is used.
Line 476 returns early for "true", so Line 483 never sets cluster.Spec.PausedUntil. That contradicts the flag contract and creates inconsistent behavior (NodePools get paused later, HostedCluster does not).
💡 Proposed fix
func applyPausedUntil(cluster *hyperv1.HostedCluster, opts *CreateOptions) error {
// validate pausedUntil value
// valid values are either "true" or RFC3339 format date
- if len(opts.PausedUntil) == 0 || opts.PausedUntil == "true" {
+ if len(opts.PausedUntil) == 0 {
return nil
}
- _, err := time.Parse(time.RFC3339, opts.PausedUntil)
- if err != nil {
- return fmt.Errorf("invalid pausedUntil value, should be \"true\" or a valid RFC3339 date format: %w", err)
+ if opts.PausedUntil != "true" {
+ _, err := time.Parse(time.RFC3339, opts.PausedUntil)
+ if err != nil {
+ return fmt.Errorf("invalid pausedUntil value, should be \"true\" or a valid RFC3339 date format: %w", err)
+ }
}
cluster.Spec.PausedUntil = &opts.PausedUntil
return nil
}And align the unit test expectation accordingly:
- pausedUntil: "true",
- expectSet: false,
+ pausedUntil: "true",
+ expectSet: true,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/cluster/core/create.go` around lines 476 - 484, The code currently
returns early when opts.PausedUntil == "true" so HostedCluster.Spec.PausedUntil
is never set; change the logic in the function handling opts.PausedUntil to: if
opts.PausedUntil is empty return nil; if opts.PausedUntil == "true" set
cluster.Spec.PausedUntil = &opts.PausedUntil and return nil; otherwise parse
opts.PausedUntil with time.RFC3339, return a wrapped error on parse failure and
set cluster.Spec.PausedUntil = &opts.PausedUntil on success (update the unit
test expectation for HostedCluster.Spec.PausedUntil when "--pausedUntil=true"
accordingly).
There was a problem hiding this comment.
Not part of this PR's changes — this file was modified in a different commit on main.
There was a problem hiding this comment.
@bryan-cox, thanks for the clarification — I'll drop this comment. The concern may still be worth tracking as a separate bug. Would you like me to open a follow-up GitHub issue for it?
| func (o *DrOidcIamOptions) checkOIDCState(ctx context.Context, s3Client *s3.Client, iamClient *iam.Client) (bool, string, bool, error) { | ||
| fmt.Println("Step 1: Checking OIDC documents in S3") | ||
| oidcDocsExist := o.checkOIDCDocumentsExist(ctx, s3Client) | ||
| if oidcDocsExist && !o.ForceRecreate { | ||
| fmt.Printf("- (%s) OIDC documents found in S3\n", greenCheck()) | ||
| } else if o.ForceRecreate { | ||
| fmt.Printf("- (%s) Force recreate enabled - will regenerate OIDC documents\n", yellowForce()) | ||
| } else { | ||
| fmt.Printf("- (%s) OIDC documents missing in S3 - will create them\n", redX()) | ||
| } | ||
|
|
||
| fmt.Println("Step 2: Checking OIDC identity provider in IAM") | ||
| providerARN, exists, err := o.checkOIDCProvider(ctx, iamClient) | ||
| if err != nil { | ||
| return false, "", false, fmt.Errorf("failed to check OIDC provider: %w", err) | ||
| } |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
# First, locate and examine the file
find . -name "dr_oidc_iam.go" -type fRepository: openshift/hypershift
Repository: openshift/hypershift
Exit code: 0
stdout:
./cmd/fix/dr_oidc_iam.go
Script executed:
# Once located, examine the checkOIDCDocumentsExist method implementation
rg -A 20 "func \(o \*DrOidcIamOptions\) checkOIDCDocumentsExist" cmd/fix/dr_oidc_iam.goRepository: openshift/hypershift
Repository: openshift/hypershift
Exit code: 0
stdout:
func (o *DrOidcIamOptions) checkOIDCDocumentsExist(ctx context.Context, s3Client *s3.Client) bool {
documents := []string{
o.InfraID + oidcConfigPath,
o.InfraID + oidc.JWKSURI,
}
for _, doc := range documents {
_, err := s3Client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: aws.String(o.OIDCStorageProviderS3Bucket),
Key: aws.String(doc),
})
if err != nil {
return false
}
}
return true
}
// configureBucketPublicAccess sets public read access on an S3 bucket by
// disabling the Block Public Access settings and applying a read-all policy.
func configureBucketPublicAccess(ctx context.Context, s3Client *s3.Client, bucketName string) error {
Script executed:
# Also examine the full checkOIDCState method
rg -A 50 "func \(o \*DrOidcIamOptions\) checkOIDCState" cmd/fix/dr_oidc_iam.goRepository: openshift/hypershift
Repository: openshift/hypershift
Exit code: 0
stdout:
func (o *DrOidcIamOptions) checkOIDCState(ctx context.Context, s3Client *s3.Client, iamClient *iam.Client) (bool, string, bool, error) {
fmt.Println("Step 1: Checking OIDC documents in S3")
oidcDocsExist := o.checkOIDCDocumentsExist(ctx, s3Client)
if oidcDocsExist && !o.ForceRecreate {
fmt.Printf("- (%s) OIDC documents found in S3\n", greenCheck())
} else if o.ForceRecreate {
fmt.Printf("- (%s) Force recreate enabled - will regenerate OIDC documents\n", yellowForce())
} else {
fmt.Printf("- (%s) OIDC documents missing in S3 - will create them\n", redX())
}
fmt.Println("Step 2: Checking OIDC identity provider in IAM")
providerARN, exists, err := o.checkOIDCProvider(ctx, iamClient)
if err != nil {
return false, "", false, fmt.Errorf("failed to check OIDC provider: %w", err)
}
if exists && !o.ForceRecreate {
fmt.Printf("- (%s) OIDC identity provider exists\n", greenCheck())
} else if o.ForceRecreate {
fmt.Printf("- (%s) Force recreate enabled - will regenerate OIDC provider\n", yellowForce())
} else {
fmt.Printf("- (%s) OIDC identity provider missing - needs to be recreated\n", redX())
}
return oidcDocsExist, providerARN, exists, nil
}
func (o *DrOidcIamOptions) ensureOIDCDocuments(ctx context.Context, k8sClient client.Client, s3Client *s3.Client, oidcDocsExist bool) error {
if oidcDocsExist && !o.ForceRecreate {
fmt.Println("Step 4: OIDC documents")
fmt.Printf("- (%s) OIDC documents already exist\n", greenCheck())
return nil
}
fmt.Println("Step 4: Generating and uploading OIDC documents")
if o.DryRun {
fmt.Printf("- (%s) DRY RUN: Would generate and upload OIDC documents using existing cluster signing key\n", yellowQuestion())
return nil
}
if err := o.generateAndUploadOIDCDocuments(ctx, k8sClient, s3Client); err != nil {
return fmt.Errorf("failed to generate OIDC documents: %w", err)
}
fmt.Printf("- (%s) OIDC documents generated and uploaded\n", greenCheck())
return nil
}
func (o *DrOidcIamOptions) ensureOIDCProvider(ctx context.Context, iamClient *iam.Client, providerARN, thumbprint string, exists bool) error {
if exists && !o.ForceRecreate {
fmt.Println("Step 6: OIDC identity provider")
Don't collapse all S3 errors into a "documents missing" check.
The checkOIDCDocumentsExist method (line ~498) returns bool only and treats every HeadObject error identically—AccessDenied, transient failures, and missing objects all return false. This masks the root cause and drives the wrong recovery path in checkOIDCState (lines 502–506), which prints "documents missing" when the actual problem may be a permission error or network failure.
Change checkOIDCDocumentsExist to return (bool, error), propagate non-404 errors, and handle them explicitly in checkOIDCState instead of treating all failures as missing documents.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/fix/dr_oidc_iam.go` around lines 496 - 511, The check currently collapses
all S3 HeadObject failures into a false result; change checkOIDCDocumentsExist
to return (bool, error) instead of bool, have it return (false, nil) only for a
genuine NotFound/404 and return (false, err) for AccessDenied/network/transient
errors so callers can distinguish causes, then update checkOIDCState to handle
the propagated error from checkOIDCDocumentsExist (inspect error != nil to
log/return the error or abort rather than printing "documents missing") while
preserving the existing branches for oidcDocsExist and o.ForceRecreate; update
any call sites of checkOIDCDocumentsExist accordingly.
There was a problem hiding this comment.
Not part of this PR's changes — this file was modified in a different commit on main.
There was a problem hiding this comment.
🧩 Analysis chain
@bryan-cox, understood — thanks for the clarification. I'll resolve this comment now since it doesn't apply to this PR.
🏁 Script executed:
#!/bin/bash
# List coderabbitai's review comments on PR `#8527` to find the one to resolve
gh api repos/openshift/hypershift/pulls/8527/comments --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, body: .body[0:80]}'
Length of output: 2593
🏁 Script executed:
#!/bin/bash
# Resolve the review comment with ID 3248462117
# First get the review thread ID using GraphQL
gh api graphql -f query='
{
repository(owner: "openshift", name: "hypershift") {
pullRequest(number: 8527) {
reviewThreads(first: 100) {
nodes {
id
isResolved
comments(first: 1) {
nodes {
databaseId
}
}
}
}
}
}
}' | jq '.data.repository.pullRequest.reviewThreads.nodes[] | select(.comments.nodes[0].databaseId == 3248462117) | {id: .id, isResolved: .isResolved}'
Length of output: 59
🏁 Script executed:
#!/bin/bash
# Resolve the review thread
gh api graphql -f query='
mutation {
resolveReviewThread(input: {threadId: "PRRT_kwDOE7ekcc6CXmgT"}) {
thread {
id
isResolved
}
}
}'
Length of output: 266
@bryan-cox, apologies for the noise — you're right that cmd/fix/dr_oidc_iam.go isn't part of this PR's diff. I wasn't able to auto-resolve the thread (insufficient API permissions), so could you please resolve it manually?
Would you like me to open a follow-up GitHub issue to track the checkOIDCDocumentsExist error-handling concern for a future PR?
| find ./bin -type f ! -name "*.tar.gz" -delete | ||
|
|
||
| FROM registry.redhat.io/ubi9/nginx-124:1-1767745334 | ||
| FROM registry.redhat.io/ubi9/nginx-124:latest |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Expectation: no production/runtime stage uses :latest
rg -n '^\s*FROM\s+\S+:latest(\s+AS\s+\S+)?\s*$' --iglob 'Containerfile*' --iglob 'Dockerfile*'Repository: openshift/hypershift
Length of output: 662
Pin the base image to an immutable version.
The :latest tag on line 22 produces non-reproducible builds and can introduce unexpected runtime changes into releases. Replace with a specific version tag or digest.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Containerfile.cli` at line 22, The Docker base image reference "FROM
registry.redhat.io/ubi9/nginx-124:latest" is using the mutable :latest tag;
update that FROM line to pin to a specific immutable version tag or image digest
(e.g., a released version tag or sha256 digest) so builds are reproducible and
stable—locate the FROM statement in Containerfile.cli and replace ":latest" with
the chosen version tag or digest, then commit the change and optionally add a
short comment noting the pinned version.
There was a problem hiding this comment.
Not part of this PR's changes — this file was modified in a different commit on main.
There was a problem hiding this comment.
🧩 Analysis chain
@bryan-cox, understood — apologies for the noise. I'll resolve this comment now.
Would you like me to open a follow-up issue to track the :latest tag concern in Containerfile.cli so it's addressed in the appropriate PR?
🏁 Script executed:
#!/bin/bash
# Resolve the review comment made by coderabbitai on PR `#8527`
gh api graphql -f query='
{
repository(owner: "openshift", name: "hypershift") {
pullRequest(number: 8527) {
reviewThreads(first: 50) {
nodes {
id
isResolved
comments(first: 5) {
nodes {
author { login }
body
url
}
}
}
}
}
}
}'
Length of output: 75563
🏁 Script executed:
#!/bin/bash
gh api graphql -f query='
mutation {
resolveReviewThread(input: {threadId: "PRRT_kwDOE7ekcc6CXmgY"}) {
thread {
id
isResolved
}
}
}'
Length of output: 266
@bryan-cox, thanks for the clarification — apologies for the noise. This comment isn't applicable to your PR. I wasn't able to auto-resolve the thread due to a permissions limitation; feel free to resolve it manually.
Would you like me to open a follow-up GitHub issue to track pinning the nginx-124 base image in Containerfile.cli to an immutable version tag or digest?
beea747 to
f19b53f
Compare
Register 6 new environment variables for lifecycle tests: release images (latest, previous, N-1, N-2), Azure DES ID, and Azure credentials file path.
f19b53f to
002d435
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (4)
test/e2e/v2/tests/nodepool_autoscaling_test.go (1)
221-221: 💤 Low valueClarify or remove unused
nodeSelectorparameter.The
nodeSelectorparameter is always passed as an empty string (lines 83, 149) and the conditional at line 269 prevents the code from executing. If this parameter is intended for future use, the implementation at lines 270-272 appears incorrect—it uses the parameter value as the map key rather than parsing it as a key-value pair or label selector. If the parameter is not needed, consider removing it to simplify the function signature.Also applies to: 269-273
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/nodepool_autoscaling_test.go` at line 221, The function newAutoscalingWorkload currently accepts nodeSelector but never uses it (call sites pass ""), and the existing code that attempts to use it in the nodeSelector map is incorrect (it uses the whole string as a map key). Fix by either removing the unused nodeSelector parameter from newAutoscalingWorkload and all its callers to simplify the signature, or if you intend to support selectors, replace the current logic with proper parsing of a "key=value" string into a map (split on '=' and set map[key]=value) and validate empty/invalid input; update the call sites accordingly and remove the dead conditional that prevents execution when nodeSelector == "".test/e2e/v2/tests/etcd_chaos_test.go (3)
139-150: 💤 Low valueConsider tracking deletion failures.
The test logs deletion errors but doesn't verify that at least some deletions succeeded. If all deletions fail (e.g., due to a permissions issue), the test would still pass because it only checks convergence afterward.
Optional: Track successful deletions
+ deletionCount := 0 for time.Now().Before(deadline) { pod := randomEtcdPods(etcdPods.Items, 1)[0] err := testCtx.MgmtClient.Delete(ctx, &pod, &crclient.DeleteOptions{ GracePeriodSeconds: ptr.To[int64](0), }) if err != nil { GinkgoWriter.Printf("Warning: failed to delete pod %s: %v\n", pod.Name, err) } else { GinkgoWriter.Printf("Deleted pod %s\n", pod.Name) + deletionCount++ } time.Sleep(period) } + Expect(deletionCount).To(BeNumerically(">", 0), "at least one pod deletion should succeed")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/etcd_chaos_test.go` around lines 139 - 150, The loop that repeatedly deletes pods using randomEtcdPods and testCtx.MgmtClient.Delete only logs failures and never asserts that any deletion actually succeeded; update the loop to track successful deletions (e.g., increment a successCount when testCtx.MgmtClient.Delete returns nil inside the for time.Now().Before(deadline) loop), and after the loop assert/fail the test if successCount == 0 (using the test framework's failure helper or Ginkgo.Fail) so the test fails fast if no pod deletions were successful.
131-131: ⚡ Quick winAdd cleanup for marker ConfigMap.
The marker ConfigMap is created in the hosted cluster but never cleaned up. Consider adding cleanup logic in a
DeferCleanup()call or using an AfterEach block to remove test artifacts.🧹 Proposed cleanup pattern
// Create marker data that should survive the chaos markerCM := createMarkerConfigMap(ctx, guestClient) + DeferCleanup(func() { + if err := guestClient.Delete(ctx, markerCM); err != nil && !apierrors.IsNotFound(err) { + GinkgoWriter.Printf("Warning: failed to cleanup marker ConfigMap: %v\n", err) + } + })Apply the same pattern at line 170 in
EtcdKillAllMembersTest.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/etcd_chaos_test.go` at line 131, The marker ConfigMap created by createMarkerConfigMap(ctx, guestClient) (assigned to markerCM) is not cleaned up; add teardown to remove it by calling DeferCleanup(func() { delete the ConfigMap using guestClient and markerCM.Name/Namespace }) immediately after creation or add an AfterEach that deletes the marker ConfigMap; apply the same pattern used in EtcdKillAllMembersTest cleanup to ensure the ConfigMap is removed from the hosted cluster after the test finishes.
381-381: 💤 Low valueHandle error from MarshalText.
While
time.Time.MarshalText()should never fail in practice, ignoring the error violates error-handling best practices.📝 Proposed fix
- value, _ := time.Now().MarshalText() + value, err := time.Now().MarshalText() + Expect(err).NotTo(HaveOccurred(), "failed to marshal timestamp") cm := &corev1.ConfigMap{🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/etcd_chaos_test.go` at line 381, The assignment value, _ := time.Now().MarshalText() ignores the error from MarshalText; change it to capture the error (e.g., value, err := time.Now().MarshalText()) and handle it appropriately in the test by failing the test on error (use t.Fatalf/t.Fatal or the test helper you use, e.g., require.NoError) so the test does not silently ignore a potential MarshalText failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/cmd/create-guests/main.go`:
- Around line 138-243: The PR adds non-trivial functions (run, buildCLIArgs,
waitForClusterAvailable) with no unit tests; add table-driven unit tests: 1)
tests for buildCLIArgs covering combinations of flags (assignSPRoles,
generateSSH, optional fields like oidcIssuerURL, saTokenIssuerPrivateKeyPath,
workloadIdentitiesFile, dnsZoneRGName, etc.) asserting expected arg slices; 2)
tests for waitForClusterAvailable polling error-paths by faking a management
client (or using a client stub) to return not-found, transient errors, and a
final Available condition and asserting correct retry/poll behavior and error
wrapping; and 3) a minimal test for run that stubs exec.CommandContext (or
factor command execution out) to simulate a failing cmd.Run and verify the
returned wrapped error; locate these tests by referencing the functions
buildCLIArgs, waitForClusterAvailable, and run.
- Around line 227-230: The polling currently treats every error from
mgmtClient.Get(ctx, crclient.ObjectKey{Namespace: namespace, Name: name}, hc) as
retryable; change it to fail fast on non-retryable API/transport errors by
importing k8s.io/apimachinery/pkg/api/errors as apierrors and updating the error
handling: if err is not nil, then if the error is not a transient API error
(check apierrors.IsForbidden(err), apierrors.IsUnauthorized(err),
apierrors.IsBadRequest(err), apierrors.IsMethodNotAllowed(err) or any
non-Status/transport error) return false, err immediately; otherwise keep
logging and return false, nil to continue polling. Ensure you reference
mgmtClient.Get and the HostedCluster variable hc when making the change.
- Around line 145-146: The test binary is forwarding subprocess stdout to
os.Stdout (cmd.Stdout = os.Stdout), which can emit non-JSON text and break the
OTE JSON-on-stdout contract; change the child process stdout to not write to
process stdout (e.g., set cmd.Stdout = os.Stderr or a dedicated buffer) while
leaving cmd.Stderr = os.Stderr so that only the main process emits JSON on
os.Stdout; update the code that sets cmd.Stdout and cmd.Stderr (referenced as
cmd.Stdout and cmd.Stderr in main.go) accordingly.
In `@test/e2e/v2/tests/etcd_chaos_test.go`:
- Around line 285-301: The removal command uses shell substitution ($(…)) but is
passed as an argument slice to e2eutil.RunCommandInPod, so the substitution
won't be evaluated; update the call to run via a shell (e.g., build a single
command string from baseCommand/memberDiscoveryCommand and invoke sh -c) or
alternatively run member discovery separately (call e2eutil.RunCommandInPod with
the member-discovery command, parse its output to get the member ID, then call
RunCommandInPod with baseCommand + ["member","remove", memberID]); adjust usage
of baseCommand, memberDiscoveryCommand, command and the RunCommandInPod
invocation accordingly so the shell substitution is evaluated or replaced by the
discovered ID.
In `@test/e2e/v2/tests/nodepool_autoscaling_test.go`:
- Line 236: The test hardcodes Image:
"registry.access.redhat.com/ubi9/ubi-minimal:latest" which pulls from an
external registry; change the test to use an internal/mirrored image or a test
fixture image (e.g., replace the literal with a constant or config value such as
testImage or localRegistryImage) and/or gate the test so it uses a pre-populated
image when running in air-gapped CI; update the Image assignment in
nodepool_autoscaling_test.go (the struct/variable that sets Image) to reference
that internal/mirrored config value or a test-only image name and document the
expectation in the test setup so CI can pre-pull if needed.
In `@test/e2e/v2/tests/nodepool_lifecycle_test.go`:
- Around line 1179-1221: The test currently only waits for nodes to be ready
(e2eutil.WaitForReadyNodesByNodePool) but does not verify the AzureMachine
DiskEncryptionSetID (TODO), producing a false-positive; either implement the
verification by importing the CAPI Azure type (capiazure.AzureMachineList) and
asserting each AzureMachine.Spec.OSDisk.DiskEncryptionSetID equals the provided
E2E_AZURE_DISK_ENCRYPTION_SET_ID (using guestClient/testCtx.MgmtClient and the
NodePool np to list/filter machines in the control-plane namespace), or mark the
test to explicitly Skip() with a clear message until that AzureMachine-based
assertion is implemented. Ensure references to buildTestNodePool, np.Name,
cleanupNodePool remain unchanged.
- Around line 811-832: Before mutating HostedCluster.Spec.AdditionalTrustBundle,
capture the original value (e.g., orig := hc.Spec.AdditionalTrustBundle) and in
your deferred cleanup use e2eutil.UpdateObject (same helper used earlier) to
restore that original pointer/value to HostedCluster.Spec.AdditionalTrustBundle
instead of setting it to nil; additionally, in the same defer remove the created
trustBundle ConfigMap via testCtx.MgmtClient.Delete(ctx, trustBundle) so the
generated ConfigMap is cleaned up. Ensure you reference the existing variables
and functions: HostedCluster.Spec.AdditionalTrustBundle, trustBundle,
e2eutil.UpdateObject, and testCtx.MgmtClient.
- Around line 150-170: Add cleanup for the created management-cluster ConfigMap
(mcConfigMap) and the hosted-cluster DaemonSet (ds) by deleting them after the
test (preferably in an AfterEach or via defer immediately after creation); call
testCtx.MgmtClient.Delete(ctx, mcConfigMap) and guestClient.Delete(ctx, ds) and
ignore apierrors.IsNotFound() on delete, returning any other errors, so the
resources created by mcConfigMap, buildMachineConfigVerificationDaemonSet (ds),
guestClient, and testCtx.MgmtClient are removed and do not accumulate across
runs.
- Around line 1280-1283: The DaemonSet containers are using the unqualified
public image "alpine" (in the Containers slice for the DS created with variable
dsName), which breaks disconnected CI; replace that literal with a
repo-managed/mirrored test image reference (e.g., a shared test constant or
helper such as testImage, images.MirroredBusybox, or
testutils.MirroredTestImage) and update both occurrences (around the Containers
block at dsName and the other instance at lines ~1351-1354) so the test pulls
from the internal/mirrored registry instead of the public alpine image.
- Around line 1133-1174: The test always calls Skip() but only after creating a
NodePool and waiting for readiness; move the early Skip so no resources are
provisioned. In NodePoolAutoRepairTest, call Skip("auto-repair instance
termination not yet implemented for v2 framework") immediately after the
platform check (or at the start of the It block) and before invoking
getDefaultNodePool, buildTestNodePool, testCtx.MgmtClient.Create, or
e2eutil.WaitForReadyNodesByNodePool so the test exits without creating the
NodePool or waiting for nodes.
---
Nitpick comments:
In `@test/e2e/v2/tests/etcd_chaos_test.go`:
- Around line 139-150: The loop that repeatedly deletes pods using
randomEtcdPods and testCtx.MgmtClient.Delete only logs failures and never
asserts that any deletion actually succeeded; update the loop to track
successful deletions (e.g., increment a successCount when
testCtx.MgmtClient.Delete returns nil inside the for time.Now().Before(deadline)
loop), and after the loop assert/fail the test if successCount == 0 (using the
test framework's failure helper or Ginkgo.Fail) so the test fails fast if no pod
deletions were successful.
- Line 131: The marker ConfigMap created by createMarkerConfigMap(ctx,
guestClient) (assigned to markerCM) is not cleaned up; add teardown to remove it
by calling DeferCleanup(func() { delete the ConfigMap using guestClient and
markerCM.Name/Namespace }) immediately after creation or add an AfterEach that
deletes the marker ConfigMap; apply the same pattern used in
EtcdKillAllMembersTest cleanup to ensure the ConfigMap is removed from the
hosted cluster after the test finishes.
- Line 381: The assignment value, _ := time.Now().MarshalText() ignores the
error from MarshalText; change it to capture the error (e.g., value, err :=
time.Now().MarshalText()) and handle it appropriately in the test by failing the
test on error (use t.Fatalf/t.Fatal or the test helper you use, e.g.,
require.NoError) so the test does not silently ignore a potential MarshalText
failure.
In `@test/e2e/v2/tests/nodepool_autoscaling_test.go`:
- Line 221: The function newAutoscalingWorkload currently accepts nodeSelector
but never uses it (call sites pass ""), and the existing code that attempts to
use it in the nodeSelector map is incorrect (it uses the whole string as a map
key). Fix by either removing the unused nodeSelector parameter from
newAutoscalingWorkload and all its callers to simplify the signature, or if you
intend to support selectors, replace the current logic with proper parsing of a
"key=value" string into a map (split on '=' and set map[key]=value) and validate
empty/invalid input; update the call sites accordingly and remove the dead
conditional that prevents execution when nodeSelector == "".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4e829337-e54e-4140-aee6-909410e6f303
📒 Files selected for processing (8)
Makefiletest/e2e/util/util.gotest/e2e/v2/cmd/create-guests/main.gotest/e2e/v2/internal/env_vars.gotest/e2e/v2/tests/control_plane_upgrade_test.gotest/e2e/v2/tests/etcd_chaos_test.gotest/e2e/v2/tests/nodepool_autoscaling_test.gotest/e2e/v2/tests/nodepool_lifecycle_test.go
✅ Files skipped from review due to trivial changes (2)
- test/e2e/v2/internal/env_vars.go
- test/e2e/v2/tests/control_plane_upgrade_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
Port autoscaling scale-up/down and multi-NodePool balancing tests from v1 to v2 Ginkgo framework. Label: nodepool-autoscaling
Port TestHAEtcdChaos from v1 to v2 Ginkgo framework. Uses Ordered decorator so tests run sequentially (etcd needs recovery between tests). Assumes HA control plane. Label: etcd-chaos Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port TestUpgradeControlPlane from v1 to v2 Ginkgo framework. Inlines upgrade wait logic using EventuallyObject/EventuallyObjects (testing.TB) since the v1 wrapper functions require *testing.T which is incompatible with GinkgoTB(). The test updates the hosted cluster release image, then waits for: 1. ControlPlaneComponent resources to report RolloutComplete (4.20+) 2. ControlPlaneVersion status to reach Completed state (4.22+) 3. Data plane (CVO) version history to show CompletedUpdate Post-upgrade validation checks (EnsureFeatureGateStatus, EnsureNoCrashingPods, etc.) are deferred until the e2eutil functions are refactored from *testing.T to testing.TB. Label: control-plane-upgrade Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GinkgoTB() returns *GinkgoTBWrapper which implements testing.TB but not *testing.T. Widening these signatures lets v2 tests call shared helpers directly instead of inlining them. The change is backwards-compatible since *testing.T satisfies testing.TB. Functions that use t.Run() (Ensure*) cannot be widened and remain as TODOs in the v2 upgrade test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port 13 NodePool sub-tests from v1 to v2 Ginkgo framework. Each sub-test creates its own uniquely-named NodePool and cleans up, making tests parallelism-safe for shared clusters. Tests ported: MachineConfig rollout, NTO rollout (Replace + InPlace), NodePool upgrade (Replace + InPlace), rolling upgrade, previous release (N-1, N-2), mirror configs, trust bundle propagation, NTO performance profile, auto-repair skeleton, and disk encryption skeleton. Label: nodepool-lifecycle Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go binary that wraps the hypershift CLI to create HA clusters for lifecycle tests. CI calls this from the create-selfmanaged-guests step instead of inline shell scripts, keeping cluster creation logic co-located with the tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
002d435 to
03750f5
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/v2/cmd/create-guests/main.go (1)
17-17: ⚡ Quick winUse hosted-cluster terminology in v2 helper text.
Line 17 still uses “create-guests” wording. In
test/e2e/v2, keep terminology aligned to “hosted cluster” / “control plane”.As per coding guidelines "
test/e2e/v2/**/*.go: Use terminology 'hosted cluster' and 'control plane'; never use 'guest cluster'".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/cmd/create-guests/main.go` at line 17, Update the helper text to use "hosted cluster" / "control plane" terminology instead of "guest" — replace the comment and any CLI/help strings that say "create-guests" with wording like "create-hosted-clusters" or "create hosted clusters" (and update any related usage in package main / func main or CLI flag descriptions) so all references in this file (e.g., the top-line comment that currently reads "create-guests creates HA HostedClusters for v2 e2e lifecycle tests.") use the approved "hosted cluster" / "control plane" terminology.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e/v2/cmd/create-guests/main.go`:
- Line 17: Update the helper text to use "hosted cluster" / "control plane"
terminology instead of "guest" — replace the comment and any CLI/help strings
that say "create-guests" with wording like "create-hosted-clusters" or "create
hosted clusters" (and update any related usage in package main / func main or
CLI flag descriptions) so all references in this file (e.g., the top-line
comment that currently reads "create-guests creates HA HostedClusters for v2 e2e
lifecycle tests.") use the approved "hosted cluster" / "control plane"
terminology.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: fc5108cd-991e-4e6e-9648-36b14975a5a1
📒 Files selected for processing (7)
Makefiletest/e2e/util/util.gotest/e2e/v2/cmd/create-guests/main.gotest/e2e/v2/tests/control_plane_upgrade_test.gotest/e2e/v2/tests/etcd_chaos_test.gotest/e2e/v2/tests/nodepool_autoscaling_test.gotest/e2e/v2/tests/nodepool_lifecycle_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- Makefile
- test/e2e/v2/tests/control_plane_upgrade_test.go
- test/e2e/v2/tests/nodepool_autoscaling_test.go
- test/e2e/util/util.go
- test/e2e/v2/tests/etcd_chaos_test.go
- test/e2e/v2/tests/nodepool_lifecycle_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/cmd/create-guests/main.go`:
- Around line 708-709: The comment above envOrDefaultInt uses "unparseable"
which trips codespell; update the comment to use the preferred spelling
"unparsable" (e.g., change "// default if empty or unparseable." to "// default
if empty or unparsable.") so the function comment passes make verify-codespell
without changing behavior of envOrDefaultInt.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: da616eb2-58b4-42f8-92e3-6d924aa62d9e
📒 Files selected for processing (6)
Dockerfile.e2eMakefiletest/e2e/v2/cmd/create-guests/main.gotest/e2e/v2/cmd/destroy-guests/main.gotest/e2e/v2/cmd/dump-guests/main.gotest/e2e/v2/cmd/run-tests/main.go
e6229f7 to
87b8c03
Compare
|
/test e2e-azure-v2-self-managed |
Replace the bash scripts in the openshift/release step registry with Go binaries that live in the hypershift repo. This moves CI orchestration closer to the code it tests, making it easier to maintain and evolve. Four binaries under test/e2e/v2/cmd/: - create-guests: creates 4 Azure HostedClusters in parallel (public, private, oauth-lb, HA upgrade) with watch-based waiting instead of poll loops, OperatorConfiguration patching, and JUnit XML output - run-tests: dispatches test-e2e-v2 per cluster with Ginkgo label filters, running upgrade + etcd-chaos sequentially on the HA cluster - destroy-guests: tears down all 4 clusters with best-effort semantics - dump-guests: collects diagnostic artifacts, always exits 0 Cluster names are derived deterministically from PROW_JOB_ID via sha256, matching the existing bash convention. The Dockerfile.e2e and Makefile are updated to build and ship all four binaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
87b8c03 to
bc776a2
Compare
|
/test e2e-azure-v2-self-managed |
1 similar comment
|
/test e2e-azure-v2-self-managed |
dc8eca5 to
2e3dbb1
Compare
|
/test pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed |
|
/test e2e-azure-v2-self-managed |
|
/test pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed |
|
/test e2e-azure-v2-self-managed |
|
@bryan-cox: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- Etcd chaos: delete entire member directory instead of a single WAL file so etcd enters CrashLoopBackOff and the recovery controller detects it - Autoscaling scale-up: add node labels to NodePool and nodeSelector to workload so pods target only the autoscaling NP, forcing the autoscaler to scale it up - Autoscaling balancing: configure HostedCluster.Spec.Autoscaling with RandomExpander and BalancingIgnoredLabels so the autoscaler distributes across NodePools instead of favoring one - Split autoscaling tests onto a dedicated cluster to reduce sequential test count in the public group below the step timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2e3dbb1 to
b6a8b20
Compare
|
/test e2e-azure-v2-self-managed |
Summary
Ports 4 remaining v1 lifecycle tests to the v2 Ginkgo framework with platform-agnostic labels, enabling them to run in the Azure self-managed CI pipeline (and any future platform CI).
nodepool-autoscaling): scale-up/down and multi-NodePool balancingnodepool-lifecycle): 13 sub-tests covering machineconfig, NTO, upgrades (replace/in-place/rolling), previous releases, mirror configs, trust bundles, performance profiles, auto-repair, and disk encryptioncontrol-plane-upgrade): N-1 → latest release upgrade with component and version rollout verificationetcd-chaos): 5 sub-tests for single member recovery, random/all member kills, WAL corruption, and missing member recoveryAlso includes:
e2eutilhelper functions from*testing.Ttotesting.TBfor Ginkgo compatibility (GinkgoTB()returnstesting.TB, not*testing.T)create-guestsbinary for CI cluster creation (wraps hypershift CLI, waits for availability)Design decisions
t.Run()(e.g.EnsureNoCrashingPods) remain as TODOs sincetesting.TBdoesn't exposeRun()Companion PR needed:
openshift/releasechanges for CI wiring (create/destroy/dump scripts, label filters, job config)Test plan
make verifypassesmake e2ebuilds clean (v1 backwards-compatible aftertesting.TBchange)make e2ev2builds cleango build -tags e2ev2 ./test/e2e/v2/cmd/create-guests/compilesgo vet -tags e2ev2 ./test/e2e/v2/...clean/test e2e-azure-v2-self-managed(existing tests still pass)🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
New Features
Chores