Skip to content

Migrate self-managed Azure e2e tests to v2 Ginkgo framework#8204

Open
bryan-cox wants to merge 2 commits intoopenshift:mainfrom
bryan-cox:migrate-self-azure-e2e
Open

Migrate self-managed Azure e2e tests to v2 Ginkgo framework#8204
bryan-cox wants to merge 2 commits intoopenshift:mainfrom
bryan-cox:migrate-self-azure-e2e

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented Apr 10, 2026

Summary

  • Migrate self-managed Azure e2e tests from the v1 framework (*testing.T, t.Run()) to v2 (Ginkgo v2 It() blocks) so each test case produces a separate JUnit XML entry for individual Sippy reporting
  • Extract Validate* helpers from Ensure* wrappers accepting testing.TB for cross-framework compatibility
  • Refactor EventuallyObject/EventuallyObjects to accept testing.TB (backward-compatible since *testing.T implements testing.TB)
  • Add GetGuestClient()/SetupGuestClient() to TestContext for guest cluster validation

New test specs (9 total across 3 label groups)

self-managed-azure-public (3 specs):

  • Workload identity webhook mutation
  • KAS allowed CIDRs
  • Ingress Operator configuration

self-managed-azure-private (4 specs, Ordered):

  • Internal LB annotation on private-router Service
  • AzurePrivateLinkService CR with PLS alias
  • Private Endpoint IP populated
  • Private DNS Zone ID populated

self-managed-azure-oauth-lb (2 specs):

  • OAuth Service as LoadBalancer with external IP
  • OAuth token flow through LoadBalancer endpoint

CI workflow (openshift/release PR to follow)

The test binary is invoked 3 times with --ginkgo.label-filter against different guest clusters (public, private, OAuth LB), each producing its own JUnit XML.

Test plan

  • make e2e — all test binaries compile including bin/test-e2e-self-managed-azure
  • bin/test-e2e-self-managed-azure --ginkgo.dry-run --ginkgo.v — lists 9 specs
  • Label filtering works: --ginkgo.label-filter="self-managed-azure-public" runs 3 specs, skips 6
  • E2E_SHOW_ENV_HELP=1 prints Azure-specific env vars
  • go build -tags e2e ./test/e2e/... — v1 tests still compile (backward compatible)
  • Full CI run against real Azure clusters (pending openshift/release PR)

Ref: CNTRLPLANE-3222

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added comprehensive Azure test suite: workload identity webhook, private topology checks, and OAuth via load balancer.
    • Refactored tests to centralize validation logic and improve polling/timeout robustness.
  • Chores
    • Broadened test helper interfaces for wider reuse across test contexts.
    • Registered two optional Azure environment variables for private network/subscription scenarios.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extracts validation logic from several e2e test wrappers into standalone helpers that accept testing.TB, widens multiple helper signatures from *testing.T to testing.TB, and adds focused validation functions for Azure workload identity webhook mutation, OAuth via LoadBalancer, kube-apiserver allowed CIDRs, and ingress operator configuration. It registers two optional Azure env vars and introduces a new e2e test suite (build tag e2ev2) with Azure-specific tests for public clusters, private topology, and OAuth LoadBalancer flows. Minor adjustments to polling utilities and kubeconfig validation were made.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant TestRunner
participant MgmtAPI as Management API (hosted-cluster CR)
participant GuestAPI as Guest Cluster API (guestClient)
participant AzureLB as Azure LoadBalancer
participant External as External HTTP Client

TestRunner->>MgmtAPI: Read HostedCluster (platform, topology, publishingStrategy)
TestRunner->>GuestAPI: Wait for oauth-openshift Service to be LoadBalancer & have ingress
GuestAPI->>AzureLB: Request/observe LoadBalancer provisioning
AzureLB-->>GuestAPI: Provisioned ingress (IP/hostname)
GuestAPI-->>TestRunner: Service has external endpoint
TestRunner->>External: Perform OAuth token flow via LoadBalancer endpoint
External-->>GuestAPI: HTTP requests to OAuth endpoint (token exchange)
GuestAPI-->>TestRunner: Return token response (validated)


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Ote Binary Stdout Contract ❌ Error The PrintEnvVarHelp() function writes to stdout using fmt.Print*(), violating the OTE Binary Stdout Contract by corrupting test listing JSON. Replace all fmt.Print*() calls with fmt.Fprint(os.Stderr, ...) variants to redirect help output to stderr.
Docstring Coverage ⚠️ Warning Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test code calls GetHostedClusterClient() before kubeconfig is ready, permanently poisoning shared TestContext via sync.Once guard. Review provided specific fixes but they were not implemented. Call WaitForGuestKubeConfig before GetHostedClusterClient in workload identity and ingress specs; remove the call from CIDR spec entirely; delete redundant NotNil assertions.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: migrating self-managed Azure e2e tests from Go testing to Ginkgo v2 framework, which is the primary objective of the changeset.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test titles in the new Azure test file are stable and deterministic with descriptive static strings and no dynamic content in test names.
Microshift Test Compatibility ✅ Passed New Ginkgo e2e tests use only HyperShift custom APIs and standard Kubernetes APIs with proper platform/topology gating. No MicroShift-unavailable OpenShift APIs present.
Single Node Openshift (Sno) Test Compatibility ✅ Passed All new Azure e2e tests are gated on Azure platform type and topology configurations; they validate Azure-specific features only applicable to Hypershift-managed hosted clusters, not SNO deployments.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test infrastructure in test/ directory; no deployment manifests, operator code, or workload specifications that could impose scheduling constraints are changed.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The new Ginkgo e2e tests in hosted_cluster_azure_test.go do not contain IPv4 assumptions or external internet connectivity requirements that would fail in IPv6-only disconnected environments.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added area/documentation Indicates the PR includes changes for documentation area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Apr 10, 2026
@openshift-ci openshift-ci bot requested review from csrwng and jparrill April 10, 2026 19:56
@bryan-cox bryan-cox force-pushed the migrate-self-azure-e2e branch from f5acfaf to a2c6727 Compare April 10, 2026 19:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
test/e2e/util/util_ingress_operator_configuration.go (1)

20-20: Unused mgmtClient parameter.

The mgmtClient parameter is declared but not used within ValidateIngressOperatorConfiguration. If it's not needed, consider removing it to keep the function signature clean.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/util_ingress_operator_configuration.go` at line 20, The
function ValidateIngressOperatorConfiguration declares an unused parameter
mgmtClient; remove mgmtClient from the function signature and update all call
sites to stop passing that argument (or, if removal is risky, replace the
parameter with _ to explicitly ignore it). Search for
ValidateIngressOperatorConfiguration usages and adjust them accordingly so the
signature and callers remain consistent.
test/e2e/v2/internal/test_context.go (1)

178-182: Consider checking for empty kubeconfig data.

The code checks if the kubeconfig key exists but doesn't verify if the data is non-empty. An empty byte slice would pass this check but fail during REST config creation with a potentially confusing error.

Suggested improvement
 	kubeconfigData, ok := kubeconfigSecret.Data["kubeconfig"]
-	if !ok {
-		return fmt.Errorf("kubeconfig key not found in secret %s/%s", hc.Namespace, kubeconfigSecretName)
+	if !ok || len(kubeconfigData) == 0 {
+		return fmt.Errorf("kubeconfig key not found or empty in secret %s/%s", hc.Namespace, kubeconfigSecretName)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/v2/internal/test_context.go` around lines 178 - 182, The current
extraction of kubeconfigData only checks presence of the "kubeconfig" key but
not that its value is non-empty; update the logic around kubeconfigData (the
variable extracted from kubeconfigSecret.Data["kubeconfig"]) to verify
len(kubeconfigData) > 0 and return a clear error (including hc.Namespace and
kubeconfigSecretName) when the slice is empty so the subsequent REST config
creation doesn't fail with a confusing error.
test/e2e/util/azure.go (1)

23-25: Consider adding cleanup for test resources.

The test creates a namespace, service account, and pod but doesn't clean them up. While this may be acceptable if tests run in ephemeral clusters, adding deferred cleanup would make the test more robust and prevent resource accumulation in long-running test environments.

Suggested approach
// After creating the namespace
t.Cleanup(func() {
    _ = guestClient.Delete(context.Background(), testNamespace)
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/azure.go` around lines 23 - 25, Add deferred cleanup after
creating test resources: after creating testNamespace (nsName) call t.Cleanup
(or defer) to delete the namespace and any created ServiceAccount/Pod using
guestClient.Delete so resources are removed when the test finishes; reference
guestClient.Delete(ctx, testNamespace) and similar deletes for the
ServiceAccount and Pod objects (use context.Background() or the existing ctx) to
ensure robust teardown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/util/util.go`:
- Around line 3478-3490: The ValidateKubeAPIServerAllowedCIDRs function mutates
hc.Spec.Networking.APIServer.AllowedCIDRBlocks (via ensureAPIServerAllowedCIDRs)
and does not restore it; capture the original APIServer networking value at the
start (e.g., save a copy of hc.Spec.Networking.APIServer or AllowedCIDRBlocks),
then defer an UpdateObject call that restores the saved value to the hosted
cluster before returning. Use the same mgmtClient.Update/UpdateObject helper
used elsewhere, reference ValidateKubeAPIServerAllowedCIDRs,
hc.Spec.Networking.APIServer.AllowedCIDRBlocks, and ensureAPIServerAllowedCIDRs
so the original CIDRs are reinstated even if the test fails.

In `@test/e2e/v2/selfmanagedazure/private_topology_test.go`:
- Around line 67-101: The current EventuallyObjects usage applies per-object
predicates to every AzurePrivateLinkService returned by the List, causing
failures when unrelated or still-reconciling PLSes exist; update the logic in
the AzurePrivateLinkService check (the anonymous collection predicate passed to
EventuallyObjects that inspects AzurePrivateLinkServiceList /
AzurePrivateLinkService items) so it either filters the list down to the target
object(s) before returning or implements an "any-match" readiness test (i.e.,
scan items and succeed as soon as one AzurePrivateLinkService has a non-empty
Status.PrivateLinkServiceAlias / endpoint IP / DNS zone), rather than requiring
all items to be ready. Ensure this change is applied to the three similar blocks
that check alias/endpoint IP/DNS zone so EventuallyObjects observes at least one
ready item.

---

Nitpick comments:
In `@test/e2e/util/azure.go`:
- Around line 23-25: Add deferred cleanup after creating test resources: after
creating testNamespace (nsName) call t.Cleanup (or defer) to delete the
namespace and any created ServiceAccount/Pod using guestClient.Delete so
resources are removed when the test finishes; reference guestClient.Delete(ctx,
testNamespace) and similar deletes for the ServiceAccount and Pod objects (use
context.Background() or the existing ctx) to ensure robust teardown.

In `@test/e2e/util/util_ingress_operator_configuration.go`:
- Line 20: The function ValidateIngressOperatorConfiguration declares an unused
parameter mgmtClient; remove mgmtClient from the function signature and update
all call sites to stop passing that argument (or, if removal is risky, replace
the parameter with _ to explicitly ignore it). Search for
ValidateIngressOperatorConfiguration usages and adjust them accordingly so the
signature and callers remain consistent.

In `@test/e2e/v2/internal/test_context.go`:
- Around line 178-182: The current extraction of kubeconfigData only checks
presence of the "kubeconfig" key but not that its value is non-empty; update the
logic around kubeconfigData (the variable extracted from
kubeconfigSecret.Data["kubeconfig"]) to verify len(kubeconfigData) > 0 and
return a clear error (including hc.Namespace and kubeconfigSecretName) when the
slice is empty so the subsequent REST config creation doesn't fail with a
confusing error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 1799fad7-46e9-4a05-9392-fccbd51ba988

📥 Commits

Reviewing files that changed from the base of the PR and between 4402b11 and f5acfaf.

📒 Files selected for processing (14)
  • Makefile
  • docs/superpowers/plans/2026-04-10-migrate-self-managed-azure-e2e-to-v2.md
  • docs/superpowers/specs/2026-04-10-migrate-self-managed-azure-e2e-to-v2-design.md
  • test/e2e/util/azure.go
  • test/e2e/util/eventually.go
  • test/e2e/util/oauth.go
  • test/e2e/util/util.go
  • test/e2e/util/util_ingress_operator_configuration.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/internal/test_context.go
  • test/e2e/v2/selfmanagedazure/oauth_lb_test.go
  • test/e2e/v2/selfmanagedazure/private_topology_test.go
  • test/e2e/v2/selfmanagedazure/public_cluster_test.go
  • test/e2e/v2/selfmanagedazure/suite_test.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/util/oauth.go (1)

128-137: ⚠️ Potential issue | 🟠 Major

Add per-attempt request context and client timeout in OAuth token polling.

httpClient.Do(request) reuses a request not bound to the poll callback context and uses no client timeout. A slow/hung network call can outlive PollUntilContextTimeout, causing long or stuck test runs.

Suggested fix
-	httpClient := &http.Client{Transport: transport}
+	httpClient := &http.Client{
+		Transport: transport,
+		Timeout:   15 * time.Second,
+	}
 	httpClient.CheckRedirect = func(req *http.Request, via []*http.Request) error {
 		// don't resolve redirects and return the response instead
 		return http.ErrUseLastResponse
 	}
@@
-		resp, err := httpClient.Do(request)
+		req := request.Clone(ctx)
+		resp, err := httpClient.Do(req)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/oauth.go` around lines 128 - 137, The polling callback uses a
pre-built request and an http.Client with no timeout, so individual attempts can
hang beyond PollUntilContextTimeout; inside the wait.PollUntilContextTimeout
callback create a new per-attempt request bound to the provided ctx (use
http.NewRequestWithContext or request.Clone().WithContext(ctx)) and use an
http.Client with a per-attempt timeout (set httpClient.Timeout or construct a
short-lived client inside the callback) before calling httpClient.Do(request) so
each attempt is cancellable by the poll context and cannot hang indefinitely;
update references around httpClient.Do, request, and the transport/CheckRedirect
setup accordingly.
🧹 Nitpick comments (1)
test/e2e/v2/selfmanagedazure/suite_test.go (1)

27-39: Fail fast when only one hosted-cluster env var is set.

Line 36-Line 39 currently skips guest setup silently if one of the two hosted-cluster env vars is missing, which can defer failures into specs with less actionable errors. Add an explicit pair-validation check in BeforeSuite.

Suggested patch
 var _ = BeforeSuite(func() {
 	ctx := context.Background()
 	ctrl.SetLogger(zap.New())
 
+	hcName := internal.GetEnvVarValue("E2E_HOSTED_CLUSTER_NAME")
+	hcNamespace := internal.GetEnvVarValue("E2E_HOSTED_CLUSTER_NAMESPACE")
+	if (hcName == "") != (hcNamespace == "") {
+		Fail("E2E_HOSTED_CLUSTER_NAME and E2E_HOSTED_CLUSTER_NAMESPACE must be set together")
+	}
+
 	testCtx, err := internal.SetupTestContextFromEnv(ctx)
 	Expect(err).NotTo(HaveOccurred(), "failed to setup test context")
 	Expect(testCtx).NotTo(BeNil(), "test context should not be nil")

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/v2/selfmanagedazure/suite_test.go` around lines 27 - 39, BeforeSuite
currently silently skips guest client setup when only one of testCtx.ClusterName
or testCtx.ClusterNamespace is set; add an explicit pair-validation right after
testCtx is created: if exactly one of testCtx.ClusterName or
testCtx.ClusterNamespace is empty, fail fast (use Expect/Fail) with a clear
message indicating both env vars must be provided together. Locate the
BeforeSuite block and add the check referencing testCtx.ClusterName,
testCtx.ClusterNamespace and testCtx.SetupGuestClient so the suite errors
immediately instead of deferring to later specs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@test/e2e/util/oauth.go`:
- Around line 128-137: The polling callback uses a pre-built request and an
http.Client with no timeout, so individual attempts can hang beyond
PollUntilContextTimeout; inside the wait.PollUntilContextTimeout callback create
a new per-attempt request bound to the provided ctx (use
http.NewRequestWithContext or request.Clone().WithContext(ctx)) and use an
http.Client with a per-attempt timeout (set httpClient.Timeout or construct a
short-lived client inside the callback) before calling httpClient.Do(request) so
each attempt is cancellable by the poll context and cannot hang indefinitely;
update references around httpClient.Do, request, and the transport/CheckRedirect
setup accordingly.

---

Nitpick comments:
In `@test/e2e/v2/selfmanagedazure/suite_test.go`:
- Around line 27-39: BeforeSuite currently silently skips guest client setup
when only one of testCtx.ClusterName or testCtx.ClusterNamespace is set; add an
explicit pair-validation right after testCtx is created: if exactly one of
testCtx.ClusterName or testCtx.ClusterNamespace is empty, fail fast (use
Expect/Fail) with a clear message indicating both env vars must be provided
together. Locate the BeforeSuite block and add the check referencing
testCtx.ClusterName, testCtx.ClusterNamespace and testCtx.SetupGuestClient so
the suite errors immediately instead of deferring to later specs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e4476d71-f433-4cea-8376-4e688ec4a6fb

📥 Commits

Reviewing files that changed from the base of the PR and between f5acfaf and a2c6727.

📒 Files selected for processing (12)
  • Makefile
  • test/e2e/util/azure.go
  • test/e2e/util/eventually.go
  • test/e2e/util/oauth.go
  • test/e2e/util/util.go
  • test/e2e/util/util_ingress_operator_configuration.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/internal/test_context.go
  • test/e2e/v2/selfmanagedazure/oauth_lb_test.go
  • test/e2e/v2/selfmanagedazure/private_topology_test.go
  • test/e2e/v2/selfmanagedazure/public_cluster_test.go
  • test/e2e/v2/selfmanagedazure/suite_test.go
✅ Files skipped from review due to trivial changes (3)
  • test/e2e/v2/selfmanagedazure/oauth_lb_test.go
  • test/e2e/v2/selfmanagedazure/private_topology_test.go
  • test/e2e/v2/selfmanagedazure/public_cluster_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/util/util_ingress_operator_configuration.go
  • test/e2e/v2/internal/test_context.go
  • test/e2e/util/azure.go
  • test/e2e/util/util.go

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.63%. Comparing base (957e86b) to head (6b4f9ea).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8204   +/-   ##
=======================================
  Coverage   34.63%   34.63%           
=======================================
  Files         767      767           
  Lines       93186    93186           
=======================================
  Hits        32277    32277           
  Misses      58236    58236           
  Partials     2673     2673           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/v2/selfmanagedazure/suite_test.go (1)

27-42: Consider using Ginkgo-managed context in BeforeSuite for proper cancellation handling.

BeforeSuite in Ginkgo v2 can accept a context.Context parameter that Ginkgo manages. Using context.Background() means the context won't be cancelled if the suite is interrupted during setup, potentially causing hanging operations.

♻️ Suggested improvement
-var _ = BeforeSuite(func() {
-	ctx := context.Background()
+var _ = BeforeSuite(func(ctx context.Context) {
 	ctrl.SetLogger(zap.New())
 
 	testCtx, err := internal.SetupTestContextFromEnv(ctx)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/v2/selfmanagedazure/suite_test.go` around lines 27 - 42, The
BeforeSuite is creating ctx with context.Background() so it isn’t cancelled by
Ginkgo; change the BeforeSuite handler to accept the Ginkgo-managed context
parameter (use BeforeSuite(func(ctx context.Context) { ... })), then pass that
ctx into internal.SetupTestContextFromEnv and into
testCtx.SetupHostedClusterClient (instead of the background context) so both
SetupTestContextFromEnv and SetupHostedClusterClient use the Ginkgo-managed
cancellable context, and keep the final internal.SetTestContext(testCtx) call
as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/v2/selfmanagedazure/suite_test.go`:
- Around line 27-42: The BeforeSuite is creating ctx with context.Background()
so it isn’t cancelled by Ginkgo; change the BeforeSuite handler to accept the
Ginkgo-managed context parameter (use BeforeSuite(func(ctx context.Context) {
... })), then pass that ctx into internal.SetupTestContextFromEnv and into
testCtx.SetupHostedClusterClient (instead of the background context) so both
SetupTestContextFromEnv and SetupHostedClusterClient use the Ginkgo-managed
cancellable context, and keep the final internal.SetTestContext(testCtx) call
as-is.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: c527c971-2ae7-499f-9221-8499b7350e2e

📥 Commits

Reviewing files that changed from the base of the PR and between a2c6727 and 689ece0.

📒 Files selected for processing (5)
  • test/e2e/v2/internal/test_context.go
  • test/e2e/v2/selfmanagedazure/oauth_lb_test.go
  • test/e2e/v2/selfmanagedazure/private_topology_test.go
  • test/e2e/v2/selfmanagedazure/public_cluster_test.go
  • test/e2e/v2/selfmanagedazure/suite_test.go
✅ Files skipped from review due to trivial changes (3)
  • test/e2e/v2/selfmanagedazure/oauth_lb_test.go
  • test/e2e/v2/selfmanagedazure/public_cluster_test.go
  • test/e2e/v2/selfmanagedazure/private_topology_test.go

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have all the evidence I need for a comprehensive report. Let me compile the final analysis.

Test Failure Analysis Complete

Job Information

  • PR: openshift/hypershift#8204Migrate self-managed Azure e2e tests to v2 Ginkgo framework
  • Head Commit: 689ece099c54419cba2fedf21d1860b6776c547a (pushed 2026-04-13T13:34:26Z)
  • Failed Checks:
    1. Red Hat Konflux / hypershift-operator-enterprise-contract / hypershift-operator-main2 failures / 242 successes / 26 warnings
    2. Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main2 failures / 242 successes / 26 warnings
    3. Red Hat Konflux / enterprise-contract-mce-217 / hypershift-release-mce-2172 failures / 258 successes / 25 warnings

Key Finding: These Are NOT Prow CI Jobs

These are Red Hat Konflux Enterprise Contract (EC) verification checks, not Prow CI test jobs. EC verification runs after Konflux builds the container image and checks the image against compliance policies (signatures, attestations, provenance, SBOM, etc.). The failure details are only accessible through the Konflux UI (a single-page app requiring authentication), not through the GitHub API or GCS artifacts.


Root Cause Analysis

The PR Does Not Cause These Failures

PR #8204 modifies only test files and the Makefile — none of which affect the container image:

File Change
Makefile Adds selfmanaged-azure-e2e target
test/e2e/util/*.go Refactors *testing.Ttesting.TB
test/e2e/v2/selfmanagedazure/*.go 4 new test files (build-tagged e2ev2)
test/e2e/v2/internal/*.go Adds HostedClusterClient helper

The Containerfile.operator (unchanged by this PR) builds hypershift and hypershift-operator binaries — it does not compile e2ev2-tagged test files. The previous commit on this same PR (a2c6727, April 10) passed all 4 EC checks with identical container build instructions.

Root Cause: Transient Attestation/Signing Race Condition

Timing analysis of the Konflux pipeline runs reveals a clear pattern — EC checks that started too soon after the image build completed failed, while the one that had more propagation time passed:

Check Build Completed EC Started Gap Result
hypershift-operator-enterprise-contract 13:46:24 13:46:26 2 seconds ❌ FAILED
hypershift-operator-main-enterprise-contract 13:46:24 13:46:26 2 seconds ❌ FAILED
enterprise-contract-mce-217 / release 13:46:17 13:34:40 Started BEFORE build ❌ FAILED
enterprise-contract-mce-217 / cli 13:56:37 13:56:40 3 seconds (but build took 10 min longer) ✅ PASSED

The hypershift-cli-mce-217 build took ~10 minutes longer to complete (13:56:37 vs 13:46:24), giving the Tekton Chains signing/attestation pipeline more time to propagate cosign signatures and SBOM attestations to the registry. Its EC check passed.

The enterprise-contract-mce-217 / release EC check started at 13:34:40, 12 minutes before the release image build completed at 13:46:17 — it was polling/waiting for the image and likely evaluated it before attestations were fully propagated.

Corroborating Evidence

  1. Same EC checks pass on other PRs: PRs CNTRLPLANE-3233: ci(gha): add release-4.22 branch to GitHub Actions workflows #8217 and CNTRLPLANE-3215: fix envtest GHA workflow path filters for CRD test suites #8200 (merged same day) pass all EC checks with identical pipeline configuration
  2. Same PR, previous commit passed: Commit a2c6727 (April 10) passed all 4 EC checks
  3. No container build changes: The Containerfile, .tekton/ pipeline definitions, and build dependencies are untouched by this PR
  4. Partial failure pattern: 3 of 4 EC checks failed, but the 4th (hypershift-cli-mce-217) passed — ruling out a global EC policy change

Evidence

Stack Trace / Error Details

The specific 2 policy violations per check are not accessible through the GitHub API (only summary counts are exposed). The detailed violation messages are in the Konflux UI pipeline logs at:

Based on the timing pattern, the 2 failures are most likely in attestation-related policies (e.g., attestation_task.known, slsa_build_scripted_build, or similar provenance checks that require cosign signatures to be available in the registry).


Recommendations

  1. Retrigger the Konflux build — push a trivial update (whitespace, comment) to the PR branch to force a new on-pull-request build. The newly built image will have attestations propagated before EC verification starts (the scheduling race is non-deterministic and typically self-resolves on retry).

  2. No code changes needed — the PR's test-only changes do not affect the container image and are not the cause of these failures.

  3. If failures persist after retrigger — check the Konflux UI verify logs (links above) for the specific 2 policy violations to rule out a Konflux infrastructure issue or an EC policy update that coincided with this build.


Artifacts


@bryan-cox bryan-cox force-pushed the migrate-self-azure-e2e branch from 689ece0 to a6b95c7 Compare April 13, 2026 17:41
@bryan-cox
Copy link
Copy Markdown
Member Author

Addressed all CodeRabbit review feedback in 161bb4a:

Major fixes:

  • CIDR restore (util.go): ValidateKubeAPIServerAllowedCIDRs now saves original APIServer networking state and defers restoration after test
  • PLS any-match (hosted_cluster_azure_test.go): All three EventuallyObjects PLS checks now use collection-level any-match predicates instead of per-item predicates that require all items to pass
  • OAuth polling (oauth.go): Added request.Clone(ctx) for per-attempt context binding and httpClient.Timeout = 15s to prevent hung requests outliving the poll

Nitpick fixes:

  • Unused mgmtClient (util_ingress_operator_configuration.go): Removed from ValidateIngressOperatorConfiguration and EnsureIngressOperatorConfiguration signatures + updated all callers
  • Namespace cleanup (azure.go): Added defer guestClient.Delete() for the test namespace in ValidateAzureWorkloadIdentityWebhookMutation
  • Empty kubeconfig (test_context.go): GetHostedClusterClient now checks len(kubeconfigData) == 0 in addition to key presence

Not applicable (files deleted):

  • suite_test.go comments (Ginkgo-managed context, env var pair-validation): The selfmanagedazure/ package was deleted — tests now live in the shared v2 binary at test/e2e/v2/tests/hosted_cluster_azure_test.go

bryan-cox and others added 2 commits April 13, 2026 14:30
…pers

Widen *testing.T parameters to testing.TB in shared e2e util functions so
they work with both v1 (testing.T) and v2 (GinkgoTB()) callers. Extract
Validate* functions from Ensure* wrappers to separate core logic from
t.Run() framing:

- EventuallyObject/EventuallyObjects: *testing.T -> testing.TB
- UpdateObject/WaitForGuestKubeConfig: *testing.T -> testing.TB
- Extract ValidateAzureWorkloadIdentityWebhookMutation from Ensure*
- Extract ValidateKubeAPIServerAllowedCIDRs from Ensure*
- Extract ValidateIngressOperatorConfiguration from Ensure*
- Extract ValidateOAuthWithIdentityProviderViaLoadBalancer from Ensure*

Ensure* functions remain unchanged for v1 compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bryan-cox bryan-cox force-pushed the migrate-self-azure-e2e branch from 161bb4a to 6b4f9ea Compare April 13, 2026 18:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/util/util.go (1)

320-324: ⚠️ Potential issue | 🟠 Major

Require non-empty kubeconfig data before returning.

This helper still treats secret.Data["kubeconfig"] as ready when the value is empty, and several callers immediately pass the bytes into RESTConfigFromKubeConfig. That reintroduces the same empty-secret flake you just fixed in test/e2e/v2/internal/test_context.go.

💡 Suggested fix
 		[]Predicate[*corev1.Secret]{
 			func(secret *corev1.Secret) (done bool, reasons string, err error) {
 				var hasData bool
 				data, hasData = secret.Data["kubeconfig"]
-				return hasData, "expected secret to contain kubeconfig in data", nil
+				return hasData && len(data) > 0, "expected secret to contain non-empty kubeconfig data", nil
 			},
 		},
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/util.go` around lines 320 - 324, The predicate that checks
secret.Data["kubeconfig"] should require non-empty bytes before returning
success: in the anonymous Predicate[*corev1.Secret] function (the one that
currently does `data, hasData = secret.Data["kubeconfig"]`), change the
condition to ensure data is not nil and len(data) > 0 (e.g., `hasData &&
len(data) > 0`) and only then return done=true; otherwise return done=false with
the same reason and nil error so callers (like RESTConfigFromKubeConfig) never
receive an empty kubeconfig byte slice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 54-56: Do not call TestContext.GetHostedClusterClient() before the
HostedCluster kubeconfig is published because GetHostedClusterClient is cached
via sync.Once and will return nil if hc.Status.KubeConfig isn't set; instead,
for specs that need a guest client (where you currently call getTestCtx() and
hostedClusterClient := testCtx.GetHostedClusterClient()), first wait for the
HostedCluster kubeconfig to be published (e.g., poll/wait for
hc.Status.KubeConfig to be non-nil or use the existing wait helper), and only
then invoke TestContext.GetHostedClusterClient(); for the CIDR spec that doesn't
use the client at all, remove the preflight GetHostedClusterClient() call
entirely.

---

Outside diff comments:
In `@test/e2e/util/util.go`:
- Around line 320-324: The predicate that checks secret.Data["kubeconfig"]
should require non-empty bytes before returning success: in the anonymous
Predicate[*corev1.Secret] function (the one that currently does `data, hasData =
secret.Data["kubeconfig"]`), change the condition to ensure data is not nil and
len(data) > 0 (e.g., `hasData && len(data) > 0`) and only then return done=true;
otherwise return done=false with the same reason and nil error so callers (like
RESTConfigFromKubeConfig) never receive an empty kubeconfig byte slice.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: ef569464-8941-442e-ab92-348ffd21b768

📥 Commits

Reviewing files that changed from the base of the PR and between a6b95c7 and 161bb4a.

📒 Files selected for processing (7)
  • test/e2e/create_cluster_test.go
  • test/e2e/util/azure.go
  • test/e2e/util/oauth.go
  • test/e2e/util/util.go
  • test/e2e/util/util_ingress_operator_configuration.go
  • test/e2e/v2/internal/test_context.go
  • test/e2e/v2/tests/hosted_cluster_azure_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/util/azure.go

Comment on lines +54 to +56
testCtx := getTestCtx()
hostedClusterClient := testCtx.GetHostedClusterClient()
Expect(hostedClusterClient).NotTo(BeNil(), "hosted cluster client is nil; HostedCluster may not have KubeConfig status set")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid calling GetHostedClusterClient() before kubeconfig is published.

TestContext.GetHostedClusterClient() is guarded by sync.Once, and it returns nil when hc.Status.KubeConfig is not set yet. Calling it too early here can permanently poison the shared TestContext for later specs. The CIDR spec is worse because it doesn't use the returned client at all.

For the specs that need a guest client, wait for kubeconfig first and only then initialize the cached client. For the CIDR spec, just drop the preflight call entirely.

💡 Suggested fix
 		It("should mutate pods with workload identity federated credentials", func() {
 			e2eutil.GinkgoAtLeast(e2eutil.Version422)
 			testCtx := getTestCtx()
+			_ = e2eutil.WaitForGuestKubeConfig(GinkgoTB(), testCtx.Context, testCtx.MgmtClient, testCtx.GetHostedCluster())
 			hostedClusterClient := testCtx.GetHostedClusterClient()
 			Expect(hostedClusterClient).NotTo(BeNil(), "hosted cluster client is nil; HostedCluster may not have KubeConfig status set")

 			e2eutil.ValidateAzureWorkloadIdentityWebhookMutation(GinkgoTB(), testCtx.Context, hostedClusterClient)
 		})
@@
 		It("should have expected KAS allowed CIDRs", func() {
 			testCtx := getTestCtx()
 			hc := testCtx.GetHostedCluster()
-			hostedClusterClient := testCtx.GetHostedClusterClient()
-			Expect(hostedClusterClient).NotTo(BeNil(), "hosted cluster client is nil; HostedCluster may not have KubeConfig status set")

 			kubeconfigData := e2eutil.WaitForGuestKubeConfig(GinkgoTB(), testCtx.Context, testCtx.MgmtClient, hc)
 			restConfig, err := clientcmd.RESTConfigFromKubeConfig(kubeconfigData)
 			Expect(err).NotTo(HaveOccurred(), "failed to create hosted cluster REST config")
@@
 		It("should have Ingress Operator configuration applied", func() {
 			e2eutil.GinkgoAtLeast(e2eutil.Version421)
 			testCtx := getTestCtx()
 			hc := testCtx.GetHostedCluster()
+			_ = e2eutil.WaitForGuestKubeConfig(GinkgoTB(), testCtx.Context, testCtx.MgmtClient, hc)
 			hostedClusterClient := testCtx.GetHostedClusterClient()
 			Expect(hostedClusterClient).NotTo(BeNil(), "hosted cluster client is nil; HostedCluster may not have KubeConfig status set")

Also applies to: 64-65, 78-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 54 - 56, Do not
call TestContext.GetHostedClusterClient() before the HostedCluster kubeconfig is
published because GetHostedClusterClient is cached via sync.Once and will return
nil if hc.Status.KubeConfig isn't set; instead, for specs that need a guest
client (where you currently call getTestCtx() and hostedClusterClient :=
testCtx.GetHostedClusterClient()), first wait for the HostedCluster kubeconfig
to be published (e.g., poll/wait for hc.Status.KubeConfig to be non-nil or use
the existing wait helper), and only then invoke
TestContext.GetHostedClusterClient(); for the CIDR spec that doesn't use the
client at all, remove the preflight GetHostedClusterClient() call entirely.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation Indicates the PR includes changes for documentation area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant