Skip to content

OCPBUGS-81686: fix(authentication): use v2 auth validation for CEL and expression support.#8246

Open
ShazaAldawamneh wants to merge 1 commit intoopenshift:mainfrom
ShazaAldawamneh:OCPBUGS-81686
Open

OCPBUGS-81686: fix(authentication): use v2 auth validation for CEL and expression support.#8246
ShazaAldawamneh wants to merge 1 commit intoopenshift:mainfrom
ShazaAldawamneh:OCPBUGS-81686

Conversation

@ShazaAldawamneh
Copy link
Copy Markdown
Contributor

@ShazaAldawamneh ShazaAldawamneh commented Apr 15, 2026

What this PR does / why we need it:

Fixes OCPBUGS-81686
The validation logic was using v1/kas which doesn't support CEL claim
validation rules or expression-based username mappings. This was causing
errors when users configured ExternalOIDC with these features.
Changes:

  • Export GenerateAuthConfig in v2/kas/auth.go to match v1 signature
  • Update support/validations/authentication.go to import v2/kas
  • Add test case for CEL validation rules and username expressions

Which issue(s) this PR fixes:

Fixes OCPBUGS-81686

Checklist:

  • [Y] Subject and description added to both, commit and PR.
  • [Y] Relevant issues have been referenced.
  • [Y] This change includes unit tests.

Summary by CodeRabbit

  • Refactor

    • Authentication config generation now builds from the cluster Authentication spec with an explicit namespace; validation flows updated to use the revised generation path for consistency.
  • Tests

    • OIDC validation tests expanded with feature-gated cases covering CEL-based claim validation and CEL-driven username mapping, including success and syntax-error failure scenarios.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label Apr 15, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@ShazaAldawamneh: This pull request references Jira Issue OCPBUGS-81686, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

Fixes OCPBUGS-81686
The validation logic was using v1/kas which doesn't support CEL claim
validation rules or expression-based username mappings. This was causing
errors when users configured ExternalOIDC with these features.
Changes:

  • Export GenerateAuthConfig in v2/kas/auth.go to match v1 signature
  • Update support/validations/authentication.go to import v2/kas
  • Add test case for CEL validation rules and username expressions

Which issue(s) this PR fixes:

Fixes OCPBUGS-81686

Checklist:

  • [Y] Subject and description added to both, commit and PR.
  • [Y] Relevant issues have been referenced.
  • [Y] This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from cblecker and jparrill April 15, 2026 11:56
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Apr 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.39%. Comparing base (3fc6979) to head (cba1008).
⚠️ Report is 55 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8246      +/-   ##
==========================================
+ Coverage   37.23%   37.39%   +0.16%     
==========================================
  Files         750      751       +1     
  Lines       91798    91809      +11     
==========================================
+ Hits        34181    34336     +155     
+ Misses      54978    54838     -140     
+ Partials     2639     2635       -4     
Files with missing lines Coverage Δ
...ator/controllers/hostedcontrolplane/v2/kas/auth.go 83.19% <100.00%> (+0.14%) ⬆️
support/validations/authentication.go 35.13% <100.00%> (ø)

... and 34 files with indirect coverage changes

Flag Coverage Δ
cmd-support 32.56% <100.00%> (+0.49%) ⬆️
cpo-hostedcontrolplane 36.49% <100.00%> (+<0.01%) ⬆️
cpo-other 37.73% <ø> (ø)
hypershift-operator 47.85% <ø> (ø)
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread support/validations/authentication.go
Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth_test.go Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The change renames and exports generateAuthConfig to GenerateAuthConfig(spec *configv1.AuthenticationSpec, ctx context.Context, c crclient.Reader, namespace string), removing the hyperv1 dependency and accepting an explicit AuthenticationSpec and namespace. adaptAuthConfig was updated to call the new function with cpContext.HCP.Spec.Configuration.Authentication and cpContext.HCP.Namespace. The validations package now imports controllers/hostedcontrolplane/v2/kas and uses its GenerateAuthConfig and HCPAuthConfigToAPIServerAuthConfig paths. Tests add two OIDC-focused cases exercising CEL claim validation (one valid, one with invalid CEL syntax).

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: switching to v2 authentication validation to enable CEL and expression support, which directly aligns with the changeset's core objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test changes introduce two new test cases with static, deterministic names that are descriptive with no dynamic content and stable across multiple runs.
Test Structure And Quality ✅ Passed Test follows standard Go testing practices with table-driven patterns, proper feature gate setup, single responsibility per case, and meaningful assertion messages consistent with codebase conventions.
Microshift Test Compatibility ✅ Passed PR modifications do not add any Ginkgo e2e tests, only standard Go unit tests to authentication_test.go for OIDC validation with CEL expressions.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only unit tests to authentication_test.go using Go's standard testing package, not Ginkgo e2e tests, so topology check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR changes are limited to authentication configuration generation and validation logic with no scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed No process-level code (main, init, TestMain, BeforeSuite) that writes to stdout is introduced. Changes are function refactoring, import updates, and test-contained additions only.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR adds unit tests in support/validations/authentication_test.go, not Ginkgo e2e tests. Tests use standard Go testing.T and validate OIDC configuration locally without network requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go (1)

61-71: ⚠️ Potential issue | 🟡 Minor

Handle nil spec in exported GenerateAuthConfig to avoid panic.

spec is dereferenced at Line 70 without a nil check. Since this function is now exported, a nil caller would panic.

Suggested fix
 func GenerateAuthConfig(spec *configv1.AuthenticationSpec, ctx context.Context, c crclient.Reader, namespace string) (*AuthenticationConfiguration, error) {
+	if spec == nil {
+		return nil, fmt.Errorf("authentication spec must not be nil")
+	}
+
 	config := &AuthenticationConfiguration{
 		TypeMeta: metav1.TypeMeta{
 			Kind:       "AuthenticationConfiguration",
 			APIVersion: "apiserver.config.k8s.io/v1alpha1",
 		},
 		JWT: []JWTAuthenticator{},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go` around
lines 61 - 71, GenerateAuthConfig currently dereferences the incoming spec
(spec.OIDCProviders) without a nil check which can panic when called with a nil
spec; add an initial nil guard in GenerateAuthConfig that returns a clear error
(or an empty AuthenticationConfiguration and nil error per project convention)
if spec == nil, then proceed to iterate over spec.OIDCProviders and call
generateJWTForProvider as before; ensure the check is in the exported function
GenerateAuthConfig so callers cannot trigger a nil dereference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go`:
- Around line 61-71: GenerateAuthConfig currently dereferences the incoming spec
(spec.OIDCProviders) without a nil check which can panic when called with a nil
spec; add an initial nil guard in GenerateAuthConfig that returns a clear error
(or an empty AuthenticationConfiguration and nil error per project convention)
if spec == nil, then proceed to iterate over spec.OIDCProviders and call
generateJWTForProvider as before; ensure the check is in the exported function
GenerateAuthConfig so callers cannot trigger a nil dereference.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b57baf8c-183a-49ef-908f-81ee02324bb1

📥 Commits

Reviewing files that changed from the base of the PR and between 387b8be and 95ea6e1.

📒 Files selected for processing (3)
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go
  • support/validations/authentication.go
  • support/validations/authentication_test.go

@cblecker
Copy link
Copy Markdown
Member

/uncc

@openshift-ci openshift-ci Bot removed the request for review from cblecker April 16, 2026 16:10
@ShazaAldawamneh ShazaAldawamneh changed the title [WIP]: OCPBUGS-81686: fix(authentication): use v2 auth validation for CEL and expression support. OCPBUGS-81686: fix(authentication): use v2 auth validation for CEL and expression support. Apr 17, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2026
Copy link
Copy Markdown
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2046677844045598720 | Cost: $3.0425726999999996 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2046677843923963904 | Cost: $1.7779251500000006 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@ShazaAldawamneh
Copy link
Copy Markdown
Contributor Author

/retest-required

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2046860630912143360 | Cost: $1.8763985000000003 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@ShazaAldawamneh
Copy link
Copy Markdown
Contributor Author

/retest-required

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2046908740782788608 | Cost: $2.0824719000000007 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@ehearne-redhat
Copy link
Copy Markdown
Contributor

/retest-required

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2048685257498038272 | Cost: $2.30102925 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@everettraven
Copy link
Copy Markdown
Contributor

/retest-required

@jparrill
Copy link
Copy Markdown
Contributor

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

@jparrill: This pull request references Jira Issue OCPBUGS-81686, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jparrill
Copy link
Copy Markdown
Contributor

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 30, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@jparrill: This pull request references Jira Issue OCPBUGS-81686, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (xxia@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks!

Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go Outdated
Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go Outdated
Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go Outdated
Comment thread support/validations/authentication_test.go Outdated
…pport

Signed-off-by: Shaza Aldawamneh <shaza.aldawamneh@hotmail.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 5, 2026
-----END CERTIFICATE-----
`

func TestGenerateAuthConfig(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add explicit testing for GenerateAuthConfig? It looks like the testing we would care about here would already be covered by TestAdaptAuthConfig below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jparrill requested a direct unit test for the exported function in their review comment.

I can see both perspectives:

  • The functionality is already covered through TestAdaptAuthConfig
  • However, since GenerateAuthConfig is now exported, testing it directly provides clearer API contract validation and easier debugging when the function fails

@jparrill @everettraven - Would you like me to keep the direct test or remove it in favor of the existing coverage through TestAdaptAuthConfig?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that strongly opinionated here. If @jparrill feels strongly we need an explicit test here, fine by me.

@everettraven
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@ShazaAldawamneh
Copy link
Copy Markdown
Contributor Author

/retest-required

@ShazaAldawamneh
Copy link
Copy Markdown
Contributor Author

/retest-required

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2052772664296083456 | Cost: $4.37084635 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

@ShazaAldawamneh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-self-managed cba1008 link true /test e2e-azure-self-managed

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

I have all the information I need. The analysis is clear. Let me now output the final report:

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed
  • Build ID: 2052772664296083456
  • Target: e2e-azure-self-managed
  • Failed Test: TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest
  • Total Tests: 313 (280 passed, 28 skipped, 5 failures — all in same test hierarchy)

Test Failure Analysis

Error

TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest:
  Failed to wait for NodePool e2e-clusters-l58xb/node-pool-cvp7b-test-additional-trust-bundle-propagation
  to stop updating in 20m0s: context deadline exceeded

  - incorrect condition: wanted UpdatingConfig=False, got UpdatingConfig=True: AsExpected(Updating config in progress. Target config: 5a3a6852)
  - incorrect condition: wanted AllNodesHealthy=True, got AllNodesHealthy=False: NodeConditionsFailed(1 of 2 machines are not healthy
      Machine node-pool-cvp7b-test-additional-trust-bundle-propagation-wtmm9h: NodeConditionsFailed: Node condition Ready is False)

Summary

The TestAdditionalTrustBundlePropagation test timed out after 20 minutes waiting for a NodePool config update to complete. The test applies an additional trust bundle to a HostedCluster, triggering a rolling replace of NodePool machines. The replacement machine (wtmm9h) took ~28 minutes to reach NodeReady state — exceeding the 20-minute test timeout by ~8 minutes. The rollout eventually completed successfully post-timeout. This failure is unrelated to PR #8246, which only modifies authentication validation code (v2/kas/auth.go, support/validations/authentication.go) for CEL and expression support — files completely outside the NodePool management, trust bundle propagation, and node bootstrapping code paths. All 280 other tests passed.

Root Cause

Slow Azure VM node bootstrapping during Replace-strategy config update (infrastructure flake)

The replacement machine wtmm9h was created at ~16:23:01 UTC when the trust bundle update was applied. Azure VM provisioning completed in ~3 minutes (InfrastructureReady at 16:26:12), but the node took an additional ~25 minutes to reach NodeHealthy=True (at 16:51:55) — far exceeding the 20-minute test timeout. The NodePool update completed successfully at 16:54:56 UTC, approximately 8-9 minutes after the test had already timed out.

Contributing factors:

  1. Abnormally slow node bootstrapping on Azure — VM provisioning was normal (~3 min), but node registration and MachineConfig application (including trust bundle, update-ca-trust, and service restarts) took ~25 minutes instead of the usual ~5-10 minutes.
  2. Ingress operator unavailability — At teardown, the hosted cluster reported ClusterOperatorNotAvailable: the cluster operator ingress is not available, which may have affected node-to-control-plane communication during bootstrapping.
  3. Marginal test timeout — The 20-minute timeout for a Replace-strategy rollout on Azure leaves little margin for infrastructure variability.

The PR changes (v2/kas/auth.go, v2/kas/auth_test.go, support/validations/authentication.go, support/validations/authentication_test.go) are authentication validation code that supports CEL claim validation rules and expression-based username mappings — entirely orthogonal to NodePool management, trust bundle propagation, and node bootstrapping.

Recommendations
  • Retry the job — This is a transient Azure infrastructure slowness issue, not a regression introduced by this PR.
  • Consider increasing the timeout for TestAdditionalTrustBundlePropagation — The 20-minute timeout for Replace-strategy rollouts on Azure is marginal; the test succeeded after ~28 minutes.
  • Investigate ingress operator availability — The ingress operator being unavailable at teardown may be a contributing factor to slow node bootstrapping and warrants separate investigation.
  • Check historical flake rate for TestAdditionalTrustBundlePropagation on Azure self-managed jobs to quantify how frequently this timeout is hit.
Evidence
Evidence Detail
Failed test TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest (1210.24s)
Failure mode Context deadline exceeded — 20m timeout waiting for NodePool to stop updating
NodePool condition at failure UpdatingConfig=True (target config: 5a3a6852), AllNodesHealthy=False (machine wtmm9h Node Ready=False)
Machine timeline Created ~16:23:01 → InfrastructureReady 16:26:12 (~3m) → NodeHealthy 16:51:55 (~28m total)
NodePool post-test state UpdatingConfig=False, AllNodesHealthy=True — rollout completed ~8m after timeout
HostedCluster condition at teardown ClusterVersionProgressing=False: ClusterOperatorNotAvailable(ingress is not available)
Other tests 280 passed, 28 skipped — all other test suites (TestCreateCluster, TestAutoscaling, TestHAEtcdChaos, TestUpgradeControlPlane, TestAzureOAuthLoadBalancer, TestAzurePrivateTopology, TestNodePool/HostedCluster0) passed
PR changed files v2/kas/auth.go, v2/kas/auth_test.go, support/validations/authentication.go, support/validations/authentication_test.go
Relevance to PR None — authentication validation code is entirely separate from NodePool/trust bundle/node bootstrapping paths
Classification Infrastructure flake — Azure node bootstrapping slowness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants