Skip to content

CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Self Managed Azure#8653

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-1357
Jun 3, 2026
Merged

CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Self Managed Azure#8653
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-1357

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented Jun 2, 2026

What this PR does / why we need it:

Adds an e2e v2 test that validates KMSv2 secret encryption on Azure self-managed hosted clusters. This ports the KMS validation from the v1 TestCreateClusterCustomConfig test into the v2 framework.

  • KMS spec validation: asserts ActiveKey fields (KeyVaultName, KeyName, KeyVersion) and auth config (WorkloadIdentity for self-managed, ManagedIdentity for ARO HCP)
  • KMS functional validation: creates test secret in hosted cluster, execs etcdctl in etcd pod, asserts k8s:enc:kms:v2 prefix in raw etcd value
  • Adds --encryption-key-id to the public cluster variant when AZURE_ENCRYPTION_KEY_ID is set (env var or vault-mounted file)
  • Adds secret-encryption label to the public test group's label filter
  • Skips gracefully when KMS is not configured, so existing CI jobs are unaffected

CI Prerequisites

Azure resources created in self-managed-azure-ci resource group:

  • Key Vault: sm-azure-ci-kms
  • Encryption key: e2e-encryption-key
  • KMS managed identity with Key Vault Crypto User role and federated credential

Vault secrets updated in hypershift-ci-jobs-self-managed-azure-e2e:

  • workload-identities.json: added kmsClientID
  • AZURE_ENCRYPTION_KEY_ID: new field with key URL

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-1357

Special notes for your reviewer:

The test file is designed to be platform-extensible — future AWS and GCP KMS contexts can be added without touching the Azure block.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Tests
    • Azure E2E test suite expanded to include secret encryption validation scenarios
    • New tests verify that secrets are properly encrypted at rest in etcd using KMS v2
    • Encryption key ID configuration now supported for Azure E2E tests via environment variable
    • Added validation tests for secret encryption configuration on hosted clusters

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 2, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jun 2, 2026

@bryan-cox: This pull request references CNTRLPLANE-1357 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds an e2e v2 test that validates KMSv2 secret encryption on Azure self-managed hosted clusters
  • KMS spec validation: asserts ActiveKey fields (KeyVaultName, KeyName, KeyVersion) and auth config (WorkloadIdentity for self-managed, ManagedIdentity for ARO HCP)
  • KMS functional validation: creates test secret in hosted cluster, execs etcdctl in etcd pod, asserts k8s:enc:kms:v2 prefix in raw etcd value
  • Adds --encryption-key-id to the public cluster variant when AZURE_ENCRYPTION_KEY_ID is set (env var or vault-mounted file)
  • Adds secret-encryption label to the public test group's label filter
  • Skips gracefully when KMS is not configured, so existing CI jobs are unaffected

CI Prerequisites

Azure resources created in self-managed-azure-ci resource group:

  • Key Vault: sm-azure-ci-kms
  • Encryption key: e2e-encryption-key
  • KMS managed identity with Key Vault Crypto User role and federated credential

Vault secrets updated in hypershift-ci-jobs-self-managed-azure-e2e:

  • workload-identities.json: added kmsClientID
  • AZURE_ENCRYPTION_KEY_ID: new field with key URL

Test plan

  • Verify build: go build -tags e2ev2 ./test/e2e/v2/...
  • Run /test e2e-azure-v2-self-managed to validate KMS test runs against the public cluster
  • Confirm test skips cleanly on jobs without KMS configured

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Walkthrough

Azure platform config now reads an encryption key ID from env or a default file and injects it into public cluster specs. The Azure public test matrix includes a secret-encryption label. A new e2ev2 Ginkgo suite validates KMSv2 spec fields and verifies hosted-cluster secrets are stored in etcd with the k8s:enc:kms:v2 marker.

Changes

Azure platform config and test matrix

Layer / File(s) Summary
Config constant and constructor
test/e2e/v2/lifecycle/azure.go
Adds default encryption-key file path; extends AzurePlatformConfig with encryptionKeyID; NewAzurePlatformConfig loads from AZURE_ENCRYPTION_KEY_ID or the default file and logs if empty.
ClusterSpecs and TestMatrix wiring
test/e2e/v2/lifecycle/azure.go
ClusterSpecs conditionally appends --encryption-key-id=<id> for the public variant; TestMatrix for public group adds secret-encryption to the label filter.

Hosted cluster secret encryption tests

Layer / File(s) Summary
Test scaffold and imports
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
Adds e2ev2 build tag and required Ginkgo/Gomega, Kubernetes, and Hypershift E2E imports.
KMS spec validation test
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
KMSSpecValidationTest asserts Azure SecretEncryption.KMS ActiveKey fields are set and exactly one auth mechanism is configured; checks ObjectEncoding for managed-identity.
Functional etcd encryption verification
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
KMSFunctionalValidationTest creates a hosted-cluster Secret, reads the corresponding etcd key via etcdctl in a management pod, and asserts the value contains k8s:enc:kms:v2 and not the plaintext secret.
Test registration and suite wiring
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
Registers tests under a secret-encryption Ginkgo suite, initializes test context, and skips when SecretEncryption.KMS is unset.

Estimated code review effort:
🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers:

  • cblecker
  • sdminonne
🚥 Pre-merge checks | ✅ 5 | ❌ 10

❌ Failed checks (10 inconclusive)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Test Structure And Quality ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Microshift Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Single Node Openshift (Sno) Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Topology-Aware Scheduling Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ote Binary Stdout Contract ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ipv6 And Disconnected Network Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Weak-Crypto ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Container-Privileges ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Sensitive-Data-In-Logs ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a KMSv2 secret encryption e2e v2 test for Self Managed Azure, which is the primary focus of both file modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Jun 2, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2026
@bryan-cox
Copy link
Copy Markdown
Member Author

/test ?

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/v2/lifecycle/azure.go (1)

50-81: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add unit coverage for the new encryption-key config path.

This adds new branching around env/file precedence and ClusterSpecs() arg generation, but there’s no Go test covering it. A small table-driven test here would catch regressions before they land in e2e.

Suggested test cases
+// Cases worth covering:
+// - env var set -> use env value
+// - env var unset and file present -> use file value
+// - encryptionKeyID set -> public spec gets --encryption-key-id
+// - encryptionKeyID unset -> public spec does not get the flag

As per coding guidelines, "Always include unit tests when creating new functions or modifying existing ones".

Also applies to: 103-113

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/lifecycle/azure.go` around lines 50 - 81, NewAzurePlatformConfig
introduced branching around encryptionKeyID (env AZURE_ENCRYPTION_KEY_ID vs file
defaultEncryptionKeyID) and affects ClusterSpecs() generation but lacks unit
tests; add a table-driven test that exercises NewAzurePlatformConfig for: (1)
env var set, (2) file-only value present, and (3) neither set (expect
warning/empty). Instantiate configs via NewAzurePlatformConfig with a temp
sharedDir, write/read the defaultEncryptionKeyID file as needed, and assert
cfg.encryptionKeyID value and that ClusterSpecs() (or the public method that
consumes the config) is generated with the expected argument values; include
cleanup of temp files and use t.Run subtests to cover branches.
🧹 Nitpick comments (1)
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go (1)

104-108: ⚡ Quick win

Bound the cleanup delete with a timeout.

context.Background() here means cleanup can block without any deadline if the API call stalls. Use a short WithTimeout context instead.

Proposed fix
 			DeferCleanup(func() {
-				if err := hostedClusterClient.Delete(context.Background(), testSecret); err != nil && !apierrors.IsNotFound(err) {
+				cleanupCtx, cancel := context.WithTimeout(context.Background(), time.Minute)
+				defer cancel()
+				if err := hostedClusterClient.Delete(cleanupCtx, testSecret); err != nil && !apierrors.IsNotFound(err) {
 					GinkgoWriter.Printf("WARNING: failed to cleanup test secret: %v\n", err)
 				}
 			})

As per coding guidelines, "context.Context for cancellation and timeouts".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go` around lines 104
- 108, The cleanup Delete call uses context.Background() which can hang; update
the DeferCleanup closure to create a short timeout context (e.g.
context.WithTimeout) and pass that to hostedClusterClient.Delete, ensuring you
call the cancel function (defer cancel()) before the Delete returns; target the
DeferCleanup closure that calls hostedClusterClient.Delete(...) with testSecret
and replace the background context with the bounded context and proper
cancelation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 91-102: The test uses a fixed Secret name "e2e-kms-test-secret"
which causes flakiness; change creation of the Secret object (testSecret) to use
a generated name (e.g., set ObjectMeta.GenerateName or append a random/UID
suffix) and ensure the same testSecret reference is used for DeferCleanup and
the hostedClusterClient.Create call so cleanup and assertions operate on the
created resource; update any checks that assume the exact name to use the actual
created ObjectMeta.Name from the returned object.

---

Outside diff comments:
In `@test/e2e/v2/lifecycle/azure.go`:
- Around line 50-81: NewAzurePlatformConfig introduced branching around
encryptionKeyID (env AZURE_ENCRYPTION_KEY_ID vs file defaultEncryptionKeyID) and
affects ClusterSpecs() generation but lacks unit tests; add a table-driven test
that exercises NewAzurePlatformConfig for: (1) env var set, (2) file-only value
present, and (3) neither set (expect warning/empty). Instantiate configs via
NewAzurePlatformConfig with a temp sharedDir, write/read the
defaultEncryptionKeyID file as needed, and assert cfg.encryptionKeyID value and
that ClusterSpecs() (or the public method that consumes the config) is generated
with the expected argument values; include cleanup of temp files and use t.Run
subtests to cover branches.

---

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 104-108: The cleanup Delete call uses context.Background() which
can hang; update the DeferCleanup closure to create a short timeout context
(e.g. context.WithTimeout) and pass that to hostedClusterClient.Delete, ensuring
you call the cancel function (defer cancel()) before the Delete returns; target
the DeferCleanup closure that calls hostedClusterClient.Delete(...) with
testSecret and replace the background context with the bounded context and
proper cancelation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b6ebd02e-4dd1-4656-95dc-df08f3162a81

📥 Commits

Reviewing files that changed from the base of the PR and between eb04f61 and 7e57a17.

📒 Files selected for processing (2)
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go

Comment thread test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.27%. Comparing base (eb04f61) to head (674b977).
⚠️ Report is 24 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8653   +/-   ##
=======================================
  Coverage   41.26%   41.27%           
=======================================
  Files         755      755           
  Lines       93443    93446    +3     
=======================================
+ Hits        38563    38566    +3     
  Misses      52148    52148           
  Partials     2732     2732           

see 1 file with indirect coverage changes

Flag Coverage Δ
cmd-support 34.86% <ø> (ø)
cpo-hostedcontrolplane 43.50% <ø> (+0.01%) ⬆️
cpo-other 42.79% <ø> (ø)
hypershift-operator 51.00% <ø> (ø)
other 31.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bryan-cox bryan-cox marked this pull request as ready for review June 3, 2026 09:49
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 3, 2026
@openshift-ci openshift-ci Bot requested review from csrwng and jparrill June 3, 2026 09:50
@bryan-cox bryan-cox changed the title CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Azure CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Self Managed Azure Jun 3, 2026
Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks!

@@ -0,0 +1,152 @@
//go:build e2ev2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: All other files in test/e2e/v2/tests/ have the Apache 2.0 license block between the build tag and package. This one's missing it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added Apache 2.0 license header.


AI-assisted response via Claude Code

Skip("KMS.Azure is not configured on this hosted cluster")
}
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since KMSSpecValidationTest is exported and could be registered from a different Describe without the outer guard, this can nil-pointer if SecretEncryption or KMS aren't set. I'd collapse it into a single defensive check:

if hc == nil || hc.Spec.Platform.Type != hyperv1.AzurePlatform ||
    hc.Spec.SecretEncryption == nil || hc.Spec.SecretEncryption.KMS == nil ||
    hc.Spec.SecretEncryption.KMS.Azure == nil {
    Skip("Azure KMS spec validation requires Azure platform with KMS configured")
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Collapsed into a single defensive nil-check chain.


AI-assisted response via Claude Code

"--cert=/etc/etcd/tls/client/etcd-client.crt",
"--key=/etc/etcd/tls/client/etcd-client.key",
"get",
secretEtcdKey,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This etcdctl command with the TLS cert paths is a verbatim copy of test/e2e/util/util.go:1500-1508 (and the same paths show up again in etcd_chaos_test.go). If those paths ever change, there are 3+ places to update. Not blocking, but it'd be nice to extract something like EtcdctlGetCommand(key string) []string into a shared util as a follow-up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — I'll track extracting an EtcdctlGetCommand helper as a follow-up.


AI-assisted response via Claude Code

})
})
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This:

Expect(strings.Contains(output, "k8s:enc:kms:v2")).To(BeTrue(), ...)

gives you "expected true, got false" on failure — not helpful for debugging. With ContainSubstring you get the actual string in the diff:

Expect(output).To(ContainSubstring("k8s:enc:kms:v2"),
    "secret should be encrypted using KMSv2")

Bonus: you can drop the "strings" import.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Switched to ContainSubstring for better failure diagnostics.


AI-assisted response via Claude Code

})
}

// RegisterHostedClusterSecretEncryptionTests registers all secret encryption tests.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure message dumps the full etcdctl output (raw etcd value: %s). The test data here is synthetic so it's fine, but this pattern could leak real data if someone copies it. I'd trim it down:

"secret should be encrypted using KMSv2 (output length: %d bytes)", len(output)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Addressed together with the ContainSubstring change — the raw output format string is removed.


AI-assisted response via Claude Code

}

// RegisterHostedClusterSecretEncryptionTests registers all secret encryption tests.
func RegisterHostedClusterSecretEncryptionTests(getTestCtx internal.TestContextGetter) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could strengthen the check by also verifying the plaintext isn't there in the clear:

Expect(output).NotTo(ContainSubstring("testData"),
    "secret data should not be readable in plaintext from etcd")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a negative assertion for plaintext data in the etcd output.


AI-assisted response via Claude Code

if cfg.encryptionKeyID == "" {
if data, err := os.ReadFile(defaultEncryptionKeyID); err == nil {
cfg.encryptionKeyID = strings.TrimSpace(string(data))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already log a WARNING for AZURE_PRIVATE_NAT_SUBNET_ID when it's missing. Without the same for the encryption key, a misconfigured CI job will just silently skip the tests and nobody will notice. Something like:

if cfg.encryptionKeyID == "" {
    log.Printf("WARNING: AZURE_ENCRYPTION_KEY_ID is not set; secret encryption tests will be skipped")
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added warning log matching the existing NAT subnet pattern.


AI-assisted response via Claude Code

LabelFilter: "self-managed-azure-public || nodepool-lifecycle",
LabelFilter: "self-managed-azure-public || nodepool-lifecycle || secret-encryption",
Skip: "KAS allowed CIDRs",
JUnitFile: "junit_self_managed_azure_public.xml",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine since the BeforeEach skips when KMS isn't configured. Just want to confirm: if AZURE_ENCRYPTION_KEY_ID isn't set, the cluster gets created without encryption, and the tests skip via the guard in the Describe — right? I want to make sure there's no scenario where these tests run against a cluster without KMS and fail instead of skipping.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. When AZURE_ENCRYPTION_KEY_ID is unset, the cluster is created without --encryption-key-id, so SecretEncryption is nil on the HostedCluster. The Describe-level BeforeEach checks for SecretEncryption.KMS == nil and skips. No failure path.


AI-assisted response via Claude Code

Add e2e v2 validation for KMSv2 secret encryption on Azure
self-managed hosted clusters.

Lifecycle changes (azure.go):
- Add encryptionKeyID field to AzurePlatformConfig, read from
  AZURE_ENCRYPTION_KEY_ID env var with vault file fallback
- Pass --encryption-key-id to the public cluster variant when set
- Add secret-encryption label to the public test group's LabelFilter

New test file (hosted_cluster_secret_encryption_test.go):
- KMS spec validation: asserts ActiveKey fields and auth config
  (WorkloadIdentity for self-managed, ManagedIdentity for ARO HCP)
- KMS functional validation: creates test secret in hosted cluster,
  execs etcdctl in etcd pod, asserts k8s:enc:kms:v2 prefix
- Skips gracefully when KMS is not configured

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go (1)

51-52: ⚡ Quick win

Use When ... it should ... phrasing for new test case titles.

The new It(...) titles don’t follow the repository’s required Gherkin-style format.

Proposed fix
-			It("should have ActiveKey fields populated", func() {
+			It("When Azure KMS is configured, it should have ActiveKey fields populated", func() {
...
-			It("should have KMS authentication configured", func() {
+			It("When Azure KMS is configured, it should have exactly one authentication mechanism configured", func() {
...
-		It("should encrypt secrets in etcd using KMSv2", func() {
+		It("When a secret is created, it should be encrypted in etcd using KMSv2", func() {

As per coding guidelines: **/*_test.go: Always use "When ... it should ..." format for describing test cases when creating unit tests; Prefer Gherkin Syntax to define unit test cases.

Also applies to: 64-65, 94-95

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go` around lines 51 -
52, The test title uses an It(...) with plain wording—change the description to
the repository's Gherkin-style "When ... it should ..." phrasing; update the It
call(s) in hosted_cluster_secret_encryption_test.go (e.g., the It("should have
ActiveKey fields populated") invocation and the other occurrences at the
referenced nearby blocks) to use When(...) or a When-style wrapper so each test
reads "When <condition>, it should <expected behavior>" (ensure you preserve the
same test body and any test context variables like testCtx and function names).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 69-73: The assertion currently allows both auth methods to be set;
change the check to enforce exclusive-or by replacing the OR condition with an
XOR so exactly one of azureKMS.WorkloadIdentity.ClientID or
azureKMS.KMS.CredentialsSecretName is set (e.g. compute hasWorkloadIdentity :=
azureKMS.WorkloadIdentity.ClientID != "" and hasManagedIdentity :=
azureKMS.KMS.CredentialsSecretName != "" then assert hasWorkloadIdentity !=
hasManagedIdentity). Update the Expect message to indicate that exactly one of
WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set.

---

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 51-52: The test title uses an It(...) with plain wording—change
the description to the repository's Gherkin-style "When ... it should ..."
phrasing; update the It call(s) in hosted_cluster_secret_encryption_test.go
(e.g., the It("should have ActiveKey fields populated") invocation and the other
occurrences at the referenced nearby blocks) to use When(...) or a When-style
wrapper so each test reads "When <condition>, it should <expected behavior>"
(ensure you preserve the same test body and any test context variables like
testCtx and function names).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d75b32a4-f5bc-4b08-b8d1-c7f0992d0d05

📥 Commits

Reviewing files that changed from the base of the PR and between 7e57a17 and 674b977.

📒 Files selected for processing (2)
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go

Comment on lines +69 to +73
hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != ""
hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != ""
Expect(hasWorkloadIdentity || hasManagedIdentity).To(BeTrue(),
"either WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce exactly one Azure KMS auth mechanism.

This check currently passes when both auth mechanisms are set, but the test intent is exclusive-or. Tighten this assertion so invalid dual-config doesn’t pass.

Proposed fix
 				hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != ""
 				hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != ""
-				Expect(hasWorkloadIdentity || hasManagedIdentity).To(BeTrue(),
-					"either WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")
+				Expect(hasWorkloadIdentity).NotTo(Equal(hasManagedIdentity),
+					"exactly one of WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != ""
hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != ""
Expect(hasWorkloadIdentity || hasManagedIdentity).To(BeTrue(),
"either WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")
hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != ""
hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != ""
Expect(hasWorkloadIdentity).NotTo(Equal(hasManagedIdentity),
"exactly one of WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go` around lines 69 -
73, The assertion currently allows both auth methods to be set; change the check
to enforce exclusive-or by replacing the OR condition with an XOR so exactly one
of azureKMS.WorkloadIdentity.ClientID or azureKMS.KMS.CredentialsSecretName is
set (e.g. compute hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID !=
"" and hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != "" then
assert hasWorkloadIdentity != hasManagedIdentity). Update the Expect message to
indicate that exactly one of WorkloadIdentity.ClientID or
KMS.CredentialsSecretName must be set.

@jparrill
Copy link
Copy Markdown
Contributor

jparrill commented Jun 3, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2062150613344456704 | Cost: $3.400087499999999 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-aws-4-22

@bryan-cox
Copy link
Copy Markdown
Member Author

/verified by e2e-azure-v2-self-managed

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This PR has been marked as verified by e2e-azure-v2-self-managed.

Details

In response to this:

/verified by e2e-azure-v2-self-managed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Jun 3, 2026

Test Results

e2e-aws

e2e-aks

@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Jun 3, 2026

/lgtm

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-aks-4-22

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Jun 3, 2026

I have all the evidence I need. Here is the analysis:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

google.golang.org/protobuf@v1.36.11: read "https://proxy.golang.org/google.golang.org/protobuf/@v/v1.36.11.zip": stream error: stream ID 347; INTERNAL_ERROR; received from peer

Summary

The verify-deps job failed due to a transient network error when the Go module proxy (proxy.golang.org) dropped an HTTP/2 stream while go mod tidy was downloading google.golang.org/protobuf@v1.36.11. The single broken stream caused 24 cascading import resolution failures across every package that depends on protobuf. This is an infrastructure flake completely unrelated to the PR's code changes.

Root Cause

The go-verify-deps CI step runs go mod tidy to verify that vendored dependencies are consistent with go.mod/go.sum. During this execution, the Go toolchain attempted to download google.golang.org/protobuf@v1.36.11 from proxy.golang.org. The download was initiated successfully (the go: downloading google.golang.org/protobuf v1.36.11 line confirms the request started) but the HTTP/2 connection then broke with stream error: stream ID 347; INTERNAL_ERROR; received from peer.

This is an HTTP/2 INTERNAL_ERROR (error code 2) sent by the Go module proxy server — it indicates the server encountered an internal problem while serving the zip file and aborted the stream. Because go mod tidy processes all packages in the dependency graph, every package that transitively imports google.golang.org/protobuf produced its own error line, resulting in 24 identical errors (different protobuf sub-packages, same underlying broken download). go mod tidy does not retry on stream errors, so the single transient failure cascaded into a total job failure.

This is not caused by the PR changes. The protobuf module is a pre-existing transitive dependency already in the vendor tree. The PR (CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Self Managed Azure) does not modify go.mod, go.sum, or the vendor/ directory in a way that would introduce protobuf — the module was already present. The failure would have occurred on any PR running at the same time due to the proxy-side issue.

Recommendations
  1. Retest the job — This is a transient infrastructure flake. Simply re-running /retest or /test verify-deps on the PR should produce a clean pass.
  2. No code changes needed — The PR's code changes are not involved in this failure. The verify-deps step failed before it could even finish go mod tidy, so no actual dependency verification ran.
  3. If the failure persists on retry, investigate whether proxy.golang.org is experiencing broader outages by checking proxy.golang.org status or the Go module mirror status page. Persistent failures across multiple retries would suggest a proxy-side problem with serving the google.golang.org/protobuf@v1.36.11 zip artifact.
Evidence
Evidence Detail
Failed step verify-deps-go-verify-deps (exited code 1 after 37s)
Failing command go mod tidy (never completed; no subsequent verification steps ran)
Error type HTTP/2 stream error: INTERNAL_ERROR (server-side) on stream ID 347
Failed URL https://proxy.golang.org/google.golang.org/protobuf/@v/v1.36.11.zip
Error count 24 identical stream errors across different protobuf sub-packages
Module affected google.golang.org/protobuf@v1.36.11 (pre-existing transitive dependency)
PR relevance None — protobuf is not introduced by this PR; it is already in the dependency tree
Go version go1.26.3 (Red Hat 1.26.3-1.el9_8) linux/amd64
Failure classification Infrastructure flake — Go module proxy transient network error

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 7461f85 and 2 for PR HEAD 674b977 in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit e5f419c into openshift:main Jun 3, 2026
42 checks passed
@bryan-cox bryan-cox deleted the CNTRLPLANE-1357 branch June 3, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants