CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Self Managed Azure#8653
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bryan-cox: This pull request references CNTRLPLANE-1357 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
WalkthroughAzure platform config now reads an encryption key ID from env or a default file and injects it into public cluster specs. The Azure public test matrix includes a secret-encryption label. A new e2ev2 Ginkgo suite validates KMSv2 spec fields and verifies hosted-cluster secrets are stored in etcd with the k8s:enc:kms:v2 marker. ChangesAzure platform config and test matrix
Hosted cluster secret encryption tests
Estimated code review effort: Suggested reviewers:
🚥 Pre-merge checks | ✅ 5 | ❌ 10❌ Failed checks (10 inconclusive)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test ? |
|
/test e2e-azure-v2-self-managed |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/v2/lifecycle/azure.go (1)
50-81: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd unit coverage for the new encryption-key config path.
This adds new branching around env/file precedence and
ClusterSpecs()arg generation, but there’s no Go test covering it. A small table-driven test here would catch regressions before they land in e2e.Suggested test cases
+// Cases worth covering: +// - env var set -> use env value +// - env var unset and file present -> use file value +// - encryptionKeyID set -> public spec gets --encryption-key-id +// - encryptionKeyID unset -> public spec does not get the flagAs per coding guidelines, "Always include unit tests when creating new functions or modifying existing ones".
Also applies to: 103-113
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/lifecycle/azure.go` around lines 50 - 81, NewAzurePlatformConfig introduced branching around encryptionKeyID (env AZURE_ENCRYPTION_KEY_ID vs file defaultEncryptionKeyID) and affects ClusterSpecs() generation but lacks unit tests; add a table-driven test that exercises NewAzurePlatformConfig for: (1) env var set, (2) file-only value present, and (3) neither set (expect warning/empty). Instantiate configs via NewAzurePlatformConfig with a temp sharedDir, write/read the defaultEncryptionKeyID file as needed, and assert cfg.encryptionKeyID value and that ClusterSpecs() (or the public method that consumes the config) is generated with the expected argument values; include cleanup of temp files and use t.Run subtests to cover branches.
🧹 Nitpick comments (1)
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go (1)
104-108: ⚡ Quick winBound the cleanup delete with a timeout.
context.Background()here means cleanup can block without any deadline if the API call stalls. Use a shortWithTimeoutcontext instead.Proposed fix
DeferCleanup(func() { - if err := hostedClusterClient.Delete(context.Background(), testSecret); err != nil && !apierrors.IsNotFound(err) { + cleanupCtx, cancel := context.WithTimeout(context.Background(), time.Minute) + defer cancel() + if err := hostedClusterClient.Delete(cleanupCtx, testSecret); err != nil && !apierrors.IsNotFound(err) { GinkgoWriter.Printf("WARNING: failed to cleanup test secret: %v\n", err) } })As per coding guidelines, "context.Context for cancellation and timeouts".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go` around lines 104 - 108, The cleanup Delete call uses context.Background() which can hang; update the DeferCleanup closure to create a short timeout context (e.g. context.WithTimeout) and pass that to hostedClusterClient.Delete, ensuring you call the cancel function (defer cancel()) before the Delete returns; target the DeferCleanup closure that calls hostedClusterClient.Delete(...) with testSecret and replace the background context with the bounded context and proper cancelation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 91-102: The test uses a fixed Secret name "e2e-kms-test-secret"
which causes flakiness; change creation of the Secret object (testSecret) to use
a generated name (e.g., set ObjectMeta.GenerateName or append a random/UID
suffix) and ensure the same testSecret reference is used for DeferCleanup and
the hostedClusterClient.Create call so cleanup and assertions operate on the
created resource; update any checks that assume the exact name to use the actual
created ObjectMeta.Name from the returned object.
---
Outside diff comments:
In `@test/e2e/v2/lifecycle/azure.go`:
- Around line 50-81: NewAzurePlatformConfig introduced branching around
encryptionKeyID (env AZURE_ENCRYPTION_KEY_ID vs file defaultEncryptionKeyID) and
affects ClusterSpecs() generation but lacks unit tests; add a table-driven test
that exercises NewAzurePlatformConfig for: (1) env var set, (2) file-only value
present, and (3) neither set (expect warning/empty). Instantiate configs via
NewAzurePlatformConfig with a temp sharedDir, write/read the
defaultEncryptionKeyID file as needed, and assert cfg.encryptionKeyID value and
that ClusterSpecs() (or the public method that consumes the config) is generated
with the expected argument values; include cleanup of temp files and use t.Run
subtests to cover branches.
---
Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 104-108: The cleanup Delete call uses context.Background() which
can hang; update the DeferCleanup closure to create a short timeout context
(e.g. context.WithTimeout) and pass that to hostedClusterClient.Delete, ensuring
you call the cancel function (defer cancel()) before the Delete returns; target
the DeferCleanup closure that calls hostedClusterClient.Delete(...) with
testSecret and replace the background context with the bounded context and
proper cancelation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b6ebd02e-4dd1-4656-95dc-df08f3162a81
📒 Files selected for processing (2)
test/e2e/v2/lifecycle/azure.gotest/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8653 +/- ##
=======================================
Coverage 41.26% 41.27%
=======================================
Files 755 755
Lines 93443 93446 +3
=======================================
+ Hits 38563 38566 +3
Misses 52148 52148
Partials 2732 2732 see 1 file with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
| @@ -0,0 +1,152 @@ | |||
| //go:build e2ev2 | |||
There was a problem hiding this comment.
nit: All other files in test/e2e/v2/tests/ have the Apache 2.0 license block between the build tag and package. This one's missing it.
There was a problem hiding this comment.
Done. Added Apache 2.0 license header.
AI-assisted response via Claude Code
| Skip("KMS.Azure is not configured on this hosted cluster") | ||
| } | ||
| }) | ||
|
|
There was a problem hiding this comment.
Since KMSSpecValidationTest is exported and could be registered from a different Describe without the outer guard, this can nil-pointer if SecretEncryption or KMS aren't set. I'd collapse it into a single defensive check:
if hc == nil || hc.Spec.Platform.Type != hyperv1.AzurePlatform ||
hc.Spec.SecretEncryption == nil || hc.Spec.SecretEncryption.KMS == nil ||
hc.Spec.SecretEncryption.KMS.Azure == nil {
Skip("Azure KMS spec validation requires Azure platform with KMS configured")
}There was a problem hiding this comment.
Done. Collapsed into a single defensive nil-check chain.
AI-assisted response via Claude Code
| "--cert=/etc/etcd/tls/client/etcd-client.crt", | ||
| "--key=/etc/etcd/tls/client/etcd-client.key", | ||
| "get", | ||
| secretEtcdKey, |
There was a problem hiding this comment.
This etcdctl command with the TLS cert paths is a verbatim copy of test/e2e/util/util.go:1500-1508 (and the same paths show up again in etcd_chaos_test.go). If those paths ever change, there are 3+ places to update. Not blocking, but it'd be nice to extract something like EtcdctlGetCommand(key string) []string into a shared util as a follow-up.
There was a problem hiding this comment.
Good call — I'll track extracting an EtcdctlGetCommand helper as a follow-up.
AI-assisted response via Claude Code
| }) | ||
| }) | ||
| } | ||
|
|
There was a problem hiding this comment.
This:
Expect(strings.Contains(output, "k8s:enc:kms:v2")).To(BeTrue(), ...)gives you "expected true, got false" on failure — not helpful for debugging. With ContainSubstring you get the actual string in the diff:
Expect(output).To(ContainSubstring("k8s:enc:kms:v2"),
"secret should be encrypted using KMSv2")Bonus: you can drop the "strings" import.
There was a problem hiding this comment.
Done. Switched to ContainSubstring for better failure diagnostics.
AI-assisted response via Claude Code
| }) | ||
| } | ||
|
|
||
| // RegisterHostedClusterSecretEncryptionTests registers all secret encryption tests. |
There was a problem hiding this comment.
The failure message dumps the full etcdctl output (raw etcd value: %s). The test data here is synthetic so it's fine, but this pattern could leak real data if someone copies it. I'd trim it down:
"secret should be encrypted using KMSv2 (output length: %d bytes)", len(output)There was a problem hiding this comment.
Done. Addressed together with the ContainSubstring change — the raw output format string is removed.
AI-assisted response via Claude Code
| } | ||
|
|
||
| // RegisterHostedClusterSecretEncryptionTests registers all secret encryption tests. | ||
| func RegisterHostedClusterSecretEncryptionTests(getTestCtx internal.TestContextGetter) { |
There was a problem hiding this comment.
nit: You could strengthen the check by also verifying the plaintext isn't there in the clear:
Expect(output).NotTo(ContainSubstring("testData"),
"secret data should not be readable in plaintext from etcd")There was a problem hiding this comment.
Done. Added a negative assertion for plaintext data in the etcd output.
AI-assisted response via Claude Code
| if cfg.encryptionKeyID == "" { | ||
| if data, err := os.ReadFile(defaultEncryptionKeyID); err == nil { | ||
| cfg.encryptionKeyID = strings.TrimSpace(string(data)) | ||
| } |
There was a problem hiding this comment.
You already log a WARNING for AZURE_PRIVATE_NAT_SUBNET_ID when it's missing. Without the same for the encryption key, a misconfigured CI job will just silently skip the tests and nobody will notice. Something like:
if cfg.encryptionKeyID == "" {
log.Printf("WARNING: AZURE_ENCRYPTION_KEY_ID is not set; secret encryption tests will be skipped")
}There was a problem hiding this comment.
Done. Added warning log matching the existing NAT subnet pattern.
AI-assisted response via Claude Code
| LabelFilter: "self-managed-azure-public || nodepool-lifecycle", | ||
| LabelFilter: "self-managed-azure-public || nodepool-lifecycle || secret-encryption", | ||
| Skip: "KAS allowed CIDRs", | ||
| JUnitFile: "junit_self_managed_azure_public.xml", |
There was a problem hiding this comment.
This should be fine since the BeforeEach skips when KMS isn't configured. Just want to confirm: if AZURE_ENCRYPTION_KEY_ID isn't set, the cluster gets created without encryption, and the tests skip via the guard in the Describe — right? I want to make sure there's no scenario where these tests run against a cluster without KMS and fail instead of skipping.
There was a problem hiding this comment.
Correct. When AZURE_ENCRYPTION_KEY_ID is unset, the cluster is created without --encryption-key-id, so SecretEncryption is nil on the HostedCluster. The Describe-level BeforeEach checks for SecretEncryption.KMS == nil and skips. No failure path.
AI-assisted response via Claude Code
Add e2e v2 validation for KMSv2 secret encryption on Azure self-managed hosted clusters. Lifecycle changes (azure.go): - Add encryptionKeyID field to AzurePlatformConfig, read from AZURE_ENCRYPTION_KEY_ID env var with vault file fallback - Pass --encryption-key-id to the public cluster variant when set - Add secret-encryption label to the public test group's LabelFilter New test file (hosted_cluster_secret_encryption_test.go): - KMS spec validation: asserts ActiveKey fields and auth config (WorkloadIdentity for self-managed, ManagedIdentity for ARO HCP) - KMS functional validation: creates test secret in hosted cluster, execs etcdctl in etcd pod, asserts k8s:enc:kms:v2 prefix - Skips gracefully when KMS is not configured Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7e57a17 to
674b977
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go (1)
51-52: ⚡ Quick winUse
When ... it should ...phrasing for new test case titles.The new
It(...)titles don’t follow the repository’s required Gherkin-style format.Proposed fix
- It("should have ActiveKey fields populated", func() { + It("When Azure KMS is configured, it should have ActiveKey fields populated", func() { ... - It("should have KMS authentication configured", func() { + It("When Azure KMS is configured, it should have exactly one authentication mechanism configured", func() { ... - It("should encrypt secrets in etcd using KMSv2", func() { + It("When a secret is created, it should be encrypted in etcd using KMSv2", func() {As per coding guidelines:
**/*_test.go: Always use "When ... it should ..." format for describing test cases when creating unit tests; Prefer Gherkin Syntax to define unit test cases.Also applies to: 64-65, 94-95
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go` around lines 51 - 52, The test title uses an It(...) with plain wording—change the description to the repository's Gherkin-style "When ... it should ..." phrasing; update the It call(s) in hosted_cluster_secret_encryption_test.go (e.g., the It("should have ActiveKey fields populated") invocation and the other occurrences at the referenced nearby blocks) to use When(...) or a When-style wrapper so each test reads "When <condition>, it should <expected behavior>" (ensure you preserve the same test body and any test context variables like testCtx and function names).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 69-73: The assertion currently allows both auth methods to be set;
change the check to enforce exclusive-or by replacing the OR condition with an
XOR so exactly one of azureKMS.WorkloadIdentity.ClientID or
azureKMS.KMS.CredentialsSecretName is set (e.g. compute hasWorkloadIdentity :=
azureKMS.WorkloadIdentity.ClientID != "" and hasManagedIdentity :=
azureKMS.KMS.CredentialsSecretName != "" then assert hasWorkloadIdentity !=
hasManagedIdentity). Update the Expect message to indicate that exactly one of
WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set.
---
Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go`:
- Around line 51-52: The test title uses an It(...) with plain wording—change
the description to the repository's Gherkin-style "When ... it should ..."
phrasing; update the It call(s) in hosted_cluster_secret_encryption_test.go
(e.g., the It("should have ActiveKey fields populated") invocation and the other
occurrences at the referenced nearby blocks) to use When(...) or a When-style
wrapper so each test reads "When <condition>, it should <expected behavior>"
(ensure you preserve the same test body and any test context variables like
testCtx and function names).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d75b32a4-f5bc-4b08-b8d1-c7f0992d0d05
📒 Files selected for processing (2)
test/e2e/v2/lifecycle/azure.gotest/e2e/v2/tests/hosted_cluster_secret_encryption_test.go
| hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != "" | ||
| hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != "" | ||
| Expect(hasWorkloadIdentity || hasManagedIdentity).To(BeTrue(), | ||
| "either WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set") | ||
|
|
There was a problem hiding this comment.
Enforce exactly one Azure KMS auth mechanism.
This check currently passes when both auth mechanisms are set, but the test intent is exclusive-or. Tighten this assertion so invalid dual-config doesn’t pass.
Proposed fix
hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != ""
hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != ""
- Expect(hasWorkloadIdentity || hasManagedIdentity).To(BeTrue(),
- "either WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")
+ Expect(hasWorkloadIdentity).NotTo(Equal(hasManagedIdentity),
+ "exactly one of WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != "" | |
| hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != "" | |
| Expect(hasWorkloadIdentity || hasManagedIdentity).To(BeTrue(), | |
| "either WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set") | |
| hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID != "" | |
| hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != "" | |
| Expect(hasWorkloadIdentity).NotTo(Equal(hasManagedIdentity), | |
| "exactly one of WorkloadIdentity.ClientID or KMS.CredentialsSecretName must be set") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/v2/tests/hosted_cluster_secret_encryption_test.go` around lines 69 -
73, The assertion currently allows both auth methods to be set; change the check
to enforce exclusive-or by replacing the OR condition with an XOR so exactly one
of azureKMS.WorkloadIdentity.ClientID or azureKMS.KMS.CredentialsSecretName is
set (e.g. compute hasWorkloadIdentity := azureKMS.WorkloadIdentity.ClientID !=
"" and hasManagedIdentity := azureKMS.KMS.CredentialsSecretName != "" then
assert hasWorkloadIdentity != hasManagedIdentity). Update the Expect message to
indicate that exactly one of WorkloadIdentity.ClientID or
KMS.CredentialsSecretName must be set.
|
/lgtm |
|
Scheduling tests matching the |
|
/test e2e-azure-v2-self-managed |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/test e2e-aws-4-22 |
|
/verified by e2e-azure-v2-self-managed |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Test Resultse2e-aws
e2e-aks
|
|
/lgtm |
|
/test e2e-azure-self-managed |
|
/test e2e-aks-4-22 |
|
I have all the evidence I need. Here is the analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe This is an HTTP/2 This is not caused by the PR changes. The protobuf module is a pre-existing transitive dependency already in the vendor tree. The PR (CNTRLPLANE-1357: add KMSv2 secret encryption e2e v2 test for Self Managed Azure) does not modify Recommendations
Evidence
|
|
@bryan-cox: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
Adds an e2e v2 test that validates KMSv2 secret encryption on Azure self-managed hosted clusters. This ports the KMS validation from the v1
TestCreateClusterCustomConfigtest into the v2 framework.k8s:enc:kms:v2prefix in raw etcd value--encryption-key-idto the public cluster variant whenAZURE_ENCRYPTION_KEY_IDis set (env var or vault-mounted file)secret-encryptionlabel to the public test group's label filterCI Prerequisites
Azure resources created in
self-managed-azure-ciresource group:sm-azure-ci-kmse2e-encryption-keyVault secrets updated in
hypershift-ci-jobs-self-managed-azure-e2e:workload-identities.json: addedkmsClientIDAZURE_ENCRYPTION_KEY_ID: new field with key URLWhich issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/CNTRLPLANE-1357
Special notes for your reviewer:
The test file is designed to be platform-extensible — future AWS and GCP KMS contexts can be added without touching the Azure block.
Checklist:
Summary by CodeRabbit