CNTRLPLANE-2157: Migrate tests of Prometheus, Audit and TLS to OTE #31360
CNTRLPLANE-2157: Migrate tests of Prometheus, Audit and TLS to OTE #31360YamunadeviShanmugam wants to merge 4 commits into
Conversation
Migrate audit tests to OTE
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@YamunadeviShanmugam: This pull request references CNTRLPLANE-2157 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: YamunadeviShanmugam The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds extended test coverage for apiserver TLS and encryption behavior, audit profile and log validation, and Prometheus alert-rule checks. ChangesApiserver TLS and encryption
Audit profile and log validation
Prometheus alert rules
Estimated code review effort: 4 (Complex) | ~60 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Migrate prometheus tests to OTE
1664313 to
56a5da4
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (8)
.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json (1)
1-470: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider if raw ProwJob JSONs should be committed to the repository.
This 470-line raw ProwJob JSON is being added as a committed file under
.work/. These artifacts are typically generated during CI and may be better suited as:
- Generated artifacts stored elsewhere (GCS/S3 as configured by
gcs_configuration)- Local debugging files that should be
.gitignoredIf these are intended as permanent documentation of the failure analysis, consider whether a summarized excerpt or a script to fetch the ProwJob would be more maintainable than committing the full JSON. The
pkg/riskanalysis/types.gostruct only capturesName, so any tooling consuming this JSON would need to use a different type or unstructured parsing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json around lines 1 - 470, This raw ProwJob JSON should not be committed under .work/ as a permanent tracked artifact. Remove the committed JSON and either ignore generated ProwJob logs via .gitignore or move them to external CI storage; if the intent is to keep failure references, add a small summary or fetch script instead, and make sure any code in pkg/riskanalysis/types.go or related tooling does not assume this full JSON shape..work/prow-job-analyze-test-failure/2071867594297577472/analysis.md (1)
18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd language specifier to fenced code block.
The fenced code block following the
### Errorheading is missing a language tag. Per markdownlint MD040, specify a language for syntax highlighting.-``` +```text the server could not find the requested resource (get clusterversions.config.openshift.io version)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.work/prow-job-analyze-test-failure/2071867594297577472/analysis.md at line 18, Add a language specifier to the fenced code block under the Error section so it satisfies markdownlint MD040; update the markdown snippet that contains the “the server could not find the requested resource” message to use a tagged fence in the analysis.md content. Use the existing fenced block near the “### Error” heading as the target and keep the text unchanged.test/extended/cluster/audit.go (2)
250-320: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicate master-node listing/HyperShift-skip logic across both tests.
Lines 255-262 and 283-290 repeat the same HyperShift skip check plus master-node listing and empty-check. Consider extracting a shared helper (e.g.
getMasterNodesOrSkip(ctx, oc)) to reduce duplication and centralize the master-node selection logic for future maintenance.♻️ Example helper extraction
func getMasterNodesOrSkip(ctx g.SpecContext, oc *exutil.CLI) []corev1.Node { if ok, _ := exutil.IsHypershift(ctx, oc.AdminConfigClient()); ok { g.Skip("HyperShift hosts the control plane externally; master node logs are not accessible via node-logs") } masters, err := oc.AsAdmin().KubeClient().CoreV1().Nodes().List( ctx, metav1.ListOptions{LabelSelector: "node-role.kubernetes.io/master"}) o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(masters.Items).NotTo(o.BeEmpty(), "expected at least one master node") return masters.Items }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/cluster/audit.go` around lines 250 - 320, The two audit tests duplicate the same HyperShift skip and master-node lookup logic, so centralize it in a shared helper such as getMasterNodesOrSkip used by both g.It blocks. Move the exutil.IsHypershift check, Nodes().List call, and empty-list expectation into that helper, returning the master nodes for the audit-file and audit-JSON checks to consume. Keep the test bodies focused on their specific assertions and reference the existing oc, ctx, and master node selection logic when wiring the helper in.
186-225: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAssert rejection reason, not just presence of an error.
The three admission-rejection tests only check
err).To(o.HaveOccurred()). A transient CLI/network failure (not an actual server-side validation rejection) would also satisfy this assertion, masking a regression where the API server incorrectly accepts the invalid value. Consider asserting the error/output contains an expected substring (e.g. field name or "Invalid value") to confirm the rejection is due to the intended validation rule.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/cluster/audit.go` around lines 186 - 225, The three audit admission tests in the cluster audit spec currently only assert that an error occurred, which can pass for unrelated CLI or network failures. Update the checks in the three g.It cases that patch apiserver cluster so they also verify the returned error/output contains the expected validation message or field-specific substring, using the same oc.AsAdmin().WithoutNamespace().Run("patch") flow and the existing o.Expect assertions to confirm the rejection is from server-side validation.test/extended/apiserver/tls.go (1)
966-983: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAnchor the encryption-key name matcher.
The current unanchored patterns can match unintended names before extracting the suffix. Use anchored patterns like
^encryption-key-openshift-apiserver-(\d+)$and parse the captured group.As per path instructions, regexes should be anchored.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 966 - 983, The encryption-key name matcher is using an unanchored regex that can match partial names before splitting the suffix. Update the matching logic around regexp.Compile and the loop over strings.Fields(out) to use an anchored pattern such as ^encryption-key-openshift-apiserver-(\d+)$, then extract and parse the captured numeric group directly instead of splitting the full name. Keep the existing maxNum selection behavior, but ensure only exact key names are accepted.Source: Path instructions
test/extended/prometheus/prometheus.go (3)
1268-1268: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueRedundant type conversion.
promv1.AlertingRule.Queryis already typed asstringinclient_golang'sv1package, sostring(rule.Query)is a no-op conversion.- o.Expect(string(rule.Query)).To(o.MatchRegexp(`audit`), + o.Expect(rule.Query).To(o.MatchRegexp(`audit`),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/prometheus/prometheus.go` at line 1268, The test in the Prometheus extended suite is doing a redundant string conversion on AlertingRule.Query, since that field is already a string. Update the assertion that references rule.Query to use it directly without wrapping it in string(), keeping the existing regexp match logic unchanged.
1296-1329: 🩺 Stability & Availability | 🔵 TrivialConsider run-isolation for the "not firing" assertion.
Asserting
KubeAPIDownis not firing could be flaky if this suite runs alongside/after disruptive tests in the same conformance run (e.g., the etcd encryption key rotation/secret deletion and custom-cert tests added elsewhere in this PR stack, which can transiently affect apiserver availability). Worth confirming test ordering/isolation guarantees or documenting the assumption.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/prometheus/prometheus.go` around lines 1296 - 1329, The KubeAPIDown “not currently firing” check in the Prometheus alert test may be flaky when run alongside disruptive tests that can transiently affect apiserver availability. Update the test around lookupAlertingRule, wait.PollUntilContextTimeout, and the ALERTS query to either enforce stronger isolation/order assumptions or gate the assertion so it only runs when the run environment guarantees no concurrent disruptive operations. If no isolation guarantee exists, add a clear skip/documentation path in this spec to avoid false failures.
1203-1330: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winExtract duplicated rule-lookup/severity-check logic into helpers.
The lookup-and-nil-check sequence is repeated identically across all five
Itblocks, and the severity-label presence/non-empty check is repeated verbatim three times. Consolidating reduces duplication and centralizes future changes to messages/behavior.♻️ Example consolidation
func requireAlertingRule(ctx g.SpecContext, oc *exutil.CLI, alertName string) *promv1.AlertingRule { prometheusURL, bearerToken := mustGetPrometheusURLAndToken(ctx, oc) g.By(fmt.Sprintf("looking for alert rule %q", alertName)) rule := lookupAlertingRule(prometheusURL, bearerToken, alertName) o.Expect(rule).NotTo(o.BeNil(), "alert rule %q was not found in Prometheus", alertName) return rule } func expectSeverityLabel(rule *promv1.AlertingRule, alertName string) string { severity, ok := rule.Labels["severity"] o.Expect(ok).To(o.BeTrue(), "alert rule %q is missing 'severity' label", alertName) o.Expect(string(severity)).NotTo(o.BeEmpty(), "alert rule %q has an empty 'severity' label", alertName) return string(severity) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/prometheus/prometheus.go` around lines 1203 - 1330, The alert tests repeat the same rule lookup, nil check, and severity-label validation across multiple `g.It` cases in `Prometheus Alerts`, so extract that shared behavior into helpers. Create a reusable helper around `mustGetPrometheusURLAndToken`/`lookupAlertingRule` that returns a verified rule for a given alert name, and another helper for the repeated severity presence/non-empty check using `rule.Labels["severity"]`. Update the existing `It` blocks to call those helpers instead of duplicating the logic, keeping the per-alert assertions like `summary`, `description`, `Query`, and `KubeAPIDown`’s `critical` check in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 701-744: The recreated secret UID check in the polling block is
order-dependent because the jsonpath output loses secret names and the
comparison in the wait.PollUntilContextTimeout callback relies on slice
positions. Update the logic around the secret retrieval and UID comparison to
parse the returned data into a name-to-UID mapping, then compare the new UIDs
against the originals by secret name instead of index. Keep the fix localized to
the recreated encryption-config secret validation in this test helper so the
stability check no longer flakes on output ordering.
- Around line 398-408: Cleanup in the deferred restore path is ignoring failures
and still depends on the spec context, so restore may be skipped on
cancellation. Update the deferred cleanup around the cluster restore logic in
the test to run under a detached context created with context.WithoutCancel(ctx)
plus an explicit timeout, and make the cleanup operations fail loudly by
logging/asserting errors from the patch, kubeconfig copy, and restoreCluster
steps instead of discarding them. Apply the same cleanup pattern to the other
deferred restore blocks referenced in this test file, using the existing
restoreCluster and oc.AsAdmin().WithoutNamespace().Run("patch") flow as the
anchor points.
- Around line 823-839: The certificate validation helper currently only logs
parse failures and still succeeds when no certificates were parsed. Update the
logic around x509.ParseCertificate and the final count check so the routine
fails if count remains zero after scanning all PEM blocks, using the existing
e2e.Logf and o.Expect flow in this helper to report malformed or empty
certificate data.
- Around line 622-627: Wait for both encryption migrations before asserting the
route prefix: the current test only calls waitEncryptionKeyMigration for
newKASEncSecretName, so the route prefix check can still race
openshift-apiserver. Update the test flow in tls.go to also wait on the
OpenShift API server migration using the relevant migration helper for
openshift-apiserver, and keep the existing kube-apiserver wait plus the
completed checks before proceeding to the route prefix assertion.
- Around line 854-858: Make getServerCertInfo use the provided test context for
the TLS probe so a stalled handshake can be canceled with the spec timeout.
Update the tls.Dial call to use a tls.Dialer with DialContext and thread ctx
through the connection setup, and make sure the helper’s call site still passes
the context. Also verify the return value from pool.AppendCertsFromPEM in
getServerCertInfo and fail fast if the CA bundle cannot be parsed.
- Around line 404-405: The test setup in apiserver/tls.go is shelling out
through bash -c for copy and rewrite operations, which is unsafe and brittle
when using originKubeconfig from KUBECONFIG. Update the logic around the cp/sed
flow to use Go file APIs for copying and editing the kubeconfig, and where a
command is still needed, pass explicit args directly via exec.Command instead of
constructing a shell string. Keep the change localized to the setup helpers that
manipulate originKubeconfig and originKubeconfigBkp.
---
Nitpick comments:
In @.work/prow-job-analyze-test-failure/2071867594297577472/analysis.md:
- Line 18: Add a language specifier to the fenced code block under the Error
section so it satisfies markdownlint MD040; update the markdown snippet that
contains the “the server could not find the requested resource” message to use a
tagged fence in the analysis.md content. Use the existing fenced block near the
“### Error” heading as the target and keep the text unchanged.
In @.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json:
- Around line 1-470: This raw ProwJob JSON should not be committed under .work/
as a permanent tracked artifact. Remove the committed JSON and either ignore
generated ProwJob logs via .gitignore or move them to external CI storage; if
the intent is to keep failure references, add a small summary or fetch script
instead, and make sure any code in pkg/riskanalysis/types.go or related tooling
does not assume this full JSON shape.
In `@test/extended/apiserver/tls.go`:
- Around line 966-983: The encryption-key name matcher is using an unanchored
regex that can match partial names before splitting the suffix. Update the
matching logic around regexp.Compile and the loop over strings.Fields(out) to
use an anchored pattern such as ^encryption-key-openshift-apiserver-(\d+)$, then
extract and parse the captured numeric group directly instead of splitting the
full name. Keep the existing maxNum selection behavior, but ensure only exact
key names are accepted.
In `@test/extended/cluster/audit.go`:
- Around line 250-320: The two audit tests duplicate the same HyperShift skip
and master-node lookup logic, so centralize it in a shared helper such as
getMasterNodesOrSkip used by both g.It blocks. Move the exutil.IsHypershift
check, Nodes().List call, and empty-list expectation into that helper, returning
the master nodes for the audit-file and audit-JSON checks to consume. Keep the
test bodies focused on their specific assertions and reference the existing oc,
ctx, and master node selection logic when wiring the helper in.
- Around line 186-225: The three audit admission tests in the cluster audit spec
currently only assert that an error occurred, which can pass for unrelated CLI
or network failures. Update the checks in the three g.It cases that patch
apiserver cluster so they also verify the returned error/output contains the
expected validation message or field-specific substring, using the same
oc.AsAdmin().WithoutNamespace().Run("patch") flow and the existing o.Expect
assertions to confirm the rejection is from server-side validation.
In `@test/extended/prometheus/prometheus.go`:
- Line 1268: The test in the Prometheus extended suite is doing a redundant
string conversion on AlertingRule.Query, since that field is already a string.
Update the assertion that references rule.Query to use it directly without
wrapping it in string(), keeping the existing regexp match logic unchanged.
- Around line 1296-1329: The KubeAPIDown “not currently firing” check in the
Prometheus alert test may be flaky when run alongside disruptive tests that can
transiently affect apiserver availability. Update the test around
lookupAlertingRule, wait.PollUntilContextTimeout, and the ALERTS query to either
enforce stronger isolation/order assumptions or gate the assertion so it only
runs when the run environment guarantees no concurrent disruptive operations. If
no isolation guarantee exists, add a clear skip/documentation path in this spec
to avoid false failures.
- Around line 1203-1330: The alert tests repeat the same rule lookup, nil check,
and severity-label validation across multiple `g.It` cases in `Prometheus
Alerts`, so extract that shared behavior into helpers. Create a reusable helper
around `mustGetPrometheusURLAndToken`/`lookupAlertingRule` that returns a
verified rule for a given alert name, and another helper for the repeated
severity presence/non-empty check using `rule.Labels["severity"]`. Update the
existing `It` blocks to call those helpers instead of duplicating the logic,
keeping the per-alert assertions like `summary`, `description`, `Query`, and
`KubeAPIDown`’s `critical` check in place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: dac4f419-66f8-4232-ab51-a4d61629f023
📒 Files selected for processing (7)
.work/prow-job-analyze-test-failure/2071867594297577472/analysis.md.work/prow-job-analyze-test-failure/2071867594297577472/logs/build-log.txt.work/prow-job-analyze-test-failure/2071867594297577472/logs/junit_e2e.xml.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.jsontest/extended/apiserver/tls.gotest/extended/cluster/audit.gotest/extended/prometheus/prometheus.go
| defer oc.AsAdmin().WithoutNamespace().Run("delete").Args( | ||
| "secret", "custom-api-cert", "-n", "openshift-config", "--ignore-not-found").Execute() | ||
| defer func() { | ||
| g.By("restoring cluster to original state") | ||
| _, _ = oc.AsAdmin().WithoutNamespace().Run("patch").Args( | ||
| "apiserver/cluster", "--type=merge", "-p", patchToRecover).Output() | ||
| if _, cpErr := exec.Command("bash", "-c", | ||
| fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig)).Output(); cpErr != nil { | ||
| e2e.Logf("warning: failed to restore kubeconfig: %v", cpErr) | ||
| } | ||
| restoreCluster() |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Handle cleanup failures and bound deferred cleanup.
These disruptive tests restore cluster-wide API server/encryption state, but cleanup errors are ignored. Please log/assert cleanup failures and use a context.WithoutCancel(ctx) + timeout context for deferred cleanup paths so cancellation does not skip restoration.
As per path instructions, Go code should never ignore error returns. Based on learnings, deferred cleanup in test/extended/ should detach cancellation from the spec context and apply an explicit timeout.
Also applies to: 553-559, 588-592, 690-696
🧰 Tools
🪛 ast-grep (0.44.0)
[error] 403-404: A shell (sh/bash) is invoked with -c and a dynamically built command string (string concatenation, fmt.Sprintf, or a variable holding the command) passed to exec.Command / exec.CommandContext. Untrusted input embedded in the command lets an attacker inject arbitrary shell commands. Avoid the shell: call the target binary directly with exec.Command(name, arg1, arg2, ...) so each argument is passed as a separate, non-interpreted token, and never build a shell command string from external input.
Context: exec.Command("bash", "-c",
fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig))
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').
(command-injection-exec-sh-c-go)
[error] 403-404: An argument to exec.Command/exec.CommandContext is built with fmt.Sprintf that interpolates a non-constant value. When the command is run through a shell (e.g. sh -c) this allows command injection. Pass untrusted input as a separate, explicit argument to exec.Command (which does not invoke a shell) instead of formatting it into the command string, and validate or allowlist any values that must be embedded.
Context: exec.Command("bash", "-c",
fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig))
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').
(command-injection-exec-sprintf-arg-go)
🪛 OpenGrep (1.23.0)
[ERROR] 404-405: Dynamic command passed to exec.Command with a shell invocation. Pass arguments directly to exec.Command without a shell wrapper.
(coderabbit.command-injection.go-exec-command)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/apiserver/tls.go` around lines 398 - 408, Cleanup in the
deferred restore path is ignoring failures and still depends on the spec
context, so restore may be skipped on cancellation. Update the deferred cleanup
around the cluster restore logic in the test to run under a detached context
created with context.WithoutCancel(ctx) plus an explicit timeout, and make the
cleanup operations fail loudly by logging/asserting errors from the patch,
kubeconfig copy, and restoreCluster steps instead of discarding them. Apply the
same cleanup pattern to the other deferred restore blocks referenced in this
test file, using the existing restoreCluster and
oc.AsAdmin().WithoutNamespace().Run("patch") flow as the anchor points.
Sources: Path instructions, Learnings
| if _, cpErr := exec.Command("bash", "-c", | ||
| fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig)).Output(); cpErr != nil { |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify shell-wrapped commands remain in the changed TLS test.
rg -n 'exec\.Command\("bash",\s*"-c"' test/extended/apiserver/tls.goRepository: openshift/origin
Length of output: 478
🏁 Script executed:
#!/bin/bash
sed -n '380,490p' test/extended/apiserver/tls.goRepository: openshift/origin
Length of output: 5602
🏁 Script executed:
python3 - <<'PY'
import base64, subprocess, tempfile, os, textwrap
data = b"-----BEGIN CERTIFICATE-----\n" + bytes(range(1, 64)) * 4 + b"\n-----END CERTIFICATE-----\n"
b64 = base64.b64encode(data).decode()
print("base64 contains '/':", '/' in b64)
cmd = f'sed -i "s/certificate-authority-data: .*/certificate-authority-data: {b64}/" /tmp/nonexistent'
print("sample sed command:", cmd[:120] + "...")
p = subprocess.run(["bash", "-lc", cmd], capture_output=True, text=True)
print("exit:", p.returncode)
print("stderr:", p.stderr.strip())
PYRepository: openshift/origin
Length of output: 500
🏁 Script executed:
rg -n 'originKubeconfig|originKubeconfigBkp|updateKubeconfCmd|caCmd|bash", "-c"|cp %s %s|sed -i' test/extended/apiserver/tls.goRepository: openshift/origin
Length of output: 1081
Avoid bash -c for these test commands. originKubeconfig comes from KUBECONFIG, so the cp/sed command strings depend on external input, and the sed rewrite is fragile for base64 payloads. Use direct Go file APIs and explicit command args instead.
🧰 Tools
🪛 OpenGrep (1.23.0)
[ERROR] 404-405: Dynamic command passed to exec.Command with a shell invocation. Pass arguments directly to exec.Command without a shell wrapper.
(coderabbit.command-injection.go-exec-command)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/apiserver/tls.go` around lines 404 - 405, The test setup in
apiserver/tls.go is shelling out through bash -c for copy and rewrite
operations, which is unsafe and brittle when using originKubeconfig from
KUBECONFIG. Update the logic around the cp/sed flow to use Go file APIs for
copying and editing the kubeconfig, and where a command is still needed, pass
explicit args directly via exec.Command instead of constructing a shell string.
Keep the change localized to the setup helpers that manipulate originKubeconfig
and originKubeconfigBkp.
Sources: Path instructions, Linters/SAST tools
| uidsOld, err := oc.WithoutNamespace().Run("get").Args( | ||
| "secret", | ||
| "encryption-config-openshift-apiserver", | ||
| "encryption-config-openshift-kube-apiserver", | ||
| "-n", "openshift-config-managed", | ||
| "-o=jsonpath={.items[*].metadata.uid}", | ||
| ).Output() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| g.By("2. delete encryption-config-* secrets from openshift-config-managed") | ||
| for _, item := range []string{ | ||
| "encryption-config-openshift-apiserver", | ||
| "encryption-config-openshift-kube-apiserver", | ||
| } { | ||
| e2e.Logf("removing finalizers from secret %s", item) | ||
| err := oc.WithoutNamespace().Run("patch").Args( | ||
| "secret", item, "-n", "openshift-config-managed", | ||
| `-p={"metadata":{"finalizers":null}}`).Execute() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| e2e.Logf("deleting secret %s", item) | ||
| err = oc.WithoutNamespace().Run("delete").Args( | ||
| "secret", item, "-n", "openshift-config-managed").Execute() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| } | ||
|
|
||
| uidsOldSlice := strings.Split(uidsOld, " ") | ||
| e2e.Logf("original secret UIDs: %v", uidsOldSlice) | ||
| errSecret := wait.PollUntilContextTimeout(ctx, 3*time.Second, 60*time.Second, false, | ||
| func(pollCtx context.Context) (bool, error) { | ||
| uidsNew, err := oc.WithoutNamespace().Run("get").Args( | ||
| "secret", | ||
| "encryption-config-openshift-apiserver", | ||
| "encryption-config-openshift-kube-apiserver", | ||
| "-n", "openshift-config-managed", | ||
| "-o=jsonpath={.items[*].metadata.uid}", | ||
| ).Output() | ||
| if err != nil { | ||
| e2e.Logf("encryption-config-* secrets not yet recreated: %v — retrying", err) | ||
| return false, nil | ||
| } | ||
| uidsNewSlice := strings.Split(uidsNew, " ") | ||
| e2e.Logf("new secret UIDs: %v", uidsNewSlice) | ||
| if len(uidsNewSlice) >= 2 && uidsNewSlice[0] != uidsOldSlice[0] && uidsNewSlice[1] != uidsOldSlice[1] { |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Compare recreated secret UIDs by name, not slice position.
The jsonpath output drops secret names, then Line 744 assumes positional ordering is stable. Parse JSON and compare metadata.uid keyed by metadata.name to avoid order-dependent flakes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/apiserver/tls.go` around lines 701 - 744, The recreated secret
UID check in the polling block is order-dependent because the jsonpath output
loses secret names and the comparison in the wait.PollUntilContextTimeout
callback relies on slice positions. Update the logic around the secret retrieval
and UID comparison to parse the returned data into a name-to-UID mapping, then
compare the new UIDs against the originals by secret name instead of index. Keep
the fix localized to the recreated encryption-config secret validation in this
test helper so the stability check no longer flakes on output ordering.
| cert, err := x509.ParseCertificate(block.Bytes) | ||
| if err != nil { | ||
| e2e.Logf(" [%s] failed to parse certificate block: %v", source, err) | ||
| continue | ||
| } | ||
| count++ | ||
| now := time.Now() | ||
| e2e.Logf(" [%s] cert #%d CN=%q NotBefore=%s NotAfter=%s", | ||
| source, count, cert.Subject.CommonName, | ||
| cert.NotBefore.Format(time.RFC3339), cert.NotAfter.Format(time.RFC3339)) | ||
| o.Expect(cert.NotAfter.After(now)).To(o.BeTrue(), | ||
| "certificate %q from %s has expired (NotAfter=%s)", cert.Subject.CommonName, source, cert.NotAfter) | ||
| o.Expect(cert.NotBefore.Before(now)).To(o.BeTrue(), | ||
| "certificate %q from %s is not yet valid (NotBefore=%s)", cert.Subject.CommonName, source, cert.NotBefore) | ||
| } | ||
| e2e.Logf(" [%s] checked %d certificates", source, count) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Fail when certificate parsing finds no valid certs.
Line 825 only logs parse failures, and Line 838 allows count == 0, so malformed or empty certificate data can pass this validation.
Proposed fix
cert, err := x509.ParseCertificate(block.Bytes)
- if err != nil {
- e2e.Logf(" [%s] failed to parse certificate block: %v", source, err)
- continue
- }
+ o.Expect(err).NotTo(o.HaveOccurred(), "failed to parse certificate block from %s", source)
count++
@@
}
+ o.Expect(count).To(o.BeNumerically(">", 0), "no PEM certificates found in %s", source)
e2e.Logf(" [%s] checked %d certificates", source, count)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cert, err := x509.ParseCertificate(block.Bytes) | |
| if err != nil { | |
| e2e.Logf(" [%s] failed to parse certificate block: %v", source, err) | |
| continue | |
| } | |
| count++ | |
| now := time.Now() | |
| e2e.Logf(" [%s] cert #%d CN=%q NotBefore=%s NotAfter=%s", | |
| source, count, cert.Subject.CommonName, | |
| cert.NotBefore.Format(time.RFC3339), cert.NotAfter.Format(time.RFC3339)) | |
| o.Expect(cert.NotAfter.After(now)).To(o.BeTrue(), | |
| "certificate %q from %s has expired (NotAfter=%s)", cert.Subject.CommonName, source, cert.NotAfter) | |
| o.Expect(cert.NotBefore.Before(now)).To(o.BeTrue(), | |
| "certificate %q from %s is not yet valid (NotBefore=%s)", cert.Subject.CommonName, source, cert.NotBefore) | |
| } | |
| e2e.Logf(" [%s] checked %d certificates", source, count) | |
| } | |
| cert, err := x509.ParseCertificate(block.Bytes) | |
| o.Expect(err).NotTo(o.HaveOccurred(), "failed to parse certificate block from %s", source) | |
| count++ | |
| now := time.Now() | |
| e2e.Logf(" [%s] cert #%d CN=%q NotBefore=%s NotAfter=%s", | |
| source, count, cert.Subject.CommonName, | |
| cert.NotBefore.Format(time.RFC3339), cert.NotAfter.Format(time.RFC3339)) | |
| o.Expect(cert.NotAfter.After(now)).To(o.BeTrue(), | |
| "certificate %q from %s has expired (NotAfter=%s)", cert.Subject.CommonName, source, cert.NotAfter) | |
| o.Expect(cert.NotBefore.Before(now)).To(o.BeTrue(), | |
| "certificate %q from %s is not yet valid (NotBefore=%s)", cert.Subject.CommonName, source, cert.NotBefore) | |
| } | |
| o.Expect(count).To(o.BeNumerically(">", 0), "no PEM certificates found in %s", source) | |
| e2e.Logf(" [%s] checked %d certificates", source, count) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/apiserver/tls.go` around lines 823 - 839, The certificate
validation helper currently only logs parse failures and still succeeds when no
certificates were parsed. Update the logic around x509.ParseCertificate and the
final count check so the routine fails if count remains zero after scanning all
PEM blocks, using the existing e2e.Logf and o.Expect flow in this helper to
report malformed or empty certificate data.
| pool := x509.NewCertPool() | ||
| pool.AppendCertsFromPEM(caData) | ||
|
|
||
| conn, err := tls.Dial("tcp", fmt.Sprintf("%s:%s", fqdn, port), | ||
| &tls.Config{RootCAs: pool, ServerName: fqdn}) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the helper around the cited lines
sed -n '820,900p' test/extended/apiserver/tls.go
# Find the helper signature and its callers
rg -n "func .*getServerCertInfo|getServerCertInfo\(" test/extended/apiserver/tls.go test/extended/apiserver -S
# Check whether this file already uses context-aware dialing elsewhere
rg -n "DialContext|tls\.Dialer|context\.Context|WithTimeout|Deadline" test/extended/apiserver/tls.go test/extended/apiserver -SRepository: openshift/origin
Length of output: 7087
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the call sites with surrounding context
sed -n '470,515p' test/extended/apiserver/tls.goRepository: openshift/origin
Length of output: 2749
Make the TLS probe context-aware. getServerCertInfo ignores the available test context, so a stuck TCP/TLS handshake can outlive the spec timeout. Pass ctx through and use tls.Dialer.DialContext; also check the return from pool.AppendCertsFromPEM so a bad CA bundle fails fast.
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 857-857: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{RootCAs: pool, ServerName: fqdn}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures
(missing-ssl-minversion-go)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/apiserver/tls.go` around lines 854 - 858, Make
getServerCertInfo use the provided test context for the TLS probe so a stalled
handshake can be canceled with the spec timeout. Update the tls.Dial call to use
a tls.Dialer with DialContext and thread ctx through the connection setup, and
make sure the helper’s call site still passes the context. Also verify the
return value from pool.AppendCertsFromPEM in getServerCertInfo and fail fast if
the CA bundle cannot be parsed.
Source: Path instructions
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 296-355: The TLS/CA validation in tls.go is currently only logging
and continuing on decode/read/parse failures, which lets broken inputs pass
unnoticed. In the secret loop and the ConfigMap handling path, replace the
silent Logf-then-continue behavior with assertions that fail the test when
base64 decode, JSON unmarshal, or configmap retrieval fails, and use
parseAndCheckPEMs only after successful validation. Also update the fallback key
scan logic to assert that a matching CA/crt key was actually found before
proceeding, using the existing secret/configmap iteration code and
parseAndCheckPEMs as the main validation entry point.
- Around line 419-423: The kubeconfig/CA extraction in the TLS test still shells
out through bash -c using caCmd, which is brittle and injection-prone. Update
the apiserver TLS helper logic in tls.go to parse the kubeconfig directly in Go
instead of using grep/base64, and replace the later OpenSSL command construction
with explicit exec.CommandContext arguments or crypto/x509-based certificate
handling. Keep the changes localized around the kubeconfig/OpenSSL helper paths
so the test setup no longer depends on shell pipelines.
- Around line 907-922: The condition check in the ClusterOperator polling logic
is incorrectly treating missing requested conditions as success. In the
condition loop that builds on co.Status.Conditions, add an observed-condition
map in the relevant check function and verify every requested key from
conditions is present before returning true; if any requested condition is
absent or mismatched, keep returning false and preserve the existing throttled
logging behavior. Focus the fix around the condition validation path that
currently iterates over co.Status.Conditions and the conditions lookup.
- Around line 971-985: The key-scanning logic in the function that parses `out`
and returns `maxNum` currently falls back to `0, nil` when no matching
encryption key is found, which masks setup failures. Update this function to
detect the no-match case after iterating over `strings.Fields(out)` and return a
non-nil error instead of zero; keep the existing parsing behavior for valid
matches, but make sure callers can distinguish “no key found” from a real key
numbered zero.
- Around line 615-618: Do not ignore the diagnostic output from the two oc get
secret calls in tls.go; instead of assigning both Execute() results to
underscore, capture and log any errors/output so failures are visible when the
wait path is hit. Update the diagnostic helper around
WithoutNamespace().Run("get").Args(...) to surface the results for both the
openshift-apiserver and openshift-kube-apiserver secret lookups, preserving the
existing namespace and label selectors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ff3e2265-e1e3-49bd-b2d9-e784e5b3dcf9
📒 Files selected for processing (2)
test/extended/apiserver/tls.gotest/extended/prometheus/prometheus.go
🚧 Files skipped from review as they are similar to previous changes (1)
- test/extended/prometheus/prometheus.go
56a5da4 to
9f22025
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
test/extended/apiserver/tls.go (1)
443-480: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueOptional: switch the test certs to a short-lived P-256 key.
openssl genrsaplus-days 100000is unnecessary for this temporary secret, and a shorter-lived cert keeps the fixture aligned with current crypto guidance if this path accepts it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 443 - 480, The TLS test certificate generation in the openssl setup still uses long-lived RSA keys and an excessively large validity window; update the certificate fixture in the openssl command sequence to use a short-lived P-256 key/cert path if supported. Keep the changes localized to the certificate creation flow that builds the CA, server key, CSR, and signed server cert, and preserve the existing SAN/subject handling in the same helper.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 630-632: The polling in the TLS spec should stay bound to the
current spec context instead of starting from context.Background(). Update the
polling setup around the context.WithTimeout calls in the test helpers to derive
from the existing ctx so Ginkgo cancellation still stops the waits, while
preserving the explicit timeout; apply the same change in the related poll
blocks referenced in the TLS test flow.
- Around line 374-405: The cleanup in restoreCluster and the deferred patching
around apiserver/cluster currently wipes serving cert configuration by always
restoring namedCertificates to null and deleting a fixed custom-api-cert secret.
Update the test setup to snapshot the original apiserver/cluster
namedCertificates state before modifying it, then restore that exact value in
the deferred patch instead of hardcoding null. Also change the secret handling
to use a per-test secret name derived from this test’s context and only delete
that secret, so existing cluster config and pre-existing secrets are not
touched.
- Around line 601-604: The regexes passed to getEncryptionKeyNumber are too
loose and can match unintended substrings or similarly named secrets; update the
encryption-key pattern matching in tls.go to use anchored numeric-only regexes
so only the exact expected secret names are selected. Apply the same anchored
pattern fix at each call site that uses getEncryptionKeyNumber, including the
other matching blocks referenced in this test file, and keep the change
localized to the regex arguments rather than the helper itself.
---
Nitpick comments:
In `@test/extended/apiserver/tls.go`:
- Around line 443-480: The TLS test certificate generation in the openssl setup
still uses long-lived RSA keys and an excessively large validity window; update
the certificate fixture in the openssl command sequence to use a short-lived
P-256 key/cert path if supported. Keep the changes localized to the certificate
creation flow that builds the CA, server key, CSR, and signed server cert, and
preserve the existing SAN/subject handling in the same helper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1dab033a-d54d-451c-90a5-de3f069fe630
📒 Files selected for processing (1)
test/extended/apiserver/tls.go
| oasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-apiserver-[^ ]*`) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| kasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-kube-apiserver-[^ ]*`) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Anchor encryption-key name matching.
The regexes currently match substrings and arbitrary suffixes. Use anchored numeric patterns so similarly named secrets cannot be selected accidentally.
Suggested fix
- oasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-apiserver-[^ ]*`)
+ oasEncNumberBefore, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-apiserver-[0-9]+$`)
...
- kasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-kube-apiserver-[^ ]*`)
+ kasEncNumberBefore, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-kube-apiserver-[0-9]+$`)
...
- oasEncNumber, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-apiserver-[^ ]*`)
+ oasEncNumber, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-apiserver-[0-9]+$`)
...
- kasEncNumber, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-kube-apiserver-[^ ]*`)
+ kasEncNumber, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-kube-apiserver-[0-9]+$`)As per path instructions, regexes should be anchored.
Also applies to: 826-829, 1057-1065
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/apiserver/tls.go` around lines 601 - 604, The regexes passed to
getEncryptionKeyNumber are too loose and can match unintended substrings or
similarly named secrets; update the encryption-key pattern matching in tls.go to
use anchored numeric-only regexes so only the exact expected secret names are
selected. Apply the same anchored pattern fix at each call site that uses
getEncryptionKeyNumber, including the other matching blocks referenced in this
test file, and keep the change localized to the regex arguments rather than the
helper itself.
Source: Path instructions
9f22025 to
91037b8
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/extended/apiserver/tls.go (1)
374-374: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winUse nanosecond entropy for the temporary secret name.
time.Now().Unix()can collide when disruptive tests are retried or run in parallel within the same second, causingoc create secretfailures or cross-test cleanup interference.Suggested fix
- testSecretName := fmt.Sprintf("custom-api-cert-test-%d", time.Now().Unix()) + testSecretName := fmt.Sprintf("custom-api-cert-test-%d", time.Now().UnixNano())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` at line 374, The temporary secret name in the tls test setup uses second-level timestamps and can collide across parallel or retried runs; update the name generation in the code path that assigns testSecretName to use nanosecond-level entropy instead of time.Now().Unix(). Keep the change local to the temporary secret naming logic so the secret created for oc create secret is unique per test invocation and does not interfere with cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test/extended/apiserver/tls.go`:
- Line 374: The temporary secret name in the tls test setup uses second-level
timestamps and can collide across parallel or retried runs; update the name
generation in the code path that assigns testSecretName to use nanosecond-level
entropy instead of time.Now().Unix(). Keep the change local to the temporary
secret naming logic so the secret created for oc create secret is unique per
test invocation and does not interfere with cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e86831f6-ace6-41b6-a54b-c8af61691e6c
📒 Files selected for processing (1)
test/extended/apiserver/tls.go
Migrate etcd encryption tests to OTE
91037b8 to
5943904
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (8)
test/extended/apiserver/tls.go (8)
913-928: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winFail when certificate parsing finds no valid certs.
ParseCertificatefailures are still only logged, andcount == 0still succeeds. Assert parse success for certificate blocks and assertcount > 0. This was already flagged in the previous review.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 913 - 928, The certificate-checking logic in the helper that iterates parsed PEM blocks still lets all parse failures pass and succeeds when no valid certificates are found. Update the flow around x509.ParseCertificate so parsing failures are asserted instead of only logged, and add an explicit expectation that the certificate counter is greater than zero before returning; use the existing loop and count handling in the tls.go certificate verification helper to locate the fix.
620-622: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winAnchor encryption-key name matching.
These patterns still match substrings and non-numeric suffixes. Use
^encryption-key-openshift-apiserver-[0-9]+$and^encryption-key-openshift-kube-apiserver-[0-9]+$. This was already flagged in the previous review. As per path instructions, regexes should be anchored.Also applies to: 654-655, 834-837
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 620 - 622, The regexes used in getEncryptionKeyNumber matching are still too loose because they can match substrings and non-numeric suffixes; update the encryption-key name patterns in the tls.go test helpers to use anchored numeric-only matches. Specifically, wherever the test calls getEncryptionKeyNumber for openshift-apiserver and openshift-kube-apiserver keys, replace the current patterns with anchored forms like ^encryption-key-openshift-apiserver-[0-9]+$ and ^encryption-key-openshift-kube-apiserver-[0-9]+$ so the matches are exact.Source: Path instructions
311-315: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winFail on malformed TLS secrets instead of skipping them.
A
kubernetes.io/tlssecret withouttls.crtis a broken input for this certificate-validation test, so this should assert rather thanLogfand continue. This is the same validation gap already flagged in the earlier review.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 311 - 315, The TLS secret validation in the certificate test currently skips over malformed kubernetes.io/tls secrets when tls.crt is missing, but this should fail the test instead. Update the logic in the secret-processing path around rawcrt handling to assert or require tls.crt rather than using e2e.Logf and continue, so broken inputs are reported immediately. Keep the existing validation flow in the tls.go test helper, but make malformed TLS secrets a hard failure instead of a skipped case.
420-421: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winAvoid shelling out for kubeconfig copy/rewrite.
originKubeconfigis environment-derived, and thesedreplacement is brittle for base64 payloads. Use Go file APIs to copy and rewrite the kubeconfig instead. This was already flagged in the previous review.Also applies to: 524-528
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 420 - 421, The kubeconfig backup/copy logic in the tls test helpers still shells out via exec.Command("bash", "-c", ...) and relies on brittle text replacement, which should be removed. Update the affected helper code around originKubeconfig handling to use Go file APIs for copying and rewriting the kubeconfig contents directly, and apply the same change in the later duplicate block referenced in the comment. Keep the fix localized to the kubeconfig copy/rewrite flow so the behavior stays the same without using bash, cp, or sed-like replacement.Sources: Path instructions, Linters/SAST tools
416-430: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winHandle cleanup failures and bound cleanup contexts.
These disruptive cleanup paths still discard restore errors and are not using a detached timeout context, so failed cleanup can leave cluster-wide API server/encryption state modified. This was already flagged in the previous review.
Also applies to: 599-605, 636-640, 762-768, 1183-1188
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 416 - 430, Cleanup in the TLS/apiserver test teardown is still ignoring restore failures and using the wrong context lifetime, which can leave cluster state partially modified. Update the deferred cleanup in the apiserver TLS test helpers around the restore logic in the same cleanup block to capture and surface patch/restore errors instead of discarding them, and run the cleanup operations under a detached timeout context rather than the parent test context. Apply the same fix to the other matching cleanup blocks referenced by the same teardown pattern so all restore paths in the TLS test suite handle failures consistently.Sources: Path instructions, Learnings
693-699: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winWait for the OpenShift API server migration too.
The test rotates both components but still waits only for
newKASEncSecretName; the route-prefix assertion can race theopenshift-apiservermigration. This was already flagged in the previous review.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 693 - 699, The migration wait in the TLS test only covers the kube-apiserver secret, so the route-prefix check can run before the openshift-apiserver migration finishes. Update the logic around waitEncryptionKeyMigration in the tls.go test to also wait for the OpenShift API server encryption migration using the corresponding openshift-apiserver secret name before continuing to the assertion.
939-948: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winMake the TLS probe context-aware and validate the CA pool.
getServerCertInfostill ignores the spec context and does not checkAppendCertsFromPEM, so a stalled handshake can outlive cancellation and bad CA data is not failed fast. This was already flagged in the previous review.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 939 - 948, The TLS probe in getServerCertInfo is still not using the spec context and it does not validate the result of AppendCertsFromPEM. Update getServerCertInfo to accept and use the context from the caller when performing the tls.Dial handshake so cancellation can stop a stalled probe, and explicitly check the cert pool append result after reading caPath so invalid CA PEM fails fast with an error.Source: Path instructions
773-825: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winCompare recreated secret UIDs by name, not slice position.
The jsonpath output still drops secret names, so Line 825 assumes stable ordering. Parse names and UIDs together, then compare
metadata.uidkeyed bymetadata.name. This was already flagged in the previous review.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/apiserver/tls.go` around lines 773 - 825, The UID verification in the recreated secret check is still matching by slice position from the jsonpath output, so the comparison can be wrong if the order changes. Update the logic in the secret polling block that uses `uidsOld`, `uidsNew`, and `wait.PollUntilContextTimeout` to fetch and compare each secret’s `metadata.uid` keyed by `metadata.name` instead of relying on array order. Keep the existing secret names `encryption-config-openshift-apiserver` and `encryption-config-openshift-kube-apiserver`, but parse the name/UID pairs and compare them by name before deciding the secrets were recreated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 462-498: The certificate-generation flow in the TLS test helper
still shells out with plain exec.Command and uses RSA-2048 keys, which conflicts
with the cancellation and crypto guidance. Update the openssl invocations in the
certificate setup path to use exec.CommandContext with the existing ctx so child
processes stop on timeout/cancel, and switch the key/certificate generation in
this helper to Ed25519 or ECDSA P-256+ instead of RSA-2048. Keep the changes
localized around the openssl genrsa/req/x509 sequence in the TLS generation
logic so the helper remains consistent with the rest of the certificate setup.
---
Duplicate comments:
In `@test/extended/apiserver/tls.go`:
- Around line 913-928: The certificate-checking logic in the helper that
iterates parsed PEM blocks still lets all parse failures pass and succeeds when
no valid certificates are found. Update the flow around x509.ParseCertificate so
parsing failures are asserted instead of only logged, and add an explicit
expectation that the certificate counter is greater than zero before returning;
use the existing loop and count handling in the tls.go certificate verification
helper to locate the fix.
- Around line 620-622: The regexes used in getEncryptionKeyNumber matching are
still too loose because they can match substrings and non-numeric suffixes;
update the encryption-key name patterns in the tls.go test helpers to use
anchored numeric-only matches. Specifically, wherever the test calls
getEncryptionKeyNumber for openshift-apiserver and openshift-kube-apiserver
keys, replace the current patterns with anchored forms like
^encryption-key-openshift-apiserver-[0-9]+$ and
^encryption-key-openshift-kube-apiserver-[0-9]+$ so the matches are exact.
- Around line 311-315: The TLS secret validation in the certificate test
currently skips over malformed kubernetes.io/tls secrets when tls.crt is
missing, but this should fail the test instead. Update the logic in the
secret-processing path around rawcrt handling to assert or require tls.crt
rather than using e2e.Logf and continue, so broken inputs are reported
immediately. Keep the existing validation flow in the tls.go test helper, but
make malformed TLS secrets a hard failure instead of a skipped case.
- Around line 420-421: The kubeconfig backup/copy logic in the tls test helpers
still shells out via exec.Command("bash", "-c", ...) and relies on brittle text
replacement, which should be removed. Update the affected helper code around
originKubeconfig handling to use Go file APIs for copying and rewriting the
kubeconfig contents directly, and apply the same change in the later duplicate
block referenced in the comment. Keep the fix localized to the kubeconfig
copy/rewrite flow so the behavior stays the same without using bash, cp, or
sed-like replacement.
- Around line 416-430: Cleanup in the TLS/apiserver test teardown is still
ignoring restore failures and using the wrong context lifetime, which can leave
cluster state partially modified. Update the deferred cleanup in the apiserver
TLS test helpers around the restore logic in the same cleanup block to capture
and surface patch/restore errors instead of discarding them, and run the cleanup
operations under a detached timeout context rather than the parent test context.
Apply the same fix to the other matching cleanup blocks referenced by the same
teardown pattern so all restore paths in the TLS test suite handle failures
consistently.
- Around line 693-699: The migration wait in the TLS test only covers the
kube-apiserver secret, so the route-prefix check can run before the
openshift-apiserver migration finishes. Update the logic around
waitEncryptionKeyMigration in the tls.go test to also wait for the OpenShift API
server encryption migration using the corresponding openshift-apiserver secret
name before continuing to the assertion.
- Around line 939-948: The TLS probe in getServerCertInfo is still not using the
spec context and it does not validate the result of AppendCertsFromPEM. Update
getServerCertInfo to accept and use the context from the caller when performing
the tls.Dial handshake so cancellation can stop a stalled probe, and explicitly
check the cert pool append result after reading caPath so invalid CA PEM fails
fast with an error.
- Around line 773-825: The UID verification in the recreated secret check is
still matching by slice position from the jsonpath output, so the comparison can
be wrong if the order changes. Update the logic in the secret polling block that
uses `uidsOld`, `uidsNew`, and `wait.PollUntilContextTimeout` to fetch and
compare each secret’s `metadata.uid` keyed by `metadata.name` instead of relying
on array order. Keep the existing secret names
`encryption-config-openshift-apiserver` and
`encryption-config-openshift-kube-apiserver`, but parse the name/UID pairs and
compare them by name before deciding the secrets were recreated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5bd7a7a2-46e2-40e2-81cd-4d3a78b4c6b3
📒 Files selected for processing (1)
test/extended/apiserver/tls.go
Extract helpers, fix security, prevent panics
f28c1e4 to
dcaf11e
Compare
|
/test verify |
|
@YamunadeviShanmugam: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Results_apiserver_migration.txt
Migrate tests to OTE for the following tests
Note for the audit tests added:
The added audit tests verify server-side validation using --dry-run=server, which exercises the complete API server admission chain including CRD schemas, validating webhooks, CEL rules, and built-in validation logic to catch integration issues like misconfigured webhooks, incorrect CRD deployments, or schema bugs in production by testing the deployed API contract rather than code implementation, providing essential regression protection for multi-component validation logic spread across openshift/api, operators, and admission controllers.
Commands :
TLS
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should have valid TLS certificates for authentication and encryption between API server components [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should support adding a custom TLS certificate for the cluster API [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should force etcd encryption key rotation and verify resources are re-encrypted [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should self-recover when etcd encryption configuration secrets are deleted [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
Prometheus
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have ExtremelyHighIndividualControlPlaneCPU alert rule defined with required annotations [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAPIErrorBudgetBurn alert rule defined with a severity label [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have AuditLogError alert rule defined and referencing audit metrics [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAggregatedAPIErrors alert rule defined with severity label [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAPIDown alert rule defined and not currently firing [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
Audit
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should report the current audit profile as a known valid value [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject an invalid audit profile name in the APIServer configuration [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject a custom audit rule with an empty group name [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject a custom audit rule with an invalid profile name [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should accept all valid audit profile names via server-side dry-run [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should have audit log files present on master nodes [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should write audit log entries in valid JSON format with required fields [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-network][Feature:Network Policy Audit logging] when using openshift ovn-kubernetes should ensure acl logs are created and correct [apigroup:project.openshift.io][apigroup:network.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have AuditLogError alert rule defined and referencing audit metrics [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
Summary by CodeRabbit
kubernetes.io/tlscertificates and CA bundle ConfigMaps.