OCPBUGS-85763: Fix metrics-proxy deployment failure due to dots in volume names#8530
Conversation
The certVolumesFromMonitors function used Secret/ConfigMap resource names directly as Kubernetes volume names. Volume names must conform to RFC 1123 DNS label rules which prohibit dots. When a ServiceMonitor references a ConfigMap named "openshift-service-ca.crt" in its TLS config, the resulting volume name is rejected by the API server. Replace dots with dashes in volume and volumeMount names while preserving the original resource name in ConfigMap/Secret source references and mount paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@muraee: This pull request references Jira Issue OCPBUGS-85763, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThis PR introduces volume name sanitization in the metrics proxy deployment controller to address Kubernetes naming constraints. The change adds a 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: muraee The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@muraee: This pull request references Jira Issue OCPBUGS-85763, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go (1)
125-150:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd collision-safe volume name handling to prevent duplicate
volumes[].nameentries.The current code calls
sanitizeVolumeName()independently on each resource name without checking for collisions. Since distinct names likea.banda-bboth sanitize toa-b, they produce duplicate volume names, causing Deployment validation to fail.Implement the suggested counter-based collision resolution:
Fix with collision detection
@@ var volumes []corev1.Volume var mounts []corev1.VolumeMount + usedVolumeNames := map[string]int{} for _, name := range names { ref := refs[name] - volName := sanitizeVolumeName(name) + baseVolumeName := sanitizeVolumeName(name) + volName := baseVolumeName + if n := usedVolumeNames[baseVolumeName]; n > 0 { + volName = fmt.Sprintf("%s-%d", baseVolumeName, n) + } + usedVolumeNames[baseVolumeName]++ vol := corev1.Volume{ Name: volName, }Also add a unit test with both
a.banda-bresource names to verify collision handling works correctly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go` around lines 125 - 150, The volume name collision occurs because sanitizeVolumeName(name) can produce duplicates (e.g., "a.b" and "a-b"); update the logic around volName creation in the loop that builds volumes/mounts to detect existing vol names (check the volumes slice or maintain a map[string]int), and if a sanitized name already exists append a counter suffix (e.g., "-1", "-2") to produce a unique volName and use that unique name for both the corev1.Volume and the corresponding corev1.VolumeMount (update where volName is assigned and where mounts are appended); also add a unit test that constructs two resources named "a.b" and "a-b" and asserts that the produced volumes have distinct Name values and mounts point to the unique names.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go (1)
312-342: ⚡ Quick winConsider using gomega assertions for consistency with project guidelines, but note this requires file-wide refactoring.
Gomega is available (
v1.39.1), and project guidelines recommend it for unit test assertions. However, the entire file usest.Errorf()style (lines 68, 71, 134, 137, 174, 210, 262, 308, etc.), and converting only the new subtest would create inconsistency. If adopting gomega, refactor the entire file rather than just the new test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go` around lines 312 - 342, The new subtest "When resource names contain dots, it should sanitize volume names but preserve mount paths" currently uses t.Errorf assertions; do not introduce Gomega here (that would make this file inconsistent). Keep the existing t.Errorf-style checks in this subtest (the calls that inspect volumes[0].Name, mounts[0].Name, mounts[0].MountPath, and volumes[0].VolumeSource.ConfigMap.Name produced by newServiceMonitorWithTLS, newCertVolumeTestContext and assertCertVolumeCount), and avoid adding any gomega imports or Expect/Ω calls; if you want Gomega instead, refactor the entire file consistently rather than changing only this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go`:
- Around line 125-150: The volume name collision occurs because
sanitizeVolumeName(name) can produce duplicates (e.g., "a.b" and "a-b"); update
the logic around volName creation in the loop that builds volumes/mounts to
detect existing vol names (check the volumes slice or maintain a
map[string]int), and if a sanitized name already exists append a counter suffix
(e.g., "-1", "-2") to produce a unique volName and use that unique name for both
the corev1.Volume and the corresponding corev1.VolumeMount (update where volName
is assigned and where mounts are appended); also add a unit test that constructs
two resources named "a.b" and "a-b" and asserts that the produced volumes have
distinct Name values and mounts point to the unique names.
---
Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go`:
- Around line 312-342: The new subtest "When resource names contain dots, it
should sanitize volume names but preserve mount paths" currently uses t.Errorf
assertions; do not introduce Gomega here (that would make this file
inconsistent). Keep the existing t.Errorf-style checks in this subtest (the
calls that inspect volumes[0].Name, mounts[0].Name, mounts[0].MountPath, and
volumes[0].VolumeSource.ConfigMap.Name produced by newServiceMonitorWithTLS,
newCertVolumeTestContext and assertCertVolumeCount), and avoid adding any gomega
imports or Expect/Ω calls; if you want Gomega instead, refactor the entire file
consistently rather than changing only this test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 81c7a0e8-2fc1-4cd7-841e-cc9604a59bc4
📒 Files selected for processing (2)
control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go
|
/jira refresh |
|
@muraee: This pull request references Jira Issue OCPBUGS-85763, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8530 +/- ##
=======================================
Coverage 40.07% 40.07%
=======================================
Files 751 751
Lines 92863 92866 +3
=======================================
+ Hits 37215 37218 +3
Misses 52956 52956
Partials 2692 2692
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/lgtm |
|
Scheduling tests matching the |
|
/verified by unit-test |
|
@muraee: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Test Resultse2e-aws
e2e-aks
|
|
/retest |
|
@muraee: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@muraee: Jira Issue Verification Checks: Jira Issue OCPBUGS-85763 Jira Issue OCPBUGS-85763 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.22 |
|
@joshbranham: new pull request created: #8534 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
Details
The
certVolumesFromMonitorsfunction in the metrics-proxy component lists all ServiceMonitors and PodMonitors in the HCP namespace and collects their TLS certificate references to create volumes. It used the resource names directly as Kubernetes volume names, which fails validation when a resource name contains dots (e.g.,openshift-service-ca.crt).On a 4.22 ROSA HCP cluster with metrics forwarding enabled, the
audit-webhookServiceMonitor references a ConfigMap namedopenshift-service-ca.crtin its TLS CA config, causing the metrics-proxy Deployment to be rejected by the API server with:Fixes https://issues.redhat.com/browse/OCPBUGS-85763
Test plan
hypershift.openshift.io/enable-metrics-forwarding: "true"annotation — metrics-proxy deployment should be created successfully🤖 Generated with Claude Code
Summary by CodeRabbit