Skip to content

OCPBUGS-85763: Fix metrics-proxy deployment failure due to dots in volume names#8530

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
muraee:fix-metrics-proxy-volume-name-dots
May 15, 2026
Merged

OCPBUGS-85763: Fix metrics-proxy deployment failure due to dots in volume names#8530
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
muraee:fix-metrics-proxy-volume-name-dots

Conversation

@muraee
Copy link
Copy Markdown
Contributor

@muraee muraee commented May 15, 2026

Summary

  • Sanitize Kubernetes volume names derived from Secret/ConfigMap resource names by replacing dots with dashes to comply with RFC 1123 DNS label rules
  • Preserve the original resource name in ConfigMap/Secret source references and mount paths so file path resolution in scrape configs remains correct
  • Add test coverage for the dot-sanitization behavior

Details

The certVolumesFromMonitors function in the metrics-proxy component lists all ServiceMonitors and PodMonitors in the HCP namespace and collects their TLS certificate references to create volumes. It used the resource names directly as Kubernetes volume names, which fails validation when a resource name contains dots (e.g., openshift-service-ca.crt).

On a 4.22 ROSA HCP cluster with metrics forwarding enabled, the audit-webhook ServiceMonitor references a ConfigMap named openshift-service-ca.crt in its TLS CA config, causing the metrics-proxy Deployment to be rejected by the API server with:

Deployment.apps "metrics-proxy" is invalid: spec.template.spec.volumes[8].name: Invalid value: "openshift-service-ca.crt": must not contain dots

Fixes https://issues.redhat.com/browse/OCPBUGS-85763

Test plan

  • Existing unit tests pass (no regressions in volume deduplication, optional flags, mount paths)
  • New test case verifies dots are replaced with dashes in volume/mount names while preserving original names in ConfigMap source and mount paths
  • Manual verification on a 4.22 ROSA HCP cluster with hypershift.openshift.io/enable-metrics-forwarding: "true" annotation — metrics-proxy deployment should be created successfully

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Fixed metrics proxy deployment to properly handle resource names containing dots by sanitizing Kubernetes volume names, ensuring more reliable configuration handling.

The certVolumesFromMonitors function used Secret/ConfigMap resource
names directly as Kubernetes volume names. Volume names must conform
to RFC 1123 DNS label rules which prohibit dots. When a ServiceMonitor
references a ConfigMap named "openshift-service-ca.crt" in its TLS
config, the resulting volume name is rejected by the API server.

Replace dots with dashes in volume and volumeMount names while
preserving the original resource name in ConfigMap/Secret source
references and mount paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@muraee: This pull request references Jira Issue OCPBUGS-85763, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Sanitize Kubernetes volume names derived from Secret/ConfigMap resource names by replacing dots with dashes to comply with RFC 1123 DNS label rules
  • Preserve the original resource name in ConfigMap/Secret source references and mount paths so file path resolution in scrape configs remains correct
  • Add test coverage for the dot-sanitization behavior

Details

The certVolumesFromMonitors function in the metrics-proxy component lists all ServiceMonitors and PodMonitors in the HCP namespace and collects their TLS certificate references to create volumes. It used the resource names directly as Kubernetes volume names, which fails validation when a resource name contains dots (e.g., openshift-service-ca.crt).

On a 4.22 ROSA HCP cluster with metrics forwarding enabled, the audit-webhook ServiceMonitor references a ConfigMap named openshift-service-ca.crt in its TLS CA config, causing the metrics-proxy Deployment to be rejected by the API server with:

Deployment.apps "metrics-proxy" is invalid: spec.template.spec.volumes[8].name: Invalid value: "openshift-service-ca.crt": must not contain dots

Fixes https://issues.redhat.com/browse/OCPBUGS-85763

Test plan

  • Existing unit tests pass (no regressions in volume deduplication, optional flags, mount paths)
  • New test case verifies dots are replaced with dashes in volume/mount names while preserving original names in ConfigMap source and mount paths
  • Manual verification on a 4.22 ROSA HCP cluster with hypershift.openshift.io/enable-metrics-forwarding: "true" annotation — metrics-proxy deployment should be created successfully

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR introduces volume name sanitization in the metrics proxy deployment controller to address Kubernetes naming constraints. The change adds a sanitizeVolumeName helper that replaces dots with dashes in volume names derived from Secret/ConfigMap resource names, since dots are invalid in Kubernetes volume identifiers. The sanitized name is used only for the Kubernetes Volume and VolumeMount Name fields, while original resource names are preserved for SecretName/ConfigMap references and mount path subdirectories. A corresponding test case validates that sanitization occurs correctly when resource names contain dots.

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive Custom check requires reviewing Ginkgo test code, but the PR contains standard Go tests (t.Run subtests). The check specification is not applicable to this test file. Check should clarify scope: review only Ginkgo tests or standard Go tests. Standard Go tests here meet all quality criteria.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing a metrics-proxy deployment failure caused by dots in volume names, which aligns perfectly with the changeset's focus on sanitizing Kubernetes volume names.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names are stable and deterministic. The new test uses static strings with no dynamic values or generated identifiers. All test names follow BDD patterns and clearly express what they validate.
Microshift Test Compatibility ✅ Passed This PR adds only standard Go unit tests (not Ginkgo e2e tests). The tests use fake clients and internal helpers, not OpenShift APIs. The custom check for Ginkgo e2e tests is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds only standard Go unit tests, not Ginkgo e2e tests. The check applies only to Ginkgo e2e tests, so it is not applicable to this PR.
Topology-Aware Scheduling Compatibility ✅ Passed Changes are limited to volume naming sanitization (dots to dashes for DNS compliance). No scheduling constraints added across any topology.
Ote Binary Stdout Contract ✅ Passed Controller library code, not OTE binaries. No main(), TestMain(), or suite setup. No stdout writes at process level. fmt.Errorf used only in error returns; testing.T methods only in test blocks.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR does not add any Ginkgo e2e tests. The changes only include unit tests that use standard Go testing, with no IPv4 assumptions or external connectivity requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from clebs and sjenning May 15, 2026 15:17
@openshift-ci openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label May 15, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels May 15, 2026
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented May 15, 2026

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

@muraee: This pull request references Jira Issue OCPBUGS-85763, which is invalid:

  • expected the bug to target only the "5.0.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go (1)

125-150: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add collision-safe volume name handling to prevent duplicate volumes[].name entries.

The current code calls sanitizeVolumeName() independently on each resource name without checking for collisions. Since distinct names like a.b and a-b both sanitize to a-b, they produce duplicate volume names, causing Deployment validation to fail.

Implement the suggested counter-based collision resolution:

Fix with collision detection
@@
 	var volumes []corev1.Volume
 	var mounts []corev1.VolumeMount
+	usedVolumeNames := map[string]int{}
 
 	for _, name := range names {
 		ref := refs[name]
-		volName := sanitizeVolumeName(name)
+		baseVolumeName := sanitizeVolumeName(name)
+		volName := baseVolumeName
+		if n := usedVolumeNames[baseVolumeName]; n > 0 {
+			volName = fmt.Sprintf("%s-%d", baseVolumeName, n)
+		}
+		usedVolumeNames[baseVolumeName]++
 		vol := corev1.Volume{
 			Name: volName,
 		}

Also add a unit test with both a.b and a-b resource names to verify collision handling works correctly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go`
around lines 125 - 150, The volume name collision occurs because
sanitizeVolumeName(name) can produce duplicates (e.g., "a.b" and "a-b"); update
the logic around volName creation in the loop that builds volumes/mounts to
detect existing vol names (check the volumes slice or maintain a
map[string]int), and if a sanitized name already exists append a counter suffix
(e.g., "-1", "-2") to produce a unique volName and use that unique name for both
the corev1.Volume and the corresponding corev1.VolumeMount (update where volName
is assigned and where mounts are appended); also add a unit test that constructs
two resources named "a.b" and "a-b" and asserts that the produced volumes have
distinct Name values and mounts point to the unique names.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go (1)

312-342: ⚡ Quick win

Consider using gomega assertions for consistency with project guidelines, but note this requires file-wide refactoring.

Gomega is available (v1.39.1), and project guidelines recommend it for unit test assertions. However, the entire file uses t.Errorf() style (lines 68, 71, 134, 137, 174, 210, 262, 308, etc.), and converting only the new subtest would create inconsistency. If adopting gomega, refactor the entire file rather than just the new test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go`
around lines 312 - 342, The new subtest "When resource names contain dots, it
should sanitize volume names but preserve mount paths" currently uses t.Errorf
assertions; do not introduce Gomega here (that would make this file
inconsistent). Keep the existing t.Errorf-style checks in this subtest (the
calls that inspect volumes[0].Name, mounts[0].Name, mounts[0].MountPath, and
volumes[0].VolumeSource.ConfigMap.Name produced by newServiceMonitorWithTLS,
newCertVolumeTestContext and assertCertVolumeCount), and avoid adding any gomega
imports or Expect/Ω calls; if you want Gomega instead, refactor the entire file
consistently rather than changing only this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go`:
- Around line 125-150: The volume name collision occurs because
sanitizeVolumeName(name) can produce duplicates (e.g., "a.b" and "a-b"); update
the logic around volName creation in the loop that builds volumes/mounts to
detect existing vol names (check the volumes slice or maintain a
map[string]int), and if a sanitized name already exists append a counter suffix
(e.g., "-1", "-2") to produce a unique volName and use that unique name for both
the corev1.Volume and the corresponding corev1.VolumeMount (update where volName
is assigned and where mounts are appended); also add a unit test that constructs
two resources named "a.b" and "a-b" and asserts that the produced volumes have
distinct Name values and mounts point to the unique names.

---

Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go`:
- Around line 312-342: The new subtest "When resource names contain dots, it
should sanitize volume names but preserve mount paths" currently uses t.Errorf
assertions; do not introduce Gomega here (that would make this file
inconsistent). Keep the existing t.Errorf-style checks in this subtest (the
calls that inspect volumes[0].Name, mounts[0].Name, mounts[0].MountPath, and
volumes[0].VolumeSource.ConfigMap.Name produced by newServiceMonitorWithTLS,
newCertVolumeTestContext and assertCertVolumeCount), and avoid adding any gomega
imports or Expect/Ω calls; if you want Gomega instead, refactor the entire file
consistently rather than changing only this test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 81c7a0e8-2fc1-4cd7-841e-cc9604a59bc4

📥 Commits

Reviewing files that changed from the base of the PR and between 10fd799 and d5f9198.

📒 Files selected for processing (2)
  • control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/metrics_proxy/deployment_test.go

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented May 15, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@muraee: This pull request references Jira Issue OCPBUGS-85763, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.07%. Comparing base (10fd799) to head (d5f9198).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8530   +/-   ##
=======================================
  Coverage   40.07%   40.07%           
=======================================
  Files         751      751           
  Lines       92863    92866    +3     
=======================================
+ Hits        37215    37218    +3     
  Misses      52956    52956           
  Partials     2692     2692           
Files with missing lines Coverage Δ
.../hostedcontrolplane/v2/metrics_proxy/deployment.go 72.18% <100.00%> (+0.64%) ⬆️
Flag Coverage Δ
cmd-support 34.31% <ø> (ø)
cpo-hostedcontrolplane 40.57% <100.00%> (+0.01%) ⬆️
cpo-other 40.14% <ø> (ø)
hypershift-operator 50.52% <ø> (ø)
other 31.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joshbranham
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented May 15, 2026

/verified by unit-test

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@muraee: This PR has been marked as verified by unit-test.

Details

In response to this:

/verified by unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 15, 2026

Test Results

e2e-aws

e2e-aks

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 10fd799 and 2 for PR HEAD d5f9198 in total

@joshbranham
Copy link
Copy Markdown
Contributor

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 7111d4d into openshift:main May 15, 2026
42 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@muraee: Jira Issue Verification Checks: Jira Issue OCPBUGS-85763
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-85763 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Summary

  • Sanitize Kubernetes volume names derived from Secret/ConfigMap resource names by replacing dots with dashes to comply with RFC 1123 DNS label rules
  • Preserve the original resource name in ConfigMap/Secret source references and mount paths so file path resolution in scrape configs remains correct
  • Add test coverage for the dot-sanitization behavior

Details

The certVolumesFromMonitors function in the metrics-proxy component lists all ServiceMonitors and PodMonitors in the HCP namespace and collects their TLS certificate references to create volumes. It used the resource names directly as Kubernetes volume names, which fails validation when a resource name contains dots (e.g., openshift-service-ca.crt).

On a 4.22 ROSA HCP cluster with metrics forwarding enabled, the audit-webhook ServiceMonitor references a ConfigMap named openshift-service-ca.crt in its TLS CA config, causing the metrics-proxy Deployment to be rejected by the API server with:

Deployment.apps "metrics-proxy" is invalid: spec.template.spec.volumes[8].name: Invalid value: "openshift-service-ca.crt": must not contain dots

Fixes https://issues.redhat.com/browse/OCPBUGS-85763

Test plan

  • Existing unit tests pass (no regressions in volume deduplication, optional flags, mount paths)
  • New test case verifies dots are replaced with dashes in volume/mount names while preserving original names in ConfigMap source and mount paths
  • Manual verification on a 4.22 ROSA HCP cluster with hypershift.openshift.io/enable-metrics-forwarding: "true" annotation — metrics-proxy deployment should be created successfully

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
  • Fixed metrics proxy deployment to properly handle resource names containing dots by sanitizing Kubernetes volume names, ensuring more reliable configuration handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@joshbranham
Copy link
Copy Markdown
Contributor

/cherry-pick release-4.22

@openshift-cherrypick-robot
Copy link
Copy Markdown

@joshbranham: new pull request created: #8534

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants