Skip to content

fix: remove duplicate metrics service, correct certificate DNS name, and add tolerations docs#44

Merged
ian-flores merged 3 commits intomainfrom
fix-migration-safety
Jan 21, 2026
Merged

fix: remove duplicate metrics service, correct certificate DNS name, and add tolerations docs#44
ian-flores merged 3 commits intomainfrom
fix-migration-safety

Conversation

@ian-flores
Copy link
Collaborator

@ian-flores ian-flores commented Jan 20, 2026

Summary

  • Remove dist/chart/templates/rbac/auth_proxy_service.yaml which duplicated the metrics service
  • Fix incorrect service name in certificate DNS names (team-operator-metrics-serviceteam-operator-controller-manager-metrics-service)
  • Add troubleshooting documentation for pod scheduling failures due to node taints

Why

Two templates were creating the same team-operator-controller-manager-metrics-service, causing Helm conflicts during installation. This was a contributing factor in migration failures.

The certificate DNS name fix ensures TLS validation works correctly when Prometheus scrapes metrics over HTTPS with cert-manager enabled.

The tolerations documentation helps operators diagnose and resolve scheduling failures on clusters with tainted nodes.

Related

Bean: ptd-fvuq

- Remove auth_proxy_service.yaml which duplicated metrics-service.yaml
- Add troubleshooting docs for pod scheduling failures due to taints
@claude
Copy link

claude bot commented Jan 20, 2026

Claude finished @ian-flores's task —— View job


PR Review

  • Read review guidelines
  • Get PR diff
  • Review deleted file
  • Create pending review
  • Review documentation changes
  • Submit review

Review submitted. The PR correctly removes a duplicate metrics service and adds well-structured troubleshooting documentation for tolerations that aligns with the existing Helm chart configuration.

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Deleted file (dist/chart/templates/rbac/auth_proxy_service.yaml): ✅ Correct removal

Verified that this was indeed a duplicate of dist/chart/templates/metrics/metrics-service.yaml. Both created team-operator-controller-manager-metrics-service with identical specs. The only difference was the conditional guard (.Values.rbac.enable vs .Values.metrics.enable). Removing the duplicate resolves the Helm conflict.

Documentation (docs/guides/troubleshooting.md): ✅ Well-structured

The new section follows the existing troubleshooting pattern with clear:

  • Symptoms
  • Diagnosis commands
  • Cause explanation
  • Solution with examples
  • Common patterns table

Verified that the Helm values (values.yaml lines 52-67) and template (manager.yaml lines 68-75) support the documented tolerations and nodeSelector configuration.

Minor observation: In dist/chart/templates/certmanager/certificate.yaml:54, the service name is team-operator-metrics-service while elsewhere it's team-operator-controller-manager-metrics-service. Unrelated to this PR but might be worth checking separately.

Replace hardcoded service name with .Values.controllerManager.serviceAccountName
@ian-flores ian-flores marked this pull request as ready for review January 20, 2026 22:45
The certificate referenced 'team-operator-metrics-service' but the actual
service is named 'team-operator-controller-manager-metrics-service'.
This mismatch would cause TLS validation failures for metrics scraping.
@ian-flores ian-flores changed the title fix: remove duplicate metrics service and add tolerations docs fix: remove duplicate metrics service, correct certificate DNS name, and add tolerations docs Jan 20, 2026
Copy link
Collaborator

@stevenolen stevenolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ian-flores ian-flores merged commit 0361fc1 into main Jan 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments