Skip to content

fix(helm): preserve STS serviceName + networkPolicy.egress back-compat#4569

Merged
waleedlatif1 merged 6 commits into
stagingfrom
waleedlatif1/yeosu-v3
May 12, 2026
Merged

fix(helm): preserve STS serviceName + networkPolicy.egress back-compat#4569
waleedlatif1 merged 6 commits into
stagingfrom
waleedlatif1/yeosu-v3

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Revert postgres StatefulSet spec.serviceName to <name>-postgresql — the field is immutable, so the rename to -headless would break helm upgrade on every existing install. Headless Service is still rendered alongside.
  • Restore networkPolicy.egress as a list of custom egress rules (back-compat). Cloud-metadata blocking moves to a sibling field networkPolicy.egressExceptCidrs. This prevents the silent-drop of user-supplied egress rules when upgrading.
  • Add upgrade notes to NOTES.txt covering the above plus the ESO apiVersion default flip (v1 → v1beta1) from the previous PR.

Type of Change

Testing

helm lint . --strict clean; helm unittest . 51/51 pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Greptile flagged two real upgrade-breaking changes vs the prior chart:

1. statefulset-postgresql spec.serviceName flipped from <name>-postgresql
   to <name>-postgresql-headless. spec.serviceName is immutable, so any
   existing install would hit 'Forbidden: updates to statefulset spec ...'
   on helm upgrade. Revert to the original name (the headless Service in
   services.yaml is added alongside, not as a swap).

2. networkPolicy.egress changed from a list to a map ({extraRules, exceptCidrs}),
   silently dropping any custom egress list set by existing users. Restore
   the original list semantics for networkPolicy.egress and move cloud-metadata
   blocking to a sibling top-level field networkPolicy.egressExceptCidrs.

Adds NOTES.txt upgrade-notes entry covering both + the ESO v1→v1beta1 default
flip (functionally a no-op, but worth surfacing).
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 12, 2026 9:18pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 12, 2026

PR Summary

Medium Risk
Touches Helm-rendered StatefulSet and NetworkPolicy templates/values; mistakes could break upgrades or unexpectedly restrict/permit egress, though the intent is explicitly back-compat and is covered by updated unit tests.

Overview
Prevents upgrade failures by restoring PostgreSQL (and copilot Postgres) StatefulSet.spec.serviceName to the previously shipped non-headless Service name and documenting the immutability constraint.

Restores backward compatibility for network policy customization by keeping networkPolicy.egress as a list of user-supplied egress rules and moving cloud-metadata blocking to networkPolicy.egressExceptCidrs; updates the template accordingly and adds a dedicated telemetry collector NetworkPolicy when telemetry.enabled=true.

Adds startupProbe support for both Postgres StatefulSets and updates chart docs/NOTES.txt and Helm unit tests to reflect the new/renamed values and upgrade notes.

Reviewed by Cursor Bugbot for commit 32f8ca4. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR fixes two upgrade-path regressions introduced by #4565: it reverts spec.serviceName on both PostgreSQL StatefulSets to their original (immutable) values to prevent helm upgrade failures on existing installs, and restores networkPolicy.egress as a flat list of custom rules while moving CIDR-exclusion config to the new sibling field networkPolicy.egressExceptCidrs.

  • StatefulSet serviceName revertspec.serviceName is immutable in Kubernetes; both statefulset-postgresql.yaml and statefulset-copilot-postgres.yaml are restored to <name>-postgresql / <name>-copilot-postgresql with an explanatory comment.
  • NetworkPolicy egress back-compatnetworkPolicy.egress is a flat list again (matching pre-improvement(helm): helm chart updates with security, ESO, and docs overhaul #4565 shape), preventing silent drops of user-supplied egress rules; cloud-metadata CIDR blocking moves to networkPolicy.egressExceptCidrs.
  • OTel Collector NetworkPolicy + startupProbe defaults — a new NetworkPolicy for the telemetry collector is added alongside startupProbe defaults for both Postgres instances to handle slow first-boot scenarios.

Confidence Score: 5/5

Safe to merge — all changes are targeted rollbacks of immutable-field renames and value-schema restructuring that would have broken existing installs.

The StatefulSet spec.serviceName revert is a straightforward correctness fix for an immutable Kubernetes field; no data or state is affected. The networkPolicy.egress shape change restores backward compatibility and is covered by the updated unit tests (51/51 pass). The only open item is a documentation gap in NOTES.txt for users who briefly ran the intermediate #4565 chart with customized egress.exceptCidrs — a narrow window with minimal real-world exposure given that #4565 targeted staging.

helm/sim/templates/NOTES.txt — the upgrade note could more explicitly guide users who customised networkPolicy.egress.exceptCidrs or networkPolicy.egress.extraRules on the short-lived intermediate chart.

Important Files Changed

Filename Overview
helm/sim/templates/networkpolicy.yaml Fixes egress field paths (extraRules→egress, exceptCidrs→egressExceptCidrs) and adds an OTel Collector NetworkPolicy; logic is correct and matches updated values schema.
helm/sim/templates/statefulset-postgresql.yaml Reverts spec.serviceName to the original immutable value and adds conditional startupProbe support; fix is correct.
helm/sim/templates/statefulset-copilot-postgres.yaml Same serviceName immutability fix as the main postgresql StatefulSet, plus startupProbe support.
helm/sim/values.yaml Restructures networkPolicy egress to a flat list with sibling egressExceptCidrs; adds startupProbe defaults for both postgresql instances.
helm/sim/tests/networkpolicy_test.yaml Replaces the combined extraRules test with split app/realtime assertions and adds the new telemetry collector render test; coverage is now adequate.
helm/sim/templates/NOTES.txt Adds upgrade notes for ESO apiVersion flip and egress restructure, but misses explicit migration guidance for the egress.exceptCidrs/extraRules rename from the intermediate #4565 chart.
helm/sim/README.md Updates production checklist to reference egressExceptCidrs and clarify that egress is now a list.
helm/sim/.claude/skills/sim-helm/references/values-model.md Minor documentation consolidation; no functional impact.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    UP[helm upgrade] --> STS_CHECK{spec.serviceName\nimmutable check}
    STS_CHECK -- "was -headless\n(broke upgrade)" --> FAIL[❌ Forbidden error]
    STS_CHECK -- "reverted to -postgresql\n(this PR)" --> OK[✅ Upgrade succeeds]

    subgraph networkPolicy.egress shape
        OLD_MAP["#4565: egress:\n  exceptCidrs: [...]\n  extraRules: [...]"] -- "silent drop\nof user rules" --> BROKEN[❌ user egress lost]
        NEW_FLAT["This PR: egress: []\negressExceptCidrs: [...]"] --> COMPAT[✅ back-compat\nwith pre-1.0 shape]
    end
Loading

Reviews (2): Last reviewed commit: "docs(helm-skill): trim narrative bloat i..." | Re-trigger Greptile

…bility issue)

Audit caught that the main fix in d5c2e8e missed statefulset-copilot-postgres.yaml,
which had the identical immutable-field rename from -copilot-postgresql to
-copilot-postgresql-headless. Same upgrade-break vector for anyone running
copilot.enabled=true on a prior chart version. Mirrors the fix and comment
from the main postgresql STS.
- add startupProbe defaults for both postgresql + copilot-postgresql STSs
  to shield liveness from slow first-boot (pgvector init, WAL replay)
- render a dedicated NetworkPolicy for the otel-collector when
  telemetry.enabled=true (OTLP ingress from app/realtime/copilot, DNS +
  HTTPS egress for forwarding to external observability backends)
- document why copilot + copilot-postgresql intentionally do NOT ship
  dedicated NetworkPolicies (Redis URL is unknowable at render time)
- regression test pins the otel-collector NP at documentIndex 3
The prior test claimed coverage of both app and realtime NPs but only
asserted documentIndex 0. Split into two tests so a regression that drops
custom egress from realtime would fail loudly.
Cut the historical 'Layer 2 was added in chart 1.0.0' note and the
generic 'single source of truth' framing. Kept the two actionable
points: ESO requires mapping Layer 1 keys; app.env overrides
envDefaults.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 32f8ca4. Configure here.

@waleedlatif1 waleedlatif1 merged commit d1eb79e into staging May 12, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/yeosu-v3 branch May 12, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant