Skip to content

fix(snd): propagate Spec.Peers from SND template to existing child SeiNodes#371

Merged
bdchatham merged 1 commit into
mainfrom
fix/snd-peers-propagation
May 30, 2026
Merged

fix(snd): propagate Spec.Peers from SND template to existing child SeiNodes#371
bdchatham merged 1 commit into
mainfrom
fix/snd-peers-propagation

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented May 29, 2026

Summary

ensureSeiNode only propagated Spec.Peers from SND.spec.template.spec on the create-path (via generateSeiNode's deep copy). The update-path used an allow-list of fields to diff and re-apply — Labels, Annotations, Spec.Image, Spec.Sidecar, Spec.PodLabels, Spec.ExternalAddress — and Spec.Peers was silently missing.

This contradicted the existing templateHash godoc in labels.go:85-97, which explicitly states peers "propagate in-place via ensureSeiNode" and so are excluded from the hash. Adding a peer source to a live SND (e.g. platform#759 adding a LabelPeerSource to existing Harbor syncers) appeared to work at the SND CR but never reached the child SeiNodes — their spec.peers stayed frozen at child-creation time, the resolver kept re-emitting the stale list, and Status.ResolvedPeers never picked up new sources.

Fix

Add a Spec.Peers diff/update branch between PodLabels and ExternalAddress in ensureSeiNode, using equality.Semantic.DeepEqual (so etcd nil/empty round-trips don't churn the diff):

// Peers are intentionally in-place propagatable (templateHash godoc
// in labels.go). Semantic.DeepEqual treats nil and empty maps/slices
// as equal — etcd round-trips normalize empties to nil, so reflect
// would spuriously diff after the first reconcile post-restart.
if !equality.Semantic.DeepEqual(existing.Spec.Peers, desired.Spec.Peers) {
    existing.Spec.Peers = desired.Spec.Peers
    updated = true
}

Coral cross-review

Both kubernetes-specialist and platform-engineer voted GO independently. Validated load-bearing concern: does this fix open a deployment-trigger feedback loop via live resolver fluctuations?

  • detectDeploymentNeeded (nodes.go:108) gates on templateHash, which hashes only ChainID + Image + Sidecar.Image. Peers are excluded by design.
  • childPhaseChangedPredicate (predicates.go:18-38) filters out status-only updates — pure Status.ResolvedPeers churn on a child does NOT enqueue the SND.
  • The Update() only fires when SND.spec.peers != child.spec.peers — i.e., a human/Flux edit, not a resolver fluctuation.

No loop. Fix completes the contract the templateHash godoc already promised.

Test coverage

Unit (nodes_test.go):

  • TestEnsureSeiNode_PropagatesPeersOnUpdate — adds a LabelPeerSource to a live SND, asserts the child carries 2 entries.
  • TestEnsureSeiNode_PropagatesPeerRemoval — shrinks the source list, asserts the child shrinks too.
  • TestEnsureSeiNode_PeersNoOpWhenUnchanged — identical template across reconciles must not bump resourceVersion.

Envtest (envtest/peers_propagation_test.go):

  • TestPeersPropagation_AddSourceOnLiveSND — live patch against a real apiserver.
  • TestPeersPropagation_RemoveSourceOnLiveSND — shrink case end-to-end.

Fixture: fixtures.WithPeers(...) added to envtest fixtures package.

Genesis interaction (analyzed, not a concern)

task/genesis_peers.go:92 writes child.Spec.Peers = [{Static{cohort}}] during the genesis ceremony. After this fix, the post-ceremony SND reconcile would revert that to the SND template's peers value.

Not a concern because:

  1. ensureSeiNode is gated by !ConditionPlanInProgress, so my new branch doesn't fire during the active genesis plan.
  2. The static cohort was bootstrap scaffolding, not a contract. Once the plan completes, the chain is live and CometBFT's runtime peer machinery (PEX, addrbook) handles membership — the template's normal peer sources take over for ongoing discovery.

The structural genesis pieces (the plan orchestration, S3 artifact distribution, per-validator genesis.json) are independent of Spec.Peers and unaffected by this fix.

Test plan

  • Unit + envtest pass on CI.
  • After merge + image bump + Flux sync on harbor: pacific-1/syncer-0-0 spec.peers shows all three sources (2 ec2Tags + 1 label); Status.ResolvedPeers populates with composed entries.

🤖 Generated with Claude Code

…iNodes

ensureSeiNode previously only deep-copied Spec.Peers on the create-path
(via generateSeiNode). The update-path used an allow-list of fields to
diff and re-apply — Labels, Annotations, Spec.Image, Spec.Sidecar,
Spec.PodLabels, Spec.ExternalAddress — and Peers was silently missing.

This contradicted the templateHash godoc in labels.go, which explicitly
states peers "propagate in-place via ensureSeiNode" and so are excluded
from the hash. Adding a peer source to a live SND (e.g. platform#759
adding a LabelPeerSource to existing Harbor syncers) appeared to work
at the SND CR level but never reached the child SeiNodes — their
spec.peers stayed frozen at child-creation time, the resolver kept
re-emitting the stale list, and Status.ResolvedPeers never picked up
new sources.

Fix: add a Spec.Peers diff/update branch between PodLabels and
ExternalAddress, using equality.Semantic.DeepEqual so etcd nil/empty
round-trips don't churn the diff.

Coral cross-review (kubernetes-specialist + platform-engineer) confirmed
the fix doesn't open a deployment-trigger feedback loop:
- detectDeploymentNeeded gates on templateHash, which excludes peers.
- childPhaseChangedPredicate filters out status-only updates, so live
  Status.ResolvedPeers churn doesn't enqueue the SND.
- The Update() only fires when SND.spec.peers actually differs from the
  child's spec.peers — a human/Flux edit, not a resolver fluctuation.

Tests:
- Unit (nodes_test.go): PropagatesPeersOnUpdate, PropagatesPeerRemoval,
  PeersNoOpWhenUnchanged.
- Envtest (peers_propagation_test.go): AddSourceOnLiveSND,
  RemoveSourceOnLiveSND.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 29, 2026

PR Summary

Medium Risk
Changes node reconciliation and peer discovery inputs for live clusters; behavior is scoped and tested but affects how children converge after SND edits.

Overview
ensureSeiNode now syncs Spec.Peers from the SND template to existing child SeiNodes on the update path, using equality.Semantic.DeepEqual so nil/empty slice differences from etcd do not cause pointless updates. That closes a gap where peers were only applied on create, so live template edits (add/remove peer sources) never reached running children.

Coverage adds unit tests for grow, shrink, and no-op reconciles, envtest tests that patch a live SND, and fixtures.WithPeers for SND builders.

Reviewed by Cursor Bugbot for commit d2cc75a. Bugbot is set up for automated code reviews on this repo. Configure here.

@bdchatham bdchatham merged commit bc181db into main May 30, 2026
5 checks passed
@bdchatham bdchatham deleted the fix/snd-peers-propagation branch May 30, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant