Skip to content

feat: replayer result export, sei-config consolidation, and API cleanup#9

Merged
bdchatham merged 3 commits into
mainfrom
feat/controller-quality-cleanup
Mar 17, 2026
Merged

feat: replayer result export, sei-config consolidation, and API cleanup#9
bdchatham merged 3 commits into
mainfrom
feat/controller-quality-cleanup

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented Mar 17, 2026

Summary

  • Add ResultExport to ReplayerSpec (exclusive to replayer, not full nodes) for periodic block result export to S3
  • Remove StartHeight from S3SnapshotSource — users provide an explicit S3 URI for a specific snapshot; height is auto-discovered from the archive key name
  • Replace local wellKnownSnapshotChains map with seiconfig.KnownChain() and local port constants with seiconfig.PortRPC, seiconfig.PortSidecar, seiconfig.NodePorts()
  • Use explicit task status from sidecar (running/completed/failed) instead of inferring from CompletedAt
  • Update sample manifest to pacific-1-shadow-replayer.yaml with inferred S3 bucket
  • Bump sei-config to v0.0.7, seictl to v0.0.15
  • Remove stale conditions.go and old pacific-1-replay.yaml sample

Test plan

  • Replayer validation tests (well-known chain, custom chain requires URI, peer requirements)
  • BuildTask tests (inferred bucket, explicit URI, discover peers, result export)
  • Result export task builder tests (replayer-only, nil when absent)
  • Resource generation tests updated for sei-config port constants
  • Plan execution tests updated for explicit task status model
  • CI lint and test pass

- Add ResultExport to ReplayerSpec (not FullNodeSpec) enabling periodic
  block result export to S3 for replay workloads
- Remove StartHeight from S3SnapshotSource; users specify an explicit
  S3 URI for a specific snapshot, height is auto-discovered from the key
- Replace local wellKnownSnapshotChains map with seiconfig.KnownChain()
- Replace local nodePorts/defaultSidecarPort with seiconfig port constants
- Update sample manifest to pacific-1-shadow-replayer with inferred bucket
- Bump sei-config to v0.0.7 and seictl to v0.0.15
- Remove stale conditions.go and pacific-1-replay.yaml

Made-with: Cursor
@bdchatham bdchatham force-pushed the feat/controller-quality-cleanup branch from 1340904 to 729ecab Compare March 17, 2026 20:34
defaultSnapshotUploadCron = "0 0 * * *"
defaultSnapshotInterval = int64(2000)

resultExportBucket = "sei-node-mvp"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be updated to a real bucket once MVP is complete.

@bdchatham bdchatham marked this pull request as ready for review March 17, 2026 20:54
@bdchatham bdchatham merged commit 72402e9 into main Mar 17, 2026
2 checks passed
@bdchatham bdchatham deleted the feat/controller-quality-cleanup branch March 19, 2026 21:00
bdchatham added a commit that referenced this pull request May 19, 2026
…ind race) (#295)

Live manual run on harbor surfaced this: `seictl nd watch --until=Ready`
returns when SeiNode pods report Running (status.phase=Running, all
plan tasks Complete) — but seid's Tendermint RPC server takes a few
more seconds AFTER that to actually bind port 26657. The
compute-target-height bash step's single-shot curl loses that race:

  curl: (7) Failed to connect to <snd>-internal.nightly.svc:26657 after
  10 ms: Could not connect to server
  failed to parse latest_block_height from .../status

Manual curl from a fresh pod 90s later returns HTTP 200 with height
286 — the chain IS up, just not at the instant `nd watch` returned.

Wrap the curl in a 30-attempt retry loop with 3s sleep (90s window)
and a 3s --connect-timeout. Matches the retry pattern
resolve-proposal-id already uses (different step, same shape:
chain-side query that needs to tolerate a brief warmup).

Symptom chain on the live run: compute-target-height exits 1 →
workflow-vars ConfigMap not created → downstream submit-upgrade-proposal
seitask-runner pod stuck in CreateContainerConfigError because its
envFrom configMapRef can't resolve. Single fix at the source resolves
the whole cascade.

Bug #9 in the major-upgrade-runs-end-to-end debugging chain. Same
shape as several earlier ones: an assumption about timing/readiness
that doesn't survive contact with the cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant