feat(noderesource): lock seid invocation on HOME env var#234
Merged
Conversation
The controller now injects a canonical `seid start --home $(HOME)` invocation across every seid-running container (main, init, bootstrap init). HOME is set on container.env to dataDir; the K8s VariableReference $(HOME) resolves at pod-create. Shell-context seid-init and the sidecar wait wrapper use "$HOME" expansion against the same env var. This removes the silent gap that started failing nodes after #233: distroless base images don't carry an /etc/passwd entry for uid 65532, so HOME is unset for the seid process, and `~/.sei` resolves to `/.sei` on a read-only root filesystem -> ENOENT/EACCES on any subcommand that touches the home dir (e.g. eth_accounts via mkdir /.sei). Tying all four codepaths to the same env var also unblocks the planned move to the conventional Cosmos SDK layout (~/.sei) in a follow-up: flipping dataDir there will Just Work because every container references HOME, not a literal path. spec.Entrypoint is now silently ignored and marked Deprecated. It was the only way to override seid args before; with HOME-based resolution the override is no longer needed (and dangerous, since it bypasses the canonical invocation). Field stays in v1alpha1 for backwards-compat; slated for removal in v1alpha2. Rollout: this is a pod-spec change. After merge + image bump, harbor controller will roll new SeiNodes through the existing NodeUpdate plan (apply-statefulset -> replace-pod -> observe-image -> mark-ready); existing chains keep their PVCs and dataDir layout untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
5 tasks
bdchatham
added a commit
that referenced
this pull request
May 13, 2026
Default image used by clusters that consume `config/default?ref=main` without an overlay-level image pin. This bump propagates to dev and prod on the next Flux reconcile after merge. Harbor is unaffected — it has its own overlay-level pin in the platform repo (already at 0d3a156 via platform#521). Cumulative changes since the previous pin (e99c04c, PR #216): - #220 — keyring containment + non-root sidecar + ReadOnlyRootFilesystem - #233 — non-root seid (uid 65532) + kubelet fsGroup migration - #234 — HOME env var declared on seid containers The controller bump itself is passive: existing dev/prod pods don't restart until each chain's spec.image is bumped to trigger a NodeUpdate plan. The kubelet fsGroup migration from #233 only fires when a pod restarts and the pod template includes fsGroup with FSGroupChangePolicy=OnRootMismatch — chown'ing the PVC root tree at mount time. On fresh harbor PVCs this is instant; on existing multi-TB prod archive PVCs it could take longer, so per-chain image bumps on prod should be sequenced with that latency in mind. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
May 14, 2026
Same cleanup shape as sei-protocol/platform#522 applied to the fleet manifests, now for the in-tree samples. Six sample files; all had `spec.entrypoint: { command: ["seid"], args: ["start", "--home", "/sei"] }` which the controller silently ignores as of #234 — the canonical invocation is injected unconditionally. Without this cleanup, copy-paste users would land manifests that: - look functionally meaningful but are silently ignored, and - pin the literal `/sei` arg value, which post-flip is misleading (the data dir is `/.sei`). One sample (`pacific-1-shadow-replayer.yaml`) carried an extra arg — `--skip-app-hash-validation` — that was the only real functional override in this set. #234 removed the surface to pass extra seid args via the CRD; tracked as a follow-up issue for cases that legitimately need non-default args (shadow replayer being the candidate use case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 14, 2026
bdchatham
added a commit
that referenced
this pull request
May 14, 2026
* feat(platform): flip dataDir to conventional /.sei home layout
`platform.DataDir` moves from `/sei` to `/.sei` — the conventional
Cosmos SDK home layout (`~/.sei`). Single source of truth in
`internal/platform/platform.go`; both consumers (`noderesource` for
the StatefulSet pod template, `task/bootstrap_resources` for the
ceremony pod) route through it, so this is a one-line flip plus
test-literal fixups.
What propagates from the flip:
- seid main container: `mountPath`, `HOME` env, and `--home $(HOME)`
K8s VariableReference all become `/.sei`. Same value resolves via
three layers (mount, env, args) automatically.
- sidecar container: `mountPath` and `SEI_HOME` env both become
`/.sei`. `seictl serve` reads `SEI_HOME` via its `--home` flag's
env source (`seictl/main.go:54-66`), so all task handlers operate
against `/.sei` with no coordinated sidecar change required —
verified via cross-repo audit.
- bootstrap container path: same shape.
Tests updated:
- `noderesource_test.go:1119` — substring assertion was
`NotTo ContainSubstring("/sei")`. `/.sei` contains `/sei` as a
substring, so the assertion would falsely match post-flip. Routed
through the `dataDir` constant so it auto-tracks any future flip.
- Two cosmetic literals (`keyringTestMountPath`, an assertion
message) updated for hygiene.
PVC semantics: the mount path is a kernel-side façade. Existing PVC
content lives at the volume root inode, not under `/sei`. Pods rolling
to the new template see the same `config/`, `data/`, `keyring-file/`
subtree, just at `/.sei` instead of `/sei`. No data movement.
Round-trip safe: reverting this constant + re-rolling pods restores
the old mountpath against the same PVC contents.
Rollout: this is a pod-spec change with no image change.
`buildRunningPlan` only triggers NodeUpdate plans on `spec.image` drift
(per `internal/planner/planner.go:708`), so existing SeiNodes stay on
their current template until each chain's `spec.image` is bumped.
That's the same propagation mechanism PR 1 (#234) relied on.
#235 tracks extending drift detection
to pod-template-hash, which would let this flip propagate automatically
to the fleet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(samples): strip deprecated spec.entrypoint blocks
Same cleanup shape as sei-protocol/platform#522 applied to the fleet
manifests, now for the in-tree samples. Six sample files; all had
`spec.entrypoint: { command: ["seid"], args: ["start", "--home", "/sei"] }`
which the controller silently ignores as of #234 — the canonical
invocation is injected unconditionally.
Without this cleanup, copy-paste users would land manifests that:
- look functionally meaningful but are silently ignored, and
- pin the literal `/sei` arg value, which post-flip is misleading
(the data dir is `/.sei`).
One sample (`pacific-1-shadow-replayer.yaml`) carried an extra arg —
`--skip-app-hash-validation` — that was the only real functional
override in this set. #234 removed the surface to pass extra seid args
via the CRD; tracked as a follow-up issue for cases that legitimately
need non-default args (shadow replayer being the candidate use case).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Single source of truth for the seid data directory across every seid-running container:
container.envdeclaresHOME=dataDir(andTMPDIR=dataDir/tmp)container.argsuse$(HOME)(K8s VariableReference) for the canonicalseid start --home $(HOME)seid-init,sidecarWaitCommand) use"$HOME"(shell expansion) against the same env varspec.Entrypointis now silently ignored and marked Deprecated; the canonical invocation is injected unconditionally. The field stays in v1alpha1 for backwards compat — manifests don't need updates today — and is slated for removal in v1alpha2.The failure mode this fixes
Smoke-testing #233 on harbor surfaced
mkdir /.sei: permission deniedfrometh_accountsand a handful of other seid subcommands. Root cause: distroless base images don't carry an/etc/passwdentry for uid 65532, soHOMEis unset for the seid process, and Cosmos SDK's~/.seidefault resolves to literal/.sei— on a read-only root filesystem.Setting
HOMEat the container level closes that gap: every subcommand that consultsos.UserHomeDir()/~/.seinow seesdataDir, the same path--homeis pointing at.Why $(HOME) (with parens, not braces)
K8s defines two unrelated substitution mechanisms:
container.args[],container.command[],env.value\$(VAR_NAME)sh -c "<script>"orbash -c "<script>"\$VARor\${VAR}The kubelet form uses parens to avoid collision with shell
\$VAR— args[] entries are passed directly viaexecvewith no shell in the loop. Ref: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#use-environment-variables-to-define-arguments.Both codepaths in this PR ultimately resolve from the same
HOMEenv var; only the substitution layer differs.What this unlocks
The 3-PR sequence we're working toward:
HOME, not a literal/sei. Behavior unchanged (still resolves to/sei).spec.entrypointfrom any manifest that still sets it (no-op since it's now ignored, but cleans up state).platform.DataDirfrom/seito the conventional Cosmos SDK layout (~/.sei//.sei). Single-line change; every container picks it up viaHOME. Blocked on Trigger NodeUpdate plan on pod template hash drift (not just spec.image) #235 (pod template hash drift detection).Rollout — corrected
This is a pod-spec change with no image change. Two semantics matter:
buildRunningPlan(internal/planner/planner.go:708) builds aNodeUpdateplan only whenspec.image != status.currentImage. Env/args changes alone do not trigger a plan.OnDeleteupdate strategy (internal/noderesource/noderesource.go:204), so even if the StatefulSet template is patched, existing pods don't roll.Net effect on existing chains:
HOME=/rootis set by the kernel for uid 0,~/.seiresolves to/root/.sei, everything works.mkdir /.sei: permission deniedbug. The new pod template includes the fix, but existing pods need a manual roll (kubectl delete podorkubectl rollout restart sts) for the operator to pick it up. Same operator action as feat: SeiNode validator.operatorKeyring CRD surface (closes #219) #220 and feat(noderesource): non-root seid + seid-init via kubelet fsGroup migration #233 required.Filed as #235 for the long-term fix: trigger NodeUpdate plans on pod template hash drift, not just
spec.imagedrift. PR 3 is blocked on #235.Cross-review (Coral trio)
spec.Entrypointarbitrary-binary primitive). Two follow-ups: Validate SeiNode.spec.chainId with a Pattern regex (shell injection sink in seid-init) #236 (ChainID injection — pre-existing, not widened), Audit snapshot uploader scope: dotfiles under dataDir now writable via HOME #237 (snapshot uploader scope — audit what tars from dataDir now that HOME points there).All three follow-ups are non-blocking for this PR.
Test plan
go build ./...cleango vet ./...cleangofmt -l .cleango test ./...greengo test -race ./...greengolangci-lint run --new-from-rev=origin/maincleanmake manifests generate— CRD doc updates reflect the deprecationeth_accountsresolves on a freshly-rolled SeiNode, nopermission deniedregressions — verified indirectly viaHOME=/.seienv on rolled pods + zeropermission deniederrors in seid logs + chain producing blocks. Literaleth_accountscurl wasn't executed (seid bound port 8545 to a non-loopback address; port-forward returned connection refused). Fix is confirmed working in practice.Follow-up issues filed
SeiNode.spec.chainIdwith a Pattern regex (pre-existing injection sink)🤖 Generated with Claude Code