feat: validator signing-key configuration on SeiNode#136
Merged
Conversation
Implements docs/design-seinode-validator-signing-key-lld.md (#135). Adds spec.validator.signingKey to SeiNode, enabling migration of an existing external validator identity onto a Kubernetes-managed SeiNode without double-signing risk. CRD changes: - New SigningKeySource discriminated union with one v1 variant (Secret). Future TMKMS / RemoteSigner / Vault variants slot in additively under the same XValidation exactly-one rule. - New SecretSigningKeySource with immutable secretName. - New ConditionSigningKeyReady status condition; reasons follow the coarse Validated/NotReady/Invalid taxonomy used by ImportPVCReady. Pod-spec mutation (production StatefulSet only): - Single Secret volume scoped to priv_validator_key.json (defaultMode 0400, items-filtered). - subPath mount on the seid main container at /sei/config/priv_validator_key.json. subPath is deliberate — kubelet does not auto-refresh subPath mounts, so a kubectl edit cannot hot-swap the consensus key under a running seid. - Sidecar container has no signing mount — no business reading consensus material. - Bootstrap-Job pod-spec is untouched; load-bearing safety comment on task.GenerateBootstrapJob pins the invariant against future refactors. Pre-flight validation: - New validate-signing-key task validates the Secret before pod creation (existence, deletionTimestamp, key data present, JSON parse, Tendermint shape). Surfaces failures via SigningKeyReady condition. - Inserted into both buildBootstrapPlan (between Phase 0 and Phase 1) and buildBasePlan (between EnsureDataPVC and ApplyStatefulSet) when SigningKey is set. - Cross-field validation in validatorPlanner.Validate: SigningKey is mutually exclusive with GenesisCeremony. Other: - New RBAC marker: secrets get;list;watch on the SeiNode controller. - Finalizer code comment noting Secrets are externally managed. priv_validator_state.json is intentionally not injected — CometBFT auto-creates it on first start, and the cutover runbook's "wait M blocks past last-signed height" provides the operational protection against re-orgs at the cutover boundary. Documented in the LLD §11 and §8. Tests: 8 task unit tests, 4 pod-spec generator tests, 1 bootstrap-Job invariant test, 5 plan-builder tests (Validate + plan ordering), 4 controller integration tests covering happy path, missing-Secret convergence, malformed-Secret terminal failure, and Secret preservation on SeiNode deletion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the no-WHAT, no-breadcrumb comment standard: - Drop trivial WHAT-comments on self-describing identifiers (privValidatorKeyDataKey, ValidateSigningKeyParams, Execute, signingKeySecretSource, needsValidateSigningKey, validateSigningKeyParams). - Trim references to PR numbers, runbook phases, and LLD section pointers that belong in the PR description, not the source. - Keep load-bearing WHY-comments on the bootstrap-Job safety invariant, the subPath-vs-hot-swap rationale, the Terminal/transient error contract, and the externally-managed-Secrets finalizer note. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
commented
Apr 28, 2026
PR review: - Drop the Phase 0.5 inline comment in buildBootstrapPlan (the function's task ordering is self-evident). - Replace the safety-invariant prose comment on GenerateBootstrapJob with a runtime guard, assertNoSigningKeyOnBootstrapPod, that fails the Job generation if the bootstrap pod-spec ever references the validator's signing-key Secret. The existing test covering this invariant remains. - Move tendermintValidatorKey to the top of validate_signing_key.go and add a serdes round-trip test against the validKeyJSON fixture, locking the on-disk Tendermint shape against future cosmos-sdk changes. Scope: - v1 supports SigningKey set from SeiNode creation only. Mid-life patching of SigningKey onto a Running validator is a no-op today because buildRunningPlan only detects image drift; the LLD §8 cutover-via-patch flow assumed otherwise. - LLD §8 rewritten as single-shot deployment: stop EC2 → wait M blocks past last-signed → create Secret → apply SeiNode with bootstrapImage + signingKey → seid syncs to tip and starts signing. - LLD §11 adds a "mid-life SigningKey patch (drift detection)" deferred entry with the implementation sketch. - .tide/validator-migration.md runbook rewritten to match: single Execution sequence, no Phase 1 pre-sync, no priv_validator_state.json transfer, slashing protection via M-block wait. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate Sei validators from EC2 to Kubernetes without double-signing. This PR adds
spec.validator.signingKey.secret.secretNameto theSeiNodeCRD so an operator can deploy a K8s validator that takes over signing from a stopped EC2 instance using its existing consensus identity.Implements
docs/design-seinode-validator-signing-key-lld.md(PR #135). Direction doc:.tide/validator-migration.md.Scope
v1 supports single-shot deployment: deploy a
SeiNodewithvalidator.signingKey.secret.secretNameset from creation. The bootstrap Job runs without the Secret (load-bearing safety property — bootstrap pods are physically incapable of signing); the production StatefulSet starts with the Secret mounted, seid catches up to chain tip and starts signing.Mid-life patching is NOT supported — adding
signingKeyto a Running validator is a no-op today becausebuildRunningPlanonly detects image drift. Documented in LLD §11 with the implementation sketch for the deferred zero-downtime variant.What's in this PR
CRD additions (
api/v1alpha1/)SigningKeySourcediscriminated union with one v1 variantSecret *SecretSigningKeySource. Future TMKMS / RemoteSigner / Vault variants slot in additively under the same exactly-one CEL rule.secretNameimmutable (self == oldSelf).ConditionSigningKeyReady+Validated/NotReady/Invalidreason taxonomy (mirrorsImportPVCReady).Pod-spec mutation (
internal/noderesource/, production StatefulSet only)priv_validator_key.json(defaultMode 0400, items-filtered).subPathmount on the seid main container at/sei/config/priv_validator_key.json.subPathis the safety property — kubelet does not auto-refreshsubPathmounts, so akubectl edit secretcannot hot-swap the consensus key under a running seid.Bootstrap-Job safety invariant (
internal/task/bootstrap_resources.go)assertNoSigningKeyOnBootstrapPodruntime guard fails Job generation if the bootstrap pod-spec ever references the validator's signing-key Secret. Belt-and-suspenders against a future refactor that couples the volume helpers; the existing testTestTaskGenerateBootstrapJob_NeverHasSigningKeyVolumeis the static-analysis layer.Pre-flight validation (
internal/task/validate_signing_key.go)validate-signing-keytask: existence, deletionTimestamp, key data present, JSON parse, Tendermint shape (address, pub_key.{type,value}, priv_key.{type,value}). Direct condition mutation from the task (matches the mergedensure-data-pvcprecedent).buildBootstrapPlan(afterEnsureDataPVC, beforeDeployBootstrapSvc) andbuildBasePlan(afterEnsureDataPVC, beforeApplyStatefulSet) when SigningKey is set.Validator planner cross-field validation —
SigningKeyis mutually exclusive withGenesisCeremony.RBAC —
+kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watchon the SeiNode reconciler.Notable scope decisions
priv_validator_state.jsonis not injected. CometBFT auto-creates it on first start. The runbook's "wait M blocks past last-signed height before deploying" provides the slashing-protection envelope.What's deferred (LLD §11)
Mid-life SigningKey patch (drift detection); TMKMS / Horcrux / RemoteSigner variants; automated cutover orchestration; double-sign detection; sentry-node topology; consensus-key rotation; cross-namespace Secret references; HSM integration.
Test plan
make test— all packages green (internal/task41.9% /internal/noderesource96.2% /internal/planner69.3% /internal/controller/node85.6%)make lint— 0 issuesmake manifests generate— regenerated CRD YAML, deepcopy, role.yamltendermintValidatorKeyserdes round-trip fixturesubPathmount on seid, sidecar mount absent, regression guard for unset SigningKeyassertNoSigningKeyOnBootstrapPodOperational dry-run before first cutover
The runbook in
.tide/validator-migration.mdis the long pole. Plan: walk the single-shot deployment against a throwaway arctic-1 testnet identity (lowest-staked validator first), confirm the M-block wait works, confirm seid auto-createspriv_validator_state.jsonand starts signing past the wait window. Document downtime envelope before applying to pacific-1.🤖 Generated with Claude Code