FIP: Requirements for Onboarding New Farcaster Validators #272
Replies: 11 comments 2 replies
-
|
Going to add comments on individual items so they can have break out replies rather than being a large bulk to individually quote and reply to |
Beta Was this translation helpful? Give feedback.
-
This warrants some clarification – by this, are you saying a client that accepts messages snapchain does not is instantly invalid, even if those messages are accepted but not included in snapchain blocks (e.g. a separate block structure being maintained)? |
Beta Was this translation helpful? Give feedback.
-
Presently, things that land on testnet happen sometimes within a day – this is obviously workable when testnet nodes are strictly running stock snapchain, but if the goal is to admit forked clients or from-scratch rewrites, this would not work, so what is the new cadence of testnet releases? |
Beta Was this translation helpful? Give feedback.
-
Can you clarify what counts as a compliant CI for this? Github actions are not free, so if other CI options are sufficient, what are the terms? |
Beta Was this translation helpful? Give feedback.
-
appreciate this detail – is there an NTP server that is required or at least, is considered the "authoritative" source? |
Beta Was this translation helpful? Give feedback.
-
Unlike the other terms, this is somewhat vague and seems like a way to allow opinion to be an arbitrary way to reject a node operator. Conflicts exist between people – the previous dev call got rather heated, others noted the "why are we still talking about this?" question and repeated interruptions to be unprofessional, so this either needs clarification, a "ceasefire" status, or removal. |
Beta Was this translation helpful? Give feedback.
-
Genuinely, this is a positive move, highly approve. |
Beta Was this translation helpful? Give feedback.
-
What is the duration of the "observation window"? |
Beta Was this translation helpful? Give feedback.
-
I'll reiterate previous concerns stated – the current block time will not survive larger quorums and/or varied geographies. It's better to rip the bandaid off sooner than later. |
Beta Was this translation helpful? Give feedback.
-
|
A couple suggested additions regarding acceptance criteria (makes alternative implementations' lives harder but is a reasonable expectation):
|
Beta Was this translation helpful? Give feedback.
-
|
Nice |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Farcaster consensus is BFT (Malachite) with equal voting power per validator and a >2/3
quorum. On today's ~6-validator shards the fault budget is 1, so a single faulty, slow, or
divergent validator can halt a shard. Adding a validator is a one-line
effective_atedit tovalidators.toml— cheap todo, expensive to get wrong.
The hard case, and the core of this proposal, is a validator running a different codebase
than snapchain (a fork or an independent reimplementation). Such a client must be bit-for-bit
deterministic with the rest of the network or it breaks consensus. This document defines the
requirements an alternative client must meet, how they are verified, and the rollout for
admitting one. Deployment requirements that apply to any new validator (including same-binary
expansion into a new geo/datacenter) are covered in one section near the end.
Motivation
Two things are happening:
see Deployment requirements below).
"Client" here means the consensus node (the validator/hub binary), not an app.
There is no written, testable bar today. This FIP provides one, scaled to risk.
Current phase. Validator membership is governed manually — there are no native protocol
incentives or permissionless staking yet, so the set is maintained by mutual agreement among
operators via
validators.toml. Until that changes, admission and removal are partly a trust andcollaboration decision, not only a technical one (see Operator requirements below).
Risk tiers
Tiers are cumulative. The bulk of this document is the B/C determinism contract; Tier A only
needs the Deployment + Operational sections.
Requirements for alternative client implementations
A non-snapchain validator (Tier B/C) must satisfy all of the following before it is added to
validators.tomlon mainnet. Each is a hard gate, not a guideline.R1 — Consensus determinism (the core gate)
Consensus signs over encoded bytes and hashes headers with BLAKE3
(
snapchain_codec.rs,blocks.proto).Any divergence in encoding, hashing, or state computation produces different signed bytes or block
hashes → votes are rejected → the shard stalls (or, if a quorum diverges together, forks). The client
must produce, for every input in the shared conformance corpus:
Gate: 100% byte-exact match against the shared conformance vectors (see Verification below).
R2 — Validation parity
The client must make the identical accept/reject decision on every message as the reference
validation rules (
src/core/validations/).One client accepting what another rejects produces non-deterministic block contents and breaks
consensus.
R3 — Protocol compatibility & versioning
definitions, the Malachite consensus wire format/version, and the message-validation rule set it
conforms to.
a defined compatibility window. A client that falls behind is removed until it re-conforms.
R4 — Byzantine safety & crash recovery
crash/restart. Signer design must make double-signing impossible (e.g. height/round high-water
mark persisted before signing).
consistently with its pre-crash self.
R5 — Networking conformance
The client must speak the existing p2p protocol: libp2p gossipsub over QUIC, the consensus /
mempool / decided-values / contact-info topics, and the contact-info exchange used for peer
discovery and mesh formation
(
gossip.rs). It mustform and hold the gossip mesh with existing validators.
R6 — Operational & custody
environments; equivocation-safe signer (R4).
exported.
operator, the client should be auditable by the existing operators (source available for review).
R7 — Continuous compliance
Conformance is not one-and-done. The client must:
version;
effective_atif it drifts.How requirements are verified
The above requirements map onto four test layers. Snapchain is strong at L2 today but thin at L0/L3
— the concrete gaps are tracked under snapchain#924
and must be closed before a Tier B/C client is admitted.
client_parity_testsis one-directional (input validation only); it must become bidirectional with output assertions.consensus_test.rsTestNetworkharness with the candidate as a node: consensus, sync, crash/recovery, validator-add, cross-shard, partition.setup_local_testnet, load viasrc/perf/) with the candidate validating.Deployment requirements (all new validators)
These apply to every new validator, Tier A included — most relevant for a validator in a new
geo/datacenter, because snapchain's timing budget is tight (
block_time1s;propose_time1s;prevote_time/precommit_time500ms; round timeouts grow bystep_delta500ms,consensus.rs).the propose/prevote windows with margin. A validator that can't get votes out in time degrades
throughput for everyone. (No latency-injection test exists yet — #920.)
re-formation under churn.
effective_at.Operator requirements (all validators)
While the set is manually governed (see Current phase above), recovering from a high-priority
incident — a stalled shard needing a coordinated validator-set cutover, or a fast removal of a
misbehaving node — depends on operators coordinating in real time. A prospective operator must commit
to:
Operators are also added as maintainers of the snapchain repository, sharing responsibility for
review, releases, and incident fixes — reinforcing the auditability expectation in R6 and ensuring
every operator can act during an incident.
The same manual process that admits a validator can remove one — for technical drift (R3/R7) or
for failing to uphold these collaboration expectations. These criteria are interim and expected to
be superseded once native protocol incentives exist.
Rollout
Staged and reversible:
observation window.
effective_at; observe the L3 gates for an observationwindow, including a partition drill.
effective_atto cut into all shards at around the same time —staggering leaves the validator sets mismatched across shards and risks cross-shard instability.
Because
effective_atis a per-shard height and shards advance independently, a near-simultaneouscutover needs per-shard height estimates coordinated with existing operators.
candidate at a future
effective_at. Operators should know how to execute and observe a removalbefore scheduling the add.
Acceptance checklist (go / no-go)
Complete before a mainnet
validators.tomledit. Tier tags in parentheses; unmarked = all tiers.candidate-attributable rejections, candidate proposes & is voted in).
verified.
participation in coordinated validator-set changes.
effective_atscheduleagreed with all operators.
Open questions
be a hard requirement for Tier C, or is conformance + audit sufficient?
propose_time/prevote_time/block_time— a coordinated, all-node change?clients can depend on it?
References
(sub-issues #917–#923)
snapchain_codec.rs,blocks.protoconsensus.rs,validators.tomlgossip.rsconsensus_test.rs,client_parity_tests/,setup_local_testnet.rs,src/perf/Beta Was this translation helpful? Give feedback.
All reactions