@renaynay @Wondertan
- 2021-11-25: initial draft
- 2022-03-30: update to bridge node definition
Refers to the data availability "halo" network created around the Core network.
A bridge node is a node that is connected to a celestia-core node via RPC. It receives a remote address from a
running celestia-core node and listens for blocks from celestia-core. For each new block from celestia-core, the bridge
node performs basic validation on the block via ValidateBasic()
, extends the block data, generates a Data Availability
Header (DAH) from the extended block data, and creates an ExtendedHeader
from the block header and the DAH, and finally
broadcasts it to the data availability network (DA network).
A bridge node does not care about what kind of celestia-core node it is connected to (validator or regular full node), it only cares that it has a direct RPC connection to a celestia-core node from which it can listen for new blocks.
The name bridge was chosen as the purpose of this node type is to provide a mechanism to relay celestia-core blocks to the data availability network.
A full node is the same thing as a light node, but instead of performing LightAvailability
(the process of
DASing to verify a header is legitimate), it performs FullAvailability
which downloads enough shares from the network in order
to fully reconstruct the block and store it, serving shares to the rest of the network.
A light node listens for ExtendedHeader
s from the DA network and performs DAS on the received headers.
This ADR describes a design for the March 2022 Celestia Testnet that we decided at the Berlin 2021 offsite. Now that we have a basic scaffolding and structure for a celestia node, the focus of the next engineering sprint is to continue refactoring and improving this structure to include more features (defined later in this document).
- Introduce a standalone full node and rename current full node implementation to bridge node.
- Remove dev as a node type and make it a flag on every available node type.
Bad encoding fraud proofs will be generated by full nodes inside of ShareService
, upon reconstructing a block
via the sampling process.
If fraud is detected, the full node will generate the proof and broadcast it to the FraudSub
gossip network and
will subsequently halt all operations. If no fraud is detected, the full node will continue operations without
propagating any messages to the network. Since full nodes reconstruct every block, they do not have to listen to
FraudSub
as they perform the necessary encoding checks on every block.
Light nodes, however, will listen to FraudSub
for bad encoding fraud proofs. Light nodes will verify the
fraud proofs against the relevant header hash to ensure that the fraud proof is valid.
If the fraud proof is valid, the node should immediately halt all operations. If it is invalid, the node proceeds
operations as usual.
Eventually, we may choose to use the reputation tracking system provided by gossipsub for nodes who broadcast invalid fraud proofs to the network, but that is not a requirement for this iteration.
Implement scaffolding for RPC on all node types, such that a user can access the following methods:
HeaderAPI
Header(_height_)
-> ExtendedHeader{}Header(_hash_)
-> ExtendedHeader{}
NodeAPI
P2PInfo()
-> returns a blob of p2p info (can be broken into several subcommands, such asnet_info
)Config()
-> returns the node's configNodeType()
-> returns the node's type (e.g. full | bridge | light )RPCInfo()
-> RPC port, version, available APIs, etc.
UserAPI
AccountBalance(_acct_)
-> returns balance for given accountSubmitTx(_txdata_)
-> submits a transaction to the network
Note: it is likely more methods will be added, but the above listed are the essential ones for this iteration.
StateService
is responsible for fetching state relevant to a user being able to submit a transaction, such as account
balance, preparing the transaction, and propagating it via TxSub
. Bridge nodes will be responsible for listening
to TxSub
and relaying the transactions into the Core mempool. Light and full nodes will be able to publish
transactions to TxSub
, but do not need to listen for them.
Celestia-node's state interaction will be detailed further in a subsequent ADR.
Currently, both light and *full nodes are unable to perform data availability sampling (DAS) while syncing. They only begin sampling once the node is synced up to head of chain.
HeaderSync
and the DASer
will be refactored such that the DASer
will be able to perform sampling on past headers
as the node is syncing. A possible approach would be to for the syncing algorithms in both the DASer
and HeaderSync
to align such that headers received during sync will be propagated to the DASer
for sampling via an internal pubsub.
The DASer
will maintain a checkpoint to the last sampled header so that it can continue sampling from the last
checkpoint on any new headers.
Initially, we started with BlockService being the more “important” component during devnet architecture, but overlooked some problems with regards to sync (we initially made the decision that a celestia full node would have to be started together at the same time as a core node).
This led us to an issue where eventually we needed to connect to an already-running core node and sync from it. We were
missing a component to do that, so we implemented HeaderExchange
over the core client (wrapping another interface we
had previously created for BlockService
called BlockFetcher
), and we had to do this last minute because it wouldn’t
work otherwise, leading to last-minute solutions, like having to hand both the celestia light and full node a
“trusted” hash of a header from the already-running chain so that it can sync from that point and start listening for
new headers.
Proposed new architecture: BlockService
is only responsible for reconstructing the block from Shares handed to it by the ShareService
Right now, the BlockService
is in charge of fetching new blocks from the core node, erasure coding them, generating
DAH, generating ExtendedHeader
, broadcasting ExtendedHeader
to HeaderSub
network, and storing the block data
(after some validation checks).
Instead, a full node will rely on ShareService
sampling to fetch us enough shares to reconstruct the block
inside of BlockService
. Contrastingly, a bridge node will not do block reconstruction via sampling, but rather
rely on the header.CoreSubscriber
implementation of header.Subscriber
for blocks. header.CoreSubscriber
will
handle listening for new block events from the core node via RPC, erasure code the new block, generate the
ExtendedHeader
and pipe the erasure coded block through to BlockService
via an internal subscription.
- Implement disconnect toleration
The light and full nodes currently are prone to long-range attacks. To mitigate it, we should
introduce an additional trustPeriod
variable (equal to unbonding period) which applies to headers. Suppose a node
starts with the period between subjective head and objective head being higher than the unbonding period -
in that case, the light node must not trust the subjective head anymore, specifically its ValidatorSet
. Therefore,
instead of syncing subsequent headers on top of the untrusted subjective head, the node should request a new objective
head from the trustedPeer
and set it as a new trusted subjective head. This approach will follow the Tendermint model
for
light client attack detection.
- Implement parallelization for retrieving shares by namespace. This issue is already being worked on.
- NMT/Shares/Namespace storage optimizations:
- Right now we prepend to each Share 17 additional bytes, Luckily, for each reason why the prepended bytes were added, there is an alternative solution: It is possible to get NMT Node type indirectly, without serializing the type itself by looking at the amount of links. To recover the namespace of the erasured data, we should not encode namespaces into the data itself. It is possible to get the namespace for each share encoded in inner non-leaf nodes of the NMT tree.
- Pruning for shares.
Since the IPLD package is pretty much entirely separate from the celestia-node implementation, it makes sense that it is removed from the celestia-node repository and maintained separately. The extraction of IPLD should also include a review and refactoring as there are still some legacy components that are either no longer necessary and the documentation also needs updating.
At the moment, the syncing logic for a light nodes is simple in that it syncs each header from a single peer. Instead, the light node should double-check headers with another randomly chosen "witness" peer than the primary peer from which it received the header, as described in the light client attack detector model from Tendermint.