-
-
Notifications
You must be signed in to change notification settings - Fork 1
Overview of Lighthouse Ethereum consensus client implementation
Following is an overview of Lighthouse Ethereum consensus client implementation, as of version v4.2.0 (May 2023). This overview was prepared in the scope of this issue.
Lighthouse is an Ethereum consensus client written in Rust, which provides a proof-of-stake consensus mechanism to execution clients after The Merge. It has been stable and production-ready since Beacon Chain genesis.
Lighthouse is a fairly complicated piece of software. It is mostly implemented in async Rust, using the tokio
framework. The tokio
runtime is encapsulated in the Environment
structure. To spawn new asynchronous tasks, ordinary Lighthouse code uses the TaskExecutor
structure from the RuntimeContext
structure, which is obtained using the core_context
or service_context
method of Environment
. The service_context
method of RuntimeContext
itself allows creating sub-contexts by augmenting the diagnostic logger with sub-service's name.
The code commonly uses traditional synchronization primitives, such as mutexes, reader-writer locks, and conditional variables, from tokio-sync
and parking_lot
crates, with reference-counted smart pointers. In some places, e.g. here, it uses Mutex<()>
to prevent concurrent execution of certain code sections. Some asynchronous tasks communicate through bounded or unbounded multiple-producer, single-consumer (MPSC) channels. Pieces of data that cannot be immediately processed get stored in various queues and maps, most of them unbounded. It is not clear, what could ensure that those unbounded buffers do not grow excessively.
Different caches and data pools are ubiquitous in the code base, e.g. in the BeaconChain
structure, clearly for performance optimization reasons. The implementation attempts to make fine-grained optimizations, such as prioritizing certain asynchronous work items, based on protocol specifics, e.g. here.
Lighthouse uses libp2p
as networking stack, supporting TCP/IP, optionally WebSockets over TCP/IP, noise
as the encryption layer, and mplex
/yamux
as multiplexing layers. It employs gossipsub and identify protocols from the libp2p
stack, as well as implements the following custom protocols:
- Eth2 RPC that facilitates direct peer-to-peer communication primarily for sending/receiving chain information for syncing,
- Discovery that handles queries and manages access to the discovery routing table.
In addition to that, there is PeerManager
, which handles peer's reputation and connection status.
The communication layer is represented as the Network
structure, which encapsulates the libp2p
swarm. Internal services do not interact with Network
directly; instead, the node spawns the network service with the start
async method of the NetworkService
structure and uses the returned instances of the NetworkGlobals
and NetworkSenders
structures. NetworkGlobals
represents a collection of variables that are accessible outside of the network service itself. NetworkSenders
provides the following methods:
-
network_send
returns an unbounded channel to send internal messages of typeNetworkMessage
to the network service, -
validator_subscription_send
returns a bounded channel to send internal messages of typeValidatorSubscriptionMessage
to the network service.
The NetworkService::start
method creates, among other things, the Router
, AttestationService
, and SyncCommitteeService
services, and finally spawns an asynchronous task for handling networking layer events, invoking the spawn_service
method. Internal messages sent through the NetworkSenders::network_send
channel are handled by the on_network_msg
method, ones sent through the NetworkSenders::validator_subscription_send
-- by on_validator_subscription_msg
, ones from AttestationService
-- by on_attestation_service_msg
, from SyncCommitteeService
-- by on_sync_committee_service_message
, etc. Events from the libp2p
swarm are first handled by the Network::poll_network
and then directed to Router
in the NetworkService::on_libp2p_event
method.
There are two main ways the beacon nodes communicate with each other through the network: sending peer-to-peer request/respons messages (RPC) and gossip.
Outbound RPC requests are initiated by sending them to the network_send
channel, wrapped into NetworkMessage::SendRequest
, which first gets dispatched by NetworkService::on_network_msg
to Network::send_request
, and then to RPC::send_request
. RPC
first checks self rate limit, if configured, passing the message to an instance of the SelfRateLimiter
structure. If the message exceeds the rate limit, it's pushed to the unbounded delayed_requests
queue; otherwise, it's pushed to the unbounded RPC::events
queue.
Similarly, outbound RPC responses are sent to the network_send
channel, wrapped into NetworkMessage::SendResponse
, which first gets dispatched by NetworkService::on_network_msg
to Network::send_response
, and then to RPC::send_response
. RPC
pushes the message directly to the unbounded RPC::events
queue.
The libp2p
swarm pulls outbound messages from the RPC event queue and self-rate limiter by calling the RPC::poll
method and delivers them to RPCHandler
by calling its inject_event
method. For RPC requests, inject_event
calls the send_request
method, which normally pushes the message to the dial_queue
queue. For RPC responses, inject_event
calls the send_response
method, which normally pushes the message to the pending_items
queue. Requests get pulled from dial_queue
in the RPCHandler::poll
method and returned to the libp2p
swarm as OutboundSubstreamRequest
variant of libp2p::swarm::ConnectionHandlerEvent
with libp2p::swarm::SubstreamProtocol
, using OutboundRequestContainer
as protocol upgrade. Responses get pulled from pending_items
in the RPCHandler::poll
method and passed to the send_message_to_inbound_substream
function, which converts it to a future that sends the message to the libp2p
substream. The future is then stored as InboundState::Busy
in the InboundInfo::state
field of the corresponding entry from the RPCHandler::inbound_substreams
map. The future gets polled in RPCHandler::poll
by the libp2p
swarm.
Inbound RPC request substreams get passed by the libp2p
swarm to the RPCHandler::inject_fully_negotiated_inbound
method, which pushes the request to its unbounded events_out
queue, which libp2p
swarm pulls by calling the RPCHandler::poll
and delivers them to RPC
by calling its inject_event
method. For RPC requests, inject_event
first checks the RPC rate limit, passing the request to an instance of the RPCRateLimiter
structure. If the request exceeds the rate limit, RPC
pushes an event to its unbounded events
queue to send an error response back to the remote peer; otherwise, it pushes an event to the queue to deliver the request from the libp2p
swarm event stream.
The libp2p
swarm pulls the inbound messages from the RPC event queue by calling the RPC::poll
method. NetworkService
pulls the inbound RPC requests from Network
, which, in turn, pulls them from the libp2p
swarm in the Network::poll_network
. Network::poll_network
handles RPC events by calling the inject_rpc_event
method, which fully handles some types of RPC requests itself or returns partially handled requests back for further handling. The returned requests get sent to Router
in the NetworkService::on_libp2p_event
method.
Similarly, inbound RPC response substreams get passed by the libp2p
swarm to the RPCHandler::inject_fully_negotiated_outbound
method, which pushes the substream into the outbound_substreams
map. The outbound substreams get pulled in the RPCHandler::poll
method, called by the libp2p
swarm. Subsequently, NetworkService
pulls the inbound RPC responses from Network
, which, in turn, pulls them from the libp2p
swarm in the Network::poll_network
. Network::poll_network
handles RPC events by calling the inject_rpc_event
method, which fully handles some types of RPC responses itself or returns partially handled requests back for further handling. The returned responses get sent to Router
in the NetworkService::on_libp2p_event
method.
Publishing gossip messages is initiated by sending them to the network_send
channel, wrapped into NetworkMessage::Publish
, which first gets dispatched by NetworkService::on_network_msg
to Network::publish
, and then to the publish
method of libp2p::gossipsub::Gossipsub
. If Gossipsub
returns PublishError::InsufficientPeers
then the message is inserted into GossipCache
to be published later. GossipCache
doesn't limit the number of cached message, but sets an expiration timeout for each gossip topic; messages waiting in the cache longer than the timeout get pruned.
The Network::subscribe
method is used to subscribe to new gossipsub
topics. There are several places in code where it happens:
- when starting the networking layer, the node subscribes to the starting topics specified in the network configuration (with the
subscribe_kind
method); - when switching to another fork, the node also subscribes to new topics;
- when handling
NetworkMessage::SubscribeCoreTopics
event inNetworkService::on_network_msg
; - when handling
SubnetServiceMessage::Subscribe
fromAttestationService
; - when handling
SubnetServiceMessage::Subscribe
fromSyncCommitteeService
.
NetworkService
pulls the inbound gossip messages from Network
, which, in turn, pulls them from the libp2p
swarm in the Network::poll_network
. Network::poll_network
handles gossip events by calling the inject_gs_event
method, which decodes the message and returns it back for further handling. If the returned message is PubsubMessage::Attestation
, it is first handled by AttestationService
. Then it gets sent to Router
in the NetworkService::on_libp2p_event
method.
Network
reports to gossipsub
the results of inbound gossip message validation when failed to decode the message or when its report_message_validation_result
method is called. The latter happens when handling the internal NetworkMessage::ValidationResult
message in NetworkService::on_network_msg
.
The Router
service spawned in NetworkService::start
handles incoming network messages by routing them to the appropriate service. NetworkService
pushes internal messages of type RouterMessage
to the unbounded channel returned by the Router:spawn
method. The method spawns the SyncManager
and BeaconProcessor
services, as well as an asynchronous task handling messages received from the receiving end of the returned channel. The received messages are handled with the handle_message
method, which dispatches the message to the appropriate method:
-
StatusPeer
messages are handled byRouter
itself by calling thesend_status
method; -
PeerDisconnected
messages are forwarded toSyncManager
by calling thesend_to_sync
method; -
RPCRequestReceived
,RPCResponseReceived
, andRPCFailed
are handled by calling thehandle_rpc_request
,handle_rpc_response
, andon_rpc_error
, respectively; -
PubsubMessage
are handled by calling thehandle_gossip
method.
Router
notifies SyncManager
about failed RPC requests and responds to Status
RPC requests directly. Valid BlocksByRange
and BlocksByRoot
RPC responses are sent to SyncManager
for handling. All other RPC request and response messages, as well as gossip messages are sent to BeaconProcessor
for handling, through a bounded channel.
BeaconProcessor
manages a number of asynchronous worker tasks performing time-intensive work (consensus messages from the network that require processing) on BeaconChain
and a set of bounded buffers of work queued for later processing. Long-running worker tasks get spawned with spawn_blocking
, i.e. out of the main tokio
executor. For different kinds of work, it uses different types of bounded queues: FifoQueue
and LifoQueue
. When full, FifoQueue
drops any new item being pushed to it; LifoQueue
drops the last recently added item to accommodate a new item.
BeaconProcessor
also spawns ReprocessQueue
, which manages work items that can't be processed immediately and should be re-processed later. ReprocessQueue
receives and delivers back work items through a bounded channel; it internally maintains queued items in a number of buffers, using such containers as tokio
DelayQueue
, hash set, hash map, and vector. The main logic of ReprocessQueue
is implemented in the handle_message
method; it is quite complicated and appears to be protocol-specific.
BeaconProcessor
coordinates activities in its manager task, which can spawn worker tasks by calling the spawn_worker
method. The spawn_worker
method actually spawns the worker tasks executing the corresponding method of the Worker
structure, depending on the kind of work item. There are numerous methods, which appear protocol-specific.
The central component of Lighthouse is the BeaconChain
structure, which represents the "Beacon Chain" component of Ethereum 2.0. In particular, it maintains an instance of the CanonicalHead
structure, which represents the "canonical head" of the beacon chain. CanonicalHead
is a wrapper around the ForkChoice
structure; it synchronizes concurrent access to ForkChoice
and CachedHead
(cached results of the last execution of ForkChoice::get_head
). It keeps ForkChoice
and CachedHead
each behind its own reader-writer lock. In order to prevent concurrent recomputing of the canonical head, CanonicalHead
includes Mutex<()>
. As mentioned in the module-level documentation, this requires great care to avoid dead locks. The reason for this design was ensuring fast concurrent access to CachedHead
.
The ForkChoice
structure represents the core logic of the consensus protocol -- the fork choice algorithm. It wraps persistent storage, the ProtoArrayForkChoice
structure, and attestations queued for later processing; it also caches the values required to be sent to the execution layer and the result of the last invocation of the get_head
method. ProtoArrayForkChoice
represents DAG of recent blocks.
The fork choice algorithm determines the canonical head and the justified/finalized checkpoint of the beacon chain. It is triggered by calling the recompute_head_at_current_slot
or recompute_head_at_slot
async methods of BeaconChain
. Those methods spawn a new blocking task to call the recompute_head_at_slot_internal
method, which performs long-running, potentially blocking operations. In case the head or checkpoints have changed, this method updates other components, notifies the execution client, and generates corresponding server-sent events.
BeaconChain::recompute_head_at_slot
is called by the state advance timer; BeaconChain::recompute_head_at_current_slot
is called by the timer service on every slot, as well as when processing blocks. Apart from get_head
, ForkChoice
provides the following methods that can update its state:
-
on_valid_execution_payload
, which updates the DAG with information about validated execution payloads; -
on_invalid_execution_payload
, which invalidates some blocks in the DAG; -
on_block
, which adds a block to the DAG; -
on_attestation
, which registers an attestation in the DAG; -
on_attester_slashing
, which applies an attester slashing to the fork choice; -
update_time
, which advances the logical clock to match the supplied current slot.
These methods get called through various methods of BeaconChain
, e.g. when processing incoming network messages, requests from the validator client, results from the execution client etc.
Lighthouse uses the slog
crate for logging diagnostic messages augmented with key-value data context, using tree-structured loggers. The code extensively collects various metrics based on the prometheus
crate. There is apparently no specific mechanism for controlling execution.