Overview of Lighthouse Ethereum consensus client implementation

Following is an overview of Lighthouse Ethereum consensus client implementation, as of version v4.2.0 (May 2023). This overview was prepared in the scope of this issue.

General Structure

Lighthouse is an Ethereum consensus client written in Rust, which provides a proof-of-stake consensus mechanism to execution clients after The Merge. It has been stable and production-ready since Beacon Chain genesis.

Lighthouse is a fairly complicated piece of software. It is mostly implemented in async Rust, using the tokio framework. The tokio runtime is encapsulated in the Environment structure. To spawn new asynchronous tasks, ordinary Lighthouse code uses the TaskExecutor structure from the RuntimeContext structure, which is obtained using the core_context or service_context method of Environment. The service_context method of RuntimeContext itself allows creating sub-contexts by augmenting the diagnostic logger with sub-service's name.

The code commonly uses traditional synchronization primitives, such as mutexes, reader-writer locks, and conditional variables, from tokio-sync and parking_lot crates, with reference-counted smart pointers. In some places, e.g. here, it uses Mutex<()> to prevent concurrent execution of certain code sections. Some asynchronous tasks communicate through bounded or unbounded multiple-producer, single-consumer (MPSC) channels. Pieces of data that cannot be immediately processed get stored in various queues and maps, most of them unbounded. It is not clear, what could ensure that those unbounded buffers do not grow excessively.

Different caches and data pools are ubiquitous in the code base, e.g. in the BeaconChain structure, clearly for performance optimization reasons. The implementation attempts to make fine-grained optimizations, such as prioritizing certain asynchronous work items, based on protocol specifics, e.g. here.

Communication

Lighthouse uses libp2p as networking stack, supporting TCP/IP, optionally WebSockets over TCP/IP, noise as the encryption layer, and mplex/yamux as multiplexing layers. It employs gossipsub and identify protocols from the libp2p stack, as well as implements the following custom protocols:

Eth2 RPC that facilitates direct peer-to-peer communication primarily for sending/receiving chain information for syncing,
Discovery that handles queries and manages access to the discovery routing table.

In addition to that, there is PeerManager, which handles peer's reputation and connection status.

The communication layer is represented as the Network structure, which encapsulates the libp2p swarm. Internal services do not interact with Network directly; instead, the node spawns the network service with the start async method of the NetworkService structure and uses the returned instances of the NetworkGlobals and NetworkSenders structures. NetworkGlobals represents a collection of variables that are accessible outside of the network service itself. NetworkSenders provides the following methods:

network_send returns an unbounded channel to send internal messages of type NetworkMessage to the network service,
validator_subscription_send returns a bounded channel to send internal messages of type ValidatorSubscriptionMessage to the network service.

The NetworkService::start method creates, among other things, the Router, AttestationService, and SyncCommitteeService services, and finally spawns an asynchronous task for handling networking layer events, invoking the spawn_service method. Internal messages sent through the NetworkSenders::network_send channel are handled by the on_network_msg method, ones sent through the NetworkSenders::validator_subscription_send -- by on_validator_subscription_msg, ones from AttestationService -- by on_attestation_service_msg, from SyncCommitteeService -- by on_sync_committee_service_message, etc. Events from the libp2p swarm are first handled by the Network::poll_network and then directed to Router in the ‎NetworkService::on_libp2p_event‎ method.

There are two main ways the beacon nodes communicate with each other through the network: sending peer-to-peer request/respons messages (RPC) and gossip.

RPC

Outbound RPC requests are initiated by sending them to the network_send channel, wrapped into NetworkMessage::SendRequest, which first gets dispatched by NetworkService::on_network_msg to ‎Network::send_request‎, and then to RPC::send_request. RPC first checks self rate limit, if configured, passing the message to an instance of the SelfRateLimiter structure. If the message exceeds the rate limit, it's pushed to the unbounded delayed_requests queue; otherwise, it's pushed to the unbounded RPC::events queue.

Similarly, outbound RPC responses are sent to the network_send channel, wrapped into NetworkMessage::SendResponse, which first gets dispatched by NetworkService::on_network_msg to ‎Network::send_response, and then to RPC::send_response. RPC pushes the message directly to the unbounded RPC::events queue.

The libp2p swarm pulls outbound messages from the RPC event queue and self-rate limiter by calling the RPC::poll method and delivers them to RPCHandler by calling its inject_event method. For RPC requests, inject_event calls the send_request method, which normally pushes the message to the dial_queue queue. For RPC responses, inject_event calls the send_response method, which normally pushes the message to the pending_items queue. Requests get pulled from dial_queue in the ‎RPCHandler::poll‎ method and returned to the libp2p swarm as OutboundSubstreamRequest variant of libp2p::swarm::ConnectionHandlerEvent with libp2p::swarm::SubstreamProtocol, using OutboundRequestContainer as protocol upgrade. Responses get pulled from pending_items in the ‎RPCHandler::poll‎ method and passed to the send_message_to_inbound_substream function, which converts it to a future that sends the message to the libp2p substream. The future is then stored as InboundState::Busy in the InboundInfo::state field of the corresponding entry from the RPCHandler::inbound_substreams‎ map. The future gets polled in RPCHandler::poll by the libp2p swarm.

Inbound RPC request substreams get passed by the libp2p swarm to the RPCHandler::inject_fully_negotiated_inbound method, which pushes the request to its unbounded events_out queue, which libp2p swarm pulls by calling the RPCHandler::poll and delivers them to RPC by calling its inject_event method. For RPC requests, inject_eventfirst checks the RPC rate limit, passing the request to an instance of the RPCRateLimiter structure. If the request exceeds the rate limit, RPC pushes an event to its unbounded events queue to send an error response back to the remote peer; otherwise, it pushes an event to the queue to deliver the request from the libp2p swarm event stream.

The libp2p swarm pulls the inbound messages from the RPC event queue by calling the RPC::poll method. NetworkService pulls the inbound RPC requests from Network, which, in turn, pulls them from the libp2p swarm in the Network::poll_network. Network::poll_network handles RPC events by calling the inject_rpc_event method, which fully handles some types of RPC requests itself or returns partially handled requests back for further handling. The returned requests get sent to Router in the NetworkService::on_libp2p_event method.

Similarly, inbound RPC response substreams get passed by the libp2p swarm to the RPCHandler::inject_fully_negotiated_outbound method, which pushes the substream into the outbound_substreams map. The outbound substreams get pulled in the RPCHandler::poll method, called by the libp2p swarm. Subsequently, NetworkService pulls the inbound RPC responses from Network, which, in turn, pulls them from the libp2p swarm in the Network::poll_network. Network::poll_network handles RPC events by calling the inject_rpc_event method, which fully handles some types of RPC responses itself or returns partially handled requests back for further handling. The returned responses get sent to Router in the NetworkService::on_libp2p_event method.

Gossip

Publishing gossip messages is initiated by sending them to the network_send channel, wrapped into NetworkMessage::Publish, which first gets dispatched by NetworkService::on_network_msg to ‎Network::publish, and then to the publish method of libp2p::gossipsub::Gossipsub. If Gossipsub returns PublishError::InsufficientPeers then the message is inserted into GossipCache to be published later. GossipCache doesn't limit the number of cached message, but sets an expiration timeout for each gossip topic; messages waiting in the cache longer than the timeout get pruned.

The Network::subscribe method is used to subscribe to new gossipsub topics. There are several places in code where it happens:

when starting the networking layer, the node subscribes to the starting topics specified in the network configuration (with the subscribe_kind method);
when switching to another fork, the node also subscribes to new topics;
when handling NetworkMessage::SubscribeCoreTopics event in NetworkService::on_network_msg;
when handling SubnetServiceMessage::Subscribe from AttestationService;
when handling SubnetServiceMessage::Subscribe from SyncCommitteeService.

NetworkService pulls the inbound gossip messages from Network, which, in turn, pulls them from the libp2p swarm in the Network::poll_network. Network::poll_network handles gossip events by calling the inject_gs_event method, which decodes the message and returns it back for further handling. If the returned message is PubsubMessage::Attestation, it is first handled by AttestationService. Then it gets sent to Router in the NetworkService::on_libp2p_event method.

Network reports to gossipsub the results of inbound gossip message validation when failed to decode the message or when its report_message_validation_result method is called. The latter happens when handling the internal NetworkMessage::ValidationResult message in ‎NetworkService::on_network_msg.

Router

The Router service spawned in NetworkService::start handles incoming network messages by routing them to the appropriate service. NetworkService pushes internal messages of type RouterMessage to the unbounded channel returned by the Router:spawn method. The method spawns the SyncManager and BeaconProcessor services, as well as an asynchronous task handling messages received from the receiving end of the returned channel. The received messages are handled with the handle_message method, which dispatches the message to the appropriate method:

StatusPeer messages are handled by Router itself by calling the send_status method;
PeerDisconnected messages are forwarded to SyncManager by calling the send_to_sync method;
RPCRequestReceived, RPCResponseReceived, and RPCFailed are handled by calling the handle_rpc_request, handle_rpc_response, and on_rpc_error, respectively;
PubsubMessage are handled by calling the handle_gossip method.

Routernotifies SyncManager about failed RPC requests and responds to Status RPC requests directly. Valid BlocksByRange and BlocksByRoot RPC responses are sent to SyncManager for handling. All other RPC request and response messages, as well as gossip messages are sent to BeaconProcessor for handling, through a bounded channel.

BeaconProcessor manages a number of asynchronous worker tasks performing time-intensive work (consensus messages from the network that require processing) on BeaconChain and a set of bounded buffers of work queued for later processing. Long-running worker tasks get spawned with spawn_blocking, i.e. out of the main tokio executor. For different kinds of work, it uses different types of bounded queues: FifoQueue and LifoQueue. When full, FifoQueue drops any new item being pushed to it; LifoQueue drops the last recently added item to accommodate a new item.

BeaconProcessor also spawns ReprocessQueue, which manages work items that can't be processed immediately and should be re-processed later. ReprocessQueue receives and delivers back work items through a bounded channel; it internally maintains queued items in a number of buffers, using such containers as tokio DelayQueue, hash set, hash map, and vector. The main logic of ReprocessQueue is implemented in the handle_message method; it is quite complicated and appears to be protocol-specific.

BeaconProcessor coordinates activities in its manager task, which can spawn worker tasks by calling the spawn_worker method. The spawn_worker method actually spawns the worker tasks executing the corresponding method of the Worker structure, depending on the kind of work item. There are numerous methods, which appear protocol-specific.

Consensus Protocol Implementation

The central component of Lighthouse is the BeaconChain structure, which represents the "Beacon Chain" component of Ethereum 2.0. In particular, it maintains an instance of the CanonicalHead structure, which represents the "canonical head" of the beacon chain. CanonicalHead is a wrapper around the ForkChoice structure; it synchronizes concurrent access to ForkChoice and CachedHead (cached results of the last execution of ForkChoice::get_head). It keeps ForkChoice and CachedHead each behind its own reader-writer lock. In order to prevent concurrent recomputing of the canonical head, CanonicalHead includes Mutex<()>. As mentioned in the module-level documentation, this requires great care to avoid dead locks. The reason for this design was ensuring fast concurrent access to CachedHead.

The ForkChoice structure represents the core logic of the consensus protocol -- the fork choice algorithm. It wraps persistent storage, the ProtoArrayForkChoice structure, and attestations queued for later processing; it also caches the values required to be sent to the execution layer and the result of the last invocation of the get_head method. ProtoArrayForkChoice represents DAG of recent blocks.

The fork choice algorithm determines the canonical head and the justified/finalized checkpoint of the beacon chain. It is triggered by calling the recompute_head_at_current_slot or recompute_head_at_slot async methods of BeaconChain. Those methods spawn a new blocking task to call the recompute_head_at_slot_internal method, which performs long-running, potentially blocking operations. In case the head or checkpoints have changed, this method updates other components, notifies the execution client, and generates corresponding server-sent events.

BeaconChain::recompute_head_at_slot is called by the state advance timer; BeaconChain::recompute_head_at_current_slot is called by the timer service on every slot, as well as when processing blocks. Apart from get_head, ForkChoice provides the following methods that can update its state:

on_valid_execution_payload, which updates the DAG with information about validated execution payloads;
on_invalid_execution_payload, which invalidates some blocks in the DAG;
on_block, which adds a block to the DAG;
on_attestation, which registers an attestation in the DAG;
on_attester_slashing, which applies an attester slashing to the fork choice;
update_time, which advances the logical clock to match the supplied current slot.

These methods get called through various methods of BeaconChain, e.g. when processing incoming network messages, requests from the validator client, results from the execution client etc.

Tracing and Controlling Execution

Lighthouse uses the slog crate for logging diagnostic messages augmented with key-value data context, using tree-structured loggers. The code extensively collects various metrics based on the prometheus crate. There is apparently no specific mechanism for controlling execution.

Made with ❤️ on Earth.

The content is licensed under the CC-BY-4.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly