Skip to content

Overview of AptosBFT implementation

Sergey Fedorov edited this page May 15, 2023 · 1 revision

Following is an overview of the AptosBFT implementation, as of version aptos-cli-v1.0.13 (April 2023). This overview was prepared in the scope of this issue.

General Structure

The consensus component supports state machine replication using the AptosBFT consensus protocol, a variant of the HotStuff consensus protocol. The protocol is mostly implemented in async Rust, using the tokio framework, and generally following the actor programming model.

Actors are typically represented as types with an associated async start function, e.g. the NetworkTask structure and its start method. Such start function usually runs a loop polling async streams with the futures::select macro and handling received values. It is supposed to be executed in its own asynchronous task, e.g. using tokio::runtime::Runtime::spawn or tokio::runtime::Handle::spawn. Those tasks typically interact with each other using two kinds of async channels, both implemented in the aptos-channels crate. The channels are constructed using the following constructor functions:

  • aptos_channels::new, which creates a simple bounded FIFO channel with backpressure,
  • aptos_channels::aptos_channel::new, which creates a bounded multi-queue channel with no backpressure and one of the supported policies for dropping and retrieving messages:
    • LIFO (oldest messages are dropped),
    • FIFO (newest messages are dropped),
    • KLAST (oldest messages are dropped, but remaining are retrieved in FIFO order).

Some values that are sent over channels include the sender part of a one-shot channel from futures::channel::oneshot, for communicating back the response. Occasionally, futures::channel::mpsc and tokio::sync::mpsc channels are also used instead of the channels from the aptos-channels crate.

Communication

Communication between Aptos nodes happens according to the AptosNet protocol. Nodes try to maintain at most one connection with each remote peer; the application protocols, such as consensus, mempool etc., are multiplexed over the single TCP connection. Application protocols are identified with a numerical ProtocolId. Peers are identified by their account address. AptosNet provides application protocols two primary interfaces:

  • DirectSend for fire-and-forget style message delivery,
  • RPC for unary Remote Procedure Calls.

To interact with the networking layer, application protocols are supplied with NetworkClient and NetworkServiceEvents types, both generic over the message type. The message type has to satisfy the Message trait constraint; the Message trait has a blanket implementation for any type implementing both serde::de::DeserializeOwned and serde::de::Deserialize traits.

NetworkClient implements the NetworkClientInterface trait and provides basic support for sending messages, disconnecting from peers, notifying the network stack of new peers, and managing some application-specific metadata for each peer. In particular, its send_to_peer and send_to_peers methods allow to send a message to one or many peers, though there is no guarantee for successful message delivery. The async send_to_peer_rpc method provides RPC-style communication with a timeout: it send the given message to the remote peer, then awaits and returns a response message, or fails after a timeout.

The implementation of NetworkClient is based on the NetworkSender structure, which is responsible for de-/serializing messages and communicating with the lower network layer. NetworkSender is a wrapper around PeerManagerRequestSender and ConnectionRequestSender, both of which are, in turn, convenience wrappers around aptos_channels::aptos_channel::Sender. PeerManagerRequestSender and ConnectionRequestSender are used to issue communication and connection requests, respectively, and await the responses from PeerManager. For each RPC request, PeerManagerRequestSender creates an instance of the one-shot channel from futures::channel::oneshot, in order to receive the response message.

NetworkServiceEvents is used to receive and respond to network events. It is just a collection of NetworkEvents structures, one per NetworkId, which are responsible for de-/serializing messages and communicating with the lower network layer. NetworkEvents represents a stream of events from the lower network layer. It is a wrapper around a pair of aptos_channels::aptos_channel::Receiver receiving communication and connection notifications from PeerManager. NetworkEvents implements the async Stream interface by merging those two channels into a single stream of Event values.

Consensus Protocol Implementation

The consensus protocol implementation mostly follows the actor programming model. The top-level component is EpochManager, which manages consensus components across epochs and connects to other components constructed in the start_consensus function:

Messages generated in the consensus components and directed to own node are looped back to NetworkTask through a channel created in the start_consensus function. Even though the loopback channel never drops messages and provides backpressure, the intermediate channels internally used in NetworkTask can drop excessive messages.

EpochManager, in turn, creates a bunch of other components, such as:

  • SafetyRulesManager, which provides an instance of the TSafetyRules trait responsible for the safety of the consensus protocol;
  • EpochState, which implements the Verifier trait, for validating messages in the epoch;
  • RoundState for keeping information about a specific round and moving forward when receiving new certificates;
  • ProposerElection, which incorporates the logic of choosing a leader among multiple candidates;
  • NetworkSender, which implements support for all consensus messaging using ConsensusNetworkClient;
  • QuorumStoreBuilder, which apparently coordinates interaction with the mempool and quorum store through PayloadManager and a bunch of channels;
  • QuorumStoreClient, which implements the PayloadClient trait, that can pull information about transactions from the mempool;
  • BlockStore, which implements the BlockReader trait, for maintaining all the blocks of payload and the dependencies of those blocks;
  • ProposalGenerator for generating proposal blocks on demand.

Details of interaction between those components and further sub-components are complicated, difficult to follow, and, therefore, fall out of the scope of this overview. It is worth though noting that EpochManager uses a bounded async task executor to spawn tasks for verifying incoming messages and forwarding them for further processing to appropriate internal channels. The implementation also not fully follows the actor programming model, e.g. BlockStore is accessed in parallel by several sub-components.

Tracing and Controlling Execution

Aptos node implements its own diagnostic logging mechanism in the aptos-logger crate. It provides a set of logging macros for emitting logs at different levels, sampled logging when logging a large amount of data is expensive; it also supports addition of structured data along with a formatted text message, as well as typed logging schemas. The code also extensively collects metrics using the prometheus crate, e.g. channel implementation in the aptos-channels crate accept metrics counters to keep track of the number of items in the queue and the number of dropped items.

To dynamically inject errors at runtime in integration tests, the code includes fail points using the fail crate. The consensus implementation doesn't use system time for time-dependent operation, such as scheduling timeouts or sleeping; instead, it uses the supplied abstraction, represented by the TimeService trait. This allows using simulated time in tests, implemented by the SimulatedTimeService.