diff --git a/blog/2023-05-30-dynamic-p2p.md b/blog/2023-05-30-dynamic-p2p.md new file mode 100644 index 00000000..9fe81d9f --- /dev/null +++ b/blog/2023-05-30-dynamic-p2p.md @@ -0,0 +1,347 @@ +--- +slug: 2023-05-30-p2p +title: "Unveiling Cardano's Dynamic P2P Release: A Leap Forward in Decentralization" +authors: [bolt12] +tags: [networking, p2p] +custom_edit_url: null +--- + +## Introduction + +As the Cardano ecosystem continues to grow and evolve, we at [IOG] with our partners at +[Well-Typed], [PNSol] and [CF] are committed to constantly refining and optimising our +networking infrastructure. Cardano's new highly performant, dynamic Peer-to-Peer (P2P) +release marks a major milestone in its journey towards creating a fully decentralized and +secure blockchain platform. Dynamic peer-to-peer (P2P) networking was released with node +v.1.35.6 release. + +As a real-time stochastic system, Cardano's performance and security are inherently +intertwined. Our team, remains committed in identifying the perfect equilibrium between +various factors, such as topological and topographic considerations that can improve +timeliness, and connectivity. + +In this blog post, we'll explore the engineering journey behind the +development of Cardano's Dynamic P2P design. We'll cover the core design principles, the +challenges we faced along the way, and the solutions we devised to create a robust and +scalable networking system. + +### What's Dynamic P2P + +The Dynamic P2P implementation continuously and dynamically refines the active topology +through a peer selection process that aims to *reduce overall diffusion time across the +entire network*. Our research findings indicate that using a policy based solely on local +information can lead to a nearly optimal global outcome. This is accomplished by tracking +the timeliness and frequency of peers providing a block header that ultimately gets +incorporated into the chain. + +The primary goal is to eliminate highly "non-optimal" peers while maintaining strong +connectivity. To achieve this, peers considered less useful based on this metric are +periodically "churned out" and replaced with randomly selected alternatives. Simulation +results show that this effective optimization method approaches a near-optimal global +result within a relatively small number of iterations. + +Practically, Dynamic P2P replaces the manual configuration of peer selection (e.g. using +the topology updater tool). + +With manual configuration, SPOs were required to establish connections with numerous +peers, for instance, 50, to ensure a consistent minimum of 20 active connections. This was +necessary due to the static nature of configured peers and the fluctuating availability of +SPO relays. + +However, with dynamic P2P, nodes can be set to target a specific number of active peer +nodes, such as 20, and choose from all registered SPO relays on the chain. If a connection +with a peer is lost, the node will automatically select alternative peers and continue +attempting connections to peers until the desired target is achieved. + +As a result, dynamic P2P eliminates the need for over-provisioning connections, providing +a more efficient and adaptable networking solution. + +## The Design Vision + +Cardano is ultimately a cooperating system of autonomous nodes. It is not a client-server +design, so there is fundamentally no central point of control nor any privileged class of +centrally managed servers. Although, with respect to its network topology, it might have +started as a federated network (in Byron era), our goal was to converge into a fully, +trustless distributed networking system that could handle the evolving demands of the +Cardano ecosystem while ensuring good connectivity and performance. + +As the networking team embarked on this engineering adventure, we knew we would encounter +numerous challenges and complexities. We embraced those and kept heading towards them, +constantly refining the set of pivotal ideas that would eventually shape our Dynamic P2P +design: + +1. Modularity and Extensibility: We designed the system with modularity in mind, making it + easy to swap out or improve individual components as needed. This extensibility allows + for seamless integration of new features and enhancements, ensuring that our design + remains adaptive to the evolving needs of the Cardano ecosystem. Modularity is + especially helpful if and when we apply formal methods to prove correctness of low + level designs with respect to high level specifications. By breaking down the system + into smaller, more manageable components, we can apply property-based testing more + effectively to each module, ensuring that the behavior of each part is well-defined and + adheres to the expected properties. Of course, picking Functional Programming with + Haskell as our primary programming language played a significant role in achieving this + level of modularity and extensibility. + +2. Scalability: As the network grows, so does the need for a system that can handle a + larger number of nodes and transactions, while ensuring it still respects Ouroboros + timing constraints. For our P2P design vision, we took scalability into account from + the outset, incorporating strategies such as intelligent peer selection. + +3. Security and Resilience: In a decentralized network, resilience and security are of + paramount importance. We focused on building a system that could withstand both + internal and external disruptions, by employing techniques such as robust error + handling mechanisms, designed for (but not only) resilience to abuse, meaning that + users should not be able to attack the system using an asymmetric denial of service + attack that will deplete network resources from other users. + + With P2P, each node is able to prioritise its connection to locally configured peers, + which ensures it is always maintains connection to trusted peers and is able to make + progress in the network. Rate limiting inbound connections and configurable peer + targets allow the node to adjust its resource consumption. Careful management of + connection states, allow the re-use of duplex connections, enables nodes behind + firewalls to improve their connectivity safely while reducing the overall attack + surface. + +4. Performance: A high-performance network is essential for a seamless user experience. We + dedicated significant effort to optimizing our design, utilizing techniques such as: + efficient data transmission via multiplexing; and protocols that support pipelining. + Intelligent peer selection will contribute latency reduction to guarantee a + responsive and reliable network. + +Achieving low latency and good connectivity is essential to establishing effective +communication within the Cardano network. Our Dynamic P2P design has been crafted to +ensure that these prerequisites are met, providing a robust, scalable, and resilient +foundation for the continued growth of the ecosystem. However, it is important to +acknowledge that the trustworthiness of the peers you connect to is also a critical factor +in maintaining a secure and reliable network. While addressing trustworthiness in depth +would steer us beyond the scope of this blog post, it is worth noting that our design +incorporates multiple measures to mitigate potential risks and safeguard the network. + +## Dynamic P2P + +Ensuring the performance and security of Ouroboros is crucial, and one critical aspect of +this is the timely relay of new blocks across the network. Ideally, the connections within +the P2P network should be arranged to minimize the time required for a block to be relayed +from any node to all other nodes in the network. + +However, this turns out to be a complex challenge, with limited prior work available that +is applicable in a trustless setting. Addressing this problem effectively required +innovative solutions that can balance the need for swift communication while maintaining +the integrity and security of the decentralized network. + +### Why it's a hard problem + +An effective solution to optimize performance would minimize the number of "hops" a block +has to take across the network. In terms of a graph, this equates to reducing the average +number of edges a block traverses. Additionally, the length of each hop or edge is +significant. Local links exhibit lower latency compared to intercontinental links; +however, some intercontinental links are necessary for worldwide block relay. For +instance, a suboptimal solution would involve traversing excessive intercontinental links, +such as from Europe to Asia and back again. + +Existing networking algorithms can generate optimal "spanning trees," which could serve as +paths for block relay. However, these algorithms depend on nodes trusting each other to +exchange accurate information, which is unsuitable for a blockchain P2P network where +nodes cannot inherently trust each other. + +An ideal solution must rely on "local" rather than "global" information — information that +nodes can individually assess without depending on shared, trustworthy data. Nonetheless, +having an optimal solution that relies on perfect global information serves as a valuable +reference point. + +#### Preliminary research + +In collaboration with network researchers from Athens University, specialists in +decentralized systems and their protocols, we embarked on a crucial task: simulating +various network iterations and studying the trade-offs in diffusion time. + +The pivotal question surrounding dissemination is the establishment of who forwards blocks +to whom, or more precisely, which dissemination links should be formed among nodes to +enhance dissemination speed. + +In addressing this question, the researchers pursued two primary approaches: + +- In the first, links are independent of the dissemination process. We simulated a static + overlay, establishing links based on specific heuristics, then ran several + disseminations to gauge performance. + +- The second strategy adapts the overlay dynamically. Nodes initially establish links with + random network nodes, maintaining performance stats for evaluating their neighbors. + Periodically, each node recalibrates its neighbor set based on these stats, deciding + which neighbors to retain and which to replace. + +![Close random policy](/img/calibrations.png) + +The figure above compares the outcomes from the two approaches described earlier. The +simulation involved each node maintaining a fixed number of 6 "close" neighbors (based on +Round Trip Time (RTT)), and 4 random nodes. These links were kept static throughout the +entire experiment. In the "2 groups (<=100ms and >100ms)" protocol, each node maintains a +fixed number of close links and remote links: "close" signifies that the RTT to that +neighbor is less or equal to 100 ms, while "remote" implies that the RTT is more than 100 +ms. Nodes start with all random links and periodically calibrate. During this calibration, +they retain up to a fixed number of neighbors that have an RTT of less than 100 ms, and +they replace some of the remaining neighbors with newly picked random nodes. + +The graph lets us infer certain metrics such as how many churn cycles one needs to get +optimal global dissemination and illustrates that when using local information to +calibrate its connections to other peers, one can gradually approach an optimal result. As +shown, periodic recalibrations lead to a progressive improvement in the overall +dissemination performance. + +![Close random policy](/img/close-random-graph-1.png) + +The plot shows how quickly a block disseminates through the network eventually arriving to +all nodes. + +It also confirms the effectiveness of our approach to the challenge of efficient +diffusion. In this experiment all nodes use exactly the same Close-Random policy as +detailed for the previous graph. All nodes start as uninformed nodes, i.e. they have not +received a given block yet, and they become informed at some point over the course of the +experiment. The dotted line represents the theoretical optimal solution, which is +contingent on all informed nodes having complete knowledge about which peers are the most +beneficial for them, making such connections. + +This analysis allows one to understand whether implementing more localized policies could +potentially lead us closer to optimal results. It emphasizes how, as the network grows +increasingly informed at a local level, we might approximate the ideal global performance +metrics. + +![Average dissemination](/img/stake_uniform.png) + +This final graph illustrates the average dissemination time under various policy +combinations, particularly when altering the percentages within a node's neighbor-picking +strategy. As previously described, the Close (C) and Random (R) policies remain the same. +The Score (S) based policy introduces a new method, where a node applies a score function +to evaluate the utility of its current neighbors. The top 'S' performing nodes, based on +this evaluation, are retained within this set. + +For the purpose of this plot, the scoring function allocates a single 'bounty' point to +the node that first delivers a new block. After every 100 blocks, the 'S' nodes exhibiting +the highest performance across all neighbors (including C, R, and S sets) are selected to +replenish the 'S' set. This process likely results in the replacement of some previous +nodes within that set. + +This heat map gives us a rich visual representation of how these varying policies interact +and influence overall network performance. It showcases the trade-offs in adopting +different proportions of 'C', 'R', and 'S' policies, highlighting the impact each has on +average dissemination time. This data is invaluable as it guides us in fine-tuning our +system to achieve optimal network performance. + +Our comparative analysis between a trustless, locally-informed solution and the ideal +benchmark offers valuable insights into our proximity to optimal performance. This +investigation underscores not only the extent of our deviation from the ideal but also +illuminates potential strategies to diminish this gap. Through these informed calibrations +and continual advancements, we are able to advance and refine our design confidently. + +### P2P networking based on local information + +To tackle the challenges of optimizing block relay in a trustless setting, we have +developed a dynamic P2P networking solution that utilizes local information. This approach +allows nodes to make informed decisions about their connections based on their +observations and experiences, without depending on potentially untrustworthy global data. + +In our dynamic P2P design, each node maintains a local view of the network and evaluates +potential connections considering factors such as latency, throughput, and historical +performance. Nodes continuously monitor and adjust their connections, seeking +better-performing peers to optimize their network position and minimize the number of hops +required for block relay. + +Each node maintains three sets of known peer nodes: + +- *Cold* peers: are known peers without an established network connection; +- *Warm* peers: are peers with an established bearer connection, used solely for network + measurements and not for any application-level consensus protocols; +- *Hot* peers: are peers with an active bearer connection, utilized for the + application-level consensus protocols. + +As mentioned earlier, nodes maintain limited information about these peers, based on +previous direct interactions. For cold nodes, this information may often be absent due to +the lack of prior direct interactions. This information resembles "reputation" in other +systems, but it is essential to emphasize that it is purely local and not shared with any +other node. + +![Sets of known peers](/img/peer-discovery.jpg) + +The illustration above demonstrates the process of peer discovery and the +promotion/demotion cycle, managed by the Peer Selection Governor (PSG). This component +aims to achieve specific targets, such as a designated number of known and active peers. +When an individual node attempts to join the network, it tries to contact root (locally +configured) nodes. New peers are added to the cold peer set, and the process continues. + +Local static configuration can also be used to designate certain known nodes as hot or +warm peers. This approach allows for fixed relationships between nodes managed by a single +organization, such as a stake pool with multiple relays. It also facilitates private +peering arrangements between stake pool operators and other probable deployment scenarios. + +In instances of adversarial behavior, a peer can be immediately demoted from the hot, +warm, and cold sets. We opt not to maintain negative peer information for extended periods +to limit resource consumption in a permission-less system which makes Sybil attacks quite +simple. + +The Peer Churn Governor (PCG) is a component that plays a pivotal role in managing the +health and efficiency of a network by navigating issues related to network partition and +eclipse attacks. Its primary role is to dynamically manage the connections to peers in the +network, categorizing them into hot, warm, or cold states. + +In this process, the PCG modifies the frequency at which peers are promoted (upgraded from +cold to warm, or warm to hot) or demoted (downgraded from hot to warm, or warm to cold). +This decision is guided by scoring functions that evaluate peers based on their usefulness +and performance. + +These scoring functions include: + +- **Hot Demotion Policy**: This function determines which 'hot' (highly active and valuable) peers + should be demoted. The score is computed based on a peer's contribution to the network, + such as the number of blocks or bytes it's been the first to provide. In times of normal + operation, a combination of these factors is used, whereas during bulk sync data + synchronization, the number of bytes provided takes precedence. + +- **Warm Demotion Policy and Cold Forget Policy**: These functions deal with 'warm' and 'cold' peers + respectively, deciding which should be downgraded or removed (in case of cold peers) from + the network. The decisions are influenced by a degree of randomness, and certain + characteristics like previous failures or a tepidity flag, indicating less reliable or + less active peers. + +The PCG also has the ability to trigger a partial or full re-bootstrapping of the network +under certain circumstances, essentially a process of resetting and rebuilding network +connections for optimal performance. + +During the process of a node syncing with the network, the PCG ensures that no more than +two active connections are used, preventing resource over-utilization. Once the node is +fully synced, the PCG facilitates a periodic 'churn', wherein it refreshes approximately +20% of the peers every hour, promoting a robust and adaptable network. + +## Development Approach + +Cardano's P2P implementation is founded on the use of Haskell, a functional programming +language renowned for its correctness, safety, and maintainability. Haskell's powerful +type system helps identify potential issues during development, leading to more robust and +reliable code. In addition, we developed and employ [io-sim], a time-based discrete event +simulation library that provides precise control over entropy and timing during +simulations. This tool also faithfully simulates Haskell's runtime system, including +`TVar`s, `MVar`s, `STM`, and more. This level of control allows for reproducibility, regression +testing, and examination of worst-case scenarios. By combining Haskell and [io-sim], we can +rigorously test the exact same code in the P2P production system under a wide range of +conditions, ensuring it is well-prepared to handle real-world challenges. + +Our team takes pride in the extensive testing conducted and the rare, elusive corner cases +uncovered as a result of our meticulous testing approach. By simulating years' worth of +time, we can identify and address even the most unlikely bugs. To put things in +perspective, let's assume each test runs between 1 to 5 hours of simulated time per test, +per input, per PR, per OS. With over 100 simulation tests in the [ouroboros-network] +repository, each test will assess 100 randomly generated inputs per OS. Considering an +average of 3 PRs per week, we run approximately 225,000 hours of total simulated time +weekly—that's over 25 years! Our rigorous approach allows us to find and fix highly +improbable bugs, run tests for decades of simulated time to ensure stability, and +confidently deliver a top-notch product. + +However, it's important to note that the quality of these tests ultimately depends on the +quality of the generators used, as they play a crucial role in producing diverse and +representative inputs for thorough evaluation. + +[io-sim]: https://github.com/input-output-hk/io-sim +[ouroboros-network]: https://github.com/input-output-hk/ouroboros-network +[IOG]: https://iog.io/ +[Well-Typed]: https://www.well-typed.com/ +[PNSol]: http://www.pnsol.com/ +[CF]: https://cardanofoundation.org/ diff --git a/blog/authors.yml b/blog/authors.yml index 24fed822..e5b231d5 100644 --- a/blog/authors.yml +++ b/blog/authors.yml @@ -37,3 +37,8 @@ bartek: name: Bartłomiej Cieślar title: Haskell DevX Intern @ IOG email: bartlomiej.cieslar@iohk.io + +bolt12: + name: Armando Santos + title: Networking Team Engineer + email: armando.santos@iohk.io diff --git a/static/img/calibrations.png b/static/img/calibrations.png new file mode 100644 index 00000000..45ac431d Binary files /dev/null and b/static/img/calibrations.png differ diff --git a/static/img/close-random-graph-1.png b/static/img/close-random-graph-1.png new file mode 100644 index 00000000..b0a735c3 Binary files /dev/null and b/static/img/close-random-graph-1.png differ diff --git a/static/img/peer-discovery.jpg b/static/img/peer-discovery.jpg new file mode 100644 index 00000000..e371fba1 Binary files /dev/null and b/static/img/peer-discovery.jpg differ diff --git a/static/img/stake_9010.png b/static/img/stake_9010.png new file mode 100644 index 00000000..cdce7579 Binary files /dev/null and b/static/img/stake_9010.png differ diff --git a/static/img/stake_uniform.png b/static/img/stake_uniform.png new file mode 100644 index 00000000..f798af23 Binary files /dev/null and b/static/img/stake_uniform.png differ