Skip to content

log :: 2025‐10

etorreborre edited this page Oct 31, 2025 · 23 revisions

2025-10-31

Simulate a node run

  • I had to introduce a "simulation" feature for the pure-stage crate since the simulation now requires the rand crate to randomly select the next runnable to execute. Unfortunately the rand create cannot be used when compiling to wasm.
  • I am now displaying the external effects as they occur on each stage.
  • The hard part was the "sync" external effects, like store effects, which are not polled but executed right away. This means that they were not included in the TraceBuffer. In order to fix this I introduced a SyncEffectBox where a StageEffect::ExternalSync effect just containing the effect typetag_name to be able to visualize it later. This feels like a hack at this stage and probably @rkuhn will want to find a better way to pass this information around.

2025-10-30

Simulate a node run

  • I improved the display of the execution of stages during a simulation.
  • While working on the simulation I realized that some headers were arriving twice at the forward_header stage.
  • I debugged this and this happens on a pretty specific case of a rollback (with 3 peers and a tree of headers of depth 10) that should not produce a fork.
  • The correctness was still ok in that case (the best chain stays the best downstream) but this is not desirable and potentially marks the current node as adversarial.
  • I added a check in our tests, in both the property test for the chain selection and in the simulation to make sure that we don't send. duplicate information downstream.
  • One interesting this is:
    • I tried to reproduce this issue with the property test for the chain selection (in headers_tree.rs) and even with 1M tests I was not able to trigger it.
    • On the other hand it was easy to reproduce with the simulation.
    • I guess that the simulation, by running some random execution of stages introduces more random interleaving for the processing of events arriving at the select_chain stage.
  • I also restructured the outputs of the simulation:
    • Files are produced by default in target/tests to avoid polluting the source directory.
    • There is one directory for simulation run, timestamped.
      • This one contains the simulation trace (in CBOR + JSON format)
      • The JSON traces can be animated with the simulation/amaru-sim/tests/animations/traces.html page.
    • Then one directory per test, inside the simulation run, timestamped.
      • This one contains the generated entries (messages sent by peers) and "actions" (executable directly by the HeadersTree).
      • The generated entries traces can be animated with the simulation/amaru-sim/tests/animations/entries.html page.
      • The actions can be copy-pasted directly to a unit test in headers_tree.rs to replay a chain selection bug.
  • I still need to add a few elements of visualization, in particular external effects, before submitting a full PR.

Eclipse evasion

From Amaru discord channel:

I am not sure there any public documents related to this.

An eclipse attack happens when an attacker control all the upstream peers of a node. This makes it possible to carry out for example double spend attacks on that operator by feeding an alternative chain.

The cardano-node's big ledger peers is the main protection against eclipse attacks. They are picked randomly weighted by stake from the top 90% of stake pools. With the honest stake assumption it is likely that any big ledger peer picked is honest. Since the pick is made from the ledger it is not something that the attacker can influence.

While it is possible any node to become eclipse, it is very unlikely to remain eclipse for a long time (hours).

A single honest peer (this doesn't have to be a big ledger peer) is enough to break the eclipse attack.

When the node is on tip it keeps connections to 5 big ledger peers. 0.5^5=0.03, so less then 3% chance of having picked all big ledger peers controlled by the attacker. When the node is behind the tip genesis-mode kicks in and the node talks only to 30 big ledger peers which turns the probability of having picked no honest peer to less than 1 in a billion.

As an added bonus because the decision on which peers to replace is made by tracking which peers provided the node with new headers and blocks first a would be attacker would have to start by being a very helpful peer. That is be serving headers and blocks for the main chain as quickly as possible.

An attacker would only peershare about his own nodes so peersharing doesn't really help.

The (Haskell) node operates in 2 modes: Praos or Genesis, where the latter is active iff the node is more than 20 minutes behind the best known chain. It maintains different lists of potential peers:

  • big ledger peers are relays from registered stake pools representing 90% of the stake
  • normal ledger peers are all the registered stake pools' relays
  • trusted roots are pre-defined peers from local configuration
  • untrusted root are pre-defined from local configuration
  • peer sharing relays propagated through eponymous protocol

It has 2 relevant configuration parameters:

  • $k$ is the target number of peers
  • $l$ is the target number of local roots peers

In Praos mode:

  • 5 peers are randomly selected from big ledger peers set based on linear stake distribution
  • $k - l$ peers are selected at random from the peer sharing and normal ledger peers set using the square root of stake as probability distribution
  • $l$ connections to local roots

In Genesis mode:

  • 30 peers are randomly selected from big ledger peers set based on linear stake distribution
  • all trusted roots are selected.

2025-10-29

Amaru next steps

Discussing short term and some medium terms goals for Amaru we would like to work on:

  • Testing: Goal for EOY is to demonstrate Amaru is a proper relay by running simulation and AT tests with the following topology:
    • 2 amaru nodes Amaru1 and Amaru2 are connected to each other
    • each is used as relay by 3 cardano-node BPs
    • network still converge correctly
  • Mempool: This is needed for proper transaction propagation which is mandatory for implementing properly a relay node. This can come in 2 versions:
    • Basic: build enough mempool capabilities to pull and propagate transactions:
      • manage a tx buffer, check tx against current ledger state, remove txs as we adopt them in blocks fetched
      • implement tx client pulling txs from downstream peers
      • implement tx server serving txs to upstream peers
    • Leios ready: for next year
  • Key management: build some solution to manage block producers keys "properly". Talk with people with SPO experience on what this solution should look like and implement first version. We already have CLI capabilities for managing VRF and KES keys, but it's unclear what solution would suit best SPOs: do we need some kind of signing agent? is a CLI enough? How would Amaru handle key changes?
  • Block forging: requires mempool logic, for later
  • Observability (metrics, traces, logs): we need to unify and better document what we currently do
    • ensure structured logging everywhere
    • proper classification of log
    • cleaner traces (maybe provide distributed tracing capability, eg. using header hash as trace id uniformly across nodes)
    • update documentation
  • Ledger:
    • Phase 2 validation: integration of plutus VM, implement all Plutus versions
    • Complete ledger rules: cover existing gaps
    • Epoch boundary offset: current implementation might lead to discrepancies in ledger state because we compute epoch boundary state in effect $k$ blocks after the boundary is crossed
  • Networking:
    • overhaul network stack to be covered by pure-stage and simulation
    • (later) implement p2p
  • Bootstrapping: some work in progress, we plan to join forces with people from HAL team to streamline snapshot extraction process from cardano-node DB and mithril snapshot
  • Hard fork: this is a known unknown as it's unclear how much work we'll need to do for the next hard fork which is due in Feb/March

2025-10-28

Running the simulation with good interleavings

  • I have started a small HTML page to visualize the execution of messages across all stages during the simulation.
  • Once issue I first faced was how to export the trace data as JSON. This is not entirely trivial since part of this data is dynamic (dyn SendData):
    • I had to add a function to recover the underlying data as SendDataValue and: 1. provide a display instance to be able to output more compact logs when necessary (otherwise SendData uses a Debug instance which is not very nice in logs.)
    • I also added some to_json functions to output TraceEntrys as JSON. At some stage, part of that data end-up being SendData which I can only display as a string, which implicitly used a string representation of CBOR. Concretely this means that something like a Point::Specific variant, that is deeply nested in the trace data, ends up showing a header hash as a Vec<u8>. That doesn't not make it easy to follow.
    • Eventually, I managed to export something as JSON and extract some useful data. That allows me to make some progress but that's not great.
  • Then I ran the visualization and realized that,... even we now randomly execute effects in a random order in a node, we feed the node the messages one by one. We should instead feed a message, make a bit of progress, feed another message, make a bit of progress, etc... => So my next step will be to see how I can "run the world" (in simulation parlance) more granularly.

2025-10-27

Flakiness with some select_chain stage tests

  • Those tests have been failing sometimes on CI. Hopefully this is fixed now with this commit.

Demoing the simulation

  • One aspect of simulating the amaru node consists in controlling the interleaving of the threads of execution.
  • I've added some support for randomly executing a given interleaving, controlled by the simulation seed.
  • I've also added a Display instance to TraceEntry to be able to log a trace in a more compact format than the one using Debug instances for the SendData.
  • Now the more interesting questions are:
    • What kind of useful coverage information / statistics can we produce for the random interleavings?
    • Would it be useful to visualize various interleavings? What would that look like?
    • Can we find bugs, or at least introduce a bug and see if we can find it when running the simulation?

2025-10-23

Visualizing the generated input for the simulation

  • There was indeed a bug in the chain selection algorithm introduced by the recent changes in how we access the data backing-up the headers tree (see the notes on PR #521).
  • It is a bit frustrating to realize that our tests didn't catch the regression at the time. I suppose that is because the data generation was not good enough.
  • I also had to rework a bit the oracle used in the property test for that algorithm in order to work ok when we have rollbacks beyond the best chain anchor.

2025-10-23

Visualizing the generated input for the simulation

  • There is now a single-page HTML that shows:
    • The tree of generated headers.
    • The arrival of messages with their time for different peers as labels on the tree nodes as time passes. This means that we see rollbacks as labels going back up the tree.
    • A highlighted node representing the tip of the current best chain
  • The data is generated from one of the test.
  • That has already proved helpful for tweaking the generation of data and inspecting the results.
  • The PR is still not up for review because the chain_selection property test is now broken:
    • I think that this is due to the max_length of the headers tree and the fact that we generate trees of headers with a greater depth.
    • I don't know yet if the fix should be in the generation or maybe just the oracle (For now I don't think there's a bug in the algorithm)

Fixing chain forward stage

  • Having a hard time finding the right angle to this problem as the current code does not lend itself well to the design I am thinking of
  • I have drafted a new ChainFollower::new implementation that takes into account the anchor and the intersection point, leveraging a dedicated ChainStore function to lookup the most recent intersection on the chain given the list of points a client sent
    • This does not yet include the iterator for the immutable part up to the anchor
  • While working on those changes I noticed some diagnostic functions were not needed in the ReadOnlyChainStore interface and moved them out to simplify also Effects and in memory implementation
  • The next major step is to actually store the chain indexed by slots in order to be able to easily iterate over those headers. I looked at the headers_tree code where the store is updated with the new best chain but it would be tricky to change as in most places we only use the HeaderHash and not the Point, therefore we don't have the slot information we need
    • We could push that logic in the update_best_chain or set_best_chain_hash functions, eg. given a hash retrieve the corresponding header and then update the best chain but this would annoyingly put some "business logic" (knowing when to roll forward and rollback) into the store itself
  • I am now considering implementing this in an additional stage, after the select_chain stage, that would use the DecodedChainSyncEvent to update the stored best chain:
    • on a Forward, add the given header's slot to the chain with the hash as value
    • on a Rollback, just rollback the stored chain at a slot before the rollback
    • This would have the benefit of isolating this logic and not requiring significant changes to the existing stages or store interface

2025-10-22

Tweak and present the generated data for the node simulation

  • The arrival times of messages on a simulated node are now properly correlated with the header slots so that the first messages arriving at the node have slots 1, 2, 3, ... and arrival times around 1s, 2s, 3s, etc
  • The arrival times are generated with a small jitters that varies from (upstream) peer to peer and which is occasionally negative.
  • I've started to vibe code an animation to show the arrival of messages on the node in order to show for example that the same FWD header can arrive at 2 different times from different peers. But during the night (yes...) I had a better idea for showing what's happening in the simulation: show the whole generated tree of headers and when each header arrives at the node for a given peer. Let's see if this can be nicely vibe-coded :-).

Fixing chain forward stage

  • Working on making chain sync for downstream clients less of a memory hog
  • Started with encapsulating more ClientState which ahs been renamed ChainFollower: It's now more an opaque structure with proper methods, and the new function ensures the state is primed correctly based on the intersection points provided by the client, it replaces the previous find_headers function
    • We are still filling the internal queue with all headers from intersection point to current tip
    • But at least we have the store available to extract an iterator
    • and tests are expressed through the interface of the ChainFollower, eg. the next_op() function
  • Tricky part will be to update the tests and the test infrastructure as we now have more cases depending on whether or not the intersection point is in the immutable or mutable part of the chain DB
  • While understanding the find_headers function I realised the implementation of chain sync protocol was wrong as we were not sending systematically a Rollback right after the intersection point which should actually be the case (it's prescribed in the network design spec)
  • I put up a small PR to fix that and now trying to test it IRL

2025-10-21

Tweak and present the generated data for the node simulation

  • We are now generating data in a simpler way but with a bit more edge cases:
    • The number of forks have been reduced to 2 main forks at 1/3rd and 2/3rd of the main spine (but this is recursive for subtrees).
    • So the main parameters controlling the generated data now are the number of upstream peers and the length of the best chain.
    • In the generated data there is always at least one peer that behaves exactly like another peer (which is the common case).
    • There is a unit test for the generated data to make sure that we don't regress on what we expect to generate.
    • The size of the "maximum fragment" (2160 in production) is adjusted to be less than the generated best chain length to be in a situation where the anchor of the best chain has to be updated.
  • Some data structures have been introduced to keep track of additional information when generating data:
    • GeneratedTree about the tree of headers that is used to generate upstream messages.
    • GeneratedActions about the actions (roll forward and rollback) generated for each peer.
  • This data is presented at the end of each test run for inspection.

2025-10-21

Design for reducing memory footprint of chain forward

  1. The ChainState is created when the peer connects to us and we compute the intersection. The chain state's main purpose is to hold a Dequeue of ChainSyncMessages that are received from the pipeline (eg. chain selection. There's one ChainState per connected peer, and every time we receive a new chain selection event we duplicate the message to all connected peers
  2. Currently, when the ChainState is created we front-load the Dequeue with the full list of RollForward operations between the intersection point and the tip at the time of creation, which takes a lot of time (>50s on preprod) and consumes a lot of memory.
    • Memory footprint has been significantly reduced by only storing hashes and not the full headers but it's still not quite right: with 100 lagging peers connected, we would need about 20GB of RAM just to store that data
  3. The interaction between the various actors/stages is somewhat tricky and I did not want to have to modify it too much, as we will certainly revisit it again once we support P2P, so I have been looking for a solution that's minimally invasive
  4. The idea would be to extend the ChainState structure to store an iterator over the ReadOnlyChainStore between the intersection point and the anchor, and to only fill the Dequeue with forward operations between the anchor and the tip which caps the amount of data to 2160 x 32B (header hash size) = 64kB / peer
  5. The ChainState's next_op method would then:
    1. either serve a Forward op constructed from the next() method of the iterator until it's empty
    2. or dequeue elements the queue from once the iterator is empty
  6. of course, if the intersection point is between the anchor and tip the iterator part is skipped and we serve operations directly from the queue

Chain state structure

2025-10-16

Running the pure-stage simulation

  • The data generation for the simulation is now working with more complex data:
    • I removed the previous hardcoded data files.
    • I surfaced some data generation parameters as top-level arguments to be able to run different profiles (chain depth, rollback ratio etc...) => I'll create the PR for this tomorrow => Next step: I want to display some form of coverage data around the generation and the results: final chain length, number of forks sent downstream, etc...

2025-10-15

Running the pure-stage simulation

  • Not a lot of work done today:
    • With more diverse generated data validation errors can occur -> this used to terminate the execution in validation_errors_stage. I changed that to just log errors so that the simulation can proceed.
    • The next issue comes from the simulation oracle. We used to extract from the history of received messages the latest chain tip and compare it to the sample chain that was store in a file and used as the starting point for generating data. => I'm now working on fixing that oracle.

2025-10-15

Downstream peers connection improvement

  • Revisiting #426 as it is a critical issue to solve to improve resilience of Amaru in a relay role
  • Took some time to understand the code in the chain_forward module(s) as it's not been visited for a while. We felt some more details about the interactions between the various components were missing to quickly catch up, but in the end things fell in place. There's a nice diagram but it feels a bit static and some more comments in the code could help
  • The source of the issue is pretty clear:
    • When a downstream node requests an intersection, we find the most recent intersection point from the list they send us and then...
    • ... we load all headers from this point up to our current tip into a list of ClientOp ...
    • ... which is then fed to the client's actor delegate
  • The "right thing" to do would rather be to maintain a pointer or cursor for each client which points at our latest known intersection, and send them the next header from our best chain until they reach the tip
  • However, we realise we actually don't store our best chain in such a way as to be easily followed from parent to child:
    • we store the best_chain header
    • we store parent -> child relationship for each header
    • but a header can have multiple children in case of forks/rollbacks
  • It seems the simplest solution would be to store an additional "table" with slots as key and header hash as value representing the one and only best chain.
    • when we roll forward our best chain, we update this table adding an entry for the new tip
    • when we roll back, we delete the entries up to the rollback point and add them back
    • indexing by slots makes it very efficient to do range queries or lookups (e.g. to find intersection)
  • Before doing that, we experiment with a tactical solution: Loading the hashes of the headers instead of the whole header and read the headers when sending them back to peer
  • Trying to test how this tactical solution impacts memory consumption, we try to run the new code on an existing preview DB but run into a schema layout upgrade so we need to resync fully first

2025-10-13

Stage processing graph

  • The store_header and store_block stage have been merged into the receive_header and fetch_block stages respectively.

Running the pure-stage simulation

  • Reusing the data generation from the HeadersTree tests for the simulation is not trivial because we use different representations for headers.
  • While thinking about this I realized that the regular Header data type recomputes its hash (serialize the header + computes the hash) every time we call the hash() method which is inefficient.
    • So I'm proposing this PR: https://github.com/pragma-org/amaru/pull/501 which allows us to:
      • Compute the header hash only once (and set it on a new data type BlockHeader).
      • Recalculate the hash after modifying parts of the header (and possibly make the 2 disconnected).
      • Remove FakeHeader and TestHeader that were used for tests which simplifies our setup. => Some tests are not yet passing, I need to fix them before opening the PR for review.

2025-10-10

Stage processing graph

  • The select_chain stage is now pushed to the end of the graph (right before forward_header).
  • In order to do so, a track_peers stage, right after the receive_header stage has been created to register when our node has caught up to a given peer.
  • The next steps will be to:
    • Squash the store_header and store_block stages into other stages since they just encapsulate a simple external effect.
    • Put the stage.pull stage into pure stage in order to deal with peer selection and in particular peer disconnections (this crashes the node at the moment).

Running the pure-stage simulation

  • I fixed a deadlock when running the simulation with more than 4 downstream peers
    • The queue that is used to collect downstream messages was filling up too quickly.
    • It is now sized proportionally to the number of downstream peers
  • I started integrating the data generation done for testing the HeadersTree to the simulation, instead of using the current hard-coded chain:
    • One drawback is that we will have to bypass the code testing the validity of a header since the headers produced by the generation code are fake headers (not properly signed)
    • On the other hand this allows us to generate more diverse chains sent by peers.

2025-10-09

Fixing the spans parenting for consensus stages

  • Geoff noticed that the consensus spans were not grouped inside the same traces on Jaeger anymore.

  • The main reason is that:

    • Stages are asynchronous pieces of execution.
    • For that reason the current parent span (created when a chain sync event is received) is transferred from stage to stage. as a field on the event that activates each stage.
    • When a stage starts processing a given event, the stage main function needs to have its span as a child of the event span.
    • The stage function span is created with the #[instrument] attribute and then reparented with span.set_parent.
    • Unfortunately this call cannot not succeed.
  • There used to be a work-around with previous versions of the OpenTelemetry crate but I couldn't find a way to use that work-around with the most recent version.

  • The fix consists in manually creating a span with the span! macro and provide the span parent as a field: span!(parent: msg.span()).

  • The PR comes with a way to test the structure of spans produced by a given piece of code, thanks to the new assert_span_trees function in the amaru-tracing-json crate.

2025-10-06

Moving all effects to pure-stage

  • The PR moving the effects used in validate_header to pure-stage is now ready for review: https://github.com/pragma-org/amaru/pull/489.
    • During the implementation I realized that a header validation rule is not yet implemented (the opcert sequence check) and discussed this with Arnaud I now better understand what will need to be done when we decide to implement it but will need a more detailed specification to do so.
  • Before we start running the simulation I need to change the stages graph so that select_chain is done at the end. => I started this rewiring and should be finished tomorrow.

2025-10-01

Moving all effects to pure-stage

  • After some debugging and navigation around the pure-stage simulation code I eventually understood that validation errors are triggering the deadlock. We discussed this with Roland and that makes sense if the error stages never consume their messages. Roland is going to have a look at that.
  • In the meantime I also fixed the issues that were generating the validation errors.
  • One good thing is that I can now also implement some tests on the faulty stage (select_chain in that case) to show that the fixes are working. => The next steps will be to:
    • Use an effect for the validate_header stage when the ledger state is read.
    • Prepare a PR for review (unfortunately this is going to be a large one again but hopefully I can simplify some of the code).

Clone this wiki locally