-
Notifications
You must be signed in to change notification settings - Fork 26
log :: 2026‐01
Tip
Pure-Stage Networking, Miniprotocols, Observability / OTEL, Relay Performance, Simulation & Replay
Major shift to a pure-stage–based network stack: dynamic/supervised stages, scheduled messages, and miniprotocols reimplemented as explicit state machines with stronger spec conformance and deterministic simulation/replay hooks.
We also document a push to operationalize the relay: fixing reconnections and protocol edge-cases (blockfetch ranges, rollback handling), diagnosing sync performance regressions (blockfetch RTT-bound one-by-one fetching) and sketching bulk/batched fetching designs.
In parallel, observability is being standardized (EDR updates, OTEL macro/schema tooling, better trace/metrics stacks and analysis), plus workflow/tooling fixes (CI/demo scripts, snapshot pipeline improvements, and memory leak investigations incl. uplc + rocksdb tuning).
Finally, there are exploratory threads on ZK use-cases and on building better developer tooling (client to consume traces, deterministic trace comparison, and simulator-led reproduction of ledger/consensus discrepancies).
Had some family and cargo things, and was not as productive as I would have liked.
Responding to feedback on https://github.com/pragma-org/uplc/pull/36, but apparently had my notification filters set up incorrectly. Let me go fix that right now.
- Continued follow-up so as to land the PR to uplc to fix the memory leaks.
- Testing main line amaru with that PR applied to uplc to see if there are other similar memory leaks that do not get hit by uplc test sweet.
- Writing up how to use bytehound to investigate the situation using the PR as an example.
- Investigate the impact of rocksdb configuration on memory usage.
Amended the existing observability EDR to align with recommendations.
Made progress with the OTEL macros implementation as well as a first push to migrate our current instrument usage to trace.
Talked with Juan, Javier and Agustin from modulo-p and friends about riscv, amaru browser support and ZK. Good contacts and surely we can work with them at some points if needs be.
Made a push towards verifying a UPLC program execution on-chain. I first tried to address the on-chain verification part. The challenge is to find a zkvm that relies on a proof system that can efficiently be proven on cardano. Practically this constraints us to groth16 or plonk/Halo2 over bls12-381. Unfortunately I couldn't find any zvm relying on such proof despite looking pretty hard . This is due to the fact that they all mostly target ethereum that does implement bn254 natively. I made an experiment nonetheless showcasing the e2e with openvm / ak381 lib.
Note that this is a very rapidly evolving field so I have good hopes that it will eventualy make favorable progress relevant to cardano.
Also note that it doesn't rule out other zk / amaru use cases involving other verifying target: native, wasm or MCU (micro-controllers).
I didn't consider pushing an e2e test using circomjs because:
- it supposedly has been proven already by modulo-p/Eryx
- it's not usable for us as we can't realistically create manually circuits
Reminder: look at ZK amaru use-cases document.
- Refine and document 100% of Amaru's traces
- Explore Zero-Knowledge use cases within Amaru
- End-to-end execution and proof of one non-trivial Aiken program using an on-chain Groth16 verifier.
- migrate amaru to the new OTEL macros
- implement tooling / infrastructure to improve the OTEL experience
- address the other side of the zk experiment: create proofs for an aiken program
This week I:
- Adjusted my PR for the responder side of the block fetch protocol
- We decided to materialize the requested headers for a request range but limit them to 1000, since downloading more would go over the protocol limit.
- Also adjusted the PR for testing the proper reconnection
- Roland created an issue to eventually use
netsimto reproduce low-level network issues (#659).
- Roland created an issue to eventually use
- Fixed an issue with the ledger that led to a rollback to fail when it shouldn't (#656).
- I then tried to reproduce this kind of issue with our simulator
- I introduced a simplified version of the ledger supporting roll forward / rollback operations by just keeping track of points.
- Unfortunately I introduced a deadlock in that code accidentally and that led to getting stuck on "Busy" when running the simulation which was puzzling for a bit.
- The simple ledger model fails when we try to roll forward an existing point and I can definitely hit this case during simulation!
- The simple ledger model also fails when we try to rollback a missing point and this does not fail the simulation when it should!
- Both are the result of our current structure for processing chainsync events. Basically we apply fetched blocks before we run chain selection, which we shouldn't do
This supports the correct execution of the base protocols for a node in order to be an effective relay.
- Finish properly replicating the ledger issue with the simulator.
- Start one of the newly created issues for consensus: #658, #659 or #660.
https://github.com/notunrandom/sandbox
- Revise, refine and automate the snapshot production pipeline. Specifically, try to reduce memory usage by introducing lazy decoding of CBOR-in-CBOR in Block.hs
- Try out the different solutions outlined in the sandbox README.
- The reconnections PR is ready to review #653.
- I've made some improvements on the blockfetch protocol implementation to make sure that we don't return a partial range of blocks for a given range request.
-
- an additional check on the initiator side that the requested blocks correspond to the request.
-
The work on testing the reconnection of an Amaru node showed that, on reconnection, a rollback chainsync event could be sent to the node and failed to be processed (because the block corresponding to the rollback point could not be fetched):
=> This obviously needs to be fixed! => This raises the question on how we could catch this kind of issue with the simulation tests. At the moment we don't really model the ledger in those tests. We should model it in order to reproduce this issue and then give us the possibility to catch further ledger/consensus discrepancies.
Following the merge of the new network stack we saw a regression in the e2e tests: the sync on preprod reached only up to epoch 178 (as opposed to the required 182) within 15min, thus failing CI. On preview, the test basically passed but was reported as failed due to some change in the reporting of the make process exit code (which I haven’t been able to understand — it should never have passed before, either).
So I changed scripts/demo to kill only the correct process and then properly report success IFF the target epoch was reached, and I lowered the expectation of syncing preprod to epoch 176, thus making CI green again, albeit with an implied TODO of fixing the performance regression.
@KtorZ and @etorreborre and I then agreed that the most urgent next step was to establish and corroborate a theory as to what is slowing down the sync (which involves both chainsync and blockfetch protocols as well as header and block validation and storage, so there are several possible candidates). To this end, I re-read the observability READMEs and recalled that some fixes were pending, see #654:
- cleaned up tracing targets by moving code
amaru_consensus::consensus::*→amaru_consensus::* - emitting span metrics more frequently to get smooth sampling for prometheus
- getting the grafana+tempo stack working again (was using
grafana:tempo/latestwhich is currently incompatible with any documented contents fortempo.yml, so usinggrafana:tempo/2.9.1together with the example config for that same version)
What I wanted to get is statistics on queue latency, processing latency, and waiting times while sending to other stages, for each of the stages in the pure-stage setup. My hypothesis was that performance is entirely driven by the blockfetch stage because we’re fetching each block individually, incurring one full network RTT plus the transfer duration for each block, which takes a lot longer than any per-block processing step. This would show up as all stages upstream of blockfetch always having full queues and always spending most of their processing latency waiting for the privilege of sending a message downstream; and all stages downstream of blockfetch would basically run in zero time compared to the latency of getting from network input into blockfetch.
Firstly, I learnt some things about the observability stack:
- in the
jaegerstack we can use jaeger to look at individual traces, but jaeger cannot compute aggregations from many traces - the prometheus available in the
jaegerstack isn’t really usable because it scrapes the span metrics fromjaegerwhich doesn’t update them frequently enough, leading to sampling artefacts that destroy statistics - the aforementioned problem is not present in the
grafana+tempostack whereprometheusgets the data directly from the otlp-collector — metrics are smooth and analytics work as they should -
temposeems completely useless to me, at least for our use case, because it seems that all it is capable of is retrieving individual traces based on a more complex query language than jaeger supports — but understanding the performance of a system requires statistical aggregation of latency histograms (BY WHICH I DO NOT MEAN THE FAKE HORSESHIT PRODUCED BY PROMETHEUS, DAMMIT), mailbox size histograms, etc. - maybe someone who knows grafana+tempo can show me what I’m missing or where I’m wrong; this exploration has been extremely frustrating for someone who used vastly superior statistical analysis tools 25 years ago in the context of high energy particle physics
Secondly, being reduced to inspecting individual traces I found all of them to confirm my above hypothesis: timing is entirely determined by blockfetch (I was syncing from a Haskell node running ca. 50ms distant) and upstream queues are all full all of the time. System is working as designed, i.e. very inefficiently. BTW: CPU utilization for Amaru was about 1–5% of one core during my tests.
The key result and insight therefore is that we need to restructure the pipeline to make it possible to fetch blocks in bulk mode, not one by one. This requires knowing a sequence of headers we need the blocks for so that we can send a range with the first and last Point to the upstream peer. This requires validating multiple headers before fetching the first block. This is also what the Haskell implementation does.
@etorreborre and I will restructure the consensus pipeline such that chainsync (i.e. header sync and validation) is decoupled from blockfetch (for block validation and chain updates). The design sketch goes like this:
- pull headers from upstream, decode and validate them, pick a best chain candidate under the assumption that all not yet falsified blocks will turn out as valid
- communicate best chain candidate updates to a block fetching coordinator that will identify chain fragments (probably with some maximum length) and send blockfetch requests to upstream peers for bulk retrieval
- waiting for one bulk to (nearly) finish before requesting the next one leads to automatic batching, as headers accumulate while waiting for the requested blocks
- validate each block as soon as it hits the store (which means changing the blockfetch miniprotocol handler to emit each streamed block instead of a
Vec<Vec<u8>>) and the ledger has CPU capacity; update best chain if valid (and thus forward downstream) or mark block invalid and tell the chain candidate selection stage to look for a different best chain
We’ll keep most of the current stages, add batching to blockfetch, but the chain selection stage will probably need to learn a few new tricks. Currently, this stage is more complicated than it needs to be because it also tracks whether each individual upstream peer is well-behaved; we can move that aspect to a stage further upstream to gain the necessary complexity budget for dealing with best chain candidates and invalid blocks — without becoming too large to reason about. This should probably be the very first step.
- Continued reworking CBOR decoders under #628; producing conformance tests from the Haskell node.
- Transaction body and its constituents are now "done" but missing proper test coverage
- Block and witnesses as well
- Reviewed @jeluard's proposal on observability
- Get 10+ independent SPO to operate and monitor a relay-capable Amaru node
- Finish #628
Worked on implementing the macros as described in the shared document.
Rust macro support come with some limitation which doesn't make it easy to coordinate 2 macros sharing some data. Macros only see the data they are passed, and can't resolve anything outside of this. Also macro expansion order is not deterministic and can't be controlled. Furthermore macro might be expanded with different context (e.g. when in different crates) preventing to share data using static vars.
This leads us to 2 options to :
- emit local code that will leverage rust type system to enforce schemas (works but cryptic error messages)
- rely on 3rd party lib (e.g. inventory)
Tentative syntax is:
define_schemas! {
consensus {
chain_sync {
VALIDATE_HEADER {
required {
point_slot: Slot,
point_hash: Hash,
}
optional {
peer_id: PeerId
}
}
}
}
}
#[trace(consensus::chain_sync::VALIDATE_HEADER)]
fn import_headers(point_slot: Slot, point_hash: Hash) {
do_extra_stuff(peer_id);
}
#[augment-trace(consensus::chain_sync::VALIDATE_HEADER)]
fn do_extra_stuff((peer_id: PeerId) {
}Started work on a parser for OTEL parses. It will serve as a foundation for generating schemas and detecting regression.
Talked to Vitor from CF about his experience at Mina Foundation related to ZK use-cases and how we could apply them to amaru. Getting in touch with Agustin from Modulo-p.
Shared first draft of potential ZK use cases.
- Refine and document 100% of Amaru's traces
- Explore Zero-Knowledge use cases within Amaru
- finish work on OTEL macros
- make progress on OTEL parser
- Extend the test connecting 2
Managersover TCP, to show that the blockfetch protocol is working ok (issue #652). - First-pass implementation of the blockfetch protocol on the responder side since it wasn't yet implemented.
- Test that disconnections / reconnections from an upstream node work ok (issue: #644)
- At first an initiator would not connect if the responder node was not started when using the
demoscript -> this was a script issue fixed by @rkuhn in #641. - Then the initiator would fail to reconnect if the upstream node was restarted
-> Some stages (at the
Muxerlevel) were missing some supervision. -> Some errors were treated as fatal by the chainsync stage (like a block fetch error).
- At first an initiator would not connect if the responder node was not started when using the
I have a branch in progress fixing the reconnection issues and testing the fix.
- Being a relay node with a full blockfetch protocol support.
- General reliability.
- Addressing all the review comments on the PR implementing and testing the responder side of the blockfetch protocol (#643).
- Probably do a follow-up PR to be more restrictive on the returned blocks in the case of a
RequestRangemessage. -> The specification is not very explicit about which blocks can be returned and when to returnNoBlocks-> It would be good to extend the Cardano blueprints with a more precise specification. - Finishing the tests / fixes for the reconnection issues. -> I would like to refactor the test to cover the real chainsync pipeline and not a simili pipeline, like I have right now.
- Continued learning about lazyness, monads, monadic parsing
- Revise, refine and automate the snapshot production pipeline. Specifically, try to reduce memory usage by introducing lazy decoding of CBOR-in-CBOR in Block.hs
- Create a spike in a sandbox to simplify the problem and prototype solutions.
It has been a while since my last confession. When Arnaud, Eric and myself agreed to create a network stack based on pure-stage (for proper integration with our simulation testing) it was clear that pure-stage would need to learn a few new tricks, so that’s where I started:
- dynamic creation of stages was added to allow modelling the network stack for each connection as a set of stages
- stages created dynamically are supervised by their parent, which may choose to terminate when a child terminates or to receive the child’s termination as a message (tombstone) that can be handled; when a parent terminates, all its dynamically spawned child stages are aborted; when a statically created stage terminates, the simulation terminates
- messages can be scheduled to be sent to the current stage at a later point in time; this is required to handle timeouts and periodic activities within protocol handlers
- CallRef has been unified with StageRef, meaning that the API of using another stage’s services no longer has two flavours — the caller chooses whether to wait synchronously for the response (
eff.call(...)) or to eventually receive the response via the mailbox (eff.send(...))
With the right mental model it turned out that the Tokio implementation for scheduled messages as well as supervision tombstones was a lot simpler than I had feared (based on my Akka experience, but Akka has other features that complicate matters). With supervision in place, it becomes quite natural to model the connection handling as one stage for the muxer plus one stage per miniprotocol (separate for initiator and responder roles), all tied together by a connection stage that spawns them all. Termination is bubbled up, meaning that all protocols are cleared out together and the manager can then depose of the underlying TCP connection.
Functional tests have been added to pure-stage that test the same scenarios against the simulation and Tokio runners, respectively. Identical behaviour is asserted by comparing the event traces written during the test, ensuring also that tracing works the same in both implementations.
The most difficult issue here was the scheduling test which relies on timers. Tokio uses a real clock, which can have a rather jumpy and unreliable performance on CI runners, so great care was taken to guarantee that the semantics of cancelling a message do not depend on whether the underlying timer already fired — if the message has not yet been processed then it can still be cancelled. Timers are also sorted so that they are delivered in the order of ascending scheduled timestamp.
Infrastructure was added to factor out generic processing logic from specified network state machine and application behaviour: in order to implement a miniprotocol, two state machines need to be given, implemented by way of the ProtocolState and StageState traits, which can then be wrapped up into a pure-stage stage by combining them with a protocol ID. The nice thing about this approach is that the ProtocolState machine can be verified by comparing its state transition graph to the Ouroboros specification. This leads us to the same level of compiler-checked conformance that the Haskell node enjoys.
Currently, buffer size limits are static and timeouts aren’t implemented. These will be added to the miniprotocol specification and should then be enforced automatically by the generic execution model, with no code needed in the specific protocol implementations.
Eric and I ported all the needed N2N protocols and their network message codecs from pallas_network to this new network stack, including transcribing the miniprotocol specs from the Ouroboros specification PDF. What we learnt during this exercise is that the handshake protocol as implemented by the Haskell node is not completely described in the PDF, the QueryReply message IS INDEED sent over the network with variant ID 3.
Along the way I replaced a bunch of u64 with their proper types, like BlockHeight, Slot, NetworkMagic, etc. This has affected many files and made rebases painful, which is why this long-running PR #612 ended up merging main to resolve conflicts.
The simulation runner of pure-stage has more internal complexity than it needs to have. This is natural, it happened in every such library that I worked on, it always grows for a while until the real structure becomes visible. And only at that point can it internally be simplified, which always takes some effort. I have not yet done this because the goal is to get to the Amaru Relay Demo ASAP.
The network stack is not yet optimized for latency and throughput, there are some TODOs in this regard in various places, but again, we want to get the relay running first — we are currently not limited by those issues, but they will bite us if we want to have more than a handful of connections.
-
#612 is now ready to be merged. Roland has found a robust way to ensure the cancellation of scheduled messages.
- The only red jobs are the end to end snapshot tests which are failing. They currently take too much time because they rely on too old snapshots.
- The PR should be merged today.
- The test connecting 2 nodes via their
Managerin memory but using TCP for their connection is now working (after a few fixes):- It shows that the initiator node eventually catches up with the responder node to be on the best chain (this is the chainsync protocol).
- It shows that the responder node eventually collects all the transactions from the initiator (this is the txsubmission protocol).
- Started reviewing and wrapping up: #628
- Reworked cli for consistency and better UX
- Pair programming with @jeluard regarding the smart-contract part of the summit's code redemption app
- Get 10+ independent SPO to operate and monitor a relay-capable Amaru node
- Finish #628
- Continue Kernel migration from Pallas.
This week I worked on the finalization of the PR reworking the network stack in order to make it amenable to deterministic execution:
- Addressing coderabbit comments.
- Fixing simulation and replay tests.
- Fixing some FIXMEs to remove the use of global mutable state for the creation of
ConnectionIdsandScheduleIds. - Added roundtrip CBOR tests for the miniprotocols messages.
This supports:
- A tighter implementation of the network stack in
amaruwithout usingpallascrates for networking and primitives (and associated data types which were problematic). - The possibility to:
- Obtain production traces allowing us to replay issues deterministically.
- Simulate the deterministic execution of a larger part of the node including waits and timeouts.
Next up:
- Fixing a flaky test on CI for that PR (related to scheduled tasks).
- Implementing an in-memory test with a both a responder
Managerand an initiatorManagershowing that upstream <-> downstream connections work for all the mini-protocols (without having to start two fullamarunodes).
The first draft of the proposed observability changed is now out and has been shared with some people for feedback.
Experimented with using the otel collector as a helper tool to remotely log span collection. It featured a logging exporter that has been deprecated and can't be used anymore.
It's successor is the debug exporter that can output json when combined with service/telemetry/logs conf. Unfortunately it wraps span logs in string serialized batches inside an internal json log format, making it pretty unusable.
Switched to implementing a basic amaru-client crate with CLI. It allows to very simply remotely listen to amaru traces and print them as JSON to stderr.
Polishing the document, it now provides context regarding blockchain in general, cardano in particular and specific ideas for amaru. Will share next week for discussion.
Spent some time with @KtorZ brainstorming/learning about contracts and DApps.
- Refine and document 100% of Amaru's traces
- Explore Zero-Knowledge use cases within Amaru
Create PRs on amaru repo to amend existing EDR and move to new OTEL. Create tooling to generated documentation and schema for OTEL. Share first draft of ZK use cases for amaru.
- Set up a VM
- Forked ouroboros-network with a view to making a PR
- Cloned on VM, built and ran tests (OK)
- Tried to understand code of Block.hs (NOK)
- Started learning about lazyness (and how to test for it),
forall,cborgandserialise, monads...
- Revise, refine and automate the snapshot production pipeline. Specifically, try to reduce memory usage by introducing lazy decoding of CBOR-in-CBOR in Block.hs
- Try to create a spike in a sandbox to simplify the problem, continue learning, and prototype solutions.
- The need for some global mutable state to create
ConnectionIds has been removed. AConnectionsdata type now encapsulates the list of connections by id and deals with the increment of new ids. - The need for some global mutable state to create
ScheduleIds to schedule runnables during the simulation has been removed:-
ScheduleIdsmaintains a counter for schedule ids inside theSimulationRunner -
ScheduledRunnableskeeps track of the runnables to execute at a later time, based on a scheduled id. Some unit tests have been added to check the behavior for adding/removing runnables (possibly at the same instant), and waking up the first available runnable.
-
- I added a
pure-stageresource that was missing during the snapshot tests. Unfortunately they run for a long time and then time out.- => This needs to be investigated.
- There are 6 remaining
FIXMEs:-
NETWORK_SEND_TIMEOUT-> find the right value. -
call(fetch_block)-> which timeout to use? -
wait_for_at_least-> in the memory pool might wait forever. - connecting a slow peer in the network
Managerneeds to be delegated to another stage to avoid blocking the manager. - The
register_data_deserializerfunction needs better documentation. -
terminate_stageshould add kill switch to scheduled external effects to terminate them and record source stage for scheduled messages to remove them.
-
This is the continuation of the work already started in December
- Extend the
pure-stagecrate in order to be able to:- Start new stages dynamically
- Supervise stages in parent-child relationships (following classic actor system practices).
- Re-implement the miniprotocols on top of new
pure-stagestages (in theamaru_protocolscrate). - Manage a peer connection as a stage.
- Manage the list of all connections as a stage (see
Manager)
This massive piece of work is implemented in #612. Most of the work in the beginning of the year was to:
- Finish the implementation and tests of the miniprotocols as 2 state machines:
- A generic state machine representing the high level states and transitions in the network specification.
- A specific state machine holding the implementation state for each protocol, either on the initiator or the responder side.
- Finish the implementation and tests of the
pure-stageevolutions, including a refactoring for calling a given stage and expecting a response.
As of today:
- The PR is passing all the unit tests.
- Coderabbit suggestions have been addressed.
- A list of
FIXMEs have been tagged asFIXME(network)that need to be tackled before we can merge the PR. - The snapshot tests are failing.
- Drafted: https://github.com/pragma-org/amaru/pull/635
- Reviewed, sometimes tweaked, and integrated:
- https://github.com/pragma-org/amaru/pull/633
- https://github.com/pragma-org/amaru/pull/632
- https://github.com/pragma-org/amaru/pull/631
- https://github.com/pragma-org/amaru/pull/629
- https://github.com/pragma-org/amaru/pull/627
- https://github.com/pragma-org/amaru/pull/626
- https://github.com/pragma-org/amaru/pull/623
- Got a tour of https://github.com/pragma-org/amaru/pull/612 from Roland & Eric
- Relay node capability
- Amaru overall delivery
- Need to review in more depth some of the networking additions (the connection manager) from Roland;
- Create issues / discussions to capture the known work ahead that currently lives in my head.
Initiated the work towards improving our observability stack for both amaru devs and end users. Although we already have a great foundation we identified some shortcomings we want to address short terms.
The first step was to get full clarity on both tracing and OpenTelemetry, their respective concepts and how they map from one another. Then identify how they are currently used in the amaru codebase.
Although we already have an EDR detailing how we expect those to be used, it's apparent there are differences in usage. It's also pretty frequent that large changes are introduce w/o clear communication.
Finally create a first document detailing the high-level goals and solution so that we can start iterating the process of team agreement.
A first PR has been shared. It improves amaru compliance with OpenTelemetry semantic-conventions.
The next steps involve working on those specs, then follow up with code changes and development of the necessary tooling.
First steps at exploring ZK in the context of amaru. Firstly we identify relevant projects in the Cardano ecosystem and reach out to associated contacts. The idea is to gather knowledge about what is being done and what are the existing pain points. This work will continue asynchronously. Then we will list noticeable projects in other ecosystems.
I shared icarus, an experiment with running amaru in both mobile and desktop environment. It leverages the tauri lib, allowing to create complex UI with HTML + JS.
It bootstrap and syncs PreProd chain and is pretty smooth :D
This could be used as a foundation for ad-hoc DApps using amaru.
I fixed a regression I introduced in the CI last week. Those changes go in the direction of improving our real chain testing strategy.
- Refine and document 100% of Amaru's traces
- Explore Zero-Knowledge use cases within Amaru
- Share first observability document draft
- Work on zk use cases document