-
Notifications
You must be signed in to change notification settings - Fork 26
log :: 2026‐02
Tip
Pure-Stage Network Stack, Observability / OTEL UI, Deterministic Simulation, Relay Demo, Snapshots Pipeline
These notes focus on finishing and integrating the pure-stage network stack (upstream + downstream miniprotocols, blockfetch responder, and removal of acto/pallas-network), while reworking simulation to model full peer connections (handshake/keepalive/chainsync) and enable trace replay.
Observability is standardized via trace macros + generated schemas/docs with CI regression checks, plus a dev OTEL UI for runtime trace inspection and upcoming work to separate technical vs applicative traces.
Relay readiness work continues through demos and CI hardening: fixing flaky property tests, diagnosing end-to-end test timeouts, and addressing chainsync intersection/range edge cases and fetch failures.
Snapshot tooling is evolving (polaroid pipeline, legacy-format friction, nightly snapshot tests), alongside ongoing memory/perf investigations (uplc/bytehound, lazy CBOR decoding).
Finally merged trace PR. All new traces should rely on this building block. Initiated work towards having a more comprehensive trace support (e.g. separate technical from applicative traces).
Implemented a dev oriented web based OTEL UI. It allows to easily grasp the trace architecture at runtime, and spot perf issues. UI itself is hosted on GitHub pages and it requires locally running a docker based bridges to stream local traces over a WebSocket.
PRs:
Started discussing with Pawel about digging openvm proof verification on-chain.
- Refine and document 100% of Amaru's traces
- some more OTEL experimentation
- start looking at https://github.com/pragma-org/amaru/issues/645
This week was about:
- Putting the final touches to the PR fully integrating the new network stack to the node and reworking the simulation because of that. => It is now merged! (#[682])
- Preparing a demo showing the use of
amaruas a relay node in different topologies
This goes towards using amaru as a Relay node.
Next is:
- Make a PR for the demo
- Fix a flaky simulation test (https://github.com/pragma-org/amaru/actions/runs/22455904466/job/65036887672#step:9:363).
- Update the simulation animations which don't work anymore after the latest changes: https://github.com/pragma-org/amaru/issues/693
- In the process check if the simulation does a reasonably good job at testing the node.
The PR #682 has finally been merged!
The end-to-end tests were not passing because they were just emitting too many / not the right logs after rebasing this PR with the recent observability changes.
Sometimes a property test fails on CI and it's a good thing because that reveals a bug!
Not this time, we had a property test checking the cbor roundtrip for EraHistory that was just failing because it
was generating data triggering an overflow in the test. This is fixed now.
Running those tests on main were taking more and more time, and eventually timing out.
So I set them up to use fixed epochs like they do for PRs and added a nightly job to run them to the latest epoch nightly
with a timeout of 60 minutes. Eventually we will need to bootstrap the nodes to a more recent epoch to avoid such running times.
The demo of an Amaru node as a relay is now working for a simple cardano-node -> amaru -> amaru case. We can see blocks arriving and epoch transitions. I'm now going to try other topologies.
- Discussions on https://github.com/IntersectMBO/ouroboros-consensus/issues/1875
- Tested the fix (on main, not yet released) - it works.
- Updated the
polaroidgist: https://gist.github.com/notunrandom/30670d4e7643b6750316e06c7b36965f - Unfortunately, we can no longer use snapshot-converter to convert the result
of
polaroidto the Legacy format currently used by Amaru: https://github.com/IntersectMBO/ouroboros-consensus/issues/1875#issuecomment-3965431924
- Revise, refine and automate the snapshot production pipeline.
- Discuss next steps, given snapshot-converter problem:
- change Amaru so that it can import the snapshots produced by
polaroid? - some other solution?
- change Amaru so that it can import the snapshots produced by
- Depending on above decision, design and deploy a production pipeline for
polaroid.
- The PR #682 has now been rebased with the most recent changes (rust formatting and observability improvements).
- Unfortunately the end-to-end tests don't pass because some blocks can't be fetched.
- I thought that it could be related to a bug in the blockfetch initiator since we end up in a state we shouldn't be in "unexpected action in state Busy: RequestRange".
- I attempted to remove that situation but the jobs still don't finish => I still need to investigate what's wrong there.
I did a review of Roland's PR (#681). I think that the separation with peer management (started in #671) is a great improvement. There are 2 question marks:
- Where do we do anchor management (it was done in the previous chain selection stage).
- Can we add back property tests showing what happens with a random tree of new tips coming in?
I started working on the demo, simply connecting:
- 1 upstream Haskell node
- 1 amaru node
- 1 downstream amaru node
That made me realize that we have an issue with the finding the intersection in the chainsync protocol, if the downstream node is too far behind. The fix is in #684.
I'm using tmux for the demo, which, believe it or not, is a first for me, but it seems to be working ok and we see that the upstream node gets new tip updates via the relay amaru node.
I addressed some review comments and fixed two issues I found with:
- Shrinking: the error condition was not correctly diagnosed.
- Evaluation steps for the simulation: so far a simulation is bounded in terms of steps. If there are more peers, the simulation will need to run longer before we can check the chain property. I made the number of steps proportional to the number of peers but it would be nice to stop the simulation when some sort of fixed point has been reached.
I also merged and reformatted all the files after the rustfmt change. It was a bit painful but now it's done :-).
Next steps:
- Finish the review live with Roland.
- Start preparing the demo for the end of the week.
Still holding the trace PR back to give some time to ensure everything is smooth as expected. Digging OTEL in general is also the opportunity to introduce tooling to have some understanding of what is going on.
Some things I've been testing:
- metrics from traces
- comprehensive web UI for fast troubleshootings
PRs:
- https://github.com/pragma-org/amaru/pull/679
- https://github.com/pragma-org/amaru/pull/680
- https://github.com/pragma-org/amaru/pull/638
cardano-zkvms is now a complete example and set of experimentations of what can be done with ZK in cardano world.
https://jeluard.github.io/cardano-zkvms/ details and show how to end-to-end prove the execution of an aiken program:
- compile a aiken program in UPLC (in browser)
- execute this program (in browser)
- generate a proof of the same execution (on a remote backend)
- verify the associated commitment (in browser)
- verify the Stark proof (in browser)
It also explores:
- mcu proof verification (via qemu emulation of ESP32)
- what it would take to fork openvm to generate proofs that coult be verified on chain
I do not intend to push the experimentation further just yet. It makes sense now to step back and consider what use-cases would be worth pushing those further.
- Refine and document 100% of Amaru's traces
- End-to-end execution and proof of one non-trivial Aiken program using an on-chain Groth16 verifier.
- some more OTEL experimentation
- finally merge the
tracemacro PR - start looking at https://github.com/pragma-org/amaru/issues/645
I am still doing pass of refactoring and fixing the current branch I'm working on:
- Better checks for schedule ids during replay (the previous version was working but hackily).
- Better check that a node is initialized by using a pure-stage breakpoint.
- Fix for the creation of properly encoded blocks for the simulation.
- Refactoring of the result of a test (
TestResult) when used as a recursive input for shrinking.
A few more comments / refactorings are still necessary but the biggest thing next is the removal of Entry / Envelope (an entry with an arrival time), since this has an impact on the data we persist for later inspection or animation.
Testing main line amaru with #669 applied to see if there are other similar memory leaks that do not get hit by uplc test sweet. I did not see evidence of uplc memory leaks. I did see new memory usage. Including the fact that bytehound traces are larger and slower to process, suggesting a large number of short lived allocations. Because of bytehound overhead I could not do a long enough sample to determine if the new allocations were leaks or plateaued at some asymptote. Did not succeed at figuring what the new allocations are and whether they were intentional. Attempted take traces at each merged PR, If I found out when they were introduced that would tracking down the logic easier. Unfortunately, Nothing meaningful was found through this search.
- Continue to track down new memory usage.
- Writing up how to use bytehound to investigate the situation using the PR as an example.
- Investigate the impact of rocksdb configuration on memory usage.
A draft PR is new created and passes all the tests, including:
- The production and test of traces (
run_simulator_with_traces) - The replay of a captured trace (
test_run_replay). - The generation of random roll forward / rollback events from upstream peers and check of the chainsync property (
run_simulator).
I still to need to comment / document / refactor some parts:
[x] Initialize nodes without having to guess the number of steps necessary for the initialization. [.] Properly comparing ScheduleIds when replaying a trace (at the moment there is a discrepancy). [.] Remove the previous notion of World/Entries/Envelope that were a high-level view for representing simulation inputs / outputs. We now have nodes communicating via messages serialized as bytes so the situation is a bit different.
When this is done, I will open the PR for review.
Last week I participated in a Dagstuhl seminar that I had organised together with three professors around the topic of «Behavioural Types for Resilience». It is a series of roughly biennial seminars on behavioural types, i.e. the static specification and verification of program behaviour over time: whereas a data type specifies which values can be expressed, a behavioural type specifies which sequences of actions (or effects) are to be performed. This obviously aligns very well with the topic of protocol conformance and the Miniprotocols (which are a specific form of session types, which again is one of the kinds of behavioural types).
While the topics discussed ranged from logical foundations and proof theories (mostly implemented in Rocq via desirable programming language semantics (like the ability to express asynchronous communication instead of only synchronous, or “mixed choice” where two protocol participants may race to perform the next action instead of there only ever being one party to do the next step) to practical matters such as how to embed session typed (aspects of) programs in existing programming languages — and here a noticeable shift has happened from Haskell to Rust.
The properties that behavioural typing disciplines seek to guarantee are usually deadlock freedom and communication safety (i.e. no unexpected messages) on one hand and liveness (i.e. the desirable actions will eventually happen) on the other hand. There will be a report with all the discussion topics and presentations later this year.
For Amaru, I brought several concrete problems to the seminar and received solutions or next steps in return (yay!):
-
How to most effectively and efficiently tie our protocol implementations to precise specifications (Miniprotocols)
The current implementation uses a DSL for building up a state machine graph that corresponds to the session type expressed in the Ouroboros network protocol specification. This can be improved using the Rust type-level machinery showcased in PR 675, the main implementation mechanism is called “typestate“, meaning that the state of the computation is tracked using type parameters. Such a refactoring should be considered after we have achieved the “Amaru as Relay Node” milestone, and it will get rid of lots of unit tests, replacing them with type checking.
-
How to avoid back pressure deadlocks in the cyclic consensus StageGraph
This is a worrisome topic that arises in any system that implements strict back pressure and involves cycling messaging patterns. In Akka Streams we didn’t solve it, the programmer needs to figure out where the problematic cycles are and where to put a buffer or conflation stage to unblock the system.
I now think we can do better, using recent research on how to track resource usage in concurrent or distributed systems and inject mitigating actions in case such a fixable deadlock occurs (basically: put a cork in the upstream, provide one extra queue slot somewhere in the cycle to unblock it, let it drain to a sufficiently low level of queue usage, then uncork the upstream). This will allow us to throttle incoming chainsync updates in such a way that the node keeps running a maximal speed without having to put in rate measurements and flow regulators and control loops — which always involves a lot of fine tuning to get optimal performance and avoid (catastrophic) control loop oscillations.
-
How to properly express pipelining such that the session types machinery can actually assess its correctness
This is something the Haskell implementation basically hand-waves and that I hacked into Amaru in a cheating fashion whose correctness requires fallible human reasoning — I see myself incapable of writing a proof that it is actually correct. Coincidentally, this problem is well-known in the session types community, encountering exponentially exploding types or type checking effort in the face of asynchronous messaging (which is what pipelining effectively introduces: initiator sends a message before the protocol foresees it, supposedly knowing when it is safe to do so). Recent progress in session types has moved the expressivity from regular to context-free languages, meaning that in addition to recursion on the prefix of the sequence of actions, we can now also express “bracketing” actions (i.e. the ability of recognising properly balanced parentheses that context free languages can while regular expressions cannot).
Conversations with researchers from Glasgow in particular have led to the idea of checking whether the typestate approach can be extended also to context free protocol specifications, effectively taking it from a regular automaton to a push-down automaton. This is a research direction I intend to follow up on over the coming months. For now, the existing miniprotocol hack seems to work and is all we have.
Finished migrating amaru codebase to the new trace macros. It proved a bit more challenging than expected to keep the existing behavior.
Pushed initial implementation of uplc program zk proving using openvm. Both the original aiken uplc and newer uplc-turbo (patched) can be proved and verified natively. Although as noted before no existing zkvm can generate proofs suitable for fast verification on cardano chain, openvm is a pretty mature option that could theoretically be amended to produce such proofs (thanks to the reliance and Halo2).
I made a quick experiment checking if amaru could be run on edge computing (recently rebranded as AI) platforms like cloudflare workers.
This is currently not doable for the following reasons:
- amaru can't be compiled to wasm32-wasip2 due to missing tokio support (among others)
- lack of comprehensive
wasisupport by main edge platforms
Note that this is evolving pretty fast lately with the introduction of wasip3 and I would expect things to settle and move positively in following months. Worth keeping a look at it.
- Refine and document 100% of Amaru's traces
- End-to-end execution and proof of one non-trivial Aiken program using an on-chain Groth16 verifier.
- finish traces e2e tests to ensure there are no regression
- light push towards making more from traces:
- traces call-graph
- e2e performance analysis
- rebuild the ledger from traces (can also help improving internal API)
Integration of the new network stack
This week, I worked on the integration of the new network stack in the amaru node, in order to support a fully deterministic
execution of the node:
- Integration of the network stages for connection management + mini protocols
- Rework of the simulation framework to support:
- The testing of traces across the whole stack.
- The simulation of random chainsync events arriving at a node from several peers.
- The replay of persisted traces.
Point n.1 is really interesting because this means that an Amaru node, whether it acts as an upstream or downstream node, can now execute fully deterministically and have its traces replayed (well, almost, they are not enabled in production yet :-)).
Reworking the simulation was not totally obvious. The generated "messages" used to be injected at the beginning of the consensus pipeline but now they had to be sent via the chainsync miniprotocol. This implies:
- Setting up a listener.
- Accepting a connection.
- Run the handshake miniprotocol.
- Start the keepalive miniprotocol.
- Start the chainsync protocol and find an intersection.
- Send roll forward and rollback messages
This means that we need to effectively need to create peers (upstream and downstream of the node under test) that know how to run those protocols. Then the question is how/where to inject the chainsync events:
-
I've tried to inject them on upstream peers at the chainsync event level. => This is not good since this tests more nodes than necessary and creates the same issue recursively: how to reliably send those events to the upstream nodes?
-
I have tried to inject messages at the level of the Manager that handles all the connections => This is not great since this requires modifying the production code for test-only code paths. Even then it was difficult to set-up the upstream nodes in a consistent way.
-
I added a stage on top of the whole processing graph to
- Send
NewTipmessages to theManager(those are production messages that are created once the node has determined a new best chain). - Before sending those messages, set-up the
ChainStoreso that its best chain (and anchor, and best tip) correspond exactly to theNewTipand the best chain we want to emulate
- Send
Small improvements
On top of those changes, I made some improvements to the test checking the traces emitted by a node:
- It covers the network connection + miniprotocols initialization now.
- There is a bit more information on each captured traces.
- It is configurable to retain more or less targets and different log levels.
- In case of a failure it is easier to see the diff and to paste the actual values into the test expectations.
I will still need to align with the changes that @jeluard did in #638 to have a test that tests exactly the public part of the tracing API.
This supports the objective of making the node a full relay by end of Q1.
Unfortunately I had to tackle 2 issues at once: #663 and #664 and the resulting PR is large. I now need to review it, refactor some pieces, document it, and do a review pass with @rkuhn before we can merge.
Then, I will have to:
-
Rework the simulation animations (generated data + execution).
-
"Play" with the simulation a bit (or evolve it) to check if we really exercise interesting behavior. I'm thinking in particular in terms of:
- Timeouts.
- Reconnections.
- Errors/failures.
-
Add a property verifying that the tx submission protocol works.
- Revise, refine and automate the snapshot production pipeline. Specifically, try to reduce memory usage by introducing lazy decoding of CBOR-in-CBOR in Block.hs
Wait (work on another project)
The integration of the new network stack implies a major rework of the simulation code:
- For the simulation of a node with several peers and the verification of the chainsync property.
- For the simulation of a node to gather traces and make sure that they are not broken.
The second point is now done. We cover a lot more traces than before, since the test now covers the full initialization of a node + the tx submission protocol.
assert_spans_trees(
execute,
vec![json!(
{
"name": "handle_msg",
"target": "amaru_consensus",
"children": [
// Protocol manager
{ "name": "manager", "target": "amaru_protocols::manager", "message_type": "AddPeer" },
{ "name": "manager", "target": "amaru_protocols::manager", "message_type": "Listen" },
{ "name": "manager", "target": "amaru_protocols::manager", "message_type": "Listen" },
{ "name": "manager", "target": "amaru_protocols::manager", "message_type": "Connect" },
// Connection initialization
{ "name": "connection", "target": "amaru_protocols::connection", "conn_id": "0", "peer": "127.0.0.1:3000", "role": "Initiator", "message_type": "Initialize" },
{ "name": "manager", "target": "amaru_protocols::manager", "message_type": "Accepted" },
{ "name": "connection", "target": "amaru_protocols::connection", "conn_id": "1", "peer": "127.0.0.1:0", "role": "Responder", "message_type": "Initialize" },
// Handshake
{ "name": "handshake.responder", "target": "amaru_protocols::handshake::responder", "message_type": "Propose" },
{ "name": "connection", "target": "amaru_protocols::connection", "conn_id": "1", "peer": "127.0.0.1:0", "role": "Responder", "message_type": "Handshake" },
{ "name": "handshake.initiator", "target": "amaru_protocols::handshake::initiator", "message_type": "Accept" },
{ "name": "connection", "target": "amaru_protocols::connection", "conn_id": "0", "peer": "127.0.0.1:3000", "role": "Initiator", "message_type": "Handshake" },
// Chainsync + Tx submission
{ "name": "diffusion.chain_sync", "target": "amaru_consensus::stages::pull", "message_type": "Initialize" },
{ "name": "tx_submission.responder", "target": "amaru_protocols::tx_submission::responder", "message_type": "Init" },
{ "name": "chainsync.responder", "target": "amaru_protocols::chainsync::responder", "message_type": "FindIntersect" },
{ "name": "tx_submission.initiator", "target": "amaru_protocols::tx_submission::initiator", "message_type": "RequestTxIdsBlocking" },
{ "name": "chainsync.initiator", "target": "amaru_protocols::chainsync::initiator", "message_type": "IntersectFound" },
{ "name": "diffusion.chain_sync", "target": "amaru_consensus::stages::pull", "message_type": "IntersectFound" },
{ "name": "tx_submission.responder", "target": "amaru_protocols::tx_submission::responder", "message_type": "ReplyTxIds" },
{ "name": "chainsync.responder", "target": "amaru_protocols::chainsync::responder", "message_type": "RequestNext" },
{ "name": "chainsync.responder", "target": "amaru_protocols::chainsync::responder", "message_type": "RequestNext" },
{ "name": "tx_submission.initiator", "target": "amaru_protocols::tx_submission::initiator", "message_type": "RequestTxIdsBlocking" },
{ "name": "chainsync.initiator", "target": "amaru_protocols::chainsync::initiator", "message_type": "RollBackward" },
{
"name": "diffusion.chain_sync",
"target": "amaru_consensus::stages::pull",
"message_type": "RollBackward",
"children": [
{ "name": "chain_sync.receive_header", "target": "amaru_consensus::stages::receive_header" },
{ "name": "chain_sync.validate_header", "target": "amaru_consensus::stages::validate_header" },
{ "name": "diffusion.fetch_block", "target": "amaru_consensus::stages::fetch_block" },
{ "name": "chain_sync.validate_block", "target": "amaru_consensus::stages::validate_block" },
{ "name": "chain_sync.select_chain", "target": "amaru_consensus::stages::select_chain" }
]
},
{ "name": "tx_submission.responder", "target": "amaru_protocols::tx_submission::responder", "message_type": "ReplyTxIds" },
{ "name": "chainsync.initiator", "target": "amaru_protocols::chainsync::initiator", "message_type": "AwaitReply" },
{ "name": "tx_submission.initiator", "target": "amaru_protocols::tx_submission::initiator", "message_type": "RequestTxIdsBlocking" },
{ "name": "tx_submission.responder", "target": "amaru_protocols::tx_submission::responder", "message_type": "ReplyTxIds" },
{ "name": "tx_submission.initiator", "target": "amaru_protocols::tx_submission::initiator", "message_type": "RequestTxIdsBlocking" },
{ "name": "tx_submission.responder", "target": "amaru_protocols::tx_submission::responder", "message_type": "ReplyTxIds" },
{ "name": "tx_submission.initiator", "target": "amaru_protocols::tx_submission::initiator", "message_type": "RequestTxIdsBlocking" },
{ "name": "tx_submission.responder", "target": "amaru_protocols::tx_submission::responder", "message_type": "ReplyTxIds" },
{ "name": "tx_submission.initiator", "target": "amaru_protocols::tx_submission::initiator", "message_type": "RequestTxIdsBlocking" }
]
}
)],
vec!["amaru_consensus", "amaru_protocols"],
vec!["amaru_protocols::mux"],
);
Note for @jeluard: I will need to integrate your recent changes to normalize traces.
The rework of the simulation tests is also underway but it is a lot more complex to setup the proper topology and forward the necessary events to simulate roll forwards and rollbacks.
I pushed the macros implementation alongside a first pass at migrating all instrument usage. There probably will be more changes / added traces in the short term future.
This also allowed to:
- have a script to generate a
traces-schema.jsonJSON schema of all generated spans - have a script to generate a documentation of those spans
- have CI ensure changes do not lead to unintentional changes to both schemas/docs
Experimented with generating proofs of uplc compilation. Some feature flag can go a long way. Comprehensive repository with experiments coming soon.
Made a POC of amaru syncing on WearOS (aka android) based watches. See https://github.com/jeluard/amaru-wear/
- Refine and document 100% of Amaru's traces
- End-to-end execution and proof of one non-trivial Aiken program using an on-chain Groth16 verifier.
- look at how traces can be made more comprehensive
- light push towards making more from traces:
- traces call-graph
- e2e performance analysis
- rebuild the ledger from traces (can also help improving internal API)
- share comprehensive zk repos
This week was about:
-
Pushing previous PRs to the finish line (blockfetch protocol responder side / reconnection tests).
-
Integrating the new network stack to the whole node and remove the old acto/gasket code (#663)
- This is actually not the hard part!
- Running the simulation with the new network stack:
- I created another issue for this (#664) and #663 relies on it otherwise the tests can't pass.
- This is actually the difficult part of the new network stack integration since we need to:
- Implement message passing between nodes in-memory and not with TCP.
- Model more or less full nodes to act as peers in the simulation test.
- Incorporate the full connection cycle (handshake, keep alive) to the test.
- Find a way to generate and inject test scenarios in each node (whereas we previously used an abstract tree of rollforwards and rollbacks)
- Find a way to reproduce various arrival times for messages (actually I don't think that this is necessary since the simulation interleaving should be able to simulate this).
- Collect outcomes, either by listening to messages in downstream nodes or checking their state (what's the tip?).
- Planning the rework of the stage graph:
- We had a meeting with Roland to arrive at the diagram presented here.
- The notable elements are:
- A stage dedicated to peer tracking, collecting information from various other stages.
- Block fetch and validation after chain selection.
- Validating a block allows a roll forward event to be sent downstream.
- Blocks are streamed to the validate block stage but we need to control their arrival because the graph is cyclic and would risk deadlocks otherwise.
- Rollback events from select chain to downstream peers is not shown on the diagram.
This works supports the node being a full relay node with peer management eventually.
Follow the plan for reworking the simulation support.
Continued working on lazy decoding in sandbox. Tried two different approaches without success.
- Revise, refine and automate the snapshot production pipeline. Specifically, try to reduce memory usage by introducing lazy decoding of CBOR-in-CBOR in Block.hs
- Open issue on ouroboros-consensus to notify of db-analyser bug which is blocking the previous approach of producing snapshots using existing tools.
- Switch to Kupo
https://github.com/pragma-org/uplc/pull/36 was merged. So I opened https://github.com/pragma-org/amaru/pull/669. Which pointed out that we had missed a spot and so we merged https://github.com/pragma-org/uplc/pull/37.
Without comment https://github.com/txpipe/pallas/pull/721 was merged. So I followed up by opening https://github.com/txpipe/pallas/pull/728. And started work preparing the next PR, which I will open when that one merges. This involves setting up a repeatable benchmark, I used the number of allocations created by the test suite. It looks like some small changes should be able to reduce it at least by 50%.
- Testing main line amaru with that PR applied to uplc to see if there are other similar memory leaks that do not get hit by uplc test sweet.
- Writing up how to use bytehound to investigate the situation using the PR as an example.
- Investigate the impact of rocksdb configuration on memory usage.
Now that the new network stack (based on pure-stage) supports both upstream and downstream protocols, it can be
fully integrated in the node. When this is done we can remove previous code that was using acto and pallas-network to
forward chain events to downstream peers.
That integration is now done,... on the production side, meaning, for the production code that uses a tokio runner.
On the simulation side, things are slightly more complicated. We would like the new network code to also be simulated instead
of just receiving / emitting chainsync events. This means being able to have simulated peers "connect" to a node, and participate
to the mini-protocols with handshake, keepalive and chainsync initialization.
It's not yet clear to me where is the best place to inject those messages and how to run the simulated peers but that's the next step!
After the recent move away from pallas crates I had not-so-small rebase to do on my PR implementing the responder side of the blockfetch protocol (#643). Then some tests started failing and when I tried a real connection with an upstream Cardano Haskell node I got a decoding error.
That eventually led me to more carefully refine 2 notions for blocks:
-
Raw blocks: this is the pure encoded form of a block transmitted over the wire. Basically a
Vec<u8>.- This is what is eventually stored in the database, indexed by hash.
- And when sent over the wire this is what is retrieved by hash, and sent as-is.
-
Network blocks: this is an encapsulation of the previous block where we can access:
- The era tag defining the block encoding.
- The decoded
amaru_kernel::Blockwith all its fields. - The block header with all its fields (which means its point, slot, hash).
- The original
RawBlock