-
Notifications
You must be signed in to change notification settings - Fork 26
log :: 2026‐05
Tip
Relay Demo, Mempool / TxSubmission, Dynamic Peers, Snapshots, Metrics & Traces
These notes focus on relay readiness: running Amaru nodes on preprod, wiring ChainSync + TxSubmission end-to-end, and packaging a reusable demo with telemetry, wallets, scripts, and configurable networks.
Mempool work reaches MVP status with validation, rollback fixes, local transaction submission, txsubmission integration, and metrics aligned with cardano-node.
Dynamic upstream peer management and chain-store tooling improve robustness, with peer eviction on validation errors, block-fetch coordination across peers, and DB repair/inspection commands.
Snapshot and bootstrapping work is simplified around self-contained bootstrap files and newer snapshot formats, while observability work restores meaningful consensus spans, dashboards, metrics browsing, and Grafana/Tempo traces.
Merged changes based on feedback Implemented and merged removal of protocol v9 support.
- Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
- node diversity workshop
- experiment with relay
Quick update:
-
The demo code has been rebased on all the latest changes.
-
There's an in-progress branch where I'm working on putting the spans back, now that the consensus graph has been updated to a more stable and complete form (open the PR description for a Grafana / Tempo screenshot).
-
I have adjusted the demo scripts to use the latest snapshot format, and successfully synchronized my amaru nodes on
preprod. -> I did a quick fix to the snapshot code but also enabled the user to specify their own bootstrap config directory.
Next up is to finish the work on spans. At least for a first pass because we can still discuss the structure of the traces
we want to display, especially since there are stages aggregating work, like fetch_block.
For now I have taken the following approach:
- The first reception of a header marks the beginning of a trace.
- If that header leads to fetching a block range, that activity is registered under the same trace.
- This means that the processing of other headers that did not trigger a block fetch will go to separate traces that will be stopped short.
Next up is filling in the blanks and make sure that all the other stages are present in the trace.
The PR #756 now works properly, with some management of multiple upstream peers:
- slow peers skip FetchBlocks requests that were already serviced by other peers
- header validation errors kick out peers as they should be
- static peers are preferred for new connections are the cooldown period when looking for new peers
-
dump-chain-db -c <hash or point>prints all children, including invalid blocks, to diagnose DB issues -
dump-chain-db -fruns the best candidate search and prints the result -
dump-chain-db -a <hash or point>prints ancestors from a point or hash (because we often have only hashes) -
remove-validation-status [<hash or point>]can be used to fix a DB if some blocks were marked invalid due to Amaru bugs -
remove-chainsubcommand was added to play with the DB or recover from a state where headers were too far ahead of ledger
Now works up to epoch 524, failing at point=141879517.ece7b5e6074936e8bc0264f65a8eb5ce834af63a44c0da7215f018f3e69e6f5d with
BlockValidationError: Invalid block: Transaction e1e399ad0bb7 at index 12 is invalid: transaction failed phase one validation: invalid transaction scripts: script integrity hash mismatch
This is probably fixed on some branch, I recall @yHSJ mentioning something on Discord.
Made changes based on feedback:
- bootstrap files are now self-sufficient, no need for bootstrap time peer access
- bootstrap creation requires having
db-analyserbinary access; no moredockerusage - general UX improvements (errors, naming)
Worked on simplifying and consolidating monitoring infrastructure. Waiting for feedback here.
- Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
- Push metrics support on-part with cardano-node
- merge PRs
- decide if we want to push the monitoring work
- remove v9 support
- RPI
The mempool + txsubmission fixes PR has now been merged on main:
- This means that anyone should be able to run an
amarunode participating in the txsubmission protocol. - A user can also submit transactions locally.
- Follow-up issues
The demo is now running ok on preprod with a pending question around the availability of stake distributions and how far
in advance we should query headers:
- #825 -> to be discussed
On preprod the demo shows:
- A complete panel with running nodes and some ad-hoc scripts
-
1-mithril-refreshchecks out the latest mithril snapshot -
2-setupinitializes the database and compiles the executables (exceptcardano-nodethat needs to be provided). -
3-cardano-nodestarts acardano-node. -
4-amaru-middlestarts anamarunode downstream of the cardano node. -
5-amaru-downstreamstarts anamarunode downstream of theamaru-middlenode. -
6-refuel-submit-walletmakes sure that there are enough inputs to create 10 concurrent transactions (an initial wallet address must be funded). -
7-submit-txsubmits one tx on theamaru-downstreamnode but can be scaled to 10 concurrent instances. -
8-telemetrycan be started on demand to observe traces (in Grafana Tempo) + a dashboard for mempool metrics. -
9-watchis a panel collecting all logs.
The whole setup has been optimized to be frequently stopped and started again without having to necessarily re-compute everything.
The demo is parametrized with AMARU_NETWORK to be able to switch from one network to another.
I haven't been able yet to show that everything works ok for anyone that would use the node with preprod and even less with preview:
- The new Van Rossem fork introduces new cost models that we haven't implemented yet: #837.
- There is an unmerged fix for duplicate redeemers:
- I fixed it here: #843 but it turns out that Jonathan has a better fix.
- Josh, I haven't had the possibility yet to check that fix in my environment.
- I'm still preparing a PR fixing an ordering issue on proposals
- I need to make a PR for a robustness issue that I found regarding the persistence of the ledger state.
- I need to rebase the
etorreborre/feat/mempool-observabilitybranch which contains now:- mempool metrics.
- fixed spans for the consensus stage.
- I want to submit a PR that bypasses the header validations when syncing from a mithril snapshot (I had an issue with this and it is not strictly necessary).
- There were other failing cases when syncing on
previewthat I need to investigate. - I need to test on
mainnet. - Then, when all of this is done, I will go through the demo set-up, review it and submit a PR so that anyone can use it.
Still working on a demo showing the end-to-end use of amaru nodes for chainsync and txsubmission:
- I found a ledger issue regarding consumed inputs: https://github.com/pragma-org/amaru/pull/835.
- I fixed a possible transaction rollback bug: https://github.com/pragma-org/amaru/pull/838.
- I was temporarily blocked by the Van Rossem fork on preprod and had to fix a cost model issue: https://github.com/pragma-org/amaru/pull/837.
- This might become a problem for the demo since we need to produce transactions and see them be included in a block.
- So if there are transactions that we can't validate, we won't be able to show that part.
I started investigating how metrics and traces get produced:
- I have integrated a dashboard to the demo around transaction submission and mempool activity.
- I started updating the traces for the consensus. They were broken since we changed quite a lot the structure of the consensus graph and we need to careful pass spans around in order to be able to display meaningful traces.
Since all this work in done on my demo branch, I will backport it to main when it stabilizes.
Iterate on feedback provided on the snapshot PR.
Pushed new metrics PR. Tested using prometheus endpoint.
Updated otel-ui so that it can browse metrics directly. Useful as a debugging tool.
- Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
- Push metrics support on-part with cardano-node
- merge snapshots PR
- PI love
- Push metrics support on-part with cardano-node
This week I worked on improving the demo showing the integration of the amaru node with other nodes, and fixing any issue coming in the way:
- Transaction ids are now correctly created #823
- Transaction ids are now correctly serialized on round-trip #809
- Transaction serialization is fixed and include the proper era tags #812
- Downloaded but not validated blocks after a restart are now validated #805
- Headers can be ahead of the ledger and not have access to the correct stake distribution #825 -> still being discussed, need @KtorZ opinion on that one.
The current demo is working on the branch etorreborre/test/tx-submission-demo. It is based on 2 stacked PRs where the
most important one is the one finishing the implementation of the mempool #788.
Here is a screenshot of the current state.
Make the node ready as a production relay.
Next is:
- Show the telemetry support (and possibly improve some of the traces).
- Test on preview as well and make the script configurable w.r.t the network.
- Probably some adjustments on the mempool PR after review.
- Eric: Bunch of PR influx; trying to merge and close as many as possible
era tag for transaction ids & transaction > adding a check for Conway & merge transaction id from body; refactoring; changes to the ledger coming related to the tags header validation fix to be reviewed by Roland and Matthias
-
Overall discussion: clarify the assumptions (Bootstrap from a snapshot; we use Conway blocks only; we currently get rid of history); for the hard fork we will cover the current epoch & the next one
-
Joshy: created a new PR for fee calculation for missing reference script validation using recursive logic might be better, to be looked into by Matthias
-
Overall ledger rules discussion:
In the next release: reorganizing the ledger rules modules (complex / simple differentiation) is a good idea Building a set of conformance tests for verifying up to our standards Pallas work around to be implemented as well
- KtorZ: still reviewing PR, spent time on optimizing the CI and playing with the cache, it's now working and faster
might need to optimise the job scheduler as well simulation workflow to be integrated / reworked in a future time, for now we'll just ignore it hard fork impacts to be measured by bumping the node versions epoch transition to be done soon with a PR
- Jonathan: majority of implementation almost done (to be merged)
branch to pick out the built-ins; changes to make the optimisations standalone and structured branch to get the cost model logic extension working once merged documentation will be updated
Pushed a PR with first working version of bootstrap creation / import.
Add new metrics to achieve cardano-node parity.
- Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
- Push metrics support on-part with cardano-node
- PI love
- Push metrics support on-part with cardano-node
This week, I:
- Finished the first version of the mempool logic + its interactions with the txsubmission protocol where I fixed a few things and aligned error cases with the Haskell node: PR #788.
- On top of that PR I updated the metrics to correspond more closely to the Haskell node: PR #791.
- Then I started running the node in a demo where I want to see:
- ChainSync working properly
- TxSubmission working properly
I found a few issues when running tests:
- Rollbacking the ledger can fail: #801.
- Fetching blocks can fail if a node is connected to both upstream and downstream peers: #800.
- The initialization of the chain store after a bootstrap is incomplete: #805.
- When a node has been stopped, we might not resume block validation properly: #805.
For the demo, I have now to create and submit transactions. There is now a local submit API to do so but, it's still taking me some time to fully validate everything:
- The demo original setup was largely produced by Claude and needed some refactoring.
- The
tmuxsetup that I had originally was a bit troublesome (at least for me) when I wanted to interact with the different panes and windows.- I eventually decided to use https://f1bonacc1.github.io/process-compose/launcher which is simpler and actually nicer.
- I encountered the bugs mentioned above (which is a good thing!).
- I managed to corrupt my db which means that I had to resynchronize my amaru nodes just to get to the point where I could submit a transaction. And this takes a while, so having an easy way to bootstrap from a more recent snapshot will really help.
Make the node ready as a production relay.
I need people to review / approve the open PRs. Probably the best people are
- Matthias -> Rollbacking the ledger can fail: #801.
- Roland -> Fetching blocks can fail if a node is connected to both upstream and downstream peers: #800.
- Roland -> The initialization of the chain store after a bootstrap is incomplete: #805.
- Roland -> When a node has been stopped, we might not resume block validation properly: #805.
- Roland or Matthias -> mempool logic.
- Julien -> mempool observability (depends on the previous one).
Then I will continue to run and package a simple demo (and certainly have to take some time to incorporate review comments).
-
Eric: Mempool PR done (check transaction and removing from the mempool)
Removed interactions with the tx submission protocol A bunch of shortcuts made in the Haskell implementation (reproducing)
-
Eric: Working on observability metrics for the mempool (mirroring the Haskell ones)
Inject crates and transaction to see if they flow Simplified setups
-
Eric: Small PR for the Ledger for Matthias
-
JSHy: merging PR for the ledger
-
JSHy: looking into a more agnostic test suite for a Cardano node (moving over some test vectors)
-
JSHy: validation context updates
-
KtorZ: will look into validation context
-
KtorZ: look into eric's PR
-
KtorZ: epoch boundary transition first get the logic right but with unoptimized performance, then tweak it for optimizing
-
Phule/Jonathan: gathering together the UPLC "dots"
-
Roland: Dynamic upstream peer selection PR open to be looked into & aligning with the comments of Eric
Roland: positive that the demo content can be achieved by the end of this week
Used the acceptance process on every contract existing and produced "proof of acceptance" for work done on each scope
Trying out the process showing its value and tweaking it a little bit
The PR has been finally merged!
Due to the PR above I had to carefully rebase the latest mempool PR. In the process of testing it, I found some issues with:
- The block fetch responder (fix here: https://github.com/pragma-org/amaru/pull/800).
- The wake up of a blocking call in the tx-submission protocol.
I've fixed the wake-up call + spent some time to check more error cases with the tx-submission protocol. Next: I will test all of this, using the demo scripts that were used to show various node topologies.
Julien nudged me to better align the mempool metrics to the ones already exposed by the cardano-node, which I did. Now I need to do a bit of end to end testing with Prometheus.
Rebooted effort by focusing on just simplifying snapshot creation from CLI. Rely on old work from Arnaud, and make sure we align with Paolo current initiative. The less moving part, the better. Get rid of older now useless things. Now directly import cardano-node InMem persistency.
- Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
Finally got zk/openvo proof verification on MCU over BLE working. 2 targets: raspberry pico2 and esp32 S3. Both require additional RAM as PSRAM (8MB). Takes roughly 1min to prove an arbitraru UPLC execution. Not quite clear what realistic use-case it unlocks now definitely excitng.
- finish snapshot creation
- make RPI refresh progress
I addressed the remaining comments. There are still quite a few things we could do to improve the consistency of the
ChainStore API but we will tackle that later.
Note: there are now test functions to unit-tests functions involved in pure-stage stages without having to run the
full stage code.
This PR adds checks to the mempool:
- It revalidates transactions when a new tip has been adopted.
- It removes transactions that are now part of the ledger.
- It makes sure that the txsubmission protocol does not try to add transactions to the mempool when it is near capacity.
Additionally:
- It exposes some parameters of the mempool and txsubmission protocol to the CLI.
- It adds some missing checks for the txsubmission protocol (and refactors the internals a bit to do so).
This PR adds traces and metrics to the mempool.
Make the node ready as a production relay.
Addressing comments on open PRs + do some manual tests for the mempool & txsubmission protocol