Skip to content

log :: 2026‐05

Matthias Benkort edited this page Jun 12, 2026 · 19 revisions

Tip

KEYWORDS

Relay Demo, Mempool / TxSubmission, Dynamic Peers, Snapshots, Metrics & Traces

SUMMARY

These notes focus on relay readiness: running Amaru nodes on preprod, wiring ChainSync + TxSubmission end-to-end, and packaging a reusable demo with telemetry, wallets, scripts, and configurable networks.

Mempool work reaches MVP status with validation, rollback fixes, local transaction submission, txsubmission integration, and metrics aligned with cardano-node.

Dynamic upstream peer management and chain-store tooling improve robustness, with peer eviction on validation errors, block-fetch coordination across peers, and DB repair/inspection commands.

Snapshot and bootstrapping work is simplified around self-contained bootstrap files and newer snapshot formats, while observability work restores meaningful consensus spans, dashboards, metrics browsing, and Grafana/Tempo traces.

2026-05-29

Weekly Update (@jeluard)

What did you work on this week?

Snapshots

Merged changes based on feedback Implemented and merged removal of protocol v9 support.

What outcome/key result did it support?

  • Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending

What's immediately next?

  • node diversity workshop
  • experiment with relay

2026-05-28

Relay demo and traces (@etorreborre)

Quick update:

  • The demo code has been rebased on all the latest changes.

  • There's an in-progress branch where I'm working on putting the spans back, now that the consensus graph has been updated to a more stable and complete form (open the PR description for a Grafana / Tempo screenshot).

  • I have adjusted the demo scripts to use the latest snapshot format, and successfully synchronized my amaru nodes on preprod. -> I did a quick fix to the snapshot code but also enabled the user to specify their own bootstrap config directory.

Next up is to finish the work on spans. At least for a first pass because we can still discuss the structure of the traces we want to display, especially since there are stages aggregating work, like fetch_block. For now I have taken the following approach:

  • The first reception of a header marks the beginning of a trace.
  • If that header leads to fetching a block range, that activity is registered under the same trace.
  • This means that the processing of other headers that did not trigger a block fetch will go to separate traces that will be stopped short.

Next up is filling in the blanks and make sure that all the other stages are present in the trace.

2026-05-23

Dynamic Upstream Peers

The PR #756 now works properly, with some management of multiple upstream peers:

  • slow peers skip FetchBlocks requests that were already serviced by other peers
  • header validation errors kick out peers as they should be
  • static peers are preferred for new connections are the cooldown period when looking for new peers

Chain store maintenance tooling added

  • dump-chain-db -c <hash or point> prints all children, including invalid blocks, to diagnose DB issues
  • dump-chain-db -f runs the best candidate search and prints the result
  • dump-chain-db -a <hash or point> prints ancestors from a point or hash (because we often have only hashes)
  • remove-validation-status [<hash or point>] can be used to fix a DB if some blocks were marked invalid due to Amaru bugs
  • remove-chain subcommand was added to play with the DB or recover from a state where headers were too far ahead of ledger

Syncing mainnet

Now works up to epoch 524, failing at point=141879517.ece7b5e6074936e8bc0264f65a8eb5ce834af63a44c0da7215f018f3e69e6f5d with

BlockValidationError: Invalid block: Transaction e1e399ad0bb7 at index 12 is invalid: transaction failed phase one validation: invalid transaction scripts: script integrity hash mismatch

This is probably fixed on some branch, I recall @yHSJ mentioning something on Discord.

2026-05-22

Weekly Update (@jeluard)

What did you work on this week?

Snapshots

Made changes based on feedback:

  • bootstrap files are now self-sufficient, no need for bootstrap time peer access
  • bootstrap creation requires having db-analyser binary access; no more docker usage
  • general UX improvements (errors, naming)

Metrics

Worked on simplifying and consolidating monitoring infrastructure. Waiting for feedback here.

What outcome/key result did it support?

  • Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
  • Push metrics support on-part with cardano-node

What's immediately next?

  • merge PRs
  • decide if we want to push the monitoring work
  • remove v9 support
  • RPI

2026-05-21

Relay demo

Mempool

The mempool + txsubmission fixes PR has now been merged on main:

  • This means that anyone should be able to run an amaru node participating in the txsubmission protocol.
  • A user can also submit transactions locally.
  • Follow-up issues
    • #840: support chained transactions.
    • #841: use only a subset of validation rules to revalidate transactions.
    • #842: limit the mempool capacity also by mem and cpu units.

The demo itself

The demo is now running ok on preprod with a pending question around the availability of stake distributions and how far in advance we should query headers:

  • #825 -> to be discussed

On preprod the demo shows:

  • A complete panel with running nodes and some ad-hoc scripts
  • 1-mithril-refresh checks out the latest mithril snapshot
  • 2-setup initializes the database and compiles the executables (except cardano-node that needs to be provided).
  • 3-cardano-node starts a cardano-node.
  • 4-amaru-middle starts an amaru node downstream of the cardano node.
  • 5-amaru-downstream starts an amaru node downstream of the amaru-middle node.
  • 6-refuel-submit-wallet makes sure that there are enough inputs to create 10 concurrent transactions (an initial wallet address must be funded).
  • 7-submit-tx submits one tx on the amaru-downstream node but can be scaled to 10 concurrent instances.
  • 8-telemetry can be started on demand to observe traces (in Grafana Tempo) + a dashboard for mempool metrics.
  • 9-watch is a panel collecting all logs.

The whole setup has been optimized to be frequently stopped and started again without having to necessarily re-compute everything.

The demo is parametrized with AMARU_NETWORK to be able to switch from one network to another.

Pending issues, questions, PRs

I haven't been able yet to show that everything works ok for anyone that would use the node with preprod and even less with preview:

  • The new Van Rossem fork introduces new cost models that we haven't implemented yet: #837.
  • There is an unmerged fix for duplicate redeemers:
    • I fixed it here: #843 but it turns out that Jonathan has a better fix.
    • Josh, I haven't had the possibility yet to check that fix in my environment.
  • I'm still preparing a PR fixing an ordering issue on proposals
  • I need to make a PR for a robustness issue that I found regarding the persistence of the ledger state.
  • I need to rebase the etorreborre/feat/mempool-observability branch which contains now:
    • mempool metrics.
    • fixed spans for the consensus stage.
  • I want to submit a PR that bypasses the header validations when syncing from a mithril snapshot (I had an issue with this and it is not strictly necessary).
  • There were other failing cases when syncing on preview that I need to investigate.
  • I need to test on mainnet.
  • Then, when all of this is done, I will go through the demo set-up, review it and submit a PR so that anyone can use it.

2026-05-19

Relay demo issues

Still working on a demo showing the end-to-end use of amaru nodes for chainsync and txsubmission:

Relay demo observability

I started investigating how metrics and traces get produced:

  • I have integrated a dashboard to the demo around transaction submission and mempool activity.
  • I started updating the traces for the consensus. They were broken since we changed quite a lot the structure of the consensus graph and we need to careful pass spans around in order to be able to display meaningful traces.

Since all this work in done on my demo branch, I will backport it to main when it stabilizes.

2026-05-15

Weekly Update (@jeluard)

What did you work on this week?

Snapshots

Iterate on feedback provided on the snapshot PR.

Metrics

Pushed new metrics PR. Tested using prometheus endpoint.

Updated otel-ui so that it can browse metrics directly. Useful as a debugging tool.

What outcome/key result did it support?

  • Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
  • Push metrics support on-part with cardano-node

What's immediately next?

  • merge snapshots PR
  • PI love
  • Push metrics support on-part with cardano-node

Weekly Update (@etorreborre)

What did you work on this week?

This week I worked on improving the demo showing the integration of the amaru node with other nodes, and fixing any issue coming in the way:

  • Transaction ids are now correctly created #823
  • Transaction ids are now correctly serialized on round-trip #809
  • Transaction serialization is fixed and include the proper era tags #812
  • Downloaded but not validated blocks after a restart are now validated #805
  • Headers can be ahead of the ledger and not have access to the correct stake distribution #825 -> still being discussed, need @KtorZ opinion on that one.

The current demo is working on the branch etorreborre/test/tx-submission-demo. It is based on 2 stacked PRs where the most important one is the one finishing the implementation of the mempool #788.

Here is a screenshot of the current state.

What outcome/key result did it support?

Make the node ready as a production relay.

What's immediately next?

Next is:

  • Show the telemetry support (and possibly improve some of the traces).
  • Test on preview as well and make the script configurable w.r.t the network.
  • Probably some adjustments on the mempool PR after review.

2026-05-14

Tactical team planning & Weekly update (@Dam-CZ)

  • Eric: Bunch of PR influx; trying to merge and close as many as possible

era tag for transaction ids & transaction > adding a check for Conway & merge transaction id from body; refactoring; changes to the ledger coming related to the tags header validation fix to be reviewed by Roland and Matthias

  • Overall discussion: clarify the assumptions (Bootstrap from a snapshot; we use Conway blocks only; we currently get rid of history); for the hard fork we will cover the current epoch & the next one

  • Joshy: created a new PR for fee calculation for missing reference script validation using recursive logic might be better, to be looked into by Matthias

  • Overall ledger rules discussion:

In the next release: reorganizing the ledger rules modules (complex / simple differentiation) is a good idea Building a set of conformance tests for verifying up to our standards Pallas work around to be implemented as well

  • KtorZ: still reviewing PR, spent time on optimizing the CI and playing with the cache, it's now working and faster

might need to optimise the job scheduler as well simulation workflow to be integrated / reworked in a future time, for now we'll just ignore it hard fork impacts to be measured by bumping the node versions epoch transition to be done soon with a PR

  • Jonathan: majority of implementation almost done (to be merged)

branch to pick out the built-ins; changes to make the optimisations standalone and structured branch to get the cost model logic extension working once merged documentation will be updated

2026-05-08

Weekly Update (@jeluard)

What did you work on this week?

Snapshots

Pushed a PR with first working version of bootstrap creation / import.

Metrics

Add new metrics to achieve cardano-node parity.

What outcome/key result did it support?

  • Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending
  • Push metrics support on-part with cardano-node

What's immediately next?

  • PI love
  • Push metrics support on-part with cardano-node

Weekly Update (@etorreborre)

What did you work on this week?

This week, I:

  • Finished the first version of the mempool logic + its interactions with the txsubmission protocol where I fixed a few things and aligned error cases with the Haskell node: PR #788.
  • On top of that PR I updated the metrics to correspond more closely to the Haskell node: PR #791.
  • Then I started running the node in a demo where I want to see:
    1. ChainSync working properly
    2. TxSubmission working properly
Issues

I found a few issues when running tests:

  • Rollbacking the ledger can fail: #801.
  • Fetching blocks can fail if a node is connected to both upstream and downstream peers: #800.
  • The initialization of the chain store after a bootstrap is incomplete: #805.
  • When a node has been stopped, we might not resume block validation properly: #805.
Demo

For the demo, I have now to create and submit transactions. There is now a local submit API to do so but, it's still taking me some time to fully validate everything:

  • The demo original setup was largely produced by Claude and needed some refactoring.
  • The tmux setup that I had originally was a bit troublesome (at least for me) when I wanted to interact with the different panes and windows.
  • I encountered the bugs mentioned above (which is a good thing!).
  • I managed to corrupt my db which means that I had to resynchronize my amaru nodes just to get to the point where I could submit a transaction. And this takes a while, so having an easy way to bootstrap from a more recent snapshot will really help.

What outcome/key result did it support?

Make the node ready as a production relay.

What's immediately next?

I need people to review / approve the open PRs. Probably the best people are

  • Matthias -> Rollbacking the ledger can fail: #801.
  • Roland -> Fetching blocks can fail if a node is connected to both upstream and downstream peers: #800.
  • Roland -> The initialization of the chain store after a bootstrap is incomplete: #805.
  • Roland -> When a node has been stopped, we might not resume block validation properly: #805.
  • Roland or Matthias -> mempool logic.
  • Julien -> mempool observability (depends on the previous one).

Then I will continue to run and package a simple demo (and certainly have to take some time to incorporate review comments).

2026-05-07

Tactical team planning & Weekly update (@Dam-CZ)

  • Eric: Mempool PR done (check transaction and removing from the mempool)

    Removed interactions with the tx submission protocol A bunch of shortcuts made in the Haskell implementation (reproducing)

  • Eric: Working on observability metrics for the mempool (mirroring the Haskell ones)

    Inject crates and transaction to see if they flow Simplified setups

  • Eric: Small PR for the Ledger for Matthias

  • JSHy: merging PR for the ledger

  • JSHy: looking into a more agnostic test suite for a Cardano node (moving over some test vectors)

  • JSHy: validation context updates

  • KtorZ: will look into validation context

  • KtorZ: look into eric's PR

  • KtorZ: epoch boundary transition first get the logic right but with unoptimized performance, then tweak it for optimizing

  • Phule/Jonathan: gathering together the UPLC "dots"

  • Roland: Dynamic upstream peer selection PR open to be looked into & aligning with the comments of Eric

Roland: positive that the demo content can be achieved by the end of this week

What did you work on this week?

Used the acceptance process on every contract existing and produced "proof of acceptance" for work done on each scope

What outcome/key result did it support?

Trying out the process showing its value and tweaking it a little bit

2026-05-06

Removal of sync effects + use of database snapshots for the chain store

The PR has been finally merged!

Mempool MVP

Due to the PR above I had to carefully rebase the latest mempool PR. In the process of testing it, I found some issues with:

I've fixed the wake-up call + spent some time to check more error cases with the tx-submission protocol. Next: I will test all of this, using the demo scripts that were used to show various node topologies.

Mempool observability

Julien nudged me to better align the mempool metrics to the ones already exposed by the cardano-node, which I did. Now I need to do a bit of end to end testing with Prometheus.

2026-05-01

Weekly Update (@jeluard)

What did you work on this week?

Snapshots

Rebooted effort by focusing on just simplifying snapshot creation from CLI. Rely on old work from Arnaud, and make sure we align with Paolo current initiative. The less moving part, the better. Get rid of older now useless things. Now directly import cardano-node InMem persistency.

What outcome/key result did it support?

  • Continuously validate Amaru against the latest available ledger snapshot within a day of an epoch ending

Extra

Finally got zk/openvo proof verification on MCU over BLE working. 2 targets: raspberry pico2 and esp32 S3. Both require additional RAM as PSRAM (8MB). Takes roughly 1min to prove an arbitraru UPLC execution. Not quite clear what realistic use-case it unlocks now definitely excitng.

What's immediately next?

  • finish snapshot creation
  • make RPI refresh progress

Weekly Update (@etorreborre)

What did you work on this week?

Finish the PR removing sync effects

I addressed the remaining comments. There are still quite a few things we could do to improve the consistency of the ChainStore API but we will tackle that later.

Note: there are now test functions to unit-tests functions involved in pure-stage stages without having to run the full stage code.

Mempool validation

This PR adds checks to the mempool:

  • It revalidates transactions when a new tip has been adopted.
  • It removes transactions that are now part of the ledger.
  • It makes sure that the txsubmission protocol does not try to add transactions to the mempool when it is near capacity.

Additionally:

  • It exposes some parameters of the mempool and txsubmission protocol to the CLI.
  • It adds some missing checks for the txsubmission protocol (and refactors the internals a bit to do so).

Mempool observability

This PR adds traces and metrics to the mempool.

What outcome/key result did it support?

Make the node ready as a production relay.

What's immediately next?

Addressing comments on open PRs + do some manual tests for the mempool & txsubmission protocol

Clone this wiki locally