Skip to content

log :: 2025‐11

etorreborre edited this page Nov 26, 2025 · 9 revisions

2025-11-27

Tx submission protocol -> client / server / mempool integration

  • I have now implemented a few tests where a client and a server start independently and stop when transactions have been transmitted.
    • Simple connection -> check that txs already present in the client mempool end-up in the server mempool.
    • Check that a concurrent process filling-in the client mempool will also make the server mempool eventually populated.
    • Check the blocking vs non-blocking behavior.
    • Vary the window size + fetch size for the server.
    • Send invalid transactions and check that they are not requested again.
    • Check some edge cases:
      • The client requesting a transaction that was not announced by the server.
      • Requesting zero transactions or zero ids.
      • Too many or not enough transactions sent or acknowledged via the client or server
  • Next steps:
    • I still have a regression in some earlier tests, so I need to stabilize all of this.
    • Proper error types (I've used anyhow errors in many places).
    • Integration to amaru as a whole.

2025-11-24

Tx submission protocol -> client side

  • I added more tests and did a bit of refactoring for the client side.

Tx submission protocol -> server side

  • I implemented the server side, which is slightly more complicated in terms of what needs to be tracked.
  • I started some tests for that side but there's more to be done
  • Then the next step will be to test the integration of: the client, the server and their respective mempools.

2025-11-21

Tx submission protocol -> client side

  • I implemented a TxSubmissionClient client using the mempool to serve requests from the tx submission protocol for a given peer.
  • The client is using the pallas-network data types modelling the protocol but it isolates the use of the pallas client via TxSubTransport interface.
  • This also allows to unit test the TxSubmissionClient by just sending request and checking the response.
    • There is one initial test getting transactions with blocking requests and a mempool filled with enough transactions for now.
    • I will add more tests before switching to the TxSubmissionServer implementation.

2025-11-20

Tx submission protocol

  • I started having a looking supporting a minimal tx submission protocol.
  • I'm planning to use pallas-network for now, util we can move that mini protocol to pure-stage.
  • Between the client and the server should live the mempool. There is already an amaru-mempool crate with a dummy mempool which is:
    • Not used yet.
    • Geared towards the relationship with the ledger and block forging.
  • I started to revisit the Mempool trait to be able to support the state data required for the tx submission protocol.
    • I'm trying to relax some assumptions on the trait to
      • Be able to support a TxId since this is required for the protocol
      • Be able to use various notions of "keys" for acknowledging a transaction. At the moment only keys function definition can be used for a given Mempool. It would be nicer IMO and IIUC to pass different "keys" function to acknowledge transactions with the same mempool. I'm still fighting with the type checker to see if it is possible to implement without any copying/cloning. But maybe this is not a big deal since transaction inputs used as transaction keys are only hashes and indices.

2025-11-19

Simulation

  • I gave up on using just one trace file as CBOR because the encoding depends closely to the Rust data types and is hard to recover in JavaScript land.
  • There is now a CI nightly job for the simulation

2025-11-18

Simulation

  • I reworked a bit the trace animation based on the rework of sync effects by Roland.
  • I fixed some issues with animations and made some improvements:
    • Some times were not parsed correctly for the entries animation.
    • I added step forward/step back buttons for the entries animation.
    • In the trace animation the hash shown was not always correct because we were not serializing that field off the BlockHeader. I changed the serialization to make this work, that should be backwards compatible (because the deserialization stays the same) but I hope that won't break anything.
  • There was indeed a bug in the chain selection, found a few times on CI, that I was able to reproduce.
  • Then it was cool to be able to use all the current tools to diagnose it!
    • There is now an independent trace for each state and I was able to run_replay that trace with the RustRover debugger.
    • The entries.html animation helped me understand the arrival order of messages since one of them was delayed.
    • The traces.html animation helped me understand what was executed by each stage exactly when.
    • Then some careful debug statement with the full HeadersTree display made me realize what the problem was.
  • The bug is here
    • We take the tree_state but don't put it back in case of an error.
    • That error happened because one message for one peer was delayed and did not arrive in order compared to the other peer's messages.
    • The fix is simple but I still need to write a proper unit test for it.

2025-11-17

Simulation

  • The rework of sync effects by Roland had a small deadlock and not all sync effects were marked as sync.
    • I was not able to fix the trace replay though and Roland will take care of that.
  • I changed the persistence of simulation data
    • With one trace per test now (before it was the full test for the full run)
    • With some isolation for the test_run_simulation and test_run_replay to avoid concurrency issues.
    • Persisted data is now saved when a Github CI job fails for later inspection.

Peer selection

  • We had a meeting with Markus about peer selection
  • There was a previous mention of peer selection in the log book, around countering an Eclipse attack.
  • We agreed that all requirements should end-up in the Cardano blueprints to be shared with other nodes implementors.
  • We discussed some additional options for selecting peers, in particular using a trace route for each connected peer, and fingerprinting that peer by removing the first 2, or 3 nodes from that route and the last 2 or 3 (to avoid those nodes to be controlled by an attacker). Then the geo-localization of each node can help with making a more decentralized choice for upstream peers.
  • We also discussed how the Amaru node could send data to the BlockFetch client maintained by Markus's team.
    • This data can then be aggregated and be used by SPOs to improve their peer selection.

2025-11-04

Observability

Simulation

  • The replay functionality can now read data directly from the latest output of a run_simulator run.
  • It turns out that external sync effects are not yet supported by the Replay support in the simulation:
    • The sync effects inputs and response are now traced which makes the simulation richer (now we know how an operation was called and what was the result).
    • However we currently don't have a way to check, during replay, that a sync call gets called with the same inputs and returns the same response as the one recorded in the trace.
  • Overall, I'm actually not sure how useful is the replay feature in its current form
    • I have the impression that being able to visualize an execution trace with interim states is useful for debugging an actual issue. => This means that we should start working on producing some traces in production.
    • If we want to check that a fix is working, having a full list of trace entries might not be so useful because if we change the implementation we not be able to replay exactly what happened. => The next step will be to update the simulation README with updated instructions on how to use what we have now.

2025-11-03

Serve chain from store

  • Finally provided a PR to fix the huge memory consumption we saw with each chain sync downstream connection

Simulation

  • While making a final pass on the PR to visualize simulation traces I noticed that we were computing the Header hash based on the whole header instead of the header body. @arnaud, would our Antithesis tests have detected this if they ran Cardano nodes as well? Since I think that this is incorrect I fixed this issue as part of my PR and the fix is now on main.

Clone this wiki locally