-
Notifications
You must be signed in to change notification settings - Fork 26
log :: 2025‐11
etorreborre edited this page Nov 26, 2025
·
9 revisions
- I have now implemented a few tests where a client and a server start independently and stop when transactions have been transmitted.
- Simple connection -> check that txs already present in the client mempool end-up in the server mempool.
- Check that a concurrent process filling-in the client mempool will also make the server mempool eventually populated.
- Check the blocking vs non-blocking behavior.
- Vary the window size + fetch size for the server.
- Send invalid transactions and check that they are not requested again.
- Check some edge cases:
- The client requesting a transaction that was not announced by the server.
- Requesting zero transactions or zero ids.
- Too many or not enough transactions sent or acknowledged via the client or server
- Next steps:
- I still have a regression in some earlier tests, so I need to stabilize all of this.
- Proper error types (I've used anyhow errors in many places).
- Integration to amaru as a whole.
- I added more tests and did a bit of refactoring for the client side.
- I implemented the server side, which is slightly more complicated in terms of what needs to be tracked.
- I started some tests for that side but there's more to be done
- Then the next step will be to test the integration of: the client, the server and their respective mempools.
- I implemented a
TxSubmissionClientclient using the mempool to serve requests from the tx submission protocol for a given peer. - The client is using the
pallas-networkdata types modelling the protocol but it isolates the use of the pallas client viaTxSubTransportinterface. - This also allows to unit test the
TxSubmissionClientby just sending request and checking the response.- There is one initial test getting transactions with blocking requests and a mempool filled with enough transactions for now.
- I will add more tests before switching to the
TxSubmissionServerimplementation.
- I started having a looking supporting a minimal tx submission protocol.
- I'm planning to use pallas-network for now, util we can move that mini protocol to pure-stage.
- Between the client and the server should live the mempool. There is already an
amaru-mempoolcrate with a dummy mempool which is:- Not used yet.
- Geared towards the relationship with the ledger and block forging.
- I started to revisit the
Mempooltrait to be able to support the state data required for the tx submission protocol.- I'm trying to relax some assumptions on the trait to
- Be able to support a
TxIdsince this is required for the protocol - Be able to use various notions of "keys" for acknowledging a transaction. At the moment only
keysfunction definition can be used for a givenMempool. It would be nicer IMO and IIUC to pass different "keys" function to acknowledge transactions with the same mempool. I'm still fighting with the type checker to see if it is possible to implement without any copying/cloning. But maybe this is not a big deal since transaction inputs used as transaction keys are only hashes and indices.
- Be able to support a
- I'm trying to relax some assumptions on the trait to
- I gave up on using just one trace file as CBOR because the encoding depends closely to the Rust data types and is hard to recover in JavaScript land.
- There is now a CI nightly job for the simulation
- I reworked a bit the trace animation based on the rework of sync effects by Roland.
- I fixed some issues with animations and made some improvements:
- Some times were not parsed correctly for the entries animation.
- I added step forward/step back buttons for the entries animation.
- In the trace animation the hash shown was not always correct because we were not serializing that field off the
BlockHeader. I changed the serialization to make this work, that should be backwards compatible (because the deserialization stays the same) but I hope that won't break anything.
- There was indeed a bug in the chain selection, found a few times on CI, that I was able to reproduce.
- Then it was cool to be able to use all the current tools to diagnose it!
- There is now an independent trace for each state and I was able to
run_replaythat trace with the RustRover debugger. - The
entries.htmlanimation helped me understand the arrival order of messages since one of them was delayed. - The
traces.htmlanimation helped me understand what was executed by each stage exactly when. - Then some careful
debugstatement with the fullHeadersTreedisplay made me realize what the problem was.
- There is now an independent trace for each state and I was able to
- The bug is here
- We take the
tree_statebut don't put it back in case of an error. - That error happened because one message for one peer was delayed and did not arrive in order compared to the other peer's messages.
- The fix is simple but I still need to write a proper unit test for it.
- We take the
-
The rework of sync effects by Roland had a small deadlock and not all sync effects were marked as sync.
- I was not able to fix the trace replay though and Roland will take care of that.
- I changed the persistence of simulation data
- With one trace per test now (before it was the full test for the full run)
- With some isolation for the
test_run_simulationandtest_run_replayto avoid concurrency issues. - Persisted data is now saved when a Github CI job fails for later inspection.
- We had a meeting with Markus about peer selection
- There was a previous mention of peer selection in the log book, around countering an Eclipse attack.
- We agreed that all requirements should end-up in the Cardano blueprints to be shared with other nodes implementors.
- We discussed some additional options for selecting peers, in particular using a trace route for each connected peer, and fingerprinting that peer by removing the first 2, or 3 nodes from that route and the last 2 or 3 (to avoid those nodes to be controlled by an attacker). Then the geo-localization of each node can help with making a more decentralized choice for upstream peers.
- We also discussed how the Amaru node could send data to the BlockFetch client maintained by Markus's team.
- This data can then be aggregated and be used by SPOs to improve their peer selection.
- I fixed a regression with the parenting of spans: https://github.com/pragma-org/amaru/pull/540.
- The
replayfunctionality can now read data directly from the latest output of arun_simulatorrun. - It turns out that external sync effects are not yet supported by the
Replaysupport in the simulation:- The sync effects inputs and response are now traced which makes the simulation richer (now we know how an operation was called and what was the result).
- However we currently don't have a way to check, during replay, that a sync call gets called with the same inputs and returns the same response as the one recorded in the trace.
- Overall, I'm actually not sure how useful is the replay feature in its current form
- I have the impression that being able to visualize an execution trace with interim states is useful for debugging an actual issue. => This means that we should start working on producing some traces in production.
- If we want to check that a fix is working, having a full list of trace entries might not be so useful because if we change the implementation we not be able to replay exactly what happened. => The next step will be to update the simulation README with updated instructions on how to use what we have now.
- Finally provided a PR to fix the huge memory consumption we saw with each chain sync downstream connection
- While making a final pass on the PR to visualize simulation traces I noticed that we were computing the
Headerhash based on the whole header instead of the header body. @arnaud, would our Antithesis tests have detected this if they ran Cardano nodes as well? Since I think that this is incorrect I fixed this issue as part of my PR and the fix is now onmain.