Skip to content

fix(l1): batch GetPooledTransactions requests and skip when not synced#6443

Merged
azteca1998 merged 5 commits intomainfrom
fix/p2p-tx-batching-and-sync-guard
Apr 15, 2026
Merged

fix(l1): batch GetPooledTransactions requests and skip when not synced#6443
azteca1998 merged 5 commits intomainfrom
fix/p2p-tx-batching-and-sync-guard

Conversation

@azteca1998
Copy link
Copy Markdown
Contributor

Summary

Addresses the medium priority items from #6435:

  • is_synced guard: Skip NewPooledTransactionHashes processing when the node is still syncing — no point requesting transactions we can't use yet.
  • Rate limiting/batching: Instead of sending a GetPooledTransactions request immediately for each NewPooledTransactionHashes announcement, buffer the hashes and flush them as a single batched request every 50ms. This reduces message volume when many peers announce overlapping transaction sets in a short window.

Implementation details

  • New pending_tx_requests buffer on the per-connection Established state accumulates (announcement, hashes) pairs.
  • A periodic FlushPendingTxRequests message (every 50ms) drains the buffer, merges announcements via NewPooledTransactionHashes::merge(), and sends a single GetPooledTransactions.
  • Teardown clears in-flight state for both sent requests and pending buffered hashes.

Test plan

  • cargo build -p ethrex-p2p compiles cleanly
  • cargo build -p ethrex-p2p --features l2 compiles cleanly
  • cargo test -p ethrex-p2p — all 21 tests pass
  • Manual testing on devnet to verify reduced GetPooledTransactions message volume

@github-actions github-actions Bot added the L1 Ethereum client label Apr 2, 2026
@azteca1998 azteca1998 force-pushed the fix/p2p-tx-batching-and-sync-guard branch from 392bd36 to 8d1dde9 Compare April 2, 2026 10:29
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

Lines of code report

Total lines added: 89
Total lines removed: 0
Total lines changed: 89

Detailed view
+-----------------------------------------------------------+-------+------+
| File                                                      | Lines | Diff |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/connection/handshake.rs | 495   | +1   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/connection/server.rs    | 1364  | +60  |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/eth/transactions.rs     | 324   | +28  |
+-----------------------------------------------------------+-------+------+

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

🤖 Claude Code Review

Now I have everything I need for the review.


Code Review: fix(l1): batch GetPooledTransactions requests and skip when not synced

Summary

The changes are well-structured and address a real performance problem. The overall approach — buffering announcements and flushing on a timer — is sound. A few correctness issues are worth addressing before merging.


Correctness Issues

1. GetPooledTransactions 256-hash limit not enforced

The devp2p ETH protocol spec mandates GetPooledTransactions MUST NOT exceed 256 items. In a busy network with many peers, pending_tx_requests can accumulate more than 256 filtered hashes within a 50ms window. The flush sends all of them in a single request, violating the spec.

server.rs:1411–1440 (flush_pending_tx_requests) — the batch should be split into chunks of at most 256 hashes, with a separate GetPooledTransactions per chunk:

const MAX_POOLED_TX_HASHES: usize = 256;

for chunk in all_hashes.chunks(MAX_POOLED_TX_HASHES) {
    let request = GetPooledTransactions::new(random(), chunk.to_vec());
    // store and send each chunk independently...
}

2. Divergence between all_hashes and merged_announcement in the flush

server.rs:1420–1438all_hashes is built from hashes (the already-filtered-by-get_transactions_to_request subset), while merged_announcement is built by merging the full original announcements. The merged announcement may therefore contain hashes that are NOT being requested (they were filtered out as already known or already in-flight from a different peer).

This mirrors the existing pattern from before the PR (requested_pooled_txs stored the full announcement alongside the filtered hashes), so response validation still works correctly. However, the comment on the field says "announcement is kept for response validation" — it's worth adding a note here to make the intentional divergence explicit, to prevent future confusion when someone looks at the stored pair and wonders why announcement.transaction_hashesall_hashes.


3. merge() does not deduplicate hashes in the merged announcement

transactions.rs:131–137 — Two different peers can each announce the same transaction hash. Each call to get_transactions_to_request will filter it correctly (the second peer's hash will be dropped since it's already in-flight), so all_hashes stays clean. But the merged announcement's transaction_hashes array can still contain the same hash twice (once per peer). During response validation (PooledTransactions handler ~line 1265), the lookup uses .position() which returns the first match, so type/size validation still works — but the duplicate entry in the merged announcement is logically wrong and may confuse future readers or future validation logic.

Since the announced hashes aren't used directly in the request (only for validation by lookup), this is not exploitable, but consider deduplicating all_hashes before building the request, or stripping duplicates when building the merged announcement.


Minor Issues

4. Unnecessary intermediate allocation in merge()

transactions.rs:132–134:

let mut types = self.transaction_types.to_vec();
types.extend_from_slice(&other.transaction_types);
self.transaction_types = types.into();

This allocates a new Vec, copies into it, then converts back to Bytes. Since transaction_types is a Bytes field and the struct fields aren't pub, consider either changing the field to Vec<u8> (which would make extend directly possible) or using BytesMut:

let mut types = bytes::BytesMut::from(self.transaction_types.as_ref());
types.extend_from_slice(&other.transaction_types);
self.transaction_types = types.freeze();

This is micro-optimization territory, but it's the hot path for the merge.

5. Unbounded pending_tx_requests buffer

There's no cap on how many entries can accumulate in pending_tx_requests between flush ticks. While the 50ms interval is short, a flood of distinct tx announcements from many peers could make the buffer large. Combined with item 1 above, a size cap (e.g. 2 * 256 = 512 filtered hashes, then trigger an early flush or drop) would bound both memory and the protocol violation.


What's Working Well

  • Teardown correctly clears both requested_pooled_txs and the new pending_tx_requests buffer — the in-flight state accounting is consistent.
  • The is_synced() guard is the right place and silently dropping is the right behavior (no point asking peers for mempool txs we can't use).
  • std::mem::take in flush_pending_tx_requests is the right idiom to drain the buffer atomically.
  • The insert-before-send ordering in the flush (inserting into requested_pooled_txs before calling send) is correct; on error the teardown path handles cleanup.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

🤖 Codex Code Review

  1. flush_pending_tx_requests now concatenates every buffered hash into a single GetPooledTransactions, but the response path still has no size cap. server.rs:1417, server.rs:1435, transactions.rs:197. A peer can flood NewPooledTransactionHashes during the 50ms window and make us request a very large batch, then force a very large PooledTransactions allocation/encode/decode on both sides. This is a network-level DoS regression. The batcher should chunk by hash count and/or expected encoded response size before sending.

  2. The batching change widens response validation from “what we requested” to “anything present in any buffered announcement.” requested_pooled_txs is documented as (original announcement, actually requested hashes), but the new code stores one merged announcement for the whole batch and still validates against announcement contents, not the all_hashes that were actually requested. server.rs:204, server.rs:1437, transactions.rs:257. After this PR, a peer can include any tx from any merged announcement and it will pass validate_requested, even if that tx was not in the wire request. For mempool admission logic, that is the wrong trust boundary.

  3. NewPooledTransactionHashes::merge copies transaction_types on every merge, so batching many announcements turns into repeated realloc/copy work. transactions.rs:131. In the hot path this is avoidable overhead; reserve once and extend in place, or accumulate into fresh vectors during flush_pending_tx_requests instead of repeatedly cloning prior state.

I did not run tests. I also did not find coverage for the new batching/flush behavior in transactions_tests.rs.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

Base automatically changed from fix/dedup-get-pooled-transactions to main April 6, 2026 21:12
@azteca1998 azteca1998 marked this pull request as ready for review April 7, 2026 12:57
@azteca1998 azteca1998 requested a review from a team as a code owner April 7, 2026 12:57
@ethrex-project-sync ethrex-project-sync Bot moved this to In Review in ethrex_l1 Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

🤖 Kimi Code Review

Critical Issues

1. Data loss on partial send failure in transaction batching (crates/networking/p2p/rlpx/connection/server.rs:1455-1499)
In flush_pending_tx_requests, std::mem::take drains the pending buffer at line 1462, but the subsequent send calls in the loop (line 1489) can fail with ?. If chunk n of m fails, chunks n+1 to m are lost permanently. These transactions will hang as "in-flight" in the mempool until the 30s timeout expires, causing unnecessary latency.

Fix: Either re-buffer unsent chunks on failure, or only remove successfully sent items from the buffer.

2. Mismatch between reserved in-flight state and actual send (crates/networking/p2p/rlpx/connection/server.rs:1251-1262)
When NewPooledTransactionHashes arrives, get_transactions_to_request immediately marks hashes as in-flight in the mempool (reserving them), but they're only actually sent later during flush. If the connection closes between buffering and flushing (or if flush fails as noted above), the cleanup code at line 242-245 clears them, which is correct. However, if the node crashes or the flush task panics, these hashes remain reserved for up to 30 seconds, blocking legitimate re-requests.

Recommendation: Consider a shorter timeout for buffered (not yet sent) requests, or send an empty GetPooledTransactions to un-reserve them on disconnect.

Code Quality & Rust Best Practices

3. Inefficient O(n×m) lookup in filter_to (crates/networking/p2p/rlpx/eth/transactions.rs:144-156)
The nested loop (position inside requested iteration) is acceptable for small batches (≤256), but consider using a HashSet<H256> for the requested parameter if this becomes hot.

4. Unnecessary vector allocations in chunking (crates/networking/p2p/rlpx/connection/server.rs:1470-1473)
all_types and all_sizes collect all data upfront, then slice into chunks. This allocates memory proportional to total pending hashes. Given the 50ms flush interval and 256-hash limit, this is minor, but you could stream chunks directly without the intermediate all_* vectors to reduce memory pressure during high load.

5. Inconsistent visibility in NewPooledTransactionHashes (crates/networking/p2p/rlpx/eth/transactions.rs:69-70)
Fields changed from private to pub(crate) and a new from_raw constructor added. Ensure this is intentional—exposing internal encoding details (transaction_types: Bytes) increases API surface area. If only the server module needs this, pub(crate) is appropriate.

Test Infrastructure

6. Fragile Dockerfile workspace mutation (test/packet-storm/flooder/Dockerfile:41)
Using sed to inject the flooder into the workspace Cargo.toml is brittle. If the members array format changes (e.g., trailing comma, different whitespace), the build breaks. Consider adding the flooder to the workspace permanently with optional = true or using a separate workspace.

7. Large generated file committed (test/packet-storm/docker-compose.yaml)
This 1600-line file is auto-generated by generate.py. Consider removing it from git and generating it in CI/build scripts to avoid drift between the script and the committed version.

DB Explorer Tooling

8. Blocking runtime in library code (tooling/db_explorer/src/lib.rs:28-30)
DbExplorer spawns a single-threaded Tokio runtime for blocking DB operations. This is acceptable for a CLI tool, but document that this type should not be instantiated multiple times in the same process (runtime panics on nested creation) and is unsuitable for use in an async application.

9. Potential database reentrancy (tooling/db_explorer/src/lib.rs:380-385)
state_stats calls storage_slot_count (which accesses the DB) from within fkv_iter.for_each. If the underlying storage uses a single connection or lock, this may deadlock or panic depending on the Store implementation. Verify that iter_fkv_accounts and count_fkv_storage_slots can be used concurrently.

Minor Suggestions

  • Line 1251: The sync check if state.blockchain.is_synced() is good, but consider logging at trace level when skipping requests due to syncing to aid debugging.
  • Line 1465-1473: The filter_to logic assumes the announcement contains all requested hashes. If get_transactions_to_request filtered some out (already known), the filter_to call is redundant but harmless.
  • Line 1481-1482: chunk_types and chunk_sizes slicing uses offset + chunk.len(). Since chunks guarantees the last chunk may be smaller, this is safe, but consider chunk.iter().enumerate() or using windows to avoid manual offset tracking.

Summary
The transaction batching logic is sound but needs hardening against partial send failures. The test infrastructure effectively reproduces the amplification attack, and the DB explorer is a useful debugging addition with appropriate caveats about its blocking runtime.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

🤖 Codex Code Review

Findings:

  1. High: the new db_explorer “state at block N” paths are not actually block-scoped. In tooling/db_explorer/src/lib.rs#L318 the code fetches state_root for block_number, but then scans iter_fkv_accounts() and count_fkv_storage_slots() from the flat tables, which represent the current flattened state, not the historical trie at that root. The same problem shows up in tooling/db_explorer/src/main.rs#L322, where SlotCount --block prints a block number but never uses it in the lookup. This will silently return misleading results for any non-latest block.

  2. Medium: the batching path introduces peer-triggerable quadratic work. crates/networking/p2p/rlpx/eth/transactions.rs#L144 does position() over self.transaction_hashes for every requested hash, and crates/networking/p2p/rlpx/connection/server.rs#L1472 calls that for every buffered announcement during each flush. Since NewPooledTransactionHashes comes from the network and frame size is large, a malicious peer can force very expensive O(n^2) scans on a synced node. Building an index once per announcement, or filtering in a single pass with a HashSet, would avoid this.

  3. Low: state_stats() can panic on valid CLI input. tooling/db_explorer/src/lib.rs#L388 does stats.total_accounts % report_every, but report_every is user-controlled and clap does not prevent --report-every 0. That turns a diagnostic command into an immediate panic instead of a handled error.

  4. Low: slot-count failures are silently converted into 0, which can mask DB corruption or iterator bugs and skew the reported histogram/top accounts. See tooling/db_explorer/src/lib.rs#L356. For an explorer/diagnostic tool, swallowing those errors makes the output look authoritative when it is not.

Aside from those points, the RLPx cleanup path for pending/in-flight tx requests looks structurally sound.

I couldn’t run cargo check in this sandbox because rustup failed to create temp files under a read-only location.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

Implements two medium-priority P2P efficiency improvements: an is_synced() guard that skips NewPooledTransactionHashes processing while the node is syncing, and a 50ms batching window that accumulates announced transaction hashes into a single GetPooledTransactions request per flush tick instead of one request per announcement. Teardown correctly drains both requested_pooled_txs and the new pending_tx_requests buffer via clear_in_flight_txs.

Confidence Score: 5/5

Safe to merge; both remaining findings are P2 style suggestions with no blocking correctness issues.

The core logic — deduplication via reserve_unknown_hashes, chunking at 256 hashes, teardown cleanup of both maps, and blob-validation before the sync guard — is all correct. The two comments are minor: a send-before-insert ordering preference and a missing re-check of is_synced at flush time, neither of which causes data loss or incorrect behaviour.

crates/networking/p2p/rlpx/connection/server.rs in flush_pending_tx_requests (lines 1491-1495 insert-before-send ordering).

Important Files Changed

Filename Overview
crates/networking/p2p/rlpx/connection/server.rs Adds pending_tx_requests buffer with 50ms periodic flush and is_synced guard; teardown correctly drains both in-flight maps
crates/networking/p2p/rlpx/connection/handshake.rs Trivially initializes the new pending_tx_requests field to an empty Vec at connection setup
crates/networking/p2p/rlpx/eth/transactions.rs Adds filter_to and from_raw helpers on NewPooledTransactionHashes for batched-request construction
test/packet-storm/docker-compose.yaml Adds a 25+ node packet-storm stress-test cluster for validating reduced GetPooledTransactions message volume
tooling/db_explorer/src/lib.rs New standalone read-only DB explorer library; unrelated to core p2p transaction batching changes

Sequence Diagram

sequenceDiagram
    participant Peer
    participant Handler as handle_incoming_message
    participant Buffer as pending_tx_requests
    participant Flush as flush_pending_tx_requests (50ms tick)
    participant InFlight as requested_pooled_txs
    participant Mempool

    Peer->>Handler: NewPooledTransactionHashes
    Handler->>Handler: is_synced()?
    alt not synced
        Handler-->>Peer: (skip silently)
    else synced
        Handler->>Mempool: reserve_unknown_hashes(hashes)
        Mempool-->>Handler: reserved hashes
        Handler->>Buffer: push (announcement, hashes)
    end

    Note over Flush: Every 50 ms
    Flush->>Buffer: std::mem::take()
    Buffer-->>Flush: pending entries
    loop each ≤256-hash chunk
        Flush->>InFlight: insert (announcement, hashes, now)
        Flush->>Peer: GetPooledTransactions(chunk)
    end

    Peer->>Handler: PooledTransactions(id, txs)
    Handler->>InFlight: remove(id)
    InFlight-->>Handler: (announcement, requested_hashes)
    Handler->>Mempool: clear_in_flight_txs(requested_hashes)
    Handler->>Handler: is_synced()?
    alt synced
        Handler->>Handler: validate_requested()
        Handler->>Mempool: add transactions
    end

    Note over Handler: On teardown
    Handler->>InFlight: drain → clear_in_flight_txs
    Handler->>Buffer: drain → clear_in_flight_txs
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/rlpx/connection/server.rs
Line: 1491-1495

Comment:
**Insert-before-send ordering leaves stale entries on send failure**

The entry is inserted into `requested_pooled_txs` before `send` is awaited. If `send` errors on any chunk in a multi-chunk batch (>256 hashes), subsequent chunks are already registered in the map but no message was ever transmitted. They remain as stale in-flight entries until the 30-second sweep timer clears them. Swapping the order — send first, then insert — eliminates this window at the cost of one extra clone of `request.id`.

```suggestion
        send(state, Message::GetPooledTransactions(request.clone())).await?;
        state
            .requested_pooled_txs
            .insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/rlpx/connection/server.rs
Line: 1459-1462

Comment:
**Sync guard not rechecked at flush time**

`is_synced()` gates buffering in `handle_incoming_message`, but `flush_pending_tx_requests` sends whatever is in the buffer unconditionally. If the node transitions synced→non-synced within the 50ms window, the buffered hashes are still dispatched. The hashes are already reserved as in-flight and will be cleaned up on response or by the sweep timer, so this is not a correctness issue — but adding an early-return guard keeps the two paths consistent and avoids unnecessary network traffic.

```suggestion
async fn flush_pending_tx_requests(state: &mut Established) -> Result<(), PeerConnectionError> {
    if state.pending_tx_requests.is_empty() || !state.blockchain.is_synced() {
        state.pending_tx_requests.clear();
        return Ok(());
    }
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "chore: merge main and resolve conflicts" | Re-trigger Greptile

Comment on lines +1491 to +1495
let request = GetPooledTransactions::new(random(), chunk.to_vec());
state
.requested_pooled_txs
.insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
send(state, Message::GetPooledTransactions(request)).await?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Insert-before-send ordering leaves stale entries on send failure

The entry is inserted into requested_pooled_txs before send is awaited. If send errors on any chunk in a multi-chunk batch (>256 hashes), subsequent chunks are already registered in the map but no message was ever transmitted. They remain as stale in-flight entries until the 30-second sweep timer clears them. Swapping the order — send first, then insert — eliminates this window at the cost of one extra clone of request.id.

Suggested change
let request = GetPooledTransactions::new(random(), chunk.to_vec());
state
.requested_pooled_txs
.insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
send(state, Message::GetPooledTransactions(request)).await?;
send(state, Message::GetPooledTransactions(request.clone())).await?;
state
.requested_pooled_txs
.insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/rlpx/connection/server.rs
Line: 1491-1495

Comment:
**Insert-before-send ordering leaves stale entries on send failure**

The entry is inserted into `requested_pooled_txs` before `send` is awaited. If `send` errors on any chunk in a multi-chunk batch (>256 hashes), subsequent chunks are already registered in the map but no message was ever transmitted. They remain as stale in-flight entries until the 30-second sweep timer clears them. Swapping the order — send first, then insert — eliminates this window at the cost of one extra clone of `request.id`.

```suggestion
        send(state, Message::GetPooledTransactions(request.clone())).await?;
        state
            .requested_pooled_txs
            .insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1459 to +1462
async fn flush_pending_tx_requests(state: &mut Established) -> Result<(), PeerConnectionError> {
if state.pending_tx_requests.is_empty() {
return Ok(());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Sync guard not rechecked at flush time

is_synced() gates buffering in handle_incoming_message, but flush_pending_tx_requests sends whatever is in the buffer unconditionally. If the node transitions synced→non-synced within the 50ms window, the buffered hashes are still dispatched. The hashes are already reserved as in-flight and will be cleaned up on response or by the sweep timer, so this is not a correctness issue — but adding an early-return guard keeps the two paths consistent and avoids unnecessary network traffic.

Suggested change
async fn flush_pending_tx_requests(state: &mut Established) -> Result<(), PeerConnectionError> {
if state.pending_tx_requests.is_empty() {
return Ok(());
}
async fn flush_pending_tx_requests(state: &mut Established) -> Result<(), PeerConnectionError> {
if state.pending_tx_requests.is_empty() || !state.blockchain.is_synced() {
state.pending_tx_requests.clear();
return Ok(());
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/rlpx/connection/server.rs
Line: 1459-1462

Comment:
**Sync guard not rechecked at flush time**

`is_synced()` gates buffering in `handle_incoming_message`, but `flush_pending_tx_requests` sends whatever is in the buffer unconditionally. If the node transitions synced→non-synced within the 50ms window, the buffered hashes are still dispatched. The hashes are already reserved as in-flight and will be cleaned up on response or by the sweep timer, so this is not a correctness issue — but adding an early-return guard keeps the two paths consistent and avoids unnecessary network traffic.

```suggestion
async fn flush_pending_tx_requests(state: &mut Established) -> Result<(), PeerConnectionError> {
    if state.pending_tx_requests.is_empty() || !state.blockchain.is_synced() {
        state.pending_tx_requests.clear();
        return Ok(());
    }
```

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

🤖 Claude Code Review

Now I have enough context to write the full review.


PR Review: fix(l1): batch GetPooledTransactions requests and skip when not synced

The overall design is sound — buffering announcements with a 50 ms flush window and deduplicating via reserve_unknown_hashes at receive time correctly prevents duplicate requests from concurrent peer announcements. The is_synced() guard, teardown logic, and std::mem::take idiom are all correct.

A few issues worth addressing before merge:


Medium — In-flight hashes leak on partial flush failure

server.rs, flush_pending_tx_requests

The loop inserts each chunk into requested_pooled_txs and then sends it:

state.requested_pooled_txs.insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
send(state, Message::GetPooledTransactions(request)).await?;

If send fails on chunk i, execution returns early with Err. Chunks i+1..N have not been inserted into requested_pooled_txs, but their hashes were already marked in-flight by reserve_unknown_hashes at receive time. Neither the inflight sweep (which only scans requested_pooled_txs) nor the teardown path (which drains requested_pooled_txs and pending_tx_requests, both now empty) will clear them. Those hashes are permanently stuck in in_flight_txs once the connection closes.

Fix: accumulate all chunks into requested_pooled_txs before entering the send loop, or collect and clear the remaining hashes on early return:

let mut all_chunks: Vec<(GetPooledTransactions, NewPooledTransactionHashes, Vec<H256>)> = ...;
// Pre-register all in requested_pooled_txs
for (req, ann, hashes) in &all_chunks {
    state.requested_pooled_txs.insert(req.id, (ann.clone(), hashes.clone(), Instant::now()));
}
// Send; on failure, teardown will clean up via requested_pooled_txs
for (req, ..) in all_chunks {
    send(state, Message::GetPooledTransactions(req)).await?;
}

Medium — Unbounded pending_tx_requests buffer

server.rs:1261–1270

A malicious (or misbehaving) peer can send an unbounded number of NewPooledTransactionHashes messages in a 50 ms window. Each push reserves hashes in the mempool and grows the buffer. While reserve_unknown_hashes limits the mempool impact once hashes are exhausted, the buffer itself is uncapped and grows proportionally to message count × hashes per message.

Consider a ceiling on total buffered hashes (e.g., 4 × MAX_HASHES_PER_REQUEST = 1024) and silently drop announcements beyond it.


Low — filter_to is O(n × m)

transactions.rs:144–160

For each hash in requested, filter_to does a linear scan of self.transaction_hashes. With announcements of up to 4096 hashes and multiple accumulated announcements per flush, this compounds. A HashMap<H256, (u8, usize)> built once from the announcement reduces this to O(n + m):

let lookup: HashMap<H256, (u8, usize)> = self.transaction_hashes.iter()
    .zip(self.transaction_types.iter().copied().zip(self.transaction_sizes.iter().copied()))
    .map(|(&h, (t, s))| (h, (t, s)))
    .collect();
for &hash in requested {
    if let Some(&(ty, sz)) = lookup.get(&hash) { ... }
}

Low — from_raw skips the length invariant enforced by decode

transactions.rs:76–86

The decode method explicitly rejects messages where the three arrays have mismatched lengths. from_raw silently accepts them. A mismatch would cause an index panic in validate_requested at transaction_types[pos] / transaction_sizes[pos] (line 298–299). This is only called from internal code, but a debug_assert_eq! at the top of from_raw would catch misuse during development at zero release cost.


Minor — Stale comment on requested_pooled_txs

server.rs:207

"The announcement is kept for response validation"

After this PR, the value stored is a reconstructed NewPooledTransactionHashes built from raw chunk slices — not the original peer announcement. The comment is still accurate in spirit but should note that it is a trimmed/synthetic announcement, not the original wire message.


Minor — filter_to silently drops hashes not found in the announcement

transactions.rs:149–153

If a hash in requested is absent from self.transaction_hashes, the entry is silently omitted. In normal flow this shouldn't happen (hashes come from reserve_unknown_hashes which filters through the same announcement), but an omitted hash would be in-flight in the mempool with no corresponding entry in the chunk's tracked hashes, so it would leak until connection teardown. A debug_assert or a warn! log on the missing-hash branch would make this kind of divergence visible.


Test coverage gap

No unit tests cover the new flush_pending_tx_requests logic, the filter_to method, or the from_raw constructor. Given that the in-flight state correctness is critical for mempool health, at minimum a unit test for filter_to (including the "hash not found" path) and an integration test that verifies no in-flight hashes are leaked after a failed send would be valuable before merging.


Packet-storm test infrastructure

The test/packet-storm/ directory is a well-structured reproduction harness. The auto-generated docker-compose.yaml (50 nodes, 1597 lines) probably belongs in .gitignore or should be generated on demand rather than committed — it's noisy in diffs and generate.py is already in the repo. Consider adding a note in the compose header or a README confirming it is intentionally committed.


Automated review by Claude (Anthropic) · sonnet · custom prompt

.insert(request.id, (announcement, chunk.to_vec(), Instant::now()));
send(state, Message::GetPooledTransactions(request)).await?;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If send() fails on, say, the 2nd chunk, the ? propagates the error and drops the local all_hashes/all_types/all_sizes vectors. Hashes from un-sent chunks were already removed from pending_tx_requests by std::mem::take (line 1464) but never inserted into requested_pooled_txs, so teardown() won't know to clear them from the mempool's in_flight_txs set. Those hashes become permanently stuck, preventing any peer from re-requesting them.

One fix: before the early ? return, clear in-flight for all hashes that weren't sent. For example, you could collect all hashes into a flat vec up front, track how many were successfully sent, and on error clear the remainder:

let result = send(state, Message::GetPooledTransactions(request)).await;
if result.is_err() {
    // Clear in-flight for all remaining unsent hashes
    let unsent = &all_hashes[offset + chunk.len()..];
    let _ = state.blockchain.mempool.clear_in_flight_txs(unsent);
    return result;
}

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in ethrex_l1 Apr 7, 2026
avilagaston9
avilagaston9 approved these changes Apr 9, 2026
Addresses the medium priority items from #6435:

- is_synced guard: skip NewPooledTransactionHashes processing when the
  node is still syncing, since we won't be building blocks soon.

- Rate limiting/batching: instead of sending a GetPooledTransactions
  request immediately for each NewPooledTransactionHashes announcement,
  buffer the hashes and flush them as a single batched request every
  50ms. This reduces message volume when many peers announce overlapping
  transaction sets.
…drop merge()

- Enforce the devp2p 256-hash-per-request limit by chunking batched
  GetPooledTransactions requests.
- Build trimmed announcements containing only the actually-requested
  hashes (with matching types/sizes), fixing the validation divergence
  where the merged announcement could contain un-requested entries.
- Replace the allocation-heavy merge() with filter_to() + from_raw(),
  which builds the final vectors once without repeated reallocs.
…stuck in-flight hashes

In flush_pending_tx_requests, the entry was inserted into
requested_pooled_txs before send() was awaited. If send() failed on
any chunk, the current chunk's hashes were already registered but no
message was transmitted, and the cleanup only covered hashes after the
failed chunk (offset + chunk.len()..) missing the current chunk itself.

Fix both issues:
1. Move the insert after a successful send, so only actually-sent
   hashes get tracked in requested_pooled_txs.
2. On send failure, clear in-flight from offset.. (including the
   current chunk) rather than offset + chunk.len().. so no hashes
   become permanently stuck in the mempool's in_flight_txs set.
@azteca1998 azteca1998 force-pushed the fix/p2p-tx-batching-and-sync-guard branch from 9a7aad2 to cc7c2f9 Compare April 13, 2026 11:21
@github-project-automation github-project-automation Bot moved this from In Progress to In Review in ethrex_l1 Apr 14, 2026
@azteca1998 azteca1998 enabled auto-merge April 15, 2026 13:50
@azteca1998 azteca1998 added this pull request to the merge queue Apr 15, 2026
Merged via the queue into main with commit b4118ed Apr 15, 2026
51 of 52 checks passed
@azteca1998 azteca1998 deleted the fix/p2p-tx-batching-and-sync-guard branch April 15, 2026 15:23
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants