[SharovBot] fix(txpool): comprehensive stale tx eviction for pending and queued sub-pools by Giulio2002 · Pull Request #19393 · erigontech/erigon

Giulio2002 · 2026-02-21T20:49:46Z

[SharovBot]

Problem

A Gnosis Chain validator operator running Erigon 3.3.8 discovered two distinct txpool eviction bugs that together cause empty block production and unbounded queued pool bloat. This PR addresses both bugs comprehensively, building on the initial analysis in PRs #19385 and #19386 (which were closed before merge).

Bug #1 — Stale Pending Transactions (AuRa/Gnosis-specific)

Root Cause

onSenderStateChange evicts pending transactions with senderNonce > tx.Nonce (stale). However, onSenderStateChange is only called for senders that appear in the EVM state-diff batch (stateChanges.ChangeBatch) sent from the execution layer.

On AuRa/Gnosis Chain, many transactions (system transactions, validator reward txns) advance nonces without producing a state-diff UPSERT entry for the sender in the batch. As a result:

The sender's nonce is mined to N+1 on-chain
The sender does not appear in sendersWithChangedState
onSenderStateChange is never called for them
Their pooled pending txns with nonce ≤ N remain in the pending pool forever
Block builders attempt to include these, get nonce too low, and produce empty blocks

Fix

In addTxnsOnNewBlock, we now build a minedSenderMinNonce map from the minedTxns argument:

For each mined tx at nonce N: minedSenderMinNonce[senderID] = max(existing, N+1)
Add all mined-tx senders to sendersWithChangedState (ensuring onSenderStateChange is always called)
When reading the nonce from kvcache, use max(kvcache_nonce, minedSenderMinNonce[senderID])

This is zero extra state queries, O(mined_txns_per_block) overhead, and works even when kvcache hasn't been updated for the sender.

Bug #2 — Zombie Queued Transactions (Universal)

Root Cause

There is no eviction for queued transactions whose nonce is astronomically higher than the sender's current on-chain nonce. Examples seen on Gnosis Chain:

Account	On-chain nonce	Queued nonce	Gap
`0x0006E9...`	281	16,814–17,013	16,533+
`0x016f6C...`	6,398	144,968	138,570

These transactions can never become pending — they would require filling 16,000+ nonce positions first. Yet they sit in the queued sub-pool indefinitely, bloating it to 4,000+ transactions and wasting memory.

The existing blob-txn nonce-gap eviction only covers BlobTxnType and only for any gap; regular transactions had no gap limit at all.

Fix

Added MaxNonceGap uint64 to txpoolcfg.Config (default: 64). In onSenderStateChange, after checking for stale nonces:

} else if p.cfg.MaxNonceGap > 0 && mt.TxnSlot.Nonce > noGapsNonce && mt.TxnSlot.Nonce-noGapsNonce > p.cfg.MaxNonceGap {
    deleteAndContinueReasonLog = "nonce gap too large"
    discardReason = txpoolcfg.NonceTooDistant
}

noGapsNonce tracks the next expected nonce accounting for consecutive txns already in the pool. A tx with nonce 100 won't be evicted if txns 0–99 are already pooled (gap from noGapsNonce is 0). A tx with nonce 16,814 when noGapsNonce=282 has gap 16,532 > 64 → evicted.

A new NonceTooDistant (37) discard reason is added for observability.

Additional Fix — Reason Tracking

Previously, all evictions in onSenderStateChange used the single reason NonceTooLow regardless of why the tx was removed. A parallel toDelReasons []txpoolcfg.DiscardReason slice now tracks the correct reason per evicted transaction.

Tests Added

TestZombieQueuedEviction — 3 sub-tests:
1. Zombie tx (gap=65 > MaxNonceGap=64) is evicted with NonceTooDistant
2. Tx at exactly MaxNonceGap boundary (gap=64) is kept
3. Consecutive txns are never zombie-evicted even when count > MaxNonceGap
TestStalePendingEvictionViaMineNonce — Verifies stale pending tx is evicted when the sender's tx is mined but the sender does NOT appear in the EVM state-diff batch (exact AuRa scenario)

Testing

go build ./txnprovider/txpool/... ✅
go test ./txnprovider/txpool/... -count=1 -timeout 5m ✅ (all tests pass including full fuzz corpus)

References

Initial incomplete fix (closed): [SharovBot] fix(txpool): advance sender nonce from mined txns to evict stale pending txns #19385, [SharovBot] fix(txpool): advance sender nonce from mined txns to evict stale pending txns #19386
Reported by Gnosis Chain validator operator on Erigon 3.3.8 (empty block production)

…ub-pools Bug #1 (stale pending txs): onSenderStateChange is only called for senders that appear in the EVM state-diff batch. On AuRa/Gnosis Chain, senders whose txns are mined via system transactions never appear in the state-diff, so their pending txns are never evicted. Fix: in addTxnsOnNewBlock, build a minedSenderMinNonce map from minedTxns (nonce N mined => on-chain nonce >= N+1), add all mined-tx senders to sendersWithChangedState, and floor the kvcache nonce with the mined floor when calling onSenderStateChange. Bug #2 (zombie queued txs): There was no eviction for queued transactions whose nonce is impossibly far ahead of the sender's on-chain nonce. Accounts on Gnosis Chain were seen with on-chain nonce=281 but queued nonces=16814+ (gap=16533). These txns can never become pending, yet sit in the pool indefinitely. Fix: add MaxNonceGap config (default 64) to txpoolcfg.Config. In onSenderStateChange, evict txns where tx.Nonce - noGapsNonce > MaxNonceGap with the new NonceTooDistant discard reason. noGapsNonce accounts for consecutive txns already in the pool, so valid long pending queues are not affected. Also improve reason tracking in onSenderStateChange: use a parallel toDelReasons slice so each evicted tx gets the correct DiscardReason (NonceTooLow vs NonceTooDistant). Tests added: - TestZombieQueuedEviction: verifies zombie txns are evicted, boundary tx kept, consecutive txns never zombie-evicted - TestStalePendingEvictionViaMineNonce: verifies stale pending tx evicted via mined-nonce floor even without EVM state-diff UPSERT event for sender

lystopad · 2026-02-23T15:49:08Z

I tested on hoodi network and confirm better processing.

Before this fix:

[INFO] [02-22|13:18:55.328] [txpool] stat                            pending=42 baseFee=3 queued=1273
[INFO] [02-22|13:21:55.328] [txpool] stat                            pending=34 baseFee=3 queued=1272
[INFO] [02-22|13:24:55.329] [txpool] stat                            pending=39 baseFee=3 queued=1274

With this fix and after 4+ hours running node:

[INFO] [02-23|15:41:41.243] [txpool] stat                            pending=36 baseFee=1 queued=25
[INFO] [02-23|15:44:41.242] [txpool] stat                            pending=39 baseFee=1 queued=25
[INFO] [02-23|15:47:41.243] [txpool] stat                            pending=36 baseFee=1 queued=25

yperbasis

I'm fine with with the fix for bug no 2 (MaxNonceGap), but not with the fix for bug 1 (missing nonce updates for Gnosis). A proper fix for it would be to ensure that such nonce changes are present in the state diffs.

More generally, it's better to tackle separate problems in separate PRs. Let's split this one into two for the two bugs and re-do the fix for bug no 1.

@yperbasis

#2) Add MaxNonceGap config (default 64) and NonceTooDistant discard reason. In onSenderStateChange, evict queued transactions whose nonce exceeds the sender's on-chain nonce (accounting for consecutive txns already in pool) by more than MaxNonceGap. These 'zombie' txns can never become pending and cause unbounded queued pool bloat (e.g. nonce 144968 vs on-chain 6398 on Gnosis Chain Erigon 3.3.8). Also fix toDelReasons to track correct discard reason per evicted txn instead of always logging NonceTooLow. Extracted from #19393 per @yperbasis review: split separate bugs into separate PRs. Closes: related to #19393

Giulio2002 · 2026-02-24T09:36:42Z

[SharovBot] 📋 Addressing @yperbasis review — splitting into separate PRs as requested:

Bug #2 (MaxNonceGap zombie eviction) → extracted to #19449. This contains only the MaxNonceGap config, NonceTooDistant discard reason, and the zombie eviction logic in onSenderStateChange which @yperbasis approved.

Bug #1 (AuRa/Gnosis stale pending — proper state-diffs approach) → investigating the proper fix as suggested: ensuring nonce changes from system/AuRa transactions appear in the StateChangeBatch accumulator. This requires changes in the execution layer's state-diff generation path. Will create a separate PR once the proper approach is confirmed.

This PR (#19393) will be superseded by those two separate PRs. Closing in favour of #19449 (Bug #2) and a follow-up PR for Bug #1.

@yperbasis

…ap (#19449) **[SharovBot]** ## Split from #19393 per @yperbasis review This PR contains **Bug #2 only** (zombie queued transaction eviction), extracted from #19393 which was asked to be split into separate PRs. ## Problem Queued transactions with an impossibly large nonce gap (e.g. on-chain nonce=281, queued nonce=16,814 — gap of 16,533) sit in the pool forever. They can never become pending without filling thousands of nonce positions first, causing unbounded queued pool bloat (4,000+ txns observed on Gnosis Chain, Erigon 3.3.8). The existing blob-txn nonce-gap eviction only covered `BlobTxnType`. Regular transactions had no gap limit. ## Fix - Add `MaxNonceGap uint64` to `txpoolcfg.Config` (default: 64) - Add `NonceTooDistant` (DiscardReason 37) for observability - In `onSenderStateChange`, evict txns whose nonce exceeds `noGapsNonce` by more than `MaxNonceGap` - `noGapsNonce` accounts for consecutive txns already pooled, so consecutive txns are never zombie-evicted - Fix `toDelReasons` parallel slice to track correct discard reason per evicted tx (was always logging `NonceTooLow`) ## Tests - `TestZombieQueuedEviction` — 3 sub-tests: 1. Zombie tx (gap=65 > MaxNonceGap=64) is evicted with `NonceTooDistant` 2. Tx at exactly MaxNonceGap boundary (gap=64) is kept 3. Consecutive txns beyond MaxNonceGap are never zombie-evicted ## Testing ``` go build ./txnprovider/txpool/... ✅ go test ./txnprovider/txpool/... -run TestZombieQueuedEviction -count=1 ✅ ``` ## Related - Bug #1 (stale pending / AuRa nonce) will be addressed separately per @yperbasis feedback - Backport to release/3.3 will follow once this is merged - Original combined PR: #19393

@yperbasis

…ap (#19449) **[SharovBot]** ## Split from #19393 per @yperbasis review This PR contains **Bug #2 only** (zombie queued transaction eviction), extracted from #19393 which was asked to be split into separate PRs. ## Problem Queued transactions with an impossibly large nonce gap (e.g. on-chain nonce=281, queued nonce=16,814 — gap of 16,533) sit in the pool forever. They can never become pending without filling thousands of nonce positions first, causing unbounded queued pool bloat (4,000+ txns observed on Gnosis Chain, Erigon 3.3.8). The existing blob-txn nonce-gap eviction only covered `BlobTxnType`. Regular transactions had no gap limit. ## Fix - Add `MaxNonceGap uint64` to `txpoolcfg.Config` (default: 64) - Add `NonceTooDistant` (DiscardReason 37) for observability - In `onSenderStateChange`, evict txns whose nonce exceeds `noGapsNonce` by more than `MaxNonceGap` - `noGapsNonce` accounts for consecutive txns already pooled, so consecutive txns are never zombie-evicted - Fix `toDelReasons` parallel slice to track correct discard reason per evicted tx (was always logging `NonceTooLow`) ## Tests - `TestZombieQueuedEviction` — 3 sub-tests: 1. Zombie tx (gap=65 > MaxNonceGap=64) is evicted with `NonceTooDistant` 2. Tx at exactly MaxNonceGap boundary (gap=64) is kept 3. Consecutive txns beyond MaxNonceGap are never zombie-evicted ## Testing ``` go build ./txnprovider/txpool/... ✅ go test ./txnprovider/txpool/... -run TestZombieQueuedEviction -count=1 ✅ ``` ## Related - Bug #1 (stale pending / AuRa nonce) will be addressed separately per @yperbasis feedback - Backport to release/3.3 will follow once this is merged - Original combined PR: #19393

@yperbasis

…vict stale pending txns (AuRa/Gnosis fix) On AuRa/Gnosis Chain, system transactions advance sender nonces without the execution layer emitting state-diff UPSERT entries for those senders. This causes onSenderStateChange to never fire for them, leaving their pooled pending transactions stale forever. Block builders then attempt to include these txns, receive 'nonce too low', and produce empty or near-empty blocks. Fix: add ensureMinedSendersInStateDiff(), called at the top of OnNewBlock before cache.OnNewBlock(). For each mined-tx sender absent from the state-diff batch, it reads the authoritative post-block account state from coreTx and injects a synthetic AccountChange{Action=UPSERT} entry. This causes: 1. kvcache to be updated with the correct nonce for that sender 2. addTxnsOnNewBlock to add the sender to sendersWithChangedState 3. onSenderStateChange to fire and evict any stale pending transactions The DB read uses coreTx.GetLatest(AccountsDomain), which falls through to the underlying MDBX store after block execution, so the correct post-block nonce is always available. If the DB has no entry (zero-value), the function floors the injected nonce at maxMinedNonce+1, guaranteeing at minimum the mined nonce is reflected. This is PR1 (Bug #1) of the split requested by @yperbasis on #19393. Bug #2 (MaxNonceGap zombie queued eviction) was addressed in #19449.

Giulio2002 · 2026-03-03T17:48:08Z

[SharovBot]

As requested by @yperbasis, this PR has been split into two separate PRs:

Bug Pull from go-ethereum up to 2f24e25 (6 Mar 2019) #2 (MaxNonceGap zombie queued eviction) → [SharovBot] fix(txpool): evict zombie queued txns exceeding MaxNonceGap #19449 ✅ merged
- Cherry-picked to release/3.3 → [SharovBot] fix(txpool): evict zombie queued txns exceeding MaxNonceGap (cherry-pick #19449 → release/3.3) #19591
Bug Pull from go-ethereum up to 26aea73 (7 Feb 2019) #1 (AuRa stale pending txns / state-diff omission) → [SharovBot] fix(txpool): inject mined-tx senders into state diff to evict stale pending txns (AuRa/Gnosis) #19592 🆕 open
- Implements ensureMinedSendersInStateDiff() — injects synthetic AccountChange{UPSERT} entries for mined-tx senders absent from the state-diff batch, so onSenderStateChange fires and evicts stale pending transactions

This PR (#19393) can be closed in favour of the above two.

@yperbasis

…ap (#19449) **[SharovBot]** ## Split from #19393 per @yperbasis review This PR contains **Bug #2 only** (zombie queued transaction eviction), extracted from #19393 which was asked to be split into separate PRs. ## Problem Queued transactions with an impossibly large nonce gap (e.g. on-chain nonce=281, queued nonce=16,814 — gap of 16,533) sit in the pool forever. They can never become pending without filling thousands of nonce positions first, causing unbounded queued pool bloat (4,000+ txns observed on Gnosis Chain, Erigon 3.3.8). The existing blob-txn nonce-gap eviction only covered `BlobTxnType`. Regular transactions had no gap limit. ## Fix - Add `MaxNonceGap uint64` to `txpoolcfg.Config` (default: 64) - Add `NonceTooDistant` (DiscardReason 37) for observability - In `onSenderStateChange`, evict txns whose nonce exceeds `noGapsNonce` by more than `MaxNonceGap` - `noGapsNonce` accounts for consecutive txns already pooled, so consecutive txns are never zombie-evicted - Fix `toDelReasons` parallel slice to track correct discard reason per evicted tx (was always logging `NonceTooLow`) ## Tests - `TestZombieQueuedEviction` — 3 sub-tests: 1. Zombie tx (gap=65 > MaxNonceGap=64) is evicted with `NonceTooDistant` 2. Tx at exactly MaxNonceGap boundary (gap=64) is kept 3. Consecutive txns beyond MaxNonceGap are never zombie-evicted ## Testing ``` go build ./txnprovider/txpool/... ✅ go test ./txnprovider/txpool/... -run TestZombieQueuedEviction -count=1 ✅ ``` ## Related - Bug #1 (stale pending / AuRa nonce) will be addressed separately per @yperbasis feedback - Backport to release/3.3 will follow once this is merged - Original combined PR: #19393

@yperbasis

…vict stale pending txns (AuRa/Gnosis fix) On AuRa/Gnosis Chain, system transactions advance sender nonces without the execution layer emitting state-diff UPSERT entries for those senders. This causes onSenderStateChange to never fire for them, leaving their pooled pending transactions stale forever. Block builders then attempt to include these txns, receive 'nonce too low', and produce empty or near-empty blocks. Fix: add ensureMinedSendersInStateDiff(), called at the top of OnNewBlock before cache.OnNewBlock(). For each mined-tx sender absent from the state-diff batch, it reads the authoritative post-block account state from coreTx and injects a synthetic AccountChange{Action=UPSERT} entry. This causes: 1. kvcache to be updated with the correct nonce for that sender 2. addTxnsOnNewBlock to add the sender to sendersWithChangedState 3. onSenderStateChange to fire and evict any stale pending transactions The DB read uses coreTx.GetLatest(AccountsDomain), which falls through to the underlying MDBX store after block execution, so the correct post-block nonce is always available. If the DB has no entry (zero-value), the function floors the injected nonce at maxMinedNonce+1, guaranteeing at minimum the mined nonce is reflected. This is PR1 (Bug #1) of the split requested by @yperbasis on #19393. Bug #2 (MaxNonceGap zombie queued eviction) was addressed in #19449.

@yperbasis

… senders to evict stale pending txns On AuRa/Gnosis Chain, block finalization (engine.Finalize) executes system transactions (validator rewards, bridge calls, etc.) that advance sender nonces. These state changes were NOT reaching the txpool state-diff batch because the block-end stateWriter in exec3_serial.go was constructed with a nil accumulator. Fix: store the block's accumulator on serialExecutor (se.accumulator) and pass it to the block-end stateWriter so that UpdateAccountData → ChangeAccount calls during Finalize/FinalizeAndAssemble emit UPSERT entries into the batch. The txpool then calls onSenderStateChange for those senders and evicts any pending transactions whose nonces are now stale. Changes: - execution/stagedsync/exec3_serial.go: - Add accumulator field to serialExecutor - Set se.accumulator = accumulator alongside StartChange per block - Pass se.accumulator to state.NewWriter in the block-end path - txnprovider/txpool/pool.go: - Remove txpool-level ensureMinedSendersInStateDiff workaround - Restore original cache.OnNewBlock ordering - txnprovider/txpool/pool_test.go: - Update TestStalePendingEvictionViaMineNonce to test via correct stateChanges from the EL (as now emitted after this fix) This is the EL-level fix requested by @yperbasis on #19392/#19393.

@yperbasis

… senders to evict stale pending txns On AuRa/Gnosis Chain, block finalization (engine.Finalize) executes system transactions (validator rewards, bridge calls, etc.) that advance sender nonces. These state changes were NOT reaching the txpool state-diff batch because the block-end stateWriter in exec3_serial.go was constructed with a nil accumulator. Fix: store the block's accumulator on serialExecutor (se.accumulator) and pass it to the block-end stateWriter so that UpdateAccountData → ChangeAccount calls during Finalize/FinalizeAndAssemble emit UPSERT entries into the batch. The txpool then calls onSenderStateChange for those senders and evicts any pending transactions whose nonces are now stale. Changes: - execution/stagedsync/exec3_serial.go: - Add accumulator field to serialExecutor - Set se.accumulator = accumulator alongside StartChange per block - Pass se.accumulator to state.NewWriter in the block-end path - txnprovider/txpool/pool.go: - Remove txpool-level ensureMinedSendersInStateDiff workaround - Restore original cache.OnNewBlock ordering - txnprovider/txpool/pool_test.go: - Update TestStalePendingEvictionViaMineNonce to test via correct stateChanges from the EL (as now emitted after this fix) This is the EL-level fix requested by @yperbasis on #19392/#19393.

@yperbasis

…ap (#19449) **[SharovBot]** ## Split from #19393 per @yperbasis review This PR contains **Bug #2 only** (zombie queued transaction eviction), extracted from #19393 which was asked to be split into separate PRs. ## Problem Queued transactions with an impossibly large nonce gap (e.g. on-chain nonce=281, queued nonce=16,814 — gap of 16,533) sit in the pool forever. They can never become pending without filling thousands of nonce positions first, causing unbounded queued pool bloat (4,000+ txns observed on Gnosis Chain, Erigon 3.3.8). The existing blob-txn nonce-gap eviction only covered `BlobTxnType`. Regular transactions had no gap limit. ## Fix - Add `MaxNonceGap uint64` to `txpoolcfg.Config` (default: 64) - Add `NonceTooDistant` (DiscardReason 37) for observability - In `onSenderStateChange`, evict txns whose nonce exceeds `noGapsNonce` by more than `MaxNonceGap` - `noGapsNonce` accounts for consecutive txns already pooled, so consecutive txns are never zombie-evicted - Fix `toDelReasons` parallel slice to track correct discard reason per evicted tx (was always logging `NonceTooLow`) ## Tests - `TestZombieQueuedEviction` — 3 sub-tests: 1. Zombie tx (gap=65 > MaxNonceGap=64) is evicted with `NonceTooDistant` 2. Tx at exactly MaxNonceGap boundary (gap=64) is kept 3. Consecutive txns beyond MaxNonceGap are never zombie-evicted ## Testing ``` go build ./txnprovider/txpool/... ✅ go test ./txnprovider/txpool/... -run TestZombieQueuedEviction -count=1 ✅ ``` ## Related - Bug #1 (stale pending / AuRa nonce) will be addressed separately per @yperbasis feedback - Backport to release/3.3 will follow once this is merged - Original combined PR: #19393

Giulio2002 requested review from taratorio and yperbasis as code owners February 21, 2026 20:49

Giulio2002 mentioned this pull request Feb 21, 2026

[SharovBot] fix(txpool): comprehensive stale tx eviction for pending and queued sub-pools #19394

Closed

style: fix gofmt formatting in pool.go

2ae6be4

lystopad requested a review from mh0lt February 21, 2026 21:49

yperbasis requested changes Feb 24, 2026

View reviewed changes

Giulio2002 mentioned this pull request Feb 24, 2026

[SharovBot] fix(txpool): evict zombie queued txns exceeding MaxNonceGap #19449

Merged

Giulio2002 mentioned this pull request Mar 3, 2026

[SharovBot] fix(txpool): inject mined-tx senders into state diff to evict stale pending txns (AuRa/Gnosis) #19592

Open

yperbasis closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SharovBot] fix(txpool): comprehensive stale tx eviction for pending and queued sub-pools#19393

[SharovBot] fix(txpool): comprehensive stale tx eviction for pending and queued sub-pools#19393
Giulio2002 wants to merge 2 commits intomainfrom
fix/txpool-eviction-comprehensive

Giulio2002 commented Feb 21, 2026

Uh oh!

lystopad commented Feb 23, 2026

Uh oh!

yperbasis left a comment

Uh oh!

Giulio2002 commented Feb 24, 2026

Uh oh!

Giulio2002 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Giulio2002 commented Feb 21, 2026

Problem

Bug #1 — Stale Pending Transactions (AuRa/Gnosis-specific)

Root Cause

Fix

Bug #2 — Zombie Queued Transactions (Universal)

Root Cause

Fix

Additional Fix — Reason Tracking

Tests Added

Testing

References

Uh oh!

lystopad commented Feb 23, 2026

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

Giulio2002 commented Feb 24, 2026

Uh oh!

Giulio2002 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants