feat(mempool): mempool-aware nonce lookahead in assignNonce (batch C PR 2)#885
Conversation
…PR 2)
PR 2 of the 3-PR batch C sequence specified in
`docs/specs/audit-sweep-batch-c-nonce.md`. Extends the PR 1 strict-
equality nonce check to account for txs already queued in this
node's mempool. Caller remains commented out — no live behaviour
change.
Why this is needed:
PR 1 enforced `tx.content.nonce === account.nonce + 1`. A sender
submitting N transactions back-to-back reads `account.nonce` once
per submission via getAddressNonce → SDK sets all N txs with
the same nonce. account.nonce does not advance until block
inclusion, so submissions 2..N would have been rejected.
What this PR ships:
- Mempool.countPendingByAddress(address): new static method that
counts mempool rows whose content.from matches the supplied
lowercase hex pubkey via a `LOWER(content->>'from')` JSON query.
Single-writer mempool (this node) is the source of truth; the
case-insensitive comparison defends against stored mixed-case
data even though every callsite already lowercases.
- assignNonce: extend the expected-nonce formula to
`account.nonce + 1 + pendingMempoolCount`. Wraps the count call
in a try/catch so DB errors return false (reject the tx) rather
than propagating — same failure mode as the existing account-
lookup branch. Error log now includes pendingMempoolCount for
triage.
- JSDoc updated to reflect mempool-aware semantics and clarify
that single-node correctness is the goal here. Cross-node
safety net is PR 3's consensus-side `expectedPrior` check.
Out of scope for PR 2:
- PR 3: GCREdit `expectedPrior` field, nonce-edit emission in
HandleNativeOperations, consensus-side rejection in
GCRNonceRoutines, caller uncomment. Requires the single SDK
publish covered in the design doc.
Verification:
- tsc --noEmit clean on both changed files.
- countPendingByAddress is read-only — no DB mutations.
- assignNonce caller remains commented out, so this is dead code
until PR 3 lands.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
More reviews will be available in 21 minutes and 54 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR extends the PR 1 nonce-enforcement infrastructure with a mempool-aware lookahead so back-to-back same-sender submissions are accepted in sequence. The
Confidence Score: 5/5Safe to merge — the assignNonce caller remains commented out so no live validation path is activated by this change. Both new code paths (countPendingByAddress and the expanded assignNonce) are unreachable through any live call site. The reference_block window filter correctly mirrors cleanMempool's cutoff, addressing the stale-entry concern raised in PR 1 review. The single new finding (missing expression index on the JSON sender field) is a pre-launch performance concern that does not affect correctness. Before the PR 3 caller wire-up, a database migration adding a functional index on LOWER((content->>'from')) in the mempooltx table should accompany or precede that PR. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant confirmTransaction
participant assignNonce
participant GCR as GCR DB (account.nonce)
participant Mempool as Mempool DB
Client->>confirmTransaction: POST /transaction (tx)
Note over confirmTransaction: caller commented out — not live yet
confirmTransaction->>assignNonce: assignNonce(tx)
assignNonce->>GCR: "findOne(pubkey=sender)"
GCR-->>assignNonce: "account.nonce = N"
assignNonce->>Mempool: countPendingByAddress(sender)
Mempool-->>assignNonce: "pendingCount = k"
assignNonce->>assignNonce: "expected = N + 1 + k"
alt "tx.content.nonce === expected"
assignNonce-->>confirmTransaction: true
confirmTransaction->>Mempool: addTransaction(tx)
else nonce mismatch
assignNonce-->>confirmTransaction: false (reject)
end
Reviews (2): Last reviewed commit: "fix(audit-sweep): address Greptile revie..." | Re-trigger Greptile |
Two code findings + one doc finding. P1 (stale-mempool inflation) — countPendingByAddress in mempool.ts Previous query had no reference_block filter, so it counted every row matching the sender — including txs that had aged past the referenceBlockRoom window and were waiting for the next cleanMempool sweep. A sender with one or more expired-but-not-yet- cleaned txs would compute an expected nonce higher than what they could supply, so every subsequent legitimate submission would be rejected until cleanup ran. Add `tx.reference_block >= :cutoff` with `cutoff = lastBlock - referenceBlockRoom`, mirroring the exact rule cleanMempool uses to mark a tx stale and isReferenceBlockAllowed enforces on inbound RPC. Count now matches what will actually be executed. P1 (TOCTOU race) — assignNonce in validateTransaction.ts Two concurrent same-sender submissions both read pendingCount=0 before either is admitted to the mempool, compute the same expected nonce, and both pass — duplicate-nonce txs in the mempool simultaneously. The PR description correctly scoped the cross-node version to PR 3, but missed the same-node concurrent path. PR 2 explicitly does NOT close this window in code (the caller is commented out, so it is unreachable); the JSDoc now documents the deferral and PR 3 will wrap the validate + Mempool.addTransaction sequence in a per-sender critical section via pg_advisory_xact_lock(hashtext(sender)), released at commit, so concurrent same-sender submissions serialise through validation in order. Also added a Risks-table row in docs/specs/audit-sweep-batch-c-nonce.md naming the lock approach. P2 (spec doc) — replace #TBD with the actual PR number (#885) Frontmatter PR list now reflects this PR's identity. Risks table also gained the new TOCTOU + stale-mempool rows so the design doc stays consistent with the iter-1 changes. Verification: - tsc --noEmit clean on both changed files. - cleanMempool's cutoff logic re-used directly (Chain.getLastBlockNumber + getSharedState.referenceBlockRoom); imports already present.
…tch C PR 3) [DRAFT - review needed] (#886) * feat(consensus): nonceEnforcement consensus rule + caller wire-up (batch C PR 3 / Path A) Final PR of the 3-PR batch C sequence specified in `docs/specs/audit-sweep-batch-c-nonce.md`. Closes the cross-RPC replay window end-to-end and activates the `nonceEnforcement` fork on devnet (already active at block 0 since PR 1's genesis fixture). Path A — SDK ships type-only, node populates `expectedPrior` at apply time. Chosen over Path B (SDK-side emit) and Path C (SDK fork introspection) for zero breaking-change risk: old SDK clients continue working post-fork without re-publishing. See spec doc §"Emission-site rationale" for the trade-off matrix. Changes: src/libs/blockchain/gcr/gcr_routines/GCRNonceRoutines.ts - When `expectedPrior` is set on a non-rollback nonce edit, reject the apply iff `accountGCR.nonce !== expectedPrior`. - Rollback path skips the check (unwinding, not validating forward progress). - Cross-RPC double-spend safety net: two competing same-nonce txs bundled into the same block from different RPCs both pass per-node validation but only one survives consensus apply. src/libs/blockchain/gcr/handleGCR.ts - In `applyTransaction`'s per-edit loop, populate `expectedPrior` on every `nonce` edit when `nonceEnforcement` is active at the apply block. Read live from `entities.accounts` so a hypothetical second nonce edit in the same tx sees the post-first-apply value. - Block height: `tx.blockNumber ?? lastBlockNumber ?? 0`. - Skip on rollback. Skip if `expectedPrior` is already set (defence against future SDK-side population). src/libs/network/endpointValidation.ts - Symmetric strip of `expectedPrior` on both `gcrEdits` (regen) and `txEditsBlanked` (SDK-shipped) before hash compare. Mirrors the existing `txhash` blanking pattern. - SDK never writes the field today; the strip is defence in depth against future SDK-side emission and any non-conformant clients. src/libs/blockchain/routines/validateTransaction.ts - Uncomment `assignNonce` caller (lines 80-89). Pre-fork it short-circuits to true (PR 1). Post-fork enforces strict sequential semantics with mempool lookahead (PR 2) + the matching consensus-side rejection above. package.json - Bump `@kynesyslabs/demosdk` from 4.0.3 to 4.0.5 — the published SDK that adds the `expectedPrior?: number` field to `GCREditNonce` (type-only, runtime unchanged). docs/specs/audit-sweep-batch-c-nonce.md - Update PRs list (#885 merged, #884 merged, #TBD = this PR). - Add `sdk_publish: 4.0.5` to frontmatter. - Rewrite §Fork activation Post-fork bullet to describe the Path A flow (node populates, not SDK emits). - Add §Emission-site rationale documenting why Path A was chosen over B / C. - Update PR 3 row in the breakdown to list actual files + summarise the SDK pin bump. Verification: - `tsc --noEmit` clean on all six changed files. - SDK 4.0.5 verified locally: `expectedPrior?: number` present in `node_modules/@kynesyslabs/demosdk/build/types/blockchain/GCREdit.d.ts`. - No new e2e test in this commit; will follow up in a separate PR with the devnet two-RPC double-broadcast scenario once the base wires are merged. Cross-RPC double-spend scenario (post-fork): 1. Client signs tx with `content.nonce = N+1`. 2. Broadcasts to RPC-A and RPC-B in parallel. 3. Both RPCs pass per-node validation (mempool lookahead is single-node). 4. Block-formation gathers both → applies in deterministic order. 5. First apply: account.nonce goes N → N+1. `expectedPrior` matched (N). 6. Second apply: `expectedPrior = N` (populated at apply time from pre-first-apply snapshot), but `accountGCR.nonce = N+1`. Mismatch → `GCRNonceRoutines` rejects → whole tx rolls back at consensus. * fix: move expectedPrior populate from apply-time to validation-time (PR #886 self-review) Caught during draft-PR review: the apply-time populate site I shipped in the initial commit was incorrect. Bug: Populating `expectedPrior` inside `HandleGCR.applyTransaction`'s per-edit loop reads `entities.accounts` which already reflects prior in-block applies. For two competing same-nonce txs in the same block: 1. Tx A applies. Edit populates `expectedPrior = N` from live state. Apply check passes (N === N). Account → N+1. 2. Tx B (replay) applies next. Edit populates `expectedPrior = N+1` (live state already advanced). Apply check passes (N+1 === N+1). Account → N+2. Both succeed. This defeats the entire cross-RPC safety net. Fix: Move populate into `assignNonce` itself (validation time), using the already-loaded account row + pending-mempool-count: `expectedPrior = account.nonce + pendingCount` This locks the value to the pre-block snapshot the sender's nonce was supposed to be at, NOT what it becomes at apply time. For the same scenario: 1. Tx A's edit carries `expectedPrior = 0`. Apply: account=0, expectedPrior=0 → match. Account → 1. 2. Tx B's edit also carries `expectedPrior = 0` (validated on a node that didn't yet see Tx A). Apply: account=1, expectedPrior=0 → MISMATCH → reject. Whole tx rolls back. For back-to-back same-sender submissions (k-th tx): `expectedPrior = account.nonce + (k-1)` matching the apply order because PR 2's mempool lookahead ensures each pending tx applies before this one. Hash compare safety: `endpointValidation` snapshots `tx.content.gcr_edits` at line 48 BEFORE `confirmTransaction` runs (which invokes `assignNonce`). My mutation hits the real array but the snapshot is the pre-mutation shape. Both sides of the compare are still stripped of `expectedPrior`, so signature integrity is preserved. The mutated array flows downstream into the apply pipeline carrying the validation-time-locked `expectedPrior`. Files: - src/libs/blockchain/routines/validateTransaction.ts: walk `tx.content.gcr_edits`, set `expectedPrior` on the matching `nonce` edit (sender address compared case-insensitive). - src/libs/blockchain/gcr/handleGCR.ts: revert the apply-time populate code added in the previous commit. Also drop the now- unused `isForkActive` and `getSharedState` imports. - docs/specs/audit-sweep-batch-c-nonce.md: rewrite the fork activation Post-fork bullet to describe the validation-time populate, plus explicit rationale for why apply-time is wrong. Update PR 3 file list to reflect the moved populate site. Verification: - tsc --noEmit clean. - Manual trace of the cross-RPC scenario now correctly rejects the second replay. * fix(audit-sweep): address Greptile + CodeRabbit review on PR #886 (greploop iter 1) Three real findings + three doc/comment fixes. Critical (CodeRabbit) — fork-gate the consensus rule GCRNonceRoutines.apply enforced the `expectedPrior` mismatch check unconditionally whenever the field was present. The spec says pre-fork nodes must ignore it, and the field is stripped from the hash compare in endpointValidation — so a client could attach `expectedPrior` on a pre-fork chain without breaking signature validation, and upgraded nodes would start rejecting edits that older nodes still accepted. Validator-split risk before the fork activates. Fix: wrap the mismatch check in `isForkActive(nonceEnforcement, blockHeight)` using `getSharedState.lastBlockNumber` (same gate source `assignNonce` uses). Pre-fork ignores the field entirely; post-fork enforces. Imports added: `isForkActive`, `getSharedState`. Major (CodeRabbit) / P1 (Greptile) — forge-account comparison bug In `assignNonce`'s populate loop, `edit.account` can be a forge-key object (non-string), but the comparison was `editAccount === senderAddress` where `senderAddress` is a lowercase hex string. Strict equality between an object and a string is always false, so the populate would silently skip for forge-format accounts and the cross-RPC safety net would never engage for those transactions. Fix: mirror the normalization already used in `GCRNonceRoutines`: `typeof edit.account === "string" ? edit.account : forgeToHex(edit.account)`, then `.toLowerCase()`. Now matches the sender regardless of how the SDK ships the account field. Doc / comment cleanup Three locations still carried "node fills it in at apply time" wording, contradicting the validation-time populate site. Updated in: - `GCRNonceRoutines.ts`: comment now describes the fork gate + validation-time source. - `endpointValidation.ts`: comment now points at `assignNonce` as the populate site. - `docs/specs/audit-sweep-batch-c-nonce.md`: Path A header rewritten as "node populates at validation time"; rationale paragraph now explains the apply-time draft was rejected, with the cross-RPC reasoning inline. Doc — PR 3 row scope The scope-table row for PR 3 listed "e2e test" as a deliverable, but the diff ships no test. Removed the "e2e test" item and added an explicit note that cross-RPC e2e is tracked as a follow-up PR. Also reworded "apply-time rejection" to "consensus- side rejection" to match what `GCRNonceRoutines` actually does (rejection at apply, populate at validation). Out of scope for this iter TOCTOU advisory-lock wrapper documented in `assignNonce`'s PR 2 docstring is still not landed; flagged in this review round but deferred to a follow-up PR since wrapping the validate + `Mempool.addTransaction` sequence in `pg_advisory_xact_lock` is its own structural change that affects every native-tx ingress path. Filed as PR 4 in the follow-up tracker. Verification: - `tsc --noEmit` clean. - Pre-fork trace: `expectedPrior` present on edit, fork inactive, `isForkActive` returns false → mismatch check skipped → legacy apply behaviour preserved bit-identically for re-sync. - Post-fork trace: forge-format `edit.account` now hex-coerces and matches the lowercased sender → `expectedPrior` populates correctly → `GCRNonceRoutines` enforces the mismatch reject. --------- Co-authored-by: tcsenpai <tcsenpai@discus.sh>
Summary
PR 2 of the 3-PR batch C nonce-replay-protection sequence specified in
docs/specs/audit-sweep-batch-c-nonce.md. Extends the PR 1 strict-equality check to account for txs already queued in this node's mempool. Caller remains commented out — no live behaviour change.3 files, +83 / -9.
Why
PR 1 enforced:
That formula breaks for back-to-back submissions. A sender submitting N transactions in a row reads
account.nonceonce per call viagetAddressNonce, and the SDK sets every tx withnonce = currentNonce + 1.account.nonceonly advances on block inclusion (PR 3), so submissions 2..N all carry the same nonce and would have been rejected by PR 1.What this PR ships
Mempool.countPendingByAddress(address)New static method on
Mempool. Counts mempool rows whosecontent.frommatches the supplied lowercase hex pubkey via aLOWER(content->>'from')JSON query.Single-writer mempool (this node) is the source of truth. The case-insensitive comparison is defence in depth — every callsite already lowercases, but stored
content.fromvalues could in principle carry mixed case.assignNoncemempool-aware formulaExpected nonce is now:
The k-th of N back-to-back submissions has
pendingMempoolCount === k - 1(the previous k-1 txs are already in mempool). The+ 1is the current tx slot.Wraps the count call in a try/catch so DB errors return
false(reject the tx) rather than propagating — same failure mode as the existing account-lookup branch. The error log now includespendingMempoolCountfor triage.Spec doc
Frontmatter updated to
status: in-progresswith PRs list referencing #884 (merged), this PR, and the upcoming PR 3.Out of scope (subsequent PRs in this sequence)
GCREdit.expectedPriorfield, nonce-edit emission inHandleNativeOperations, consensus-side rejection inGCRNonceRoutines, caller uncomment. Requires the single SDK type publish covered in the design doc.Cross-node correctness note
Single-node correctness only: another node sees its own mempool, which may not contain the same pending set. A user submitting the same tx through two RPCs would briefly pass validation on both. PR 3's consensus-side
expectedPriorcheck is the cross-node safety net — at block-application time, only one of any pair of competing same-nonce txs survives.Verification
tsc --noEmitproduces zero new errors in the two changed files.countPendingByAddressis read-only.assignNoncecaller inconfirmTransactionremains commented out, so this code is reachable only via direct import (currently none insrc/libs/).Test plan
bun installand confirm clean tscwipe_and_reboot.sh); confirm no boot regressionnonce=5, two pending txs in mempool from that sender →Mempool.countPendingByAddress(addr) === 2,assignNonce(tx with nonce=8) === true,assignNonce(tx with nonce=7) === false,assignNonce(tx with nonce=9) === falsecontent.fromrows are still counted (LOWER(...))