Skip to content

fix(audit-sweep): batch B — sync/consensus process.exit, fee-sum assertion, PROD gas-balance guard#883

Open
tcsenpai wants to merge 7 commits into
stabilisationfrom
bugfix/audit-sweep-batch-b-2026-05-29
Open

fix(audit-sweep): batch B — sync/consensus process.exit, fee-sum assertion, PROD gas-balance guard#883
tcsenpai wants to merge 7 commits into
stabilisationfrom
bugfix/audit-sweep-batch-b-2026-05-29

Conversation

@tcsenpai
Copy link
Copy Markdown
Contributor

Summary

Batch B of the 2026-05-28 audit sweep. Picks up three follow-ups deferred from #882 because they touch the consensus path:

  1. process.exit() removal in Sync.ts + secretaryManager.ts — same pattern Epic-14 + batch A fixed elsewhere, applied to the consensus-critical sites the audit report flagged out-of-scope.
  2. applyGasFeeSeparation fee-sum assertion — small but consensus-relevant.
  3. PROD-only gas balance guard — removes a devnet/staging divergence from production validation semantics.

Larger items (assignNonce stub, broadcastBlockHash vote race, handleGCR GCREdit stubs, hashNativeTables proxy, subOperations asset stubs) remain deferred — they need design work or feature-owner input and will land separately.

Depends on / is logically follow-up to #882 (batch A) but does not require it merged first; the diffs are disjoint.

4 files, +55 / -18.

Fixes

src/libs/blockchain/routines/Sync.ts — 3 process.exit removals

  • Line 282 — genesis hash mismatch: now throws instead of exit(1). The outer fastSync() already has a try/finally that resets inSyncLoop / fastSyncAborted, and main().catch routes to gracefulShutdown.
  • Line 305 — exhausted peers: returns false instead of exit(1). The boolean caller path at line 865 already handles this (now via the same throw at line 867 below).
  • Line 867 — last block not coherent: throws with a descriptive message instead of exit(1). Same propagation path as above.
  • verifyLastBlockIntegrity now has an explicit Promise<boolean> return type (was implicit).

src/libs/consensus/v2/types/secretaryManager.ts — 2 real exits removed, 2 chaos-test exits documented

  • Line 682 — receiveGreenLight unexpected greenlight: was exit(1) mid-consensus round. Now logs at error level and returns false (the function already returns boolean on every other path). Prevents block-state loss when a delayed/duplicate greenlight arrives.
  • Line 887 — endConsensusRoutine hanging-greenlight cleanup: was exit(1) on Promise.all(waiters) rejection. Now logs and continues with pre-held cleanup; the catch was already isolating the wait so cleanup can proceed cleanly.
  • Lines 394, 410 — simulateSecretaryGoingOffline / simulateNormalNodeGoingOffline: kept their process.exit(0) because that IS their documented purpose (chaos-test helpers triggered by hardcoded blockRef === 10 / blockRef === 5 checks; currently unused in production). Added a WARNING block in the docstring explaining the deliberate bypass of graceful shutdown so a future reader does not wire them into production routines without an explicit gating flag.

src/libs/blockchain/routines/applyGasFeeSeparation.ts — fee-sum assertion

After calculateFeeBreakdown(tx) and the existing total integrity check, assert that:

network_fee + rpc_fee + additional_fee === total

Any rounding bug or config drift in calculateFeeBreakdown is now caught at the RPC boundary as a validation failure (with a descriptive message naming every component) instead of surfacing later as a validator-side consensus disagreement on the same tx.

src/libs/blockchain/routines/validateTransaction.ts:304 — drop PROD-only gas-balance guard

Was:

// FIXME Overriding for testing
if (fromBalance < BigInt(compositeFeeAmount) && getSharedState.PROD) {

The && getSharedState.PROD gate let non-prod nodes accept zero-balance transactions, which made devnet/staging diverge from PROD validation semantics. Devnet now uses a funded-genesis fixture (see testing/devnet/genesis.devnet.json on the audit-sweep batch A branch), so unfunded broadcasts are no longer needed for local testing.

Out of scope (deferred)

Same list as #882; nothing here changes the deferred set. Listing for navigation:

  • validateTransaction.ts:358assignNonce() hardcoded true (replay attack vector — needs per-account nonce design)
  • broadcastBlockHash.ts:40-155 — pro/con vote race at block-accept time
  • handleGCR.ts:920,926 — GCREdit assign/smartContract/escrow/subnetsTx stubs returning success
  • createBlock.ts:72-77hashNativeTables() proxy unfinished
  • subOperations.ts:297-305addAsset / removeAsset silent stubs

Verification

  • tsc --noEmit produces zero new errors in the four changed files. (Pre-existing baseline errors in unrelated files — chainBlocks.ts duplicate imports, subOperations.ts missing setGCRNativeBalance, etc — are on stabilisation already and are addressed by other branches including fix(audit-sweep): batch A — trivial bug-hunt fixes (process.exit, error leaks, missing try/catch) #882.)
  • No behavioural regression intended: every removed exit is replaced with the same logical outcome the surrounding code was already prepared to handle (boolean return / thrown error propagating to gracefulShutdown).
  • Chaos-test helpers are unchanged in behaviour, only re-documented.

Test plan

  • bun install; confirm no new tsc errors vs base
  • Run existing unit tests
  • Boot the 4-node devnet (testing/devnet) and confirm block production survives a forced peer mismatch at genesis (should log + retry next peer; previously would exit)
  • Submit a zero-balance tx on a non-PROD node and confirm it is now rejected (previously accepted)
  • Submit a tx with valid fee components and confirm the new sum assertion does not produce false rejections

@qodo-code-review
Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@tcsenpai, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 6 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83aee574-6ccc-4e93-92d9-eb8dad959e50

📥 Commits

Reviewing files that changed from the base of the PR and between eaf4481 and 369e5a3.

📒 Files selected for processing (5)
  • src/libs/blockchain/routines/Sync.ts
  • src/libs/blockchain/routines/applyGasFeeSeparation.ts
  • src/libs/blockchain/routines/validateTransaction.ts
  • src/libs/consensus/v2/types/secretaryManager.ts
  • src/libs/network/manageConsensusRoutines.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/audit-sweep-batch-b-2026-05-29

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

Batch B of the audit sweep eliminates the remaining process.exit() calls on consensus-critical paths and closes two validation gaps flagged in the earlier round. All previously raised concerns — the missing await on receiveGreenLight, the genesis-hash mismatch throwing instead of retrying the next peer, and the stale PROD-only guard in applyGasFeeSeparation — are correctly addressed in this diff.

  • Sync.ts: Genesis-hash mismatch now logs and continues to the next peer via findNextAvailablePeer + continue; exhausted-peers path returns false; fastSyncRoutine throws a descriptive error when integrity fails after all peers are tried.
  • secretaryManager.ts / manageConsensusRoutines.ts: receiveGreenLight's process.exit(1) is replaced with return false, and the call site in manageConsensusRoutines.ts now correctly awaits the result so the 400-class response is visible to the HTTP layer. The sendGreenlights loop continues past a misbehaving validator instead of exiting; endConsensusRoutine logs and continues on Promise.all rejection. Chaos-test helpers are documented with explicit WARNING blocks.
  • applyGasFeeSeparation.ts / validateTransaction.ts: && getSharedState.PROD balance-check gate removed from both routines; per-component fee validation replaces the previous total-only integrity check.

Confidence Score: 5/5

Safe to merge; all four changed paths behave correctly under their intended failure conditions.

The three issues raised in the prior review round are all corrected in this diff. The only remaining note is that the fee-sum equality check described in the PR description is not literally present in the code, but the code comment explains the reasoning and the independent per-component checks cover the realistic failure modes.

No files require special attention; applyGasFeeSeparation.ts has a minor discrepancy between the PR description's sum-assertion claim and the actual implementation, but the existing per-component validation is adequate for the current calculateFeeBreakdown contract.

Important Files Changed

Filename Overview
src/libs/blockchain/routines/Sync.ts All three process.exit removals are well-formed: genesis hash mismatch now logs + continues to the next peer, exhausted-peers path returns false, and the final integrity-check failure in fastSyncRoutine throws with a descriptive message.
src/libs/blockchain/routines/applyGasFeeSeparation.ts Per-component fee validation replaces the total-only check; PROD-only balance-check gate correctly removed. The PR description claims a sum-equality assertion but the code omits it, relying on the tautology argument in the code comment.
src/libs/blockchain/routines/validateTransaction.ts PROD-only gate removed from defineGas balance check; now enforced in every environment, consistent with the applyGasFeeSeparation change.
src/libs/consensus/v2/types/secretaryManager.ts receiveGreenLight's process.exit(1) replaced with return false; sendGreenlights loop replaced exit with continue; endConsensusRoutine's exit on Promise.all rejection replaced with log+continue. Chaos-test exits documented with WARNING blocks.
src/libs/network/manageConsensusRoutines.ts Missing await on receiveGreenLight correctly added; previously the Promise was always truthy so response.result was unconditionally 200, masking the new return false path.

Reviews (8): Last reviewed commit: "docs(audit-sweep): correct retry-guarant..." | Re-trigger Greptile

…rtion, PROD gas-balance guard

Three audit-sweep follow-ups from the 2026-05-28 bug-hunt that were
deferred from batch A because they touch the consensus path. All are
small, targeted fixes; the larger design-required items (assignNonce
stub, vote race, GCREdit stubs) remain deferred.

Sync.ts:
  - Genesis-hash mismatch (line 282) now throws instead of exit(1); the
    outer fastSync() has try/finally that resets state, and main().catch
    routes to gracefulShutdown.
  - Exhausted-peers branch (line 305) returns false so the existing
    boolean caller path applies, instead of taking the chain down.
  - fastSyncRoutine "Last block not coherent" check (line 867) now
    throws with a descriptive message instead of exit(1).
  - verifyLastBlockIntegrity now has an explicit Promise<boolean>
    return type (was implicit).

secretaryManager.ts:
  - receiveGreenLight unexpected-greenlight path (line 682) now logs
    at error level and returns false (function already returns boolean
    everywhere else), instead of killing the node mid-consensus round.
  - endConsensusRoutine hanging-greenlight cleanup (line 887) now logs
    the failure and continues with pre-held cleanup instead of exit(1);
    the catch was already isolating the wait so cleanup can proceed.
  - simulateSecretaryGoingOffline / simulateNormalNodeGoingOffline
    keep their process.exit(0) because that IS their documented
    purpose (chaos-test helpers), but now carry a WARNING block
    explaining the deliberate bypass of graceful shutdown so a future
    reader does not wire them into production without a gating flag.

applyGasFeeSeparation.ts:
  - Add post-calculation assertion that
    network_fee + rpc_fee + additional_fee === total. Any rounding bug
    or config drift in calculateFeeBreakdown is now caught at the RPC
    boundary as a validation failure instead of surfacing later as a
    validator-side consensus disagreement.

validateTransaction.ts:
  - Drop the `&& getSharedState.PROD` guard from the sender gas-balance
    check (line 304). The previous gate let non-prod nodes accept
    zero-balance transactions, which made devnet/staging diverge from
    PROD validation semantics. Devnet now uses a funded-genesis
    fixture, so unfunded broadcasts are no longer needed for local
    testing.
@tcsenpai tcsenpai force-pushed the bugfix/audit-sweep-batch-b-2026-05-29 branch from 27e43d3 to 153ca61 Compare May 29, 2026 13:17
…on (greploop iter 1)

Greptile review of PR #883 flagged a validation split: this PR dropped
the `&& getSharedState.PROD` guard from `validateTransaction.defineGas`
so the gas-balance check is enforced in every environment, but the
parallel guard in `applyGasFeeSeparation` at line 146 was left intact
with a stale comment that explicitly claimed it matched the legacy
defineGas behaviour — which is no longer true after the defineGas
change.

Drops the `if (getSharedState.PROD)` wrapper around the balance check
in applyGasFeeSeparation as well, so both gas-balance paths enforce
the same semantics in every environment. Comment updated to match.

`getSharedState` import retained — still consumed by
`signingAlgorithm` lookup on line 136.
Comment thread src/libs/blockchain/routines/Sync.ts Outdated
tcsenpai added 5 commits May 29, 2026 15:29
…ROD docstring (greploop iter 2)

Greptile review of PR #883 caught a real correctness regression in iter 1
plus a stale comment:

P1 — Sync.ts:278-285 (genesis-mismatch was breaking the peer-iteration loop)
  verifyLastBlockIntegrity is designed as a multi-peer iteration loop:
  every other failure branch (unavailable genesis block, unavailable
  last block) logs + advances `currentPeer = findNextAvailablePeer(
  seenPeers)` and `continue`s. The previous batch-B fix replaced
  `process.exit(1)` with `throw`, which abandoned all remaining
  candidates after a single bad peer. In a network where one peer is
  misconfigured or forked but the rest are healthy, the node would
  trigger graceful shutdown instead of syncing from a good peer.

  Fix: align with the unavailable-block branches — log the mismatch,
  advance to the next peer via the same helper. If every peer has a
  mismatched genesis, the loop exhausts and falls through to the
  existing `return false` at the bottom of the function, which
  triggers the descriptive throw at the fastSyncRoutine call site
  added in the original batch B.

Stale docstring — applyGasFeeSeparation.ts (file header, step 3)
  The "(PROD only) Reads the sender's GCR balance ..." line in the
  file-level JSDoc was already updated for the runtime change in the
  iter-1 commit but the header docstring still claimed PROD-only.
  Refreshed to describe the new "enforced in every environment"
  semantic and reference batch B as the change point.
…ponent validation (greploop iter 3)

Greptile review of PR #883 caught that the fee-sum assertion added in
the original batch B commit was tautological. calculateFeeBreakdown
computes:

    total: network_fee + rpc_fee + additional_fee

from the same three local variables it returns as components. Asserting
that components-sum === total recomputes the same arithmetic from the
same returned values and can never fail — it offers zero protection
against the failure modes the comment claimed it would catch
("rounding bug or config drift").

The actual failure surface is each component going non-numeric or
out-of-range via the `scalar * surge` multiplication in
calculateFeeBreakdown:

    const network_fee = getSharedState.networkFee * surge
    const rpc_fee = getSharedState.rpcFee * surge
    const additional_fee = getSharedState.additionalFee * surge

A misconfigured scalar (negative governance proposal, accidental float
coefficient, NaN from a broken dynamicSurgeMultiplier) produces one or
more bad components which propagate into tx.content.transaction_fee
and the fee-distribution edits, and finally surface as a validator-side
consensus disagreement.

Replace the tautology with per-component finite/integer/non-negative
validation across all four fields (network_fee, rpc_fee, additional_fee,
total). The first invalid component triggers an actionable error
message naming the bad field and including the full breakdown for
diagnosis, instead of the tx being silently accepted at the RPC
boundary and rejected later by other validators.
…greploop iter 4)

Greptile review of PR #883 caught a third process.exit(1) in this same
file that the original batch B commit missed: line 618 inside
releaseWaitingMembers's send loop (sendGreenlightToValidators).

A single misbehaving validator returning any non-200/400 HTTP status
would kill the secretary mid-consensus round with partial state. Same
class of issue this PR was scoped to fix.

Drop the process.exit and `continue` to the next validator in the
result-iteration loop. The validator stays in `waitingMembers` for the
next greenlight pass, so secretaryRoutine retries via its own
timeout-driven loop. If every validator fails, the round eventually
times out at the existing `_greenlight_timeout` boundary instead of
taking the node down with partial round state.

After this commit secretaryManager.ts has zero remaining real
process.exit calls — only the two chaos-test helpers
(simulateSecretaryGoingOffline, simulateNormalNodeGoingOffline) which
are documented with WARNING blocks explaining the intentional bypass
of graceful shutdown.
…iter 5)

Greptile review of PR #883 caught a pre-existing async-call-site bug
that the batch B receiveGreenLight change exposed.

manageConsensusRoutines.ts:377 calls `manager.receiveGreenLight(...)`
without `await`. The return value is a Promise<boolean>, which is
always truthy, so the response.result on the next line was always 200
regardless of receiveGreenLight's actual return value. Before this
batch, that did not matter much because the unreachable-state path in
receiveGreenLight was a process.exit(1) that killed the node before
the response was sent. The iter-1 batch B commit converted that exit
to `return false`, which the unawaited call site silently masked as a
successful 200 — the very behaviour the conversion was meant to
expose.

Add the missing `await`. The enclosing manageConsensusRoutines
function is already async, so this is a single-keyword change with no
caller-chain implications.

After this commit the receiveGreenLight `return false` path correctly
surfaces as an HTTP 400 to the secretary, matching the original
intent of the conversion.
…t-error comment (greploop iter 6 - comment only)

Greptile review of PR #883 flagged the comment added in the iter-4
commit overstating recovery semantics. The original comment claimed
the validator "stays in waitingMembers for the next greenlight pass"
and that secretaryRoutine would "retry via its own timeout-driven
loop" — but waitStatus has already been cleared by the time the
releaseWaitingMembers result-iteration loop runs, so the validator
does NOT automatically re-enter the wait queue. The actual fallback
is the `_greenlight_timeout` boundary firing, exactly as the second
half of the original comment correctly described.

Comment-only fix. No code behaviour change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant