fix(consensus): close vote-race in broadcastBlockHash, restore [pro, con] BFT semantics (batch D) [DRAFT - consensus] by tcsenpai · Pull Request #888 · kynesyslabs/node

tcsenpai · 2026-05-31T14:41:55Z

Plain-English summary

What your code does (the consensus voting handshake)

When a node thinks it has a candidate block ready, it asks every other node in the shard: "do you agree the block's hash should be X?" by calling each peer's proposeBlockHash endpoint.

Each peer answers with either:

200 = yes + a signature attesting to that hash, plus any other signatures it's already collected from other peers
401/404 = no (you're not in the shard / I haven't formed a candidate yet)

The sender tallies the yes/no votes. If more than 2/3 voted yes → the block is valid → it gets finalised onto the chain.

What was broken (three things at once, all amplifying each other)

Half-counted votes — the tally code (pro++/con++) ran inside a .then(...) callback that wasn't being waited for. Outer code returned before the votes finished counting. Result: routine sometimes returned pro=0 even though every peer agreed.
Wrong number returned — instead of returning the vote count, the routine returned the signature count — a totally different number. Signatures pile up from multiple sources (peer's own attestation, signatures the peer relayed from other peers, signatures other peers were concurrently writing into the same shared object because everyone calls everyone). The BFT "2/3 majority" check was comparing this inflated number against the threshold → blocks were getting approved on fewer real votes than required.
Shared mutable map race — because all five shard members call each other in parallel, they were all writing signatures into the same JavaScript object at the same time. JS is single-threaded so individual writes don't tear, but the read-verify-write pattern interleaves badly across paths.

What the fix does

Replace fire-and-forget callbacks with awaited promises. Vote tally now finishes before the function returns.
Each per-peer call returns a small typed object: {vote: "pro" | "con", signaturesToMerge: ...}. All outcomes are collected first, then a single serial loop counts votes and merges verified signatures. No more racing writes.
Return [pro, con] (real vote counts), not signature count. BFT check now compares apples to apples.
Network failures count as con votes with an explicit reason, not silent aborts.

Why this matters

The combination of bug 2 + bug 3 was the dangerous one: blocks could pass BFT without enough validators actually agreeing. With the fix, the math under isBlockValid finally matches the protocol's intent (2/3 + 1 of shard size).

Technical detail

Closes audit-sweep CRITICAL #2 from the 2026-05-28 bug hunt: `broadcastBlockHash.ts` vote race.

Three interacting bugs in the pre-existing implementation

1. Fire-and-forget `.then` chains

`peer.longCall(...).then(async response => { ... pro++; signatures[...] = ... })` — the `.then` is discarded. Outer `await Promise.all(promises)` waits for the original longCall promises only. `pro/con` increments and signature writes complete AFTER the routine returns. Caller (`voteOnBlock`) consumes a half-populated tally.

2. Return value contradicts BFT semantics

Last line returned `[signatureCount, shard.length - signatureCount]` instead of `[pro, con]`. Signature count includes:

our own pre-existing signature
signatures peer-merged via `response.extra`
signatures relayed by other shard peers concurrently calling our `manageProposeBlockHash` handler (because `block` IS `getSharedState.candidateBlock` — same object reference)

BFT `isBlockValid(pro, totalVotes)` was being passed signature count vs threshold computed from `shard.length`. Blocks could pass BFT on fewer actual peer-agreement votes than 2/3+1 requires. Consensus safety violation.

3. Concurrent writes to shared signatures map

`block.validation_data.signatures` IS `getSharedState.candidateBlock.validation_data.signatures` (same object — verified at `createBlock.ts:67`). Outbound `.then` callbacks AND inbound `manageProposeBlockHash` handler invocations race to write into it.

Fix shape (Path C — full rewrite with explicit aggregation)

`proposeAndCollect` per-peer flow — try/catch around `longCall` so network failures become `con` votes with reason, not aborts. Matches the `Promise.allSettled` discipline batch A applied elsewhere.
`verifyIncomingSignatures` parallel sig verify, returns verified subset.
Outer routine — `await Promise.all(shard.map(...))`. Aggregates outcomes before counting. Single deterministic serial merge loop.
Return — `[pro, con]` actual vote counts.
Allowed codes — added 404 (candidate not formed) alongside 401, was being treated as retry-worthy before.

Why not Bun workers (asked during implementation)

Race wasn't CPU-bound. Workers fix CPU, not promise misuse. `TxValidatorPool` already uses `worker_threads` for crypto.

Manual verification traces

Race scenario (4 peers, all agree):

Old: `Promise.all(promises)` resolves on HTTP completion. `pro=0`, returns `[0, shard.length]`. Block rejected.
New: `Promise.all(map(proposeAndCollect))` waits on full per-peer flow. Returns `[4, 0]`. Block passes.

Signature-inflation scenario (5-node shard, A relays C+D, B relays D+E):

Old: signatureCount = us+A+B+C+D+E = 6. Returns `[6, -1]`. Threshold = floor(10/3)+1 = 4. Passes on inflated count.
New: `pro=2` (A, B directly voted); C, D, E never directly called us. Returns `[2, 3]`. Correctly rejected. Relayed signatures still merged into map for downstream finalisation.

Specifically wanted feedback on

Whether ANY caller of `broadcastBlockHash` was implicitly relying on the inflated signature-count return value.
Whether `isBlockValid`'s 2/3+1 threshold was tuned with the inflated count in mind (it should be — but want explicit confirmation).
Whether the 404 allowed-codes addition is safe (was a 404 "candidate block not formed" previously triggering 3× `longCall` retry by accident, masking some other behaviour?).

Out of scope

Cross-RPC vote-relay (TODO comment carried over).
e2e test — devnet harness exists, scripted test follow-up.

Test plan

`bun install`, confirm clean tsc.
Boot devnet (5 nodes). Confirm consensus rounds still complete.
Inject chaos: stop one node mid-round. Confirm `pro=3, con=1` shape (down from 4 healthy).
Confirm BFT correctly rejects when 3/5 vote con (threshold = 4, so `pro=2` should fail).

…con] BFT semantics (batch D) Closes audit-sweep CRITICAL #2 (broadcastBlockHash.ts vote race) from the 2026-05-28 bug hunt. The pre-existing implementation had three distinct, interacting consensus-safety defects. Bug 1 — fire-and-forget .then chains Each `peer.longCall` was wrapped in `promise.then(async response => { ... pro++; con++; signatures[...] = ...; })`. The `.then` chain was discarded; only the original `longCall` promises were awaited via `Promise.all(promises)`. That means `pro/con` increments and signature writes ran AFTER the routine returned. The caller (`voteOnBlock`) then consumed a half-populated vote tally and a half-merged signatures map. Bug 2 — return value contradicts BFT semantics Last line returned `[signatureCount, shard.length - signatureCount]` rather than `[pro, con]`. Vote count and signature count are not the same metric. Signature count includes: - our own pre-existing signature - signatures peer-merged via the response.extra payload - signatures relayed by other shard members concurrently calling our `manageProposeBlockHash` handler (because `block` IS `getSharedState.candidateBlock` — same object reference) BFT threshold (2/3 + 1 of shard.length) was being compared against this inflated number. Blocks could pass `isBlockValid` on fewer actual peer-agreement votes than BFT requires. Safety violation. Bug 3 — concurrent writes to shared signatures map `block.validation_data.signatures` is the same object as `getSharedState.candidateBlock.validation_data.signatures`. Outbound `.then` callbacks and inbound `manageProposeBlockHash` handler invocations from every shard peer race to write into it. Individual `obj[key] = value` is atomic in JS, but the read-verify-async-write pattern (incoming sig → await verify → write) interleaves across multiple paths. Fix shape (Path C — full rewrite with explicit aggregation) 1. Per-peer flow extracted to `proposeAndCollect`. Wraps `peer.longCall` in try/catch so a network failure becomes a `con` vote with rejection reason, not a routine-level abort. Implements the same `Promise.allSettled` discipline batch A applied to `broadcastNewBlock`/`broadcastOurSyncData`. 2. `verifyIncomingSignatures` helper runs all per-peer signature verifications in parallel via `Promise.all`, returns the verified subset as `Record<string, string>`. Drops invalid signatures with logging; never throws. 3. Outer routine: `await Promise.all(shard.map(...))` aggregates every peer outcome before counting anything. Single deterministic serial-merge loop over the outcomes increments `pro/con` and writes verified signatures into `block.validation_data.signatures`. No `.then` chains. 4. Return value: `[pro, con]` — actual vote counts. Matches `isBlockValid(pro, totalVotes)` 2/3+1 BFT threshold semantics in `PoRBFT.ts:642-645` and the caller signature in `voteOnBlock` (`PoRBFT.ts:614-617`). 5. Allowed-codes list extended to include 404 (candidate block not formed) in addition to 401 (validator not in shard). Both are valid "con" responses the handler can return; previously 404 was being treated as a retryable failure in `longCall`, producing 3× retries before eventually counting as con. Why not Bun workers Race wasn't CPU-bound; it was async-completion-ordering. Workers fix CPU bottlenecks, not promise misuse. Awaiting the chain fixes the race; threads wouldn't. TxValidatorPool already uses worker_threads for crypto. Verification - tsc --noEmit clean. - Manual trace of the original race: 1. Shard A,B,C,D,E. We call A, B, C, D in parallel. 2. Old code: outer Promise.all(promises) resolves when all four longCall HTTP responses come back. pro/con still 0; signatures still {}. Routine returns [0, shard.length] (signatureCount=0). isBlockValid fails. Block rejected even though all peers agreed. 3. New code: outer Promise.all(map) waits for proposeAndCollect (longCall + verify + classify) to complete on EVERY peer. Outcomes aggregated. Merge loop runs. Routine returns [pro=4, con=0]. isBlockValid passes correctly. - Manual trace of the signature inflation: 1. Shard A,B,C,D,E. We propose. A relays C's and D's signatures. B relays D's and E's signatures. 2. Old code: signatureCount = our + A + C + D + B + E = 6. But shard.length = 5, so [6, -1] gets returned. isBlockValid compares pro=6 against threshold (10/3 + 1) = 4. Passes. Reality: only A and B actually voted pro. C, D, E were relayed in. 3. New code: pro = 2 (A and B), con = 3 (C, D, E never directly called us). Threshold = 4. Correctly rejects. The relayed signatures from A and B still get merged into our signatures map for downstream block-finalisation use, but they don't inflate our vote count. Out of scope for this PR - Cross-RPC vote-relay (the TODO at the end of the previous implementation): peers transmitting their inbound vote counts to other peers to handle network partitions. Tracked separately. - e2e test that exercises the race scenarios. The traces above are walkable on devnet via the existing `testing/devnet/scripts/test-transfer-e2e.sh` harness with a second-validator chaos injection.

coderabbitai · 2026-05-31T14:42:01Z

Warning

Review limit reached

@tcsenpai, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ade400fe-e89a-4af4-a28f-6f05efcdf01d

📥 Commits

Reviewing files that changed from the base of the PR and between 8decf73 and 4cd4654.

📒 Files selected for processing (1)

src/libs/consensus/v2/routines/broadcastBlockHash.ts

Walkthrough

The broadcast block hash consensus routine is refactored to eliminate post-return signature mutations. Per-peer calls now execute in parallel via Promise.all, with network failures and non-200 responses converted to deterministic "con" vote outcomes. Signature verification occurs synchronously before merging, and all signatures are merged serially into the block before the function returns.

Changes

Consensus Broadcast Refactoring

Layer / File(s)	Summary
Type contract and signature verification `src/libs/consensus/v2/routines/broadcastBlockHash.ts` (lines 2–72)	Adds `RPCResponse` import, defines `PeerVoteOutcome` type to classify peer responses, and implements `verifyIncomingSignatures` helper that validates each relayed signature against the candidate block hash, dropping invalid ones with error logging and returning only verified `{identity: signature}` pairs.
Per-peer proposal and collection `src/libs/consensus/v2/routines/broadcastBlockHash.ts` (lines 74–201)	Introduces `proposeAndCollect` helper that wraps the consensus_routine call, uses try/catch to convert network errors into `"con"` votes with rejection reason, handles non-200 responses with optional diagnostic tx-set diff logging, and on success invokes signature verification to produce verified signature subsets for merging.
Orchestration, signature merge, and return `src/libs/consensus/v2/routines/broadcastBlockHash.ts` (lines 203–291)	Refactors `broadcastBlockHash` to execute per-peer outcomes via `Promise.all`, counts `pro`/`con` votes from outcomes, performs deterministic serial merge of verified signatures from each `"pro"` outcome into `block.validation_data.signatures`, ensures all merges complete before returning, and returns actual `[pro, con]` peer vote counts instead of signature-derived values.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A rabbit hops through consensus threads,
Where race conditions filled the dreads.
Now promises align in neat arrays,
Signatures merge before the routine strays.
No ghosts haunt the block when return's displayed! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change: fixing a vote-race bug in broadcastBlockHash and restoring proper BFT vote semantics ([pro, con] counts), which aligns with the substantial rewrite eliminating race-prone patterns and async ordering issues.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bugfix/audit-sweep-batch-d-vote-race-2026-05-31

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tcsenpai · 2026-05-31T14:48:21Z

@greptile review

qodo-code-review · 2026-05-31T14:59:19Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai

🧹 Nitpick comments (1)

src/libs/consensus/v2/routines/broadcastBlockHash.ts (1)

260-263: 💤 Low value

Comment mentions Object.assign but code uses direct property assignment.

The comment describes "each Object.assign operation" but the actual merge at line 269 uses direct property assignment. Consider updating the comment to match the implementation.

📝 Suggested comment update

-    // Serial merge: each `Object.assign` operation against the
-    // shared `signatures` map is atomic, but we want a single
-    // deterministic order so log lines reflect what landed and
-    // operators can replay the merge if needed.
+    // Serial merge: we iterate outcomes in deterministic order so
+    // log lines reflect what landed and operators can replay the
+    // merge if needed. All writes complete before the function
+    // returns, eliminating the post-return mutation race.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libs/consensus/v2/routines/broadcastBlockHash.ts` around lines 260 - 263,
The comment referencing "each `Object.assign` operation" is inaccurate because
the merge uses direct property assignment into the shared `signatures` map;
update the comment near the merge in broadcastBlockHash (the block merge that
writes into `signatures`) to describe the actual implementation (e.g., "each
property is assigned directly into the shared `signatures` map in a
deterministic, serial order") or alternatively change the merge code to use
`Object.assign(signatures, ...)` if you prefer the comment to stay as-is; ensure
the comment and the implementation (the merge into `signatures` inside
broadcastBlockHash) match.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/libs/consensus/v2/routines/broadcastBlockHash.ts`:
- Around line 260-263: The comment referencing "each `Object.assign` operation"
is inaccurate because the merge uses direct property assignment into the shared
`signatures` map; update the comment near the merge in broadcastBlockHash (the
block merge that writes into `signatures`) to describe the actual implementation
(e.g., "each property is assigned directly into the shared `signatures` map in a
deterministic, serial order") or alternatively change the merge code to use
`Object.assign(signatures, ...)` if you prefer the comment to stay as-is; ensure
the comment and the implementation (the merge into `signatures` inside
broadcastBlockHash) match.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 613d21a8-961e-44ad-8f98-398f647b3d34

📥 Commits

Reviewing files that changed from the base of the PR and between 924df40 and 8decf73.

📒 Files selected for processing (1)

src/libs/consensus/v2/routines/broadcastBlockHash.ts

greptile-apps · 2026-05-31T15:00:56Z

Greptile Summary

This PR rewrites broadcastBlockHash.ts to fix three interacting consensus bugs: fire-and-forget .then callbacks that left vote tallies empty when the caller consumed them, returning signatureCount instead of actual [pro, con] vote counts to the BFT check, and concurrent .then writes racing against inbound manageProposeBlockHash calls on the shared validation_data.signatures object.

Vote-race fix: per-peer calls are wrapped in proposeAndCollect, which always resolves (never rejects), so Promise.all collects every outcome before the serial merge loop increments pro/con and writes signatures.
BFT semantics restored: the function now returns [pro, con] (actual vote counts), not [signatureCount, shard.length - signatureCount], so isBlockValid(pro, shard.length) finally compares apples to apples against the 2/3+1 threshold.
Signature validation tightened: a 200 response now only counts as pro if the responder's own cryptographic signature is present in the returned bundle and passes TxValidatorPool.verify; responses that lack a verifiable own-signature are demoted to con with an explicit rejection reason.

Confidence Score: 4/5

The three correctness bugs are properly fixed and the new BFT vote-counting logic is consistent with isBlockValid's 2/3+1 threshold; the main open question is whether relayed-but-lost signatures from the own-sig-missing con branch could prevent finalization under network partition.

The core fixes — awaited vote tallying, returning [pro, con] instead of [signatureCount, ...], and serial post-collect merge — are all implemented correctly. The remaining edge case (verified third-party signatures silently discarded on the own-sig-missing con path) could affect finalization liveness under network partition, worth addressing before production.

src/libs/consensus/v2/routines/broadcastBlockHash.ts — specifically the con return path at lines 228-238 where verified is populated but not forwarded to the merge loop

Important Files Changed

Filename	Overview
src/libs/consensus/v2/routines/broadcastBlockHash.ts	Full rewrite fixing fire-and-forget vote tallying, wrong BFT return value, and shared-map race; new per-peer validation logic is correct; one edge case where third-party verified signatures on the con-due-to-missing-own-sig path are silently dropped could affect liveness in rare scenarios.

Sequence Diagram

sequenceDiagram
    participant BH as broadcastBlockHash
    participant PAC as proposeAndCollect xN
    participant Peer as Remote Peer
    participant VIS as verifyIncomingSignatures
    participant Merge as Serial Merge Loop
    BH->>BH: structuredClone validation_data snapshot
    BH->>PAC: Promise.all shard.map proposeAndCollect
    par Per peer
        PAC->>Peer: longCall proposeBlockHash
        alt Network error
            PAC-->>BH: vote con
        else HTTP 401 or 404
            PAC-->>BH: vote con
        else HTTP 200 empty sigs
            PAC-->>BH: vote con
        else HTTP 200 with sigs
            PAC->>VIS: verifyIncomingSignatures
            VIS-->>PAC: verified subset
            alt own sig missing
                PAC-->>BH: vote con
            else own sig verified
                PAC-->>BH: vote pro signaturesToMerge
            end
        end
    end
    BH->>Merge: serial loop outcomes
    Merge->>Merge: pro or con increment merge sigs
    BH-->>BH: return pro con

_{Reviews (6): Last reviewed commit: "fix(consensus): require verified self-si..." | Re-trigger Greptile}

greptile-apps · 2026-05-31T15:08:25Z

Greptile Summary

This PR rewrites broadcastBlockHash.ts to fix three interlocking consensus bugs: fire-and-forget .then callbacks that let the vote tally return before peer responses were counted, an incorrect return value (signatureCount instead of [pro, con] vote counts) that made the BFT threshold comparison meaningless, and concurrent writes to the shared validation_data.signatures map. The fix introduces explicit await-ed Promise.all over per-peer proposeAndCollect flows, a serial post-collection merge loop, and restores correct [pro, con] semantics throughout.

Vote-race fix: each peer call is now fully await-ed end-to-end inside proposeAndCollect, so pro/con counts are complete before the function returns.
BFT semantics restored: the return value is now actual vote counts, not an inflated signature count derived from the shared candidateBlock object.
Signature merge serialised: all writes to block.validation_data.signatures happen in a single deterministic loop after all peer outcomes settle, eliminating the old write race between concurrent .then callbacks.

Confidence Score: 3/5

The core vote-race and wrong-return-value bugs are correctly addressed, but a peer that returns 200 with no verifiable signatures still increments the pro counter, meaning the BFT tally can include votes that carry no cryptographic attestation into the block.

The rewrite fixes the three described bugs cleanly and the overall structure is sound. The main residual concern is that proposeAndCollect classifies any HTTP 200 as vote: "pro" regardless of whether the peer's signatures pass verifyIncomingSignatures — so a node with a broken or rotated signing key contributes to quorum without leaving a verifiable trace in validation_data.signatures. Additionally, the double-log pattern in verifyIncomingSignatures emits two distinct error messages for a single verification exception, which will make log-based debugging confusing.

src/libs/consensus/v2/routines/broadcastBlockHash.ts — specifically the pro vote counting in proposeAndCollect and the error-logging path in verifyIncomingSignatures.

Important Files Changed

Filename	Overview
src/libs/consensus/v2/routines/broadcastBlockHash.ts	Full rewrite fixing three interacting bugs (fire-and-forget vote tally, wrong return value, shared-map write race); introduces a double-log on verification exceptions and a gap where a peer returning 200 with no verifiable signatures still increments the pro counter.

Sequence Diagram

sequenceDiagram
    participant CR as consensusRoutine (PoRBFT)
    participant BB as broadcastBlockHash
    participant PAC as proposeAndCollect (×N)
    participant VIS as verifyIncomingSignatures
    participant Peer as Peer N (longCall)
    participant MPB as manageProposeBlockHash (remote)

    CR->>BB: broadcastBlockHash(block, shard)
    BB->>BB: build proposeParams [hash, validation_data ref, ourId]
    BB->>PAC: Promise.all(shard.map(proposeAndCollect))

    par for each shard peer
        PAC->>Peer: longCall(proposeBlockHash, proposeParams)
        Peer->>MPB: HTTP POST consensus_routine/proposeBlockHash
        MPB-->>Peer: 200 + validation_data (or 401/404)
        Peer-->>PAC: RPCResponse
        alt "result === 200"
            PAC->>VIS: verifyIncomingSignatures(extra.signatures, block.hash, peerId)
            VIS->>VIS: Promise.all(verify each sig via TxValidatorPool)
            VIS-->>PAC: "verified: Record<string,string>"
            PAC-->>BB: "{vote:pro, signaturesToMerge: verified}"
        else "result === 401 / 404"
            PAC-->>BB: "{vote:con, signaturesToMerge:{}}"
        else longCall throws
            PAC-->>BB: "{vote:con, rejectionReason: msg}"
        end
    end

    BB->>BB: serial merge loop — count pro/con, write signaturesToMerge into block.validation_data.signatures
    BB-->>CR: [pro, con]
    CR->>CR: isBlockValid(pro, shard.length) → finalizeBlock or throw BlockInvalidError

_{Reviews (2): Last reviewed commit: "fix(consensus): close vote-race in broad..." | Re-trigger Greptile}

… validation_data fan-out (PR #888 iter 1) Four Greptile findings — all real. P1 — pro vote without cryptographic attestation The previous iter classified any HTTP 200 as `vote: "pro"`, even when: (a) `extra.signatures` was empty, or (b) every entry in `extra.signatures` failed verification. BFT quorum could be reached on votes carrying no auditable proof. A node with a broken/rotated signing key contributed to threshold without leaving a verifiable trace. Fix: tighten the pro contract to require the peer's OWN signature on our block hash to be PRESENT in `extra.signatures` AND survive verification. Anything weaker downgrades to a `con` with explicit rejectionReason: - "200 with empty signatures map" - "200 without verifiable own signature on block hash" Relayed third-party signatures (other validators' attestations the peer happens to have collected) are still merged into our signatures map for downstream finalisation, but they no longer qualify as the peer's own vote. P2 — double error log on verify exception `verifyIncomingSignatures` catch path logged "Signature verification threw for X". The outer loop then ran the `else` branch on `isValid: false` and logged "Invalid signature relayed by Y for X; dropping" — two unrelated- looking errors for a single event, confusing log triage. Fix: add `loggedFailure: boolean` to the per-entry result. Outer loop skips its own log when the inner catch already emitted one. Genuine `isValid: false` from a clean verify (not exception) still produces a single "Invalid signature relayed" log. P2 — `proposeParams[1]` live reference race `proposeParams[1]` held `block.validation_data` directly. That is the same object as `getSharedState.candidateBlock.validation_data`. Inbound `manageProposeBlockHash` handlers run concurrently with our outbound broadcast (every shard member calls every other simultaneously) and mutate `signatures` on that same object. Effect: each peer in our `Promise.all` fan-out received a slightly different payload depending on which inbound calls landed first. Receivers see inconsistent snapshots of our validation_data — verifies are still correct (signatures are self-contained crypto), but log-based debugging is harder because "what we sent to peer A" no longer matches "what we sent to peer B". Fix: `structuredClone(block.validation_data)` once before fan-out. Every peer sees the same frozen payload. Receivers still verify and merge what they get; the freeze only affects what we ship. Verification - tsc --noEmit clean. - Pro vote with empty extra.signatures: now logs error + returns con with reason. Cannot inflate BFT tally. - Pro vote where peer's own pubkey is not in verified set: now downgrades to con with explicit "missing own signature" reason. - Double-log scenario: catch path emits "verification threw", outer loop sees `loggedFailure: true`, skips its own log. Single log per event. - Concurrent inbound during fan-out: `validationDataSnapshot` is the immutable frozen copy; mutations to live `block.validation_data.signatures` from inbound handlers do not reach the outbound serialisation. Each peer call gets bit-identical `proposeParams[1]`.

tcsenpai · 2026-05-31T15:20:14Z

@greptile review

tcsenpai · 2026-05-31T15:38:04Z

@greptile review

…ork testing pass Single document covering everything surfaced during the testing pass that ran 2026-05-26 → 2026-05-31 against dev.node2. Organised by topic, not chronology, so the team has one place to look up what was found, what was fixed, and what is still pending — without scrolling Telegram or hopping between PRs. Sections: - L2PS encryption nonce reuse (SDK #87 + 3 follow-up security findings) - Governance hash mismatch on dev.node2 (still broken; real fix in DEM-727) - Test-coverage gap that allowed the governance break to slip through - Side finding — nonces were not checked or incremented per tx (Hovhannes's batch C #884–#887, batch D #888 in flight) - UX surfacing for L2PS (Demo #11) - Operational risks still open - Linear ticket map across DEM-722 epic - Source material cross-links Cross-links the raw battery output and serializer analysis already in flight via Node #876. Closes DEM-729. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tcsenpai · 2026-06-01T14:11:17Z

@greptile review

tcsenpai marked this pull request as ready for review May 31, 2026 14:59

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

greptile-apps Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/libs/consensus/v2/routines/broadcastBlockHash.ts

Comment thread src/libs/consensus/v2/routines/broadcastBlockHash.ts

Comment thread src/libs/consensus/v2/routines/broadcastBlockHash.ts

greptile-apps Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/libs/consensus/v2/routines/broadcastBlockHash.ts

Comment thread src/libs/consensus/v2/routines/broadcastBlockHash.ts

Comment thread src/libs/consensus/v2/routines/broadcastBlockHash.ts

linear Bot mentioned this pull request Jun 1, 2026

Re-validate dev environment after reboot #890

Open

Shitikyan mentioned this pull request Jun 1, 2026

(closed) #892

Closed

tcsenpai merged commit f62a609 into stabilisation Jun 1, 2026
4 checks passed

tcsenpai mentioned this pull request Jun 1, 2026

test(devnet): cross-RPC nonce-replay e2e test for nonceEnforcement (batch E) #899

Open

3 tasks

Conversation

tcsenpai commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Plain-English summary

Technical detail

Three interacting bugs in the pre-existing implementation

1. Fire-and-forget `.then` chains

2. Return value contradicts BFT semantics

3. Concurrent writes to shared signatures map

Fix shape (Path C — full rewrite with explicit aggregation)

Why not Bun workers (asked during implementation)

Manual verification traces

Specifically wanted feedback on

Out of scope

Test plan

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

tcsenpai commented May 31, 2026

Uh oh!

qodo-code-review Bot commented May 31, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 31, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tcsenpai commented May 31, 2026

Uh oh!

tcsenpai commented May 31, 2026

Uh oh!

tcsenpai commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcsenpai commented May 31, 2026 •

edited

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading

greptile-apps Bot commented May 31, 2026 •

edited

Loading