Skip to content

fix(core): B3b — recompute total_minted from blocks when blob is stale#712

Merged
github-actions[bot] merged 1 commit into
mainfrom
fix/b3-reconcile-total-minted-from-blocks
May 25, 2026
Merged

fix(core): B3b — recompute total_minted from blocks when blob is stale#712
github-actions[bot] merged 1 commit into
mainfrom
fix/b3-reconcile-total-minted-from-blocks

Conversation

@satyakwok
Copy link
Copy Markdown
Collaborator

@satyakwok satyakwok commented May 25, 2026

load_blockchain runs Patch B3 to overwrite stale accounts from the canonical trie. But total_minted lives only in the bincode blob — no trie key, no per-block index. So when B3 fires (trie ahead of blob), accounts snap to canonical but total_minted stays at whatever the last save_blockchain wrote.

That alone wouldn't matter if save_blockchain were always called in lock-step with apply, but there are edges where it isn't: an offline chain.db moved between hosts via cp without first halting, a save that raced a halt window, or a B2 replay path where the per-block apply updates total_minted but the corrupted-from-the-start blob doesn't get rewritten until after B3.

In the 2026-05-24 mainnet incident, treasury and beacon ended up with identical accounts (B3 reconciled both from the same trie) but differing total_minted. STATE-FP fp = SHA256(acc + total_minted) then disagreed across two otherwise-identical validators. Same block content, same state-root in the trie, but the fingerprint trace fired.

This commit adds B3b right after B3:

  • recompute_total_minted_from_blocks(bc) walks every persisted block 1..=tip, sums block.coinbase().amount, and adds TOTAL_PREMINE. C-04 (block_executor.rs:336) enforces coinbase.amount == reward, so the chain has no other source of newly-minted supply — this sum is the canonical value of total_minted at the tip.
  • load_blockchain compares the blob value against the recomputed value and warns + overwrites on mismatch.
  • Save-back happens via the existing atomic B1 path when either B3 repaired an account OR B3b repaired total_minted.

Cost: O(N) MDBX block loads at boot. At mainnet h≈2.2M this is ~30–60s on warm SSD. Acceptable for a once-per-boot sanity pass that only writes back when divergence is detected. Missing blocks are skipped with a warning, matching the B3 fail-soft posture earned during the 2026-05-20 testnet trie-gap incident.

Two unit tests added:

  • test_recompute_total_minted_matches_block_sum — ground-truth check that the helper matches TOTAL_PREMINE + N * BLOCK_REWARD for a freshly-mined chain.
  • test_b3b_repairs_stale_total_minted_on_load — end-to-end via the production load_blockchain path: persist a blob with intentionally- stale total_minted, reload, assert the loaded value is canonical. Fails on main; passes after this commit.

Summary

Risk tier

Check ONE:

  • 🟢 Low — docs, tools/, tests/, CI configs, dependency patch bumps in dev-only crates, comments
  • 🟡 Medium — non-consensus production code (RPC handlers, network plumbing, observability, ops scripts)
  • 🟠 High — consensus-critical crates (sentrix-core, sentrix-trie, sentrix-staking, sentrix-bft), block_executor, apply_block_*, state_root path
  • 🔴 Critical — Voyager activation, fork-height changes, hard-fork rollouts, anything that flips env vars on mainnet

Required by tier

🟢 Low — minimum bar

  • CI green (tests + clippy + audit + gitleaks)

🟡 Medium — adds

  • New public function or behavioural change has at least one corresponding #[test] in same PR
  • Brief description of how this was tested (manual run, integration test, etc.)

🟠 High — adds

  • Regression test that fails on main and passes with this change — paste test name in PR body
  • Designed against documented invariant (link the audit/runbook/design doc)
  • Fresh-brain review by someone other than the author (per the consensus-change review checklist)
  • Single conceptual unit per PR (no bundling — bundling consensus changes burned us on v2.1.12 → 2026-04-25 livelock)

🔴 Critical — adds

  • Testnet rehearsal completed with success criteria + log evidence linked here
  • Bake window observed: minimum 2h on testnet at the same configuration before mainnet
  • Coordinated rollback plan documented in PR body — exact commands operator runs if it fails
  • Operator sign-off at activation moment (not just PR approval — separate moment for the actual flip)

Test plan

Rollback plan

Related

Summary by CodeRabbit

  • Bug Fixes
    • Added automatic validation and repair of the blockchain state on startup to correct inaccurate total-minted tracking.
    • Repaired state is now persisted when either account reconciliation or total-minted correction occurs; logging clarifies both outcomes.
    • Startup recovery is more resilient: missing persisted blocks are skipped with warnings instead of causing startup failure.

Review Change Stack

@github-actions github-actions Bot enabled auto-merge (squash) May 25, 2026 11:25
@satyakwok satyakwok self-assigned this May 25, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'sentrix-trie benches'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite Current: 9278cb3 Previous: c6ae261 Ratio
insert_single 172113 ns/iter (± 15633) 83470 ns/iter (± 10721) 2.06

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

This PR adds load-time repair logic to detect and fix stale total_minted snapshots during blockchain initialization. A new helper function recompute_total_minted_from_blocks computes the canonical total by summing coinbase amounts from all persisted blocks plus genesis premine. The load_blockchain function now invokes this helper after trie reconciliation, compares the computed value against the persisted blob, logs a warning when they diverge, and extends the persist-repair flow to trigger whenever either account reconciliation or total_minted was stale. Tests validate that the helper matches block-sum ground truth and that load_blockchain successfully repairs corrupted snapshots.

Sequence Diagram(s)

sequenceDiagram
  participant Loader as load_blockchain
  participant Recompute as recompute_total_minted_from_blocks
  participant BlockStore as PersistedBlocks(MDBX)
  participant Blockchain as in-memory Blockchain

  Loader->>Recompute: request recomputed_total(tip, TOTAL_PREMINE)
  Recompute->>BlockStore: load block at height i
  BlockStore-->>Recompute: Block{coinbase.amount} or missing (warn & skip)
  Recompute-->>Loader: recomputed_total
  Loader->>Blockchain: compare recomputed_total vs persisted_blob.total_minted
  alt mismatch
    Loader->>Blockchain: overwrite bc.total_minted with recomputed_total
    Loader->>BlockStore: persist repaired state (if needed)
  else match
    Loader-->>Blockchain: no repair needed
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • sentrix-labs/sentrix#696: Modifies boot-time trie reconciliation that this PR relies on; both touch load-time reconciliation and persist-repair decisions.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: implementing Patch B3b to recompute total_minted from persisted blocks when the bincode blob snapshot is stale, addressing the root cause of state-fingerprint mismatches.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the problem context, specific solution details, performance implications, and test coverage. However, the formal template sections (Summary, Risk tier, Test plan, Rollback plan) are incompletely filled with minimal details under 'Test plan'.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/b3-reconcile-total-minted-from-blocks

Comment @coderabbitai help to get the list of available commands and usage tips.

`load_blockchain` runs Patch B3 to overwrite stale `accounts` from the
canonical trie. But `total_minted` lives only in the bincode blob — no
trie key, no per-block index. So when B3 fires (trie ahead of blob),
`accounts` snap to canonical but `total_minted` stays at whatever the
last save_blockchain wrote.

That alone wouldn't matter if save_blockchain were always called
in lock-step with apply, but there are edges where it isn't: an offline
chain.db moved between hosts via cp without first halting, a save that
raced a halt window, or a B2 replay path where the per-block apply
updates `total_minted` but the corrupted-from-the-start blob doesn't
get rewritten until after B3.

In the 2026-05-24 mainnet incident, treasury and beacon ended up with
identical `accounts` (B3 reconciled both from the same trie) but
differing `total_minted`. `STATE-FP fp = SHA256(acc + total_minted)`
then disagreed across two otherwise-identical validators. Same block
content, same state-root in the trie, but the fingerprint trace fired.

This commit adds B3b right after B3:

  * `recompute_total_minted_from_blocks(bc)` walks every persisted block
    1..=tip, sums `block.coinbase().amount`, and adds `TOTAL_PREMINE`.
    C-04 (`block_executor.rs:336`) enforces `coinbase.amount == reward`,
    so the chain has no other source of newly-minted supply — this sum
    is the canonical value of `total_minted` at the tip.
  * `load_blockchain` compares the blob value against the recomputed
    value and warns + overwrites on mismatch.
  * Save-back happens via the existing atomic B1 path when either B3
    repaired an account OR B3b repaired `total_minted`.

Cost: O(N) MDBX block loads at boot. At mainnet h~2.2M this is ~30-60s
on warm SSD. Acceptable for a once-per-boot sanity pass that only
writes back when divergence is detected. Missing blocks are skipped
with a warning, matching the B3 fail-soft posture earned during the
2026-05-20 testnet trie-gap incident.

Two unit tests added:
  - `test_recompute_total_minted_matches_block_sum` — ground-truth
    check that the helper matches `TOTAL_PREMINE + N * BLOCK_REWARD`
    for a freshly-mined chain.
  - `test_b3b_repairs_stale_total_minted_on_load` — end-to-end via the
    production `load_blockchain` path: persist a blob with intentionally-
    stale `total_minted`, reload, assert the loaded value is canonical.
    Fails on main; passes after this commit.
@satyakwok satyakwok force-pushed the fix/b3-reconcile-total-minted-from-blocks branch from 9278cb3 to 2866b11 Compare May 25, 2026 13:41
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/sentrix-core/src/storage.rs (1)

272-278: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Debug log format inconsistency.

The format string "{}/{} accounts checked" uses repaired as the first argument, which will print something like "0/150 accounts checked". This reads awkwardly — when nothing is repaired, the denominator-style fraction is misleading. Consider simplifying to just show the count checked:

📝 Suggested simplification
         } else {
             tracing::debug!(
-                "load_blockchain B3: {}/{} accounts checked, total_minted matches; \
+                "load_blockchain B3: {} accounts checked, total_minted matches; \
                  nothing to reconcile at height {}",
-                repaired,
                 checked,
                 bc.height()
             );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-core/src/storage.rs` around lines 272 - 278, The debug line in
function load_blockchain uses the format "{}/{} accounts checked" with repaired
as the first arg which reads awkwardly; update the tracing::debug call in
load_blockchain to simplify the message (e.g., "{} accounts checked" and pass
checked) or explicitly label both values (e.g., "repaired: {} checked: {}") and
pass repaired, checked, keeping the rest of the context (total_minted/height)
unchanged so the log reads clearly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sentrix-core/src/storage.rs`:
- Around line 891-908: After deleting the MDBX entry, add an explicit assertion
that the persisted block entry is gone to ensure the test premise is exercised:
call storage.mdbx_arc() (same as used to delete), check TABLE_META for the key
corresponding to format!("block:{}", 2) (i.e. the raw key used b"block:2" in the
diff) using the MDBX read/get API and assert that the lookup returns None or an
empty result before running recompute_total_minted_from_blocks(&bc); this makes
the test fail clearly if the key schema or storage behavior changes.

---

Outside diff comments:
In `@crates/sentrix-core/src/storage.rs`:
- Around line 272-278: The debug line in function load_blockchain uses the
format "{}/{} accounts checked" with repaired as the first arg which reads
awkwardly; update the tracing::debug call in load_blockchain to simplify the
message (e.g., "{} accounts checked" and pass checked) or explicitly label both
values (e.g., "repaired: {} checked: {}") and pass repaired, checked, keeping
the rest of the context (total_minted/height) unchanged so the log reads
clearly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: a1ae5142-27d8-412c-92a7-c4e4c66b262d

📥 Commits

Reviewing files that changed from the base of the PR and between 9278cb3 and 2866b11.

📒 Files selected for processing (1)
  • crates/sentrix-core/src/storage.rs

Comment on lines +891 to +908
// Yank block #2 out of MDBX. The block stays in `bc.chain`
// (in-memory window) but the persisted entry is gone, so the
// recompute loop hits `load_block(2) -> Ok(None)` and counts
// the gap.
let mdbx = storage.mdbx_arc();
mdbx.delete(sentrix_storage::tables::TABLE_META, b"block:2")
.unwrap();

// Sum should be `TOTAL_PREMINE + 2 * BLOCK_REWARD` (block 1 +
// block 3; block 2 was deleted).
let recomputed = storage
.recompute_total_minted_from_blocks(&bc)
.unwrap();
assert_eq!(
recomputed,
TOTAL_PREMINE + 2 * BLOCK_REWARD,
"missing block must be silently skipped (fail-soft per B3 precedent)"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how ChainStorage stores blocks — the test assumes "block:{height}" in TABLE_META
rg -n --type rust -C5 'save_block|TABLE_META|"block:' crates/sentrix-storage/src/

Repository: sentrix-labs/sentrix

Length of output: 24365


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '840,930p' crates/sentrix-core/src/storage.rs | nl -ba | sed -n '1,120p'

Repository: sentrix-labs/sentrix

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '840,930p' crates/sentrix-core/src/storage.rs | nl -ba | sed -n '1,120p'

Repository: sentrix-labs/sentrix

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '840,930p' crates/sentrix-core/src/storage.rs | cat -n | sed -n '1,130p'

Repository: sentrix-labs/sentrix

Length of output: 3457


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "fn\s+load_block\s*\(|load_block\(" crates/sentrix-core/src/storage.rs crates/sentrix-core/src -S

Repository: sentrix-labs/sentrix

Length of output: 2614


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '330,420p' crates/sentrix-core/src/storage.rs | cat -n

Repository: sentrix-labs/sentrix

Length of output: 4381


Add an assertion after the MDBX delete to ensure the missing-block premise is actually exercised

  • sentrix_storage::ChainStorage persists blocks in TABLE_META under keys format!("block:{}", index), so b"block:2" matches the current schema.
  • The test should still assert the block entry is gone, so if the key/schema ever changes the failure mode is explicit.
🛡️ Proposed guard assertion
         mdbx.delete(sentrix_storage::tables::TABLE_META, b"block:2")
             .unwrap();
 
+        // Verify delete succeeded — if this fails, the key format/schema is wrong.
+        assert!(
+            storage.load_block(2).unwrap().is_none(),
+            "delete must remove block:2 from MDBX for test to be valid"
+        );
+
         // Sum should be `TOTAL_PREMINE + 2 * BLOCK_REWARD` (block 1 +
         // block 3; block 2 was deleted).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-core/src/storage.rs` around lines 891 - 908, After deleting
the MDBX entry, add an explicit assertion that the persisted block entry is gone
to ensure the test premise is exercised: call storage.mdbx_arc() (same as used
to delete), check TABLE_META for the key corresponding to format!("block:{}", 2)
(i.e. the raw key used b"block:2" in the diff) using the MDBX read/get API and
assert that the lookup returns None or an empty result before running
recompute_total_minted_from_blocks(&bc); this makes the test fail clearly if the
key schema or storage behavior changes.

@github-actions github-actions Bot merged commit 7e4b8cb into main May 25, 2026
18 checks passed
github-actions Bot pushed a commit that referenced this pull request May 25, 2026
Ships PRs #711 (trie prune race augment-walk), #712 (B3b total_minted
self-heal), #713 (pay_one_signer sorted delegators), #714
(collect_reachable depth-aware empty check). See CHANGELOG entry for
the per-PR narrative and the 2026-05-24 incident root-cause analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant