fix(aperture): prevent racy getLatestBlockhash #649

bmuddha · 2025-11-20T07:51:27Z

This fixes the issue when the blockhash returned
to the client didn't exist in the cache do to timing differences between block update and cache update.

Summary by CodeRabbit

Refactor
- Improved block update processing for more timely and reliable event handling by reordering event handling and removing duplicate paths
- Replaced prior block state tracking with a lock-free, atomic approach to reduce contention and improve performance
- Simplified initialization flow by consolidating how the latest block state is provided
Chores
- Introduced an atomic caching library to support the new lock-free latest-block tracking

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update.

github-actions · 2025-11-20T07:51:36Z

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

coderabbitai · 2025-11-20T07:51:42Z

Walkthrough

Moved block-update handling earlier in the processor select! loop and removed a duplicate branch; replaced latest-block storage with an ArcSwapAny-backed atomic pointer by introducing LastCachedBlock and updating BlocksCache to use it; added arc-swap dependency.

Changes

Cohort / File(s)	Summary
Event loop reorganization `magicblock-aperture/src/processor.rs`	Moved block update handling to the top of the select! loop and removed the later duplicate block-update branch; other event branches unchanged except for ordering relative to blocks.
Block cache refactor `magicblock-aperture/src/state/blocks.rs`	Replaced previous latest storage with `ArcSwapAny<Arc<LastCachedBlock>>`; added `LastCachedBlock { blockhash, slot }`; changed `BlocksCache::new` signature to accept `LastCachedBlock`; updated `set_latest`, `get_latest`, and `BlockHashInfo` (derive Default, maintain hash/validity/slot); adjusted RpcBlockhash conversion accordingly.
State initialization update `magicblock-aperture/src/state/mod.rs`	Load latest block via `ledger.latest_block.load()` and construct/passthrough a `LastCachedBlock` into `BlocksCache::new(blocktime, latest)` during SharedState initialization; updated imports to include `LastCachedBlock`.
Dependency addition `magicblock-aperture/Cargo.toml`	Added dependency `arc-swap = { workspace = true }`.

Sequence Diagram(s)

sequenceDiagram
  participant Processor
  participant BlockUpdateRx as block_update_rx
  participant AccountRx as account_update_rx
  participant TxRx as tx_status_update_rx
  participant BlocksCache

  rect rgba(135,206,250,0.08)
  Note right of Processor: select! loop (new order)
  end

  block_update_rx->>Processor: Ok(latest block)
  Processor->>BlocksCache: set_latest(LastCachedBlock)
  BlocksCache-->>Processor: ack
  alt account update arrives
    account_update_rx->>Processor: AccountUpdate
    Processor->>Processor: process account update
  end
  alt tx status arrives
    TxRx->>Processor: TxStatusUpdate
    Processor->>Processor: process tx status update
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas needing extra attention:
- magicblock-aperture/src/state/blocks.rs — correctness of ArcSwapAny usage, cloning/lifetimes of Arc, and Default derivation effects on BlockHashInfo.
- magicblock-aperture/src/processor.rs — verify behavioral equivalence after reordering and absence of race conditions.
- magicblock-aperture/Cargo.toml — workspace dependency resolution for arc-swap.

Suggested reviewers

thlorenz

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main fix: preventing a race condition in getLatestBlockhash that caused cache misses.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bmuddha/fix/blockhash-not-found

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

magicblock-aperture/src/state/blocks.rs (1)

91-107: RPC handlers lack guards for empty block cache case

The concern is valid. get_latest() returns BlockHashInfo::default() when the cache is empty (at server startup before the first block update arrives), and the two RPC handlers that convert this directly to RpcBlockhash have no guards:

get_latest_blockhash handler (line 14): directly converts result

simulate_transaction handler (line 50): directly converts result

Since there's no explicit initialization wait before handling requests, a client request arriving before the first BlockUpdate from the validator would receive a response with zero-valued blockhash/validity fields.

Suggested fixes:

Return Option<BlockHashInfo> from get_latest() to make the empty-cache case explicit, requiring callers to explicitly handle it

Or add validation in handlers to check for default/sentinel values before converting to RpcBlockhash

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb08ebd and d24885f.

📒 Files selected for processing (3)

magicblock-aperture/src/processor.rs (1 hunks)
magicblock-aperture/src/state/blocks.rs (5 hunks)
magicblock-aperture/src/state/mod.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (6)

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-aperture/src/processor.rs
magicblock-aperture/src/state/blocks.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-aperture/src/processor.rs

📚 Learning: 2025-10-28T13:15:42.706Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.

Applied to files:

magicblock-aperture/src/processor.rs

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-aperture/src/processor.rs

📚 Learning: 2025-10-21T11:00:18.396Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/encoder.rs:176-187
Timestamp: 2025-10-21T11:00:18.396Z
Learning: In the magicblock validator, the current slot is always the root slot. The SlotEncoder in magicblock-aperture/src/encoder.rs correctly sets `root: slot` because there is no lag between current and root slots in this architecture.

Applied to files:

magicblock-aperture/src/state/blocks.rs

📚 Learning: 2025-11-04T10:48:00.070Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.

Applied to files:

magicblock-aperture/src/state/blocks.rs

🧬 Code graph analysis (3)

magicblock-aperture/src/state/mod.rs (2)

magicblock-aperture/src/processor.rs (1)

new (52-61)

magicblock-aperture/src/state/blocks.rs (1)

new (59-73)

magicblock-aperture/src/processor.rs (2)

magicblock-aperture/src/state/subscriptions.rs (1)

send_slot (258-260)

magicblock-aperture/src/state/blocks.rs (1)

set_latest (76-89)

magicblock-aperture/src/state/blocks.rs (1)

magicblock-aperture/src/state/mod.rs (1)

new (76-93)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run_make_ci_test
GitHub Check: Build Project

🔇 Additional comments (2)

magicblock-aperture/src/processor.rs (1)

85-99: Reordering block updates first in select! looks good

Prioritizing block_update_rx with biased; and updating subscriptions (send_slot) plus the blocks cache (set_latest) in a single arm matches the goal of making new blocks visible atomically from the processor’s point of view. This also removes the earlier duplication of the block‑update branch and keeps the loop easier to reason about.

magicblock-aperture/src/state/mod.rs (1)

76-92: Verify and document BlocksCache initialization guarantee in startup sequence

The concern in the original review is valid. RPC handlers call get_latest() and block_height() unconditionally without defensive checks, and these methods return BlockHashInfo::default() (zero blockhash/validity) or 0 (zero slot) respectively when latest is null.

For example, in get_blocks.rs, the handler uses self.blocks.block_height() to determine the query range upper bound, and a zero return would cause incorrect slot range handling. Similarly, get_latest_blockhash directly uses get_latest() without checking for the uninitialized state.

Action required: Verify and document in code comments or architecture notes that the first BlockUpdate is guaranteed to be processed before the RPC server accepts client connections. If this guarantee does not hold, add defensive checks (e.g., return error/Option) at call sites or initialize BlocksCache with a seed blockhash.

magicblock-aperture/src/state/blocks.rs

snawaz · 2025-11-20T09:28:29Z

Alright. I see you also pushed the fix: UB with use after free commit. I’m not familiar enough with arc_swap internals to properly review that one right now.... so if you’re confident with it, feel free to merge ... I might revisit it later once I’ve gone through how arc_swap works under the hood.

first commit

I found the fix in the first commit much simpler and easier to reason about, but it needs two tweaks in my opinion. Here is my thought process:

- let prev = self.latest.swap(last, Ordering::Release);
+ let prev = self.latest.swap(last, Ordering::AcqRel);

swap does two things: a LOAD and a STORE. With Release, only the STORE is Release but the LOAD becomes Relaxed. The docs says,

Note that using Acquire makes the store part of this operation Relaxed, and using Release makes the load part Relaxed.

We need Acquire on that LOAD to safely reclaim the old pointer, and we need Release to publish the new one. So AcqRel is the correct ordering here.

And the second change:

-let latest = self.latest.load(Ordering::Relaxed);
+let latest = self.latest.load(Ordering::Acquire);

Now with these two changes, it seems good as far as LOAD and STORE are concerned.

but then...wait!

danger ⚠️

But after I finished writing this comment, I realized there is one danger still lurking around (even with the two tweaks): some code might still be holding a reference to the now-deleted pointer. So I’m guessing arc_swap is exactly what handles that part as well. 🤔

snawaz

looks good to me. Great use of arc_swap.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62b47f3 and 68a6e26.

📒 Files selected for processing (2)

magicblock-aperture/src/state/blocks.rs (5 hunks)
magicblock-aperture/src/state/mod.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 650
File: magicblock-chainlink/src/submux/subscription_task.rs:13-99
Timestamp: 2025-11-20T08:57:07.189Z
Learning: In the magicblock-validator repository, avoid posting review comments that merely confirm code is correct or matches intended behavior without providing actionable feedback, suggestions for improvement, or identifying potential issues. Such confirmatory comments are considered unhelpful noise by the maintainers.

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

📚 Learning: 2025-11-04T10:48:00.070Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.

Applied to files:

magicblock-aperture/src/state/blocks.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-aperture/src/state/blocks.rs
magicblock-aperture/src/state/mod.rs

🧬 Code graph analysis (1)

magicblock-aperture/src/state/blocks.rs (2)

magicblock-aperture/src/state/mod.rs (1)

new (76-98)

magicblock-aperture/src/processor.rs (1)

new (52-61)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run_make_ci_test
GitHub Check: run_make_ci_lint
GitHub Check: Build Project

magicblock-aperture/src/state/blocks.rs

magicblock-aperture/src/state/mod.rs

This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update.  ## Summary by CodeRabbit * **Refactor** * Improved block update processing for more timely and reliable event handling by reordering event handling and removing duplicate paths * Replaced prior block state tracking with a lock-free, atomic approach to reduce contention and improve performance * Simplified initialization flow by consolidating how the latest block state is provided * **Chores** * Introduced an atomic caching library to support the new lock-free latest-block tracking <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

* master: feat: use latest svm version (#657) chore: update solana account (#660) fix: better transaction diagnostics & rent exemption check (#642) chore: add access-control-max-age header to cors (#654) fix(aperture): prevent racy getLatestBlockhash (#649) fix: await until sub is established and perform them in parallel (#650) feat: persist all accounts (#648)

This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update.  ## Summary by CodeRabbit * **Refactor** * Improved block update processing for more timely and reliable event handling by reordering event handling and removing duplicate paths * Replaced prior block state tracking with a lock-free, atomic approach to reduce contention and improve performance * Simplified initialization flow by consolidating how the latest block state is provided * **Chores** * Introduced an atomic caching library to support the new lock-free latest-block tracking <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

fix(aperture): prevent racy getLatestBlockhash

d24885f

This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update.

bmuddha requested review from snawaz and thlorenz November 20, 2025 07:51

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

magicblock-aperture/src/state/blocks.rs Show resolved Hide resolved

GabrielePicco approved these changes Nov 20, 2025

View reviewed changes

fix: UB with use after free

62b47f3

snawaz approved these changes Nov 20, 2025

View reviewed changes

fix: init the cache with latest blockhash

68a6e26

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

magicblock-aperture/src/state/blocks.rs Show resolved Hide resolved

magicblock-aperture/src/state/mod.rs Show resolved Hide resolved

bmuddha merged commit ce85be3 into master Nov 20, 2025
18 checks passed

bmuddha deleted the bmuddha/fix/blockhash-not-found branch November 20, 2025 11:28

coderabbitai bot mentioned this pull request Nov 20, 2025

fix: better transaction diagnostics & rent exemption check #642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(aperture): prevent racy getLatestBlockhash #649

fix(aperture): prevent racy getLatestBlockhash #649

Uh oh!

bmuddha commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

snawaz commented Nov 20, 2025 •

edited

Loading

Uh oh!

snawaz left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix(aperture): prevent racy getLatestBlockhash #649

fix(aperture): prevent racy getLatestBlockhash #649

Uh oh!

Conversation

bmuddha commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manual Deploy Available

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snawaz commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

first commit

danger ⚠️

Uh oh!

snawaz left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bmuddha commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

github-actions bot commented Nov 20, 2025 •

edited

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

snawaz commented Nov 20, 2025 •

edited

Loading