Skip to content

Conversation

@bmuddha
Copy link
Contributor

@bmuddha bmuddha commented Nov 20, 2025

This fixes the issue when the blockhash returned
to the client didn't exist in the cache do to timing differences between block update and cache update.

Summary by CodeRabbit

  • Refactor

    • Improved block update processing for more timely and reliable event handling by reordering event handling and removing duplicate paths
    • Replaced prior block state tracking with a lock-free, atomic approach to reduce contention and improve performance
    • Simplified initialization flow by consolidating how the latest block state is provided
  • Chores

    • Introduced an atomic caching library to support the new lock-free latest-block tracking

✏️ Tip: You can customize this high-level summary in your review settings.

This fixes the issue when the blockhash returned
to the client didn't exist in the cache do to timing
differences between block update and cache update.
@bmuddha bmuddha requested review from snawaz and thlorenz November 20, 2025 07:51
@github-actions
Copy link

github-actions bot commented Nov 20, 2025

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

Walkthrough

Moved block-update handling earlier in the processor select! loop and removed a duplicate branch; replaced latest-block storage with an ArcSwapAny-backed atomic pointer by introducing LastCachedBlock and updating BlocksCache to use it; added arc-swap dependency.

Changes

Cohort / File(s) Summary
Event loop reorganization
magicblock-aperture/src/processor.rs
Moved block update handling to the top of the select! loop and removed the later duplicate block-update branch; other event branches unchanged except for ordering relative to blocks.
Block cache refactor
magicblock-aperture/src/state/blocks.rs
Replaced previous latest storage with ArcSwapAny<Arc<LastCachedBlock>>; added LastCachedBlock { blockhash, slot }; changed BlocksCache::new signature to accept LastCachedBlock; updated set_latest, get_latest, and BlockHashInfo (derive Default, maintain hash/validity/slot); adjusted RpcBlockhash conversion accordingly.
State initialization update
magicblock-aperture/src/state/mod.rs
Load latest block via ledger.latest_block.load() and construct/passthrough a LastCachedBlock into BlocksCache::new(blocktime, latest) during SharedState initialization; updated imports to include LastCachedBlock.
Dependency addition
magicblock-aperture/Cargo.toml
Added dependency arc-swap = { workspace = true }.

Sequence Diagram(s)

sequenceDiagram
  participant Processor
  participant BlockUpdateRx as block_update_rx
  participant AccountRx as account_update_rx
  participant TxRx as tx_status_update_rx
  participant BlocksCache

  rect rgba(135,206,250,0.08)
  Note right of Processor: select! loop (new order)
  end

  block_update_rx->>Processor: Ok(latest block)
  Processor->>BlocksCache: set_latest(LastCachedBlock)
  BlocksCache-->>Processor: ack
  alt account update arrives
    account_update_rx->>Processor: AccountUpdate
    Processor->>Processor: process account update
  end
  alt tx status arrives
    TxRx->>Processor: TxStatusUpdate
    Processor->>Processor: process tx status update
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas needing extra attention:
    • magicblock-aperture/src/state/blocks.rs — correctness of ArcSwapAny usage, cloning/lifetimes of Arc, and Default derivation effects on BlockHashInfo.
    • magicblock-aperture/src/processor.rs — verify behavioral equivalence after reordering and absence of race conditions.
    • magicblock-aperture/Cargo.toml — workspace dependency resolution for arc-swap.

Suggested reviewers

  • thlorenz

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main fix: preventing a race condition in getLatestBlockhash that caused cache misses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bmuddha/fix/blockhash-not-found

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
magicblock-aperture/src/state/blocks.rs (1)

91-107: RPC handlers lack guards for empty block cache case

The concern is valid. get_latest() returns BlockHashInfo::default() when the cache is empty (at server startup before the first block update arrives), and the two RPC handlers that convert this directly to RpcBlockhash have no guards:

  • get_latest_blockhash handler (line 14): directly converts result
  • simulate_transaction handler (line 50): directly converts result

Since there's no explicit initialization wait before handling requests, a client request arriving before the first BlockUpdate from the validator would receive a response with zero-valued blockhash/validity fields.

Suggested fixes:

  • Return Option<BlockHashInfo> from get_latest() to make the empty-cache case explicit, requiring callers to explicitly handle it
  • Or add validation in handlers to check for default/sentinel values before converting to RpcBlockhash
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb08ebd and d24885f.

📒 Files selected for processing (3)
  • magicblock-aperture/src/processor.rs (1 hunks)
  • magicblock-aperture/src/state/blocks.rs (5 hunks)
  • magicblock-aperture/src/state/mod.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-aperture/src/processor.rs
  • magicblock-aperture/src/state/blocks.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-aperture/src/processor.rs
📚 Learning: 2025-10-28T13:15:42.706Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.

Applied to files:

  • magicblock-aperture/src/processor.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-aperture/src/processor.rs
📚 Learning: 2025-10-21T11:00:18.396Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/encoder.rs:176-187
Timestamp: 2025-10-21T11:00:18.396Z
Learning: In the magicblock validator, the current slot is always the root slot. The SlotEncoder in magicblock-aperture/src/encoder.rs correctly sets `root: slot` because there is no lag between current and root slots in this architecture.

Applied to files:

  • magicblock-aperture/src/state/blocks.rs
📚 Learning: 2025-11-04T10:48:00.070Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.

Applied to files:

  • magicblock-aperture/src/state/blocks.rs
🧬 Code graph analysis (3)
magicblock-aperture/src/state/mod.rs (2)
magicblock-aperture/src/processor.rs (1)
  • new (52-61)
magicblock-aperture/src/state/blocks.rs (1)
  • new (59-73)
magicblock-aperture/src/processor.rs (2)
magicblock-aperture/src/state/subscriptions.rs (1)
  • send_slot (258-260)
magicblock-aperture/src/state/blocks.rs (1)
  • set_latest (76-89)
magicblock-aperture/src/state/blocks.rs (1)
magicblock-aperture/src/state/mod.rs (1)
  • new (76-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: run_make_ci_test
  • GitHub Check: Build Project
🔇 Additional comments (2)
magicblock-aperture/src/processor.rs (1)

85-99: Reordering block updates first in select! looks good

Prioritizing block_update_rx with biased; and updating subscriptions (send_slot) plus the blocks cache (set_latest) in a single arm matches the goal of making new blocks visible atomically from the processor’s point of view. This also removes the earlier duplication of the block‑update branch and keeps the loop easier to reason about.

magicblock-aperture/src/state/mod.rs (1)

76-92: Verify and document BlocksCache initialization guarantee in startup sequence

The concern in the original review is valid. RPC handlers call get_latest() and block_height() unconditionally without defensive checks, and these methods return BlockHashInfo::default() (zero blockhash/validity) or 0 (zero slot) respectively when latest is null.

For example, in get_blocks.rs, the handler uses self.blocks.block_height() to determine the query range upper bound, and a zero return would cause incorrect slot range handling. Similarly, get_latest_blockhash directly uses get_latest() without checking for the uninitialized state.

Action required: Verify and document in code comments or architecture notes that the first BlockUpdate is guaranteed to be processed before the RPC server accepts client connections. If this guarantee does not hold, add defensive checks (e.g., return error/Option) at call sites or initialize BlocksCache with a seed blockhash.

@snawaz
Copy link

snawaz commented Nov 20, 2025

Alright. I see you also pushed the fix: UB with use after free commit. I’m not familiar enough with arc_swap internals to properly review that one right now.... so if you’re confident with it, feel free to merge ... I might revisit it later once I’ve gone through how arc_swap works under the hood.

first commit

I found the fix in the first commit much simpler and easier to reason about, but it needs two tweaks in my opinion. Here is my thought process:

- let prev = self.latest.swap(last, Ordering::Release);
+ let prev = self.latest.swap(last, Ordering::AcqRel);

swap does two things: a LOAD and a STORE. With Release, only the STORE is Release but the LOAD becomes Relaxed. The docs says,

Note that using Acquire makes the store part of this operation Relaxed, and using Release makes the load part Relaxed.

We need Acquire on that LOAD to safely reclaim the old pointer, and we need Release to publish the new one. So AcqRel is the correct ordering here.

And the second change:

-let latest = self.latest.load(Ordering::Relaxed);
+let latest = self.latest.load(Ordering::Acquire);

Now with these two changes, it seems good as far as LOAD and STORE are concerned.

but then...wait!

danger ⚠️

But after I finished writing this comment, I realized there is one danger still lurking around (even with the two tweaks): some code might still be holding a reference to the now-deleted pointer. So I’m guessing arc_swap is exactly what handles that part as well. 🤔

Copy link

@snawaz snawaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Great use of arc_swap.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62b47f3 and 68a6e26.

📒 Files selected for processing (2)
  • magicblock-aperture/src/state/blocks.rs (5 hunks)
  • magicblock-aperture/src/state/mod.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 650
File: magicblock-chainlink/src/submux/subscription_task.rs:13-99
Timestamp: 2025-11-20T08:57:07.189Z
Learning: In the magicblock-validator repository, avoid posting review comments that merely confirm code is correct or matches intended behavior without providing actionable feedback, suggestions for improvement, or identifying potential issues. Such confirmatory comments are considered unhelpful noise by the maintainers.
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
📚 Learning: 2025-11-04T10:48:00.070Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.

Applied to files:

  • magicblock-aperture/src/state/blocks.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-aperture/src/state/blocks.rs
  • magicblock-aperture/src/state/mod.rs
🧬 Code graph analysis (1)
magicblock-aperture/src/state/blocks.rs (2)
magicblock-aperture/src/state/mod.rs (1)
  • new (76-98)
magicblock-aperture/src/processor.rs (1)
  • new (52-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run_make_ci_test
  • GitHub Check: run_make_ci_lint
  • GitHub Check: Build Project

@bmuddha bmuddha merged commit ce85be3 into master Nov 20, 2025
18 checks passed
@bmuddha bmuddha deleted the bmuddha/fix/blockhash-not-found branch November 20, 2025 11:28
bmuddha added a commit that referenced this pull request Nov 20, 2025
This fixes the issue when the blockhash returned
to the client didn't exist in the cache do to timing differences between
block update and cache update.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Improved block update processing for more timely and reliable event
handling by reordering event handling and removing duplicate paths
* Replaced prior block state tracking with a lock-free, atomic approach
to reduce contention and improve performance
* Simplified initialization flow by consolidating how the latest block
state is provided

* **Chores**
* Introduced an atomic caching library to support the new lock-free
latest-block tracking

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
thlorenz added a commit that referenced this pull request Nov 21, 2025
* master:
  feat: use latest svm version (#657)
  chore: update solana account (#660)
  fix: better transaction diagnostics & rent exemption check (#642)
  chore: add access-control-max-age header to cors (#654)
  fix(aperture): prevent racy getLatestBlockhash (#649)
  fix: await until sub is established and perform them in parallel (#650)
  feat: persist all accounts (#648)
Dodecahedr0x pushed a commit that referenced this pull request Nov 24, 2025
This fixes the issue when the blockhash returned
to the client didn't exist in the cache do to timing differences between
block update and cache update.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Improved block update processing for more timely and reliable event
handling by reordering event handling and removing duplicate paths
* Replaced prior block state tracking with a lock-free, atomic approach
to reduce contention and improve performance
* Simplified initialization flow by consolidating how the latest block
state is provided

* **Chores**
* Introduced an atomic caching library to support the new lock-free
latest-block tracking

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants