-
Notifications
You must be signed in to change notification settings - Fork 23
fix(aperture): prevent racy getLatestBlockhash #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update.
Manual Deploy AvailableYou can trigger a manual deploy of this PR branch to testnet: Alternative: Comment
Comment updated automatically when the PR is synchronized. |
WalkthroughMoved block-update handling earlier in the processor select! loop and removed a duplicate branch; replaced latest-block storage with an ArcSwapAny-backed atomic pointer by introducing LastCachedBlock and updating BlocksCache to use it; added arc-swap dependency. Changes
Sequence Diagram(s)sequenceDiagram
participant Processor
participant BlockUpdateRx as block_update_rx
participant AccountRx as account_update_rx
participant TxRx as tx_status_update_rx
participant BlocksCache
rect rgba(135,206,250,0.08)
Note right of Processor: select! loop (new order)
end
block_update_rx->>Processor: Ok(latest block)
Processor->>BlocksCache: set_latest(LastCachedBlock)
BlocksCache-->>Processor: ack
alt account update arrives
account_update_rx->>Processor: AccountUpdate
Processor->>Processor: process account update
end
alt tx status arrives
TxRx->>Processor: TxStatusUpdate
Processor->>Processor: process tx status update
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
magicblock-aperture/src/state/blocks.rs (1)
91-107: RPC handlers lack guards for empty block cache caseThe concern is valid.
get_latest()returnsBlockHashInfo::default()when the cache is empty (at server startup before the first block update arrives), and the two RPC handlers that convert this directly toRpcBlockhashhave no guards:
get_latest_blockhashhandler (line 14): directly converts resultsimulate_transactionhandler (line 50): directly converts resultSince there's no explicit initialization wait before handling requests, a client request arriving before the first
BlockUpdatefrom the validator would receive a response with zero-valued blockhash/validity fields.Suggested fixes:
- Return
Option<BlockHashInfo>fromget_latest()to make the empty-cache case explicit, requiring callers to explicitly handle it- Or add validation in handlers to check for default/sentinel values before converting to
RpcBlockhash
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
magicblock-aperture/src/processor.rs(1 hunks)magicblock-aperture/src/state/blocks.rs(5 hunks)magicblock-aperture/src/state/mod.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
Applied to files:
magicblock-aperture/src/processor.rsmagicblock-aperture/src/state/blocks.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Applied to files:
magicblock-aperture/src/processor.rs
📚 Learning: 2025-10-28T13:15:42.706Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.
Applied to files:
magicblock-aperture/src/processor.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
Applied to files:
magicblock-aperture/src/processor.rs
📚 Learning: 2025-10-21T11:00:18.396Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/encoder.rs:176-187
Timestamp: 2025-10-21T11:00:18.396Z
Learning: In the magicblock validator, the current slot is always the root slot. The SlotEncoder in magicblock-aperture/src/encoder.rs correctly sets `root: slot` because there is no lag between current and root slots in this architecture.
Applied to files:
magicblock-aperture/src/state/blocks.rs
📚 Learning: 2025-11-04T10:48:00.070Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.
Applied to files:
magicblock-aperture/src/state/blocks.rs
🧬 Code graph analysis (3)
magicblock-aperture/src/state/mod.rs (2)
magicblock-aperture/src/processor.rs (1)
new(52-61)magicblock-aperture/src/state/blocks.rs (1)
new(59-73)
magicblock-aperture/src/processor.rs (2)
magicblock-aperture/src/state/subscriptions.rs (1)
send_slot(258-260)magicblock-aperture/src/state/blocks.rs (1)
set_latest(76-89)
magicblock-aperture/src/state/blocks.rs (1)
magicblock-aperture/src/state/mod.rs (1)
new(76-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_make_ci_test
- GitHub Check: Build Project
🔇 Additional comments (2)
magicblock-aperture/src/processor.rs (1)
85-99: Reordering block updates first inselect!looks goodPrioritizing
block_update_rxwithbiased;and updating subscriptions (send_slot) plus the blocks cache (set_latest) in a single arm matches the goal of making new blocks visible atomically from the processor’s point of view. This also removes the earlier duplication of the block‑update branch and keeps the loop easier to reason about.magicblock-aperture/src/state/mod.rs (1)
76-92: Verify and document BlocksCache initialization guarantee in startup sequenceThe concern in the original review is valid. RPC handlers call
get_latest()andblock_height()unconditionally without defensive checks, and these methods returnBlockHashInfo::default()(zero blockhash/validity) or0(zero slot) respectively whenlatestis null.For example, in
get_blocks.rs, the handler usesself.blocks.block_height()to determine the query range upper bound, and a zero return would cause incorrect slot range handling. Similarly,get_latest_blockhashdirectly usesget_latest()without checking for the uninitialized state.Action required: Verify and document in code comments or architecture notes that the first
BlockUpdateis guaranteed to be processed before the RPC server accepts client connections. If this guarantee does not hold, add defensive checks (e.g., return error/Option) at call sites or initializeBlocksCachewith a seed blockhash.
|
Alright. I see you also pushed the first commitI found the fix in the first commit much simpler and easier to reason about, but it needs two tweaks in my opinion. Here is my thought process: - let prev = self.latest.swap(last, Ordering::Release);
+ let prev = self.latest.swap(last, Ordering::AcqRel);
We need And the second change: -let latest = self.latest.load(Ordering::Relaxed);
+let latest = self.latest.load(Ordering::Acquire);Now with these two changes, it seems good as far as LOAD and STORE are concerned. but then...wait! danger
|
snawaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. Great use of arc_swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
magicblock-aperture/src/state/blocks.rs(5 hunks)magicblock-aperture/src/state/mod.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 650
File: magicblock-chainlink/src/submux/subscription_task.rs:13-99
Timestamp: 2025-11-20T08:57:07.189Z
Learning: In the magicblock-validator repository, avoid posting review comments that merely confirm code is correct or matches intended behavior without providing actionable feedback, suggestions for improvement, or identifying potential issues. Such confirmatory comments are considered unhelpful noise by the maintainers.
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
📚 Learning: 2025-11-04T10:48:00.070Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.
Applied to files:
magicblock-aperture/src/state/blocks.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
Applied to files:
magicblock-aperture/src/state/blocks.rsmagicblock-aperture/src/state/mod.rs
🧬 Code graph analysis (1)
magicblock-aperture/src/state/blocks.rs (2)
magicblock-aperture/src/state/mod.rs (1)
new(76-98)magicblock-aperture/src/processor.rs (1)
new(52-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run_make_ci_test
- GitHub Check: run_make_ci_lint
- GitHub Check: Build Project
This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved block update processing for more timely and reliable event handling by reordering event handling and removing duplicate paths * Replaced prior block state tracking with a lock-free, atomic approach to reduce contention and improve performance * Simplified initialization flow by consolidating how the latest block state is provided * **Chores** * Introduced an atomic caching library to support the new lock-free latest-block tracking <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
* master: feat: use latest svm version (#657) chore: update solana account (#660) fix: better transaction diagnostics & rent exemption check (#642) chore: add access-control-max-age header to cors (#654) fix(aperture): prevent racy getLatestBlockhash (#649) fix: await until sub is established and perform them in parallel (#650) feat: persist all accounts (#648)
This fixes the issue when the blockhash returned to the client didn't exist in the cache do to timing differences between block update and cache update. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved block update processing for more timely and reliable event handling by reordering event handling and removing duplicate paths * Replaced prior block state tracking with a lock-free, atomic approach to reduce contention and improve performance * Simplified initialization flow by consolidating how the latest block state is provided * **Chores** * Introduced an atomic caching library to support the new lock-free latest-block tracking <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This fixes the issue when the blockhash returned
to the client didn't exist in the cache do to timing differences between block update and cache update.
Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.