Skip to content

Conversation

@GabrielePicco
Copy link
Collaborator

@GabrielePicco GabrielePicco commented Dec 5, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced concurrent account access handling to improve stability and prevent race conditions during simultaneous operations.

✏️ Tip: You can customize this high-level summary in your review settings.

Summary by CodeRabbit

  • Bug Fixes

    • Improved concurrent account access handling to reduce race conditions and return more consistent results under contention.
  • Refactor

    • Reworked internal retry and locking behavior for more reliable reads during concurrent operations, improving stability without changing external interfaces.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

📝 Walkthrough

Walkthrough

The read_locked path in the accounts module keeps the fast (no-concurrent-write) early return. The slow path now only applies to Borrowed accounts and uses a per-iteration retry loop: each iteration calls borrowed.reinit() to get a fresh account view, invokes the reader, then calls lock.relock() to obtain a fresh lock snapshot and check for changes. The loop retries if the lock indicates a change; if no change is detected the result is returned. The previous approach of mutating a reused account variable with a single pre-cloned lock was removed and inline comments were updated.

Suggested reviewers

  • bmuddha
  • taco-paco
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch picco/fix-per-account-livelock

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf52d6b and 6e8a535.

📒 Files selected for processing (1)
  • magicblock-core/src/link/accounts.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-accounts-db/src/lib.rs:63-72
Timestamp: 2025-10-21T10:34:59.140Z
Learning: In magicblock-validator, the AccountsDb "stop-the-world" synchronizer is managed at the processor/executor level, not at the AccountsDb API level. Transaction executors in magicblock-processor hold a read lock (sync.read()) for the duration of each slot and release it only at slot boundaries, ensuring all account writes happen under the read lock. Snapshot operations acquire a write lock, blocking until all executors release their read locks. This pattern ensures mutual exclusion between writes and snapshots without requiring read guards in AccountsDb write APIs.
Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 639
File: test-integration/test-committor-service/tests/test_ix_commit_local.rs:867-881
Timestamp: 2025-12-03T09:33:48.707Z
Learning: Repo: magicblock-labs/magicblock-validator PR: 639
Context: test-integration/test-committor-service/tests/test_ix_commit_local.rs (ix_commit_local)
Learning: The PhotonIndexer used for compressed account fetches (get_compressed_account) has built‑in retry logic (defaults to ~10 attempts), so tests should not add separate retry loops around compressed fetches unless there’s a specific need.
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
📚 Learning: 2025-12-03T09:36:01.527Z
Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 639
File: magicblock-chainlink/src/remote_account_provider/mod.rs:1350-1353
Timestamp: 2025-12-03T09:36:01.527Z
Learning: Repo: magicblock-labs/magicblock-validator
File: magicblock-chainlink/src/remote_account_provider/mod.rs
Context: consolidate_fetched_remote_accounts
Learning: For unexpected result counts (>2), the project prefers logging an error and returning an empty Vec over panicking; acceptable during development per maintainer (Dodecahedr0x).

Applied to files:

  • magicblock-core/src/link/accounts.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-core/src/link/accounts.rs
📚 Learning: 2025-10-21T10:34:59.140Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-accounts-db/src/lib.rs:63-72
Timestamp: 2025-10-21T10:34:59.140Z
Learning: In magicblock-validator, the AccountsDb "stop-the-world" synchronizer is managed at the processor/executor level, not at the AccountsDb API level. Transaction executors in magicblock-processor hold a read lock (sync.read()) for the duration of each slot and release it only at slot boundaries, ensuring all account writes happen under the read lock. Snapshot operations acquire a write lock, blocking until all executors release their read locks. This pattern ensures mutual exclusion between writes and snapshots without requiring read guards in AccountsDb write APIs.

Applied to files:

  • magicblock-core/src/link/accounts.rs
📚 Learning: 2025-12-03T09:33:48.707Z
Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 639
File: test-integration/test-committor-service/tests/test_ix_commit_local.rs:867-881
Timestamp: 2025-12-03T09:33:48.707Z
Learning: Repo: magicblock-labs/magicblock-validator PR: 639
Context: test-integration/test-committor-service/tests/test_ix_commit_local.rs (ix_commit_local)
Learning: The PhotonIndexer used for compressed account fetches (get_compressed_account) has built‑in retry logic (defaults to ~10 attempts), so tests should not add separate retry loops around compressed fetches unless there’s a specific need.

Applied to files:

  • magicblock-core/src/link/accounts.rs
📚 Learning: 2025-11-19T09:34:37.917Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: test-integration/test-chainlink/tests/ix_remote_account_provider.rs:62-63
Timestamp: 2025-11-19T09:34:37.917Z
Learning: In test-integration/test-chainlink/tests/ix_remote_account_provider.rs and similar test files, the `_fwd_rx` receiver returned by `init_remote_account_provider()` is intentionally kept alive (but unused) to prevent "receiver dropped" errors on the sender side. The pattern `let (remote_account_provider, _fwd_rx) = init_remote_account_provider().await;` should NOT be changed to `let (remote_account_provider, _) = ...` because dropping the receiver would cause send() operations to fail.

Applied to files:

  • magicblock-core/src/link/accounts.rs
🧬 Code graph analysis (1)
magicblock-core/src/link/accounts.rs (1)
magicblock-accounts-db/src/lib.rs (1)
  • reader (207-213)
🔇 Additional comments (3)
magicblock-core/src/link/accounts.rs (3)

90-94: LGTM: Fast path documentation improvement.

The comment clarification helps readers understand that this is the "fast path" where no concurrent write was detected during the optimistic read.


96-103: LGTM: Slow path setup with clear documentation.

The updated comment accurately clarifies that the retry loop only applies to borrowed accounts. The mut lock binding on line 100 is correctly prepared for the relock() calls in the loop.


105-113: LGTM: Critical livelock fix with fresh snapshots per iteration.

This correctly fixes the livelock by ensuring each retry iteration operates on fresh state:

  1. Line 106 (lock.relock()): Refreshes the lock snapshot, preventing stale lock comparisons
  2. Line 107 (borrowed.reinit()): Obtains a fresh account view synchronized with the current buffer state
  3. Lines 109-111: Checks the refreshed lock and exits immediately when no concurrent write is detected

The fix prevents the infinite loop that occurred when the backing buffer was reinitialized by ensuring each iteration compares against the current lock state, not a stale snapshot. The logic is sound and well-documented.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but @bmuddha should do final approval

Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@GabrielePicco GabrielePicco merged commit 17c601b into master Dec 5, 2025
18 of 19 checks passed
@GabrielePicco GabrielePicco deleted the picco/fix-per-account-livelock branch December 5, 2025 10:57
thlorenz added a commit that referenced this pull request Dec 9, 2025
* master:
  feat: add program subscription support (#703)
  feat: add a way to remove confined accounts on start (#726)
  chore: remove unused dependencies in the workspace  (#732)
  fix: prevent removal of delegated/undelegating accounts on eviction (#724)
  fix: Fix per-account livelock (#728)
  fix: properly prune program cache upon corruption (#730)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants