fix: make db thread safe #680

Dodecahedr0x · 2025-11-24T15:22:34Z

TaskSchedulerService has unsafe impl Sync for TaskSchedulerService {} despite SchedulerDatabase being a wrapper around rusqlite::Connection, which is not thread-safe. This is not yet a critical issue because it is only used by a single thread but could become a problem.

This solves it by wrapping the database with a RwLock

Summary by CodeRabbit

Refactor
- Database access converted to async and made concurrency-safe; scheduler service APIs converted to async and now await DB operations.
Bug Fixes
- More reliable persistence of execution updates and improved error/recovery propagation during scheduling and cancellation.
Tests
- Integration and unit tests updated to drive async DB calls via a runtime.
Chores
- Startup sequence now awaits scheduler initialization.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-11-24T15:22:46Z

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

coderabbitai · 2025-11-24T15:22:56Z

📝 Walkthrough

Walkthrough

SchedulerDatabase.conn was changed from a synchronous Connection to tokio::sync::Mutex<Connection> and is initialized with Mutex::new(conn). Nearly all DB methods were converted from sync to async and now acquire the mutex via .lock().await before preparing/executing SQL. TaskScheduler service methods (start, process_request, process_cancel_request, register_task, unregister_task, execute paths) were updated to call and await the new async DB APIs. Unit and integration tests were adjusted to drive async DB calls (some via tokio::runtime::Runtime::block_on).

Possibly related PRs

fix: use isolated runtimes for critical components #682: Modifies magicblock-task-scheduler/src/service.rs and changes scheduler start/spawn semantics and async signatures, impacting the same service start/run flow.
feat: replace task context with thread-local storage #614: Converts service and DB interfaces to async (start, register/unregister, process_*), matching the async API changes in this PR.

Suggested reviewers

thlorenz
GabrielePicco
taco-paco

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/thread-safe-crank

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c44beaa and 4d4ab44.

📒 Files selected for processing (9)

magicblock-api/src/magic_validator.rs (1 hunks)
magicblock-task-scheduler/src/db.rs (9 hunks)
magicblock-task-scheduler/src/service.rs (8 hunks)
magicblock-task-scheduler/tests/service.rs (3 hunks)
test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs (5 hunks)
test-integration/test-task-scheduler/tests/test_reschedule_task.rs (5 hunks)
test-integration/test-task-scheduler/tests/test_schedule_error.rs (6 hunks)
test-integration/test-task-scheduler/tests/test_schedule_task.rs (5 hunks)
test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:110-122
Timestamp: 2025-11-04T10:53:50.922Z
Learning: In magicblock-processor, the TransactionScheduler runs in a single, dedicated thread and will always remain single-threaded. The `next_transaction_id()` function in scheduler/locks.rs uses `unsafe static mut` which is safe given this architectural guarantee.

Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 614
File: magicblock-task-scheduler/src/db.rs:26-0
Timestamp: 2025-11-12T09:46:27.553Z
Learning: In magicblock-task-scheduler, task parameter validation (including ensuring iterations > 0 and enforcing minimum execution intervals) is performed in the Magic program (on-chain) before ScheduleTaskRequest instances reach the scheduler service. The From<&ScheduleTaskRequest> conversion in db.rs does not need additional validation because inputs are already validated at the program level.

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-accounts-db/src/lib.rs:63-72
Timestamp: 2025-10-21T10:34:59.140Z
Learning: In magicblock-validator, the AccountsDb "stop-the-world" synchronizer is managed at the processor/executor level, not at the AccountsDb API level. Transaction executors in magicblock-processor hold a read lock (sync.read()) for the duration of each slot and release it only at slot boundaries, ensuring all account writes happen under the read lock. Snapshot operations acquire a write lock, blocking until all executors release their read locks. This pattern ensures mutual exclusion between writes and snapshots without requiring read guards in AccountsDb write APIs.

📚 Learning: 2025-11-07T13:09:52.253Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.

Applied to files:

test-integration/test-task-scheduler/tests/test_schedule_task.rs
test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs
magicblock-task-scheduler/tests/service.rs
magicblock-api/src/magic_validator.rs
test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs
test-integration/test-task-scheduler/tests/test_schedule_error.rs
test-integration/test-task-scheduler/tests/test_reschedule_task.rs

📚 Learning: 2025-11-12T09:46:27.553Z

Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 614
File: magicblock-task-scheduler/src/db.rs:26-0
Timestamp: 2025-11-12T09:46:27.553Z
Learning: In magicblock-task-scheduler, task parameter validation (including ensuring iterations > 0 and enforcing minimum execution intervals) is performed in the Magic program (on-chain) before ScheduleTaskRequest instances reach the scheduler service. The From<&ScheduleTaskRequest> conversion in db.rs does not need additional validation because inputs are already validated at the program level.

Applied to files:

test-integration/test-task-scheduler/tests/test_schedule_task.rs
test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs
magicblock-api/src/magic_validator.rs
test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs
test-integration/test-task-scheduler/tests/test_schedule_error.rs
test-integration/test-task-scheduler/tests/test_reschedule_task.rs
magicblock-task-scheduler/src/db.rs
magicblock-task-scheduler/src/service.rs

📚 Learning: 2025-11-20T17:25:23.444Z

Learnt from: JMirval
Repo: magicblock-labs/magicblock-validator PR: 656
File: programs/guinea/src/lib.rs:33-33
Timestamp: 2025-11-20T17:25:23.444Z
Learning: In the magicblock-validator codebase, task IDs (task_id) are i64 values that can be negative. The task scheduler is designed to handle any i64 value including negative task IDs. Task IDs are randomly generated using rand::random() which produces values across the full i64 range. No validation is needed to restrict task IDs to positive values.

Applied to files:

test-integration/test-task-scheduler/tests/test_schedule_task.rs
test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs
magicblock-api/src/magic_validator.rs
test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs
test-integration/test-task-scheduler/tests/test_schedule_error.rs
test-integration/test-task-scheduler/tests/test_reschedule_task.rs

📚 Learning: 2025-11-04T10:53:50.922Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:110-122
Timestamp: 2025-11-04T10:53:50.922Z
Learning: In magicblock-processor, the TransactionScheduler runs in a single, dedicated thread and will always remain single-threaded. The `next_transaction_id()` function in scheduler/locks.rs uses `unsafe static mut` which is safe given this architectural guarantee.

Applied to files:

magicblock-task-scheduler/src/db.rs

🧬 Code graph analysis (6)

test-integration/test-task-scheduler/tests/test_schedule_task.rs (2)

magicblock-task-scheduler/src/db.rs (5)

new (67-108)

get_failed_schedulings (300-319)

get_failed_tasks (321-340)

get_task_ids (288-298)

get_task (206-244)

magicblock-task-scheduler/src/service.rs (1)

new (67-100)

test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs (1)

magicblock-task-scheduler/src/db.rs (4)

get_failed_schedulings (300-319)

get_failed_tasks (321-340)

get_task_ids (288-298)

get_task (206-244)

test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs (1)

magicblock-task-scheduler/src/db.rs (4)

get_failed_schedulings (300-319)

get_failed_tasks (321-340)

get_task_ids (288-298)

get_task (206-244)

test-integration/test-task-scheduler/tests/test_schedule_error.rs (1)

magicblock-task-scheduler/src/db.rs (5)

new (67-108)

get_failed_schedulings (300-319)

get_failed_tasks (321-340)

get_task_ids (288-298)

get_task (206-244)

test-integration/test-task-scheduler/tests/test_reschedule_task.rs (1)

magicblock-task-scheduler/src/db.rs (4)

get_failed_schedulings (300-319)

get_failed_tasks (321-340)

get_task_ids (288-298)

get_task (206-244)

magicblock-task-scheduler/src/service.rs (3)

magicblock-api/src/magic_validator.rs (1)

start (651-727)

magicblock-magic-program-api/src/args.rs (1)

id (163-168)

magicblock-task-scheduler/src/db.rs (2)

remove_task (194-204)

insert_failed_task (169-180)

🔇 Additional comments (12)

magicblock-api/src/magic_validator.rs (1)

702-702: LGTM! Async integration correctly implemented.

The change from task_scheduler.start()? to task_scheduler.start().await? correctly integrates with the async conversion of TaskSchedulerService::start(). The await is properly placed inside the async block spawned on line 701.

test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs (1)

15-15: LGTM! Appropriate async handling pattern for integration tests.

The use of Runtime::new() combined with runtime.block_on() is the correct approach for calling async database methods from synchronous test code. The runtime is created once and reused for multiple operations, which is efficient.

Also applies to: 107-110, 119-120, 129-129, 138-138

magicblock-task-scheduler/tests/service.rs (1)

28-58: LGTM! Clean async test setup.

The conversion of setup() to an async function with proper await on .start() (lines 54-55) is correctly implemented. Using #[tokio::test] and native async/await is the cleanest approach for these unit tests.

magicblock-task-scheduler/src/service.rs (6)

102-124: LGTM! Async conversion correctly implemented.

The conversion of start() to async and awaiting get_tasks() is correctly implemented. The task rescheduling logic is preserved, and the method properly returns a JoinHandle for the spawned async task.

126-168: LGTM with observation on error recovery.

The async conversion of process_request() correctly awaits both register_task() and process_cancel_request(), with proper error handling via insert_failed_scheduling(). The use of ProcessingOutcome::Recoverable preserves the existing error recovery semantics.

170-195: LGTM! Cancel request processing correctly converted to async.

The async conversion properly awaits get_task() and unregister_task(), with correct authority checking logic preserved.

228-251: LGTM! Task registration correctly converted to async.

The async conversion properly awaits get_task() and insert_task(), with correct authority validation logic preserved.

253-261: LGTM! Task unregistration correctly converted to async.

The conversion to async with awaited remove_task() is correct. Note that the method signature maintains &self (not &mut self), which is correct since the Mutex provides interior mutability.

263-296: LGTM! Event loop correctly handles async operations.

The run() method properly awaits execute_task() and process_request(), with correct error handling. The use of select! for multiplexing task queue, scheduled tasks, and cancellation token is appropriate. Error paths correctly await database operations (lines 273-274).

magicblock-task-scheduler/src/db.rs (3)

206-340: Lock holding pattern is correct for rusqlite's borrow requirements.

The read operations correctly hold the lock for the entire query duration, which is necessary because rusqlite::Statement and the rows iterator borrow from the Connection. This pattern is appropriate given rusqlite's synchronous API.

However, the same blocking concern from the previous comment applies: these synchronous operations block async runtime threads while holding the mutex. The spawn_blocking pattern would also benefit these read operations.

67-108: Let me check the Cargo.toml to see how rusqlite is configured:
<function_calls>

#!/bin/bash

Find and read Cargo.toml files in the task scheduler area

fd 'Cargo.toml' | xargs -I {} bash -c "echo '=== {} ===' && cat {}" | head -200

</function_calls>

Now let me check the actual methods in db.rs to understand the lock usage patterns:
<function_calls>

#!/bin/bash

View more of db.rs to understand all method patterns

cat -n magicblock-task-scheduler/src/db.rs | head -250

</function_calls>

110-204: Based on my thorough verification, I can now provide the rewritten review comment.

This review comment is based on a misunderstanding of the architecture.

The code is safe as designed. TaskSchedulerService is explicitly documented with a safety comment stating it "is moved into a single Tokio task in start() and never cloned" and "runs exclusively on that task's thread." This single-threaded guarantee (confirmed by tokio::spawn(self.run()) which moves self once) eliminates the blocking concern that spawn_blocking is meant to solve.

While holding a tokio::sync::Mutex across synchronous SQLite calls would be problematic in multi-threaded async code, it is not a concern here because:

Only one task ever holds the Mutex - there are no other tasks competing for it

Blocking the single task does not starve other work since no other tasks exist

Adding spawn_blocking would introduce unnecessary thread pool overhead without benefit

The Mutex wrapper exists for safety (defensive programming), not for actual concurrency control. This pattern is consistent with the intentional single-threaded design of the scheduler.

Likely an incorrect or invalid review comment.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bmuddha

We should avoid locking in the code which runs inside of an async context, I understand that it's highly unlikely that this lock might result in runtime starvation, but considering that there are scores of locks employed throughout the codebase, it's nigh impossible to analyze the interplay of lock acquisition/release flow. In the future, we should replace all the Lock uses with lock free alternatives.

For now I'd suggest to synchronize via non-blocking primitives like semaphores, in which case unsafe impl is justified always even if connection is used across multiple threads.

struct SchedulerDatabase {
    connection: Connection,
    semaphore: Arc<Semaphore>,
}

impl SchedulerDatabase {
    fn new() -> Self {
        // ...
        let semaphore = Arc::new(Semaphore::new(1));
        let connection = Connection;
        Self {
            connection,
            semaphore,
        }
    }

    async fn connection(&self) -> ConnectionGuard<'_> {
        let permit = self
            .semaphore
            .acquire()
            .await
            .expect("semaphore cannot be closed");
        ConnectionGuard {
            permit,
            connection: &self.connection,
        }
    }

    async fn do_something(&self) {
        let con = self.connection();
        // Do some CRUD with connection
        // ...
        // con goes out of scope, releasing semaphore
        // permit and the next request can proceed
    }
}

struct ConnectionGuard<'db> {
    permit: SemaphorePermit<'db>,
    connection: &'db Connection,
}

impl Deref for ConnectionGuard<'_> {
    type Target = Connection;
    fn deref(&self) -> &Self::Target {
        &self.connection
    }
}

Dodecahedr0x · 2025-11-25T11:33:04Z

Addressed in 5d10cf8

bmuddha

LGTM

GabrielePicco

LGTM!

* master: fix: don't unbork for same delegation slot as remote slot (#702) feat: use simplified configuration management (#685) Report Intent's patched errors (#667) fix: replace cargo-expand with syn-based verification (#611) fix: make db thread safe (#680) fix: reset program cache upon error (#696) chore: use smaller runners (#695) feat: cranked commits (#656) fix: refresh stuck undelegating accounts that are closed on-chain (#691) Fix: abort truncation if validator exited (#689)

feat: make db thread safe

c4745ca

Merge branch 'master' into fix/thread-safe-crank

f75cbb4

Dodecahedr0x requested review from GabrielePicco, bmuddha, taco-paco and thlorenz November 24, 2025 16:33

bmuddha requested changes Nov 24, 2025

View reviewed changes

Dodecahedr0x added 3 commits November 25, 2025 10:45

feat: non-blocking mutex

5d10cf8

fix: await

1093bcb

fix: tests

c44beaa

Dodecahedr0x requested a review from bmuddha November 25, 2025 11:33

bmuddha approved these changes Nov 25, 2025

View reviewed changes

Merge branch 'master' into fix/thread-safe-crank

4d4ab44

GabrielePicco approved these changes Nov 30, 2025

View reviewed changes

Merge branch 'master' into fix/thread-safe-crank

079f635

Dodecahedr0x merged commit 4c96cec into master Nov 30, 2025
18 checks passed

Dodecahedr0x deleted the fix/thread-safe-crank branch November 30, 2025 15:24

Dodecahedr0x added a commit that referenced this pull request Dec 2, 2025

fix: make db thread safe (#680)

ccc97e3

This was referenced Dec 3, 2025

feat: configurable minimum crank frequency #717

Open

feat: replace task executor with rpc client #723

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make db thread safe #680

fix: make db thread safe #680

Uh oh!

Dodecahedr0x commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Walkthrough

Possibly related PRs

Suggested reviewers

Find and read Cargo.toml files in the task scheduler area

View more of db.rs to understand all method patterns

Uh oh!

bmuddha left a comment

Uh oh!

Dodecahedr0x commented Nov 25, 2025

Uh oh!

bmuddha left a comment

Uh oh!

GabrielePicco left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: make db thread safe #680

fix: make db thread safe #680

Uh oh!

Conversation

Dodecahedr0x commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manual Deploy Available

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Find and read Cargo.toml files in the task scheduler area

View more of db.rs to understand all method patterns

Uh oh!

bmuddha left a comment

Choose a reason for hiding this comment

Uh oh!

Dodecahedr0x commented Nov 25, 2025

Uh oh!

bmuddha left a comment

Choose a reason for hiding this comment

Uh oh!

GabrielePicco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dodecahedr0x commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

github-actions bot commented Nov 24, 2025 •

edited

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading