Skip to content

Conversation

@Dodecahedr0x
Copy link
Contributor

@Dodecahedr0x Dodecahedr0x commented Nov 24, 2025

TaskSchedulerService has unsafe impl Sync for TaskSchedulerService {} despite SchedulerDatabase being a wrapper around rusqlite::Connection, which is not thread-safe. This is not yet a critical issue because it is only used by a single thread but could become a problem.

This solves it by wrapping the database with a RwLock

Summary by CodeRabbit

  • Refactor
    • Database access converted to async and made concurrency-safe; scheduler service APIs converted to async and now await DB operations.
  • Bug Fixes
    • More reliable persistence of execution updates and improved error/recovery propagation during scheduling and cancellation.
  • Tests
    • Integration and unit tests updated to drive async DB calls via a runtime.
  • Chores
    • Startup sequence now awaits scheduler initialization.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

📝 Walkthrough

Walkthrough

SchedulerDatabase.conn was changed from a synchronous Connection to tokio::sync::Mutex<Connection> and is initialized with Mutex::new(conn). Nearly all DB methods were converted from sync to async and now acquire the mutex via .lock().await before preparing/executing SQL. TaskScheduler service methods (start, process_request, process_cancel_request, register_task, unregister_task, execute paths) were updated to call and await the new async DB APIs. Unit and integration tests were adjusted to drive async DB calls (some via tokio::runtime::Runtime::block_on).

Possibly related PRs

Suggested reviewers

  • thlorenz
  • GabrielePicco
  • taco-paco
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/thread-safe-crank

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c44beaa and 4d4ab44.

📒 Files selected for processing (9)
  • magicblock-api/src/magic_validator.rs (1 hunks)
  • magicblock-task-scheduler/src/db.rs (9 hunks)
  • magicblock-task-scheduler/src/service.rs (8 hunks)
  • magicblock-task-scheduler/tests/service.rs (3 hunks)
  • test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs (5 hunks)
  • test-integration/test-task-scheduler/tests/test_reschedule_task.rs (5 hunks)
  • test-integration/test-task-scheduler/tests/test_schedule_error.rs (6 hunks)
  • test-integration/test-task-scheduler/tests/test_schedule_task.rs (5 hunks)
  • test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:110-122
Timestamp: 2025-11-04T10:53:50.922Z
Learning: In magicblock-processor, the TransactionScheduler runs in a single, dedicated thread and will always remain single-threaded. The `next_transaction_id()` function in scheduler/locks.rs uses `unsafe static mut` which is safe given this architectural guarantee.
Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 614
File: magicblock-task-scheduler/src/db.rs:26-0
Timestamp: 2025-11-12T09:46:27.553Z
Learning: In magicblock-task-scheduler, task parameter validation (including ensuring iterations > 0 and enforcing minimum execution intervals) is performed in the Magic program (on-chain) before ScheduleTaskRequest instances reach the scheduler service. The From<&ScheduleTaskRequest> conversion in db.rs does not need additional validation because inputs are already validated at the program level.
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/mod.rs:217-219
Timestamp: 2025-11-04T10:48:00.070Z
Learning: In magicblock-validator, the codebase uses a pattern where types containing non-Send/non-Sync fields (like Rc<RefCell<...>>) are marked with unsafe impl Send when they are guaranteed to be confined to a single thread through careful API design and thread spawning patterns.
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-accounts-db/src/lib.rs:63-72
Timestamp: 2025-10-21T10:34:59.140Z
Learning: In magicblock-validator, the AccountsDb "stop-the-world" synchronizer is managed at the processor/executor level, not at the AccountsDb API level. Transaction executors in magicblock-processor hold a read lock (sync.read()) for the duration of each slot and release it only at slot boundaries, ensuring all account writes happen under the read lock. Snapshot operations acquire a write lock, blocking until all executors release their read locks. This pattern ensures mutual exclusion between writes and snapshots without requiring read guards in AccountsDb write APIs.
📚 Learning: 2025-11-07T13:09:52.253Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.

Applied to files:

  • test-integration/test-task-scheduler/tests/test_schedule_task.rs
  • test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs
  • magicblock-task-scheduler/tests/service.rs
  • magicblock-api/src/magic_validator.rs
  • test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs
  • test-integration/test-task-scheduler/tests/test_schedule_error.rs
  • test-integration/test-task-scheduler/tests/test_reschedule_task.rs
📚 Learning: 2025-11-12T09:46:27.553Z
Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 614
File: magicblock-task-scheduler/src/db.rs:26-0
Timestamp: 2025-11-12T09:46:27.553Z
Learning: In magicblock-task-scheduler, task parameter validation (including ensuring iterations > 0 and enforcing minimum execution intervals) is performed in the Magic program (on-chain) before ScheduleTaskRequest instances reach the scheduler service. The From<&ScheduleTaskRequest> conversion in db.rs does not need additional validation because inputs are already validated at the program level.

Applied to files:

  • test-integration/test-task-scheduler/tests/test_schedule_task.rs
  • test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs
  • magicblock-api/src/magic_validator.rs
  • test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs
  • test-integration/test-task-scheduler/tests/test_schedule_error.rs
  • test-integration/test-task-scheduler/tests/test_reschedule_task.rs
  • magicblock-task-scheduler/src/db.rs
  • magicblock-task-scheduler/src/service.rs
📚 Learning: 2025-11-20T17:25:23.444Z
Learnt from: JMirval
Repo: magicblock-labs/magicblock-validator PR: 656
File: programs/guinea/src/lib.rs:33-33
Timestamp: 2025-11-20T17:25:23.444Z
Learning: In the magicblock-validator codebase, task IDs (task_id) are i64 values that can be negative. The task scheduler is designed to handle any i64 value including negative task IDs. Task IDs are randomly generated using rand::random() which produces values across the full i64 range. No validation is needed to restrict task IDs to positive values.

Applied to files:

  • test-integration/test-task-scheduler/tests/test_schedule_task.rs
  • test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs
  • magicblock-api/src/magic_validator.rs
  • test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs
  • test-integration/test-task-scheduler/tests/test_schedule_error.rs
  • test-integration/test-task-scheduler/tests/test_reschedule_task.rs
📚 Learning: 2025-11-04T10:53:50.922Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:110-122
Timestamp: 2025-11-04T10:53:50.922Z
Learning: In magicblock-processor, the TransactionScheduler runs in a single, dedicated thread and will always remain single-threaded. The `next_transaction_id()` function in scheduler/locks.rs uses `unsafe static mut` which is safe given this architectural guarantee.

Applied to files:

  • magicblock-task-scheduler/src/db.rs
🧬 Code graph analysis (6)
test-integration/test-task-scheduler/tests/test_schedule_task.rs (2)
magicblock-task-scheduler/src/db.rs (5)
  • new (67-108)
  • get_failed_schedulings (300-319)
  • get_failed_tasks (321-340)
  • get_task_ids (288-298)
  • get_task (206-244)
magicblock-task-scheduler/src/service.rs (1)
  • new (67-100)
test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs (1)
magicblock-task-scheduler/src/db.rs (4)
  • get_failed_schedulings (300-319)
  • get_failed_tasks (321-340)
  • get_task_ids (288-298)
  • get_task (206-244)
test-integration/test-task-scheduler/tests/test_unauthorized_reschedule.rs (1)
magicblock-task-scheduler/src/db.rs (4)
  • get_failed_schedulings (300-319)
  • get_failed_tasks (321-340)
  • get_task_ids (288-298)
  • get_task (206-244)
test-integration/test-task-scheduler/tests/test_schedule_error.rs (1)
magicblock-task-scheduler/src/db.rs (5)
  • new (67-108)
  • get_failed_schedulings (300-319)
  • get_failed_tasks (321-340)
  • get_task_ids (288-298)
  • get_task (206-244)
test-integration/test-task-scheduler/tests/test_reschedule_task.rs (1)
magicblock-task-scheduler/src/db.rs (4)
  • get_failed_schedulings (300-319)
  • get_failed_tasks (321-340)
  • get_task_ids (288-298)
  • get_task (206-244)
magicblock-task-scheduler/src/service.rs (3)
magicblock-api/src/magic_validator.rs (1)
  • start (651-727)
magicblock-magic-program-api/src/args.rs (1)
  • id (163-168)
magicblock-task-scheduler/src/db.rs (2)
  • remove_task (194-204)
  • insert_failed_task (169-180)
🔇 Additional comments (12)
magicblock-api/src/magic_validator.rs (1)

702-702: LGTM! Async integration correctly implemented.

The change from task_scheduler.start()? to task_scheduler.start().await? correctly integrates with the async conversion of TaskSchedulerService::start(). The await is properly placed inside the async block spawned on line 701.

test-integration/test-task-scheduler/tests/test_cancel_ongoing_task.rs (1)

15-15: LGTM! Appropriate async handling pattern for integration tests.

The use of Runtime::new() combined with runtime.block_on() is the correct approach for calling async database methods from synchronous test code. The runtime is created once and reused for multiple operations, which is efficient.

Also applies to: 107-110, 119-120, 129-129, 138-138

magicblock-task-scheduler/tests/service.rs (1)

28-58: LGTM! Clean async test setup.

The conversion of setup() to an async function with proper await on .start() (lines 54-55) is correctly implemented. Using #[tokio::test] and native async/await is the cleanest approach for these unit tests.

magicblock-task-scheduler/src/service.rs (6)

102-124: LGTM! Async conversion correctly implemented.

The conversion of start() to async and awaiting get_tasks() is correctly implemented. The task rescheduling logic is preserved, and the method properly returns a JoinHandle for the spawned async task.


126-168: LGTM with observation on error recovery.

The async conversion of process_request() correctly awaits both register_task() and process_cancel_request(), with proper error handling via insert_failed_scheduling(). The use of ProcessingOutcome::Recoverable preserves the existing error recovery semantics.


170-195: LGTM! Cancel request processing correctly converted to async.

The async conversion properly awaits get_task() and unregister_task(), with correct authority checking logic preserved.


228-251: LGTM! Task registration correctly converted to async.

The async conversion properly awaits get_task() and insert_task(), with correct authority validation logic preserved.


253-261: LGTM! Task unregistration correctly converted to async.

The conversion to async with awaited remove_task() is correct. Note that the method signature maintains &self (not &mut self), which is correct since the Mutex provides interior mutability.


263-296: LGTM! Event loop correctly handles async operations.

The run() method properly awaits execute_task() and process_request(), with correct error handling. The use of select! for multiplexing task queue, scheduled tasks, and cancellation token is appropriate. Error paths correctly await database operations (lines 273-274).

magicblock-task-scheduler/src/db.rs (3)

206-340: Lock holding pattern is correct for rusqlite's borrow requirements.

The read operations correctly hold the lock for the entire query duration, which is necessary because rusqlite::Statement and the rows iterator borrow from the Connection. This pattern is appropriate given rusqlite's synchronous API.

However, the same blocking concern from the previous comment applies: these synchronous operations block async runtime threads while holding the mutex. The spawn_blocking pattern would also benefit these read operations.


67-108: Let me check the Cargo.toml to see how rusqlite is configured:
<function_calls>

#!/bin/bash

Find and read Cargo.toml files in the task scheduler area

fd 'Cargo.toml' | xargs -I {} bash -c "echo '=== {} ===' && cat {}" | head -200


</function_calls>

Now let me check the actual methods in db.rs to understand the lock usage patterns:
<function_calls>

#!/bin/bash

View more of db.rs to understand all method patterns

cat -n magicblock-task-scheduler/src/db.rs | head -250


</function_calls>


110-204: Based on my thorough verification, I can now provide the rewritten review comment.


This review comment is based on a misunderstanding of the architecture.

The code is safe as designed. TaskSchedulerService is explicitly documented with a safety comment stating it "is moved into a single Tokio task in start() and never cloned" and "runs exclusively on that task's thread." This single-threaded guarantee (confirmed by tokio::spawn(self.run()) which moves self once) eliminates the blocking concern that spawn_blocking is meant to solve.

While holding a tokio::sync::Mutex across synchronous SQLite calls would be problematic in multi-threaded async code, it is not a concern here because:

  1. Only one task ever holds the Mutex - there are no other tasks competing for it
  2. Blocking the single task does not starve other work since no other tasks exist
  3. Adding spawn_blocking would introduce unnecessary thread pool overhead without benefit

The Mutex wrapper exists for safety (defensive programming), not for actual concurrency control. This pattern is consistent with the intentional single-threaded design of the scheduler.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid locking in the code which runs inside of an async context, I understand that it's highly unlikely that this lock might result in runtime starvation, but considering that there are scores of locks employed throughout the codebase, it's nigh impossible to analyze the interplay of lock acquisition/release flow. In the future, we should replace all the Lock uses with lock free alternatives.

For now I'd suggest to synchronize via non-blocking primitives like semaphores, in which case unsafe impl is justified always even if connection is used across multiple threads.

struct SchedulerDatabase {
    connection: Connection,
    semaphore: Arc<Semaphore>,
}

impl SchedulerDatabase {
    fn new() -> Self {
        // ...
        let semaphore = Arc::new(Semaphore::new(1));
        let connection = Connection;
        Self {
            connection,
            semaphore,
        }
    }

    async fn connection(&self) -> ConnectionGuard<'_> {
        let permit = self
            .semaphore
            .acquire()
            .await
            .expect("semaphore cannot be closed");
        ConnectionGuard {
            permit,
            connection: &self.connection,
        }
    }

    async fn do_something(&self) {
        let con = self.connection();
        // Do some CRUD with connection
        // ...
        // con goes out of scope, releasing semaphore
        // permit and the next request can proceed
    }
}

struct ConnectionGuard<'db> {
    permit: SemaphorePermit<'db>,
    connection: &'db Connection,
}

impl Deref for ConnectionGuard<'_> {
    type Target = Connection;
    fn deref(&self) -> &Self::Target {
        &self.connection
    }
}

@Dodecahedr0x
Copy link
Contributor Author

Addressed in 5d10cf8

@Dodecahedr0x Dodecahedr0x requested a review from bmuddha November 25, 2025 11:33
Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@GabrielePicco GabrielePicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Dodecahedr0x Dodecahedr0x merged commit 4c96cec into master Nov 30, 2025
18 checks passed
@Dodecahedr0x Dodecahedr0x deleted the fix/thread-safe-crank branch November 30, 2025 15:24
thlorenz added a commit that referenced this pull request Dec 2, 2025
* master:
  fix: don't unbork for same delegation slot as remote slot (#702)
  feat: use simplified configuration management (#685)
  Report Intent's patched errors (#667)
  fix: replace cargo-expand with syn-based verification (#611)
  fix: make db thread safe (#680)
  fix: reset program cache upon error (#696)
  chore: use smaller runners (#695)
  feat: cranked commits (#656)
  fix: refresh stuck undelegating accounts that are closed on-chain (#691)
  Fix: abort truncation if validator exited (#689)
Dodecahedr0x added a commit that referenced this pull request Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants