Include job_id in order clause when fetching scheduled jobs to dispatch and when dispatching #164

rosa · 2024-03-05T18:50:57Z

This was missing and makes the locking prone to deadlocks when we have a bunch of jobs scheduled at the same time with the same priority, since we'd have a non-deterministic ordering.

Besides, on the DELETE query, force ORDER BY job_id even if we don't care about that. It turns out, under certain circumstances, when the scheduled_executions table is too small, MySQL might choose not to use any indexes for the DELETE by job_id. This means it might lock other rows besides those it's going to delete. Then, when running more than one dispatcher doing this, we might end up with a deadlock because one dispatcher is deleting and locking some rows that is not going to delete, and the other is doing the same, and both are waiting for the other's lock.

This deadlock was happening consistently in CI, for MySQL only, but I didn't manage to reproduce it locally, and it has never happened in production for us. Using an INDEX hint in the DELETE query, to ensure the index on job_id was used, also avoided the deadlock.

…ch and when dispatching This was missing and makes the locking prone to deadlocks when we have a bunch of jobs scheduled at the same time with the same priority, since we'd have a non-deterministic ordering. Besides, on the DELETE query, force a order by job_id even if we don't care about that. It turns out, under certain circumstances, when the scheduled_executions table is too small, MySQL might choose not to use any indexes for the DELETE by job_id. This means it might lock other rows besides those it's going to delete. Then, when running more than one dispatcher doing this, we might end up with a deadlock because one dispatcher is deleting and locking some rows that's not deleting, and the other is doing the same, and both are waiting for the other's lock. This deadlock was happening consistently in CI, for MySQL only, but I didn't manage to reproduce it locally, and it has never happened in production for us. Using an INDEX hint in the DELETE query, to ensure the index on job_id was used, also avoided the deadlock.

rosa merged commit 28f9e05 into main Mar 5, 2024
6 checks passed

rosa deleted the fix-deadlock-on-scheduled-executions branch March 5, 2024 18:51

rosa mentioned this pull request Mar 5, 2024

Some jobs failing due to ActiveRecord::Deadlocked when trying to create a ScheduledExecution #162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include job_id in order clause when fetching scheduled jobs to dispatch and when dispatching #164

Include job_id in order clause when fetching scheduled jobs to dispatch and when dispatching #164

rosa commented Mar 5, 2024

Include job_id in order clause when fetching scheduled jobs to dispatch and when dispatching #164

Include job_id in order clause when fetching scheduled jobs to dispatch and when dispatching #164

Conversation

rosa commented Mar 5, 2024