Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acquiring locks fails - Deadlock found when trying to get lock #28

Closed
srchulo opened this issue Jul 5, 2020 · 7 comments
Closed

acquiring locks fails - Deadlock found when trying to get lock #28

srchulo opened this issue Jul 5, 2020 · 7 comments

Comments

@srchulo
Copy link

srchulo commented Jul 5, 2020

I get this error frequently from my jobs when trying to acquire a lock:

DBD::MariaDB::st execute failed: Deadlock found when trying to get lock; try restarting transaction at /home/admin/local/lib/perl5/Minion/Backend/mysql.pm line 403.

I wonder if there's someway to prevent this or retry when this occurs?

https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html

@preaction
Copy link
Owner

Hey, thanks for the report! This doesn't seem like its referring to the Minion lock, but the underlying MySQL lock. But the backend doesn't explicitly lock any tables, so I'm going to need some information about how it occurs.

  • What kind of thing are you doing when this happens? Could you make a minimum reproduction case?
  • How many workers do you have? How many enqueued jobs? How many pending jobs trying to obtain a lock on the same Minion resource?
  • Can your environment use DBD::mysql? If so, does the same thing happen?
  • Would you be able to run your workers under a module like Carp::Always or Devel::Confess so it creates a full stack trace from the error?

The only thing I can think of right now is to try to make sure that all the things that do work on different tables do that work in the same order, but that doesn't really make sense. Maybe I need to be more aggressive about commits and/or autocommit?

@srchulo
Copy link
Author

srchulo commented Jul 8, 2020

I don't think it's referring to a literal lock command, I think it has to do with some kind of deadlock related to the minion_lock function:

CREATE FUNCTION minion_lock( $1 VARCHAR(191), $2 INTEGER, $3 INTEGER) RETURNS BOOL

What kind of thing are you doing when this happens? Could you make a minimum reproduction case?

I have jobs running that call guard to acquire a lock. These jobs are not calling for the same lock (the lock name is different). I will work on making a minimum reproduction case. I suspect that putting a sleep somewhere in the minion_lock function and then calling for two locks may help.

How many workers do you have? How many enqueued jobs? How many pending jobs trying to obtain a lock on the same Minion resource?

I'm actually just using Minion locks for distributed locking, but my jobs are jobs running on AWS Batch. However, I may have 10-15 jobs running at once, each trying to acquire a lock, but with different names (the same type of job only runs once, so no simultaneous jobs ask for the same lock).

Can your environment use DBD::mysql? If so, does the same thing happen?

I can use DBD::mysql. I am going to switch to this and see if the problem persists (I only see it happen a few times a day, so I may not really be sure if this helps until I make my repro case).

One thing that I should note, is for several months before this I was using the exact same code, but with Minion::Backend::Pg and didn't see this problem (not sure how similar the code is between that and Minion::Backend::mysql, and obviously mysql and Pg are not the same!) :)

Would you be able to run your workers under a module like Carp::Always or Devel::Confess so it creates a full stack trace from the error?

I will see if I can get my jobs to run using one of those modules.

@srchulo
Copy link
Author

srchulo commented Jul 8, 2020

Can now confirm that I get the same error using DBD::mysql:

DBD::mysql::st execute failed: Deadlock found when trying to get lock; try restarting transaction at /home/admin/local/lib/perl5/Minion/Backend/mysql.pm line 403.

Still working on repro case

@preaction
Copy link
Owner

Thanks! Also, which version / fork of mysqld are you running? I can add that to the travis tests to try to reproduce.

@srchulo
Copy link
Author

srchulo commented Jul 10, 2020

It's 5.7, however, it's Amazon Aurora's version, which could make reproduction hard on Travis (and technically, could be causing the issue, although I would think not).

The exact version is 5.7.mysql_aurora.2.07.2

@preaction
Copy link
Owner

We started hitting this same problem at work, so I was able to get the InnoDB locking information. It appears to be the job.id = @dequeued_job_id := job.id hack that is causing the deadlock: This requires an exclusive lock on the index for the job.id column, which cannot be done while a shared lock was in place (which appeared to be created by the NOT EXISTS (...) subquery).

Doing a proper SELECT ... FOR UPDATE should fix this by removing the need for the exclusive lock, which should prevent any deadlocks.

preaction added a commit that referenced this issue Apr 12, 2021
  [NOTE]

  This release tries to fix some reported deadlocks from job dequeuing
  by moving the dequeue process into a transaction with a `SELECT ...
  FOR UPDATE` query. Please make sure to test this release in your
  environment before moving it to production, and be prepared to roll
  back to v0.23. Please report any issues you have so they can get
  fixed!

  [Fixed]

  - Fix compatibility with Mojolicious 9.0: The tests would fail
    because the `delay` helper is no longer available.
  - Fix jobs being run before their parents. Thanks @tn-laslov for
    reporting this issue and providing the solution! [Github #34]
  - Fix slow job dequeue on very large jobs tables. Thanks @znestor
    for reporting this issue and providing the solution! [Github #33]
  - Trying to fix some reported deadlocks in job dequeueing. Thanks
    @srchulo for reporting and help debugging this issue, and Grant
    Street Group for additional debugging information. [Github #28]
  - Fix note values not accepting UTF-8 characters. Thanks @uralm1 for
    reporting this issue and providing a fix! [Github #32]
@srchulo
Copy link
Author

srchulo commented Apr 13, 2021

Awesome!! Thanks for fixing/figuring this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants