Database deadlock #200

xiaoronglv · 2015-09-17T07:54:01Z

Hi , I encounter a problem about rpush, and try to fix it. But I can't find the root cause of

Rpush version: 2.4.0
rpush run in two host.

*** (1) TRANSACTION:
mysql tables in use 1, locked 1
LOCK WAIT 22 lock struct(s), heap size 3112, 240 row lock(s)
query id 2154897537 rpushhost1 Creating sort index
SELECT `rpush_notifications`.* FROM `rpush_notifications` WHERE (processing = 0 AND delivered = 0 AND failed = 0 AND (deliver_after IS NULL OR deliver_after < '2015-09-08 06:23:19')) ORDER BY created_at ASC LIMIT 100 FOR UPDATE

*** (2) TRANSACTION:
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
query id 2154897539 rpushhost2 updating
UPDATE `rpush_notifications` SET processing = 0, delivered = 1, delivered_at = '2015-09-08 06:23:19' WHERE `rpush_notifications`.`id` IN (47324, 47325, 47326)
*** WE ROLL BACK TRANSACTION (2)

The text was updated successfully, but these errors were encountered:

modx-space · 2015-10-12T11:13:38Z

we also met this similar issue, here is the log:

LATEST DETECTED DEADLOCK
------------------------
2015-10-10 15:04:56 7f1aa365b700
*** (1) TRANSACTION:
TRANSACTION 20624196, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s), undo log entries 1
MySQL thread id 9452, OS thread handle 0x7f1aa193e700, query id 1025858 xxxx update
INSERT INTO `rpush_notifications` (`alert`, `app_id`, `badge`, `created_at`, `data`, `device_token`, `push_setting_id`, `registration_ids`, `sound`, `type`, `updated_at`, `url_args`) VALUES ('xx',...)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 154827 page no 5 n bits 120 index `index_rpush_notifications_multi` of table `mp_prod`.`rpush_notifications` trx id 20624196 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 7 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 80; asc  ;;
 1: len 1; hex 81; asc  ;;
 2: len 4; hex 8000f7c1; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 20624195, ACTIVE 2 sec fetching rows
mysql tables in use 1, locked 1
8 lock struct(s), heap size 1248, 101 row lock(s)
MySQL thread id 14610, OS thread handle 0x7f1aa365b700, query id 1025860 xxxx Searching rows for update
UPDATE `rpush_notifications` SET processing = 1 WHERE `rpush_notifications`.`id` IN (63459, 63461, 63463, 63465, 63467, 63489, 63491, 63493, 63495, 63497, 63499, 63501, 63503, 63505, 63507, 63509, 63469, 63471, 63473, 63475, 63477, 63479, 63481, 63483, 63485, 63487)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 154827 page no 5 n bits 120 index `index_rpush_notifications_multi` of table `mp_prod`.`rpush_notifications` trx id 20624195 lock_mode X locks gap before rec
Record lock, heap no 7 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 80; asc  ;;
 1: len 1; hex 81; asc  ;;
 2: len 4; hex 8000f7c1; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 154827 page no 7 n bits 96 index `PRIMARY` of table `mp_prod`.`rpush_notifications` trx id 20624195 lock_mode X waiting
Record lock, heap no 29 PHYSICAL RECORD: n_fields 33; compact format; info bits 0
 0: len 4; hex 8000f817; asc     ;;
 1: len 6; hex 0000013ab344; asc    : D;;
 2: len 7; hex f4000001380110; asc     8  ;;
 3: len 4; hex 80000009; asc     ;;
 4: len 30; hex 666537316362326362613035616234373734336538336439653532613139; asc fe71cb2cba05ab47743e83d9e52a19; (total 64 bytes);
 5: len 7; hex 64656661756c74; asc default;;
 6: len 30; hex 4361726c61204772616e7420616464656420746f7069633a206e6f746573; asc Carla Grant added topic: notes; (total 40 bytes);
 7: len 30; hex 7b2270223a22632c536170706869726530342c636772616e7431222c2274; asc {"p":"c,Sapphire04,cgrant1","t; (total 99 bytes);
 8: len 4; hex 80015180; asc   Q ;;
 9: len 1; hex 80; asc  ;;
 10: SQL NULL;
 11: len 1; hex 80; asc  ;;
 12: SQL NULL;
 13: SQL NULL;
 14: SQL NULL;
 15: SQL NULL;
 16: len 5; hex 9997556136; asc   Ua6;;
 17: len 5; hex 9997556136; asc   Ua6;;
 18: len 1; hex 80; asc  ;;
 19: len 30; hex 52707573683a3a436c69656e743a3a4163746976655265636f72643a3a41; asc Rpush::Client::ActiveRecord::A; (total 47 bytes);
 20: SQL NULL;
 21: len 1; hex 80; asc  ;;
 22: SQL NULL;
 23: len 4; hex 80000001; asc     ;;
 24: len 4; hex 80000000; asc     ;;
 25: SQL NULL;
 26: SQL NULL;
 27: SQL NULL;
 28: len 1; hex 80; asc  ;;
 29: SQL NULL;
 30: SQL NULL;
 31: SQL NULL;
 32: len 4; hex 8000235a; asc   #Z;;

*** WE ROLL BACK TRANSACTION (1)

But we found that we had two rpush processes running on the host during the issue happened, so I am wondering whether rpush supports multiple instances running in the same environment?

amaierhofer · 2015-11-11T16:25:33Z

We encountered similar Deadlock issues, which appear more frequently for a higher number of notifications, larger batch_size e.g. 2000 and short poll_interval e.g. 0.5. We are using mysql2 0.3.20.

Occasionally the recovering mechanism also fails and the daemon stops delivering messages all together. Sometimes the queue (as queried with rpush status) fills up, sometimes even the feeder thread seems to hang. In both cases no additional information is available in the log files.

It appears that introducing additional sleep time between statements in https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record/reconnectable.rb#L55 mitigates the problem but it still occurs occasionally.

Any ideas on who to debug this further? What settings work best for achieving high throughput for APNS on mysql without running the risk of Deadlocks?

xiaoronglv · 2015-11-17T06:04:17Z

Hi @amaierhofer

I have reproduced in my localhost and fixed this problem by refactoring index.

Conflict transactions

Transaction 1

SELECT  `rpush_notifications`.* FROM `rpush_notifications` WHERE (processing = 0 AND delivered = 0 AND failed = 0 AND (deliver_after IS NULL OR deliver_after < '2015-08-21 16:48:10')) ORDER BY created_at ASC LIMIT 100 FOR UPDATE

Transaction 2

UPDATE `rpush_notifications` SET processing = 0, delivered = 1, delivered_at = '2015-08-21 16:48:10' WHERE `rpush_notifications`.`id` IN (647770)

Root cause

The counterintuitive part of transaction1 is that, In Repeated read isolate level, Innodb engine will lock all rows which satisfy 'failed=0 and delivered=0' , it don't care the following conditions.

limit 100;
processing=0
deliver_after IS NULL OR deliver_after < '2015-08-21 17:47:32'
ORDER BY created_at ASC

These conditions are filtered in MySQL sever level, not engine level.

https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record.rb#L30

https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record.rb#L95

How to reproduce this bug in your localhost. (run it in rails c)

Process A

loop do
    ActiveRecord::Base.connection.execute("SELECT `rpush_notifications`.* FROM `rpush_notifications` WHERE (processing = 0 AND delivered = 0 AND failed = 0 AND (deliver_after IS NULL OR deliver_after < '2015-10-21 17:47:32')) ORDER BY created_at ASC LIMIT 100 for update;")
end

Process B

Rpush::Notification.where(processing: true).find_each do |ntf|
id = ntf.id
ActiveRecord::Base.connection.execute("UPDATE `rpush_notifications` SET processing = 0, delivered = 1, delivered_at = '2015-08-21 16:48:10' WHERE `rpush_notifications`.`id` IN (#{id})")
sleep 0.5
end

Solution

Modify the following index to reduce lock rows when select * from * for update.

DROP INDEX `index_rpush_notifications_multi` ON rpush_notifications;
CREATE INDEX index_rpush_notifications_multi ON rpush_notifications (delivered, failed, processing, deliver_after)

xiaoronglv · 2015-11-17T06:08:47Z

Deadlock will raise an error, then interrupt the thread 'Feeder'.

That's the reason why you rpush stop work.

amaierhofer · 2015-11-17T16:49:35Z

Many thanks @xiaoronglv for sharing your insights! I can confirm, that this solution works!

Without adjusting the index, running the code from Process A and the rpush daemon at the same time, triggered plenty of deadlocks. After making the suggested adjustments, no more deadlocks occured.

The index_rpush_notifications_multi index as created in

rpush/lib/generators/templates/rpush_2_0_0_updates.rb

Line 24 in 22f6530

    
           add_index :rpush_notifications, [:delivered, :failed], name: 'index_rpush_notifications_multi', where: 'NOT delivered AND NOT failed'

should probably be adjusted, so new mysql deployments wont run into this issue at all. Also the partial index is not supported by mysql. @ileitch, what do you think?

daniel-illi · 2015-11-18T20:30:39Z

@xiaoronglv, @amaierhofer What is the impact on the performance when including all attributes in the index instead of using a partial index?

xiaoronglv · 2015-11-18T23:49:34Z

It's a balance about reuse / index size / read and write.

advantage

full Index covered query is more faster.

disadvantage

index is too large.
destroy/insert/update will be more slower(write)

@dzubi

gsusmonzon · 2016-02-26T12:52:45Z

Hello,

I followed a different approach.
In my setup changing the index didnt helped too much (the selectivity was still low and locked many rows).
However sorting by id instead of created_at saved from doing a 'filesort' and the query run faster.
It means monkey patching the ready_for_delivery method https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record.rb#L191 . But my approach is kind of dirty.

def ready_for_delivery
  relation = Rpush::Client::ActiveRecord::Notification.where('processing = ? AND delivered = ? AND failed = ? AND (deliver_after IS NULL OR deliver_after < ?)', false, false, false, Time.now)
  @using_oracle ? relation : relation.order('created_at ASC') #change order to `id ASC`
end

Hope it helps. But I guess that it only makes the deadlock less likely. If the .lock(true) method is used ( https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record.rb#L208 ) deadlocks cant be avoided completely.

soulfly · 2016-05-06T15:08:52Z

Hi there,

also fighting with similar strange issues

under the high load, we noticed that a lot of notifications remain in 'processing' state forever and the number of such notifications increases time to time

for example,

APNS processing: 429
GCM processing: 215
WPNS processing: 1

and we don't know why

also, we sometimes see the following in rpush.log:

[2016-05-06 14:01:27] [ERROR] ActiveRecord::StatementInvalid, Mysql2::Error: Deadlock found when trying to get lock; try restarting transaction: UPDATE rpush_notifications SET processing = 0, delivered = 1, delivered_at = '2016-05-06 14:01:27' WHERE rpush_notifications.id IN (13615806)

So looks like @xiaoronglv is right

We are going to try his solution, will let you know

We use the latest rpush 2.7.0

modx-space · 2016-09-05T07:24:01Z

Hi @soulfly , how about the solution? any updates?

soulfly · 2016-09-05T07:37:07Z

@shawzt looks like it works better,
we have less records in 'processing' state but anyway we still have it for some reason..

modx-space · 2016-09-06T06:05:38Z

@soulfly Good to know. I will apply the changes to the index in our new project.

soulfly · 2016-09-07T08:01:08Z

At the moment we have another version in-parallel of our project with Redis instead of ActiveRecord.
We don't have such issues there. Still testing Redis for more visibility.

modx-space · 2016-09-08T03:01:45Z

@soulfly , how long you have been running it with Redis?

soulfly · 2016-09-08T06:25:51Z

@shawzt it's already ~3 or 4 months up and running

amaierhofer · 2016-09-23T07:07:56Z

@soulfly how did switching to redis affect message throughput? Do you have any numbers to share? We are currently pushing around 100k in 5 minutes with active record backend.

soulfly · 2016-09-27T15:25:39Z

@amaierhofer we do not have ready-to-share performance metrics
but Redis is in-memory store and all read/write operations work faster here with proper setup

1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes rpush#200

1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes #200

@loadhigh

v3.1.1 Fixed - Database deadlock [rpush#200](rpush#200) (by [@loadhigh](https://github.com/loadhigh) in [rpush#428](rpush#428)) Enhancements - Change the index on `rpush_notifications` to minimize number of locked records and pre-sort the records ([rpush#428](rpush#428) by [@loadhigh](https://github.com/loadhigh)) - Minimize the locking duration by moving the row dump code outside of the transaction ([rpush#428](rpush#428) by [@loadhigh](https://github.com/loadhigh))

1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes rpush#200

modx-space mentioned this issue Oct 13, 2015

fatal: deadlock detected #14

Closed

amaierhofer mentioned this issue Oct 4, 2016

RPUSH Freezes, connection stay in CLOSE_WAIT status #301

Closed

loadhigh mentioned this issue Mar 15, 2018

rpush_notifications active_record performance improvements #426

Closed

aried3r mentioned this issue Apr 11, 2018

rpush_notifications active_record performance improvements #428

Merged

aried3r closed this as completed in #428 Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database deadlock #200

Database deadlock #200

xiaoronglv commented Sep 17, 2015

modx-space commented Oct 12, 2015

amaierhofer commented Nov 11, 2015

xiaoronglv commented Nov 17, 2015

xiaoronglv commented Nov 17, 2015

amaierhofer commented Nov 17, 2015

daniel-illi commented Nov 18, 2015

xiaoronglv commented Nov 18, 2015

gsusmonzon commented Feb 26, 2016

soulfly commented May 6, 2016 •

edited

Loading

modx-space commented Sep 5, 2016

soulfly commented Sep 5, 2016

modx-space commented Sep 6, 2016

soulfly commented Sep 7, 2016 •

edited

Loading

modx-space commented Sep 8, 2016

soulfly commented Sep 8, 2016

amaierhofer commented Sep 23, 2016

soulfly commented Sep 27, 2016

Database deadlock #200

Database deadlock #200

Comments

xiaoronglv commented Sep 17, 2015

modx-space commented Oct 12, 2015

amaierhofer commented Nov 11, 2015

xiaoronglv commented Nov 17, 2015

Conflict transactions

Root cause

How to reproduce this bug in your localhost. (run it in rails c)

Solution

xiaoronglv commented Nov 17, 2015

amaierhofer commented Nov 17, 2015

daniel-illi commented Nov 18, 2015

xiaoronglv commented Nov 18, 2015

advantage

disadvantage

gsusmonzon commented Feb 26, 2016

soulfly commented May 6, 2016 • edited Loading

modx-space commented Sep 5, 2016

soulfly commented Sep 5, 2016

modx-space commented Sep 6, 2016

soulfly commented Sep 7, 2016 • edited Loading

modx-space commented Sep 8, 2016

soulfly commented Sep 8, 2016

amaierhofer commented Sep 23, 2016

soulfly commented Sep 27, 2016

soulfly commented May 6, 2016 •

edited

Loading

soulfly commented Sep 7, 2016 •

edited

Loading