-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database deadlock #200
Comments
we also met this similar issue, here is the log:
But we found that we had two rpush processes running on the host during the issue happened, so I am wondering whether rpush supports multiple instances running in the same environment? |
We encountered similar Deadlock issues, which appear more frequently for a higher number of notifications, larger batch_size e.g. 2000 and short poll_interval e.g. 0.5. We are using mysql2 0.3.20. Occasionally the recovering mechanism also fails and the daemon stops delivering messages all together. Sometimes the queue (as queried with rpush status) fills up, sometimes even the feeder thread seems to hang. In both cases no additional information is available in the log files. It appears that introducing additional sleep time between statements in https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record/reconnectable.rb#L55 mitigates the problem but it still occurs occasionally. Any ideas on who to debug this further? What settings work best for achieving high throughput for APNS on mysql without running the risk of Deadlocks? |
Hi @amaierhofer I have reproduced in my localhost and fixed this problem by refactoring index. Conflict transactionsTransaction 1 SELECT `rpush_notifications`.* FROM `rpush_notifications` WHERE (processing = 0 AND delivered = 0 AND failed = 0 AND (deliver_after IS NULL OR deliver_after < '2015-08-21 16:48:10')) ORDER BY created_at ASC LIMIT 100 FOR UPDATE Transaction 2 UPDATE `rpush_notifications` SET processing = 0, delivered = 1, delivered_at = '2015-08-21 16:48:10' WHERE `rpush_notifications`.`id` IN (647770) Root causeThe counterintuitive part of transaction1 is that, In Repeated read isolate level, Innodb engine will lock all rows which satisfy 'failed=0 and delivered=0' , it don't care the following conditions.
These conditions are filtered in MySQL sever level, not engine level. https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record.rb#L30 https://github.com/rpush/rpush/blob/master/lib/rpush/daemon/store/active_record.rb#L95 How to reproduce this bug in your localhost. (run it in rails c)Process A loop do
ActiveRecord::Base.connection.execute("SELECT `rpush_notifications`.* FROM `rpush_notifications` WHERE (processing = 0 AND delivered = 0 AND failed = 0 AND (deliver_after IS NULL OR deliver_after < '2015-10-21 17:47:32')) ORDER BY created_at ASC LIMIT 100 for update;")
end Process B Rpush::Notification.where(processing: true).find_each do |ntf|
id = ntf.id
ActiveRecord::Base.connection.execute("UPDATE `rpush_notifications` SET processing = 0, delivered = 1, delivered_at = '2015-08-21 16:48:10' WHERE `rpush_notifications`.`id` IN (#{id})")
sleep 0.5
end SolutionModify the following index to reduce lock rows when select * from * for update. DROP INDEX `index_rpush_notifications_multi` ON rpush_notifications;
CREATE INDEX index_rpush_notifications_multi ON rpush_notifications (delivered, failed, processing, deliver_after) |
Deadlock will raise an error, then interrupt the thread 'Feeder'. That's the reason why you rpush stop work. |
Many thanks @xiaoronglv for sharing your insights! I can confirm, that this solution works! Without adjusting the index, running the code from Process A and the rpush daemon at the same time, triggered plenty of deadlocks. After making the suggested adjustments, no more deadlocks occured. The index_rpush_notifications_multi index as created in
|
@xiaoronglv, @amaierhofer What is the impact on the performance when including all attributes in the index instead of using a partial index? |
It's a balance about reuse / index size / read and write. advantagefull Index covered query is more faster. disadvantage
|
Hello, I followed a different approach.
Hope it helps. But I guess that it only makes the deadlock less likely. If the |
Hi there, also fighting with similar strange issues under the high load, we noticed that a lot of notifications remain in 'processing' state forever and the number of such notifications increases time to time for example,
and we don't know why also, we sometimes see the following in rpush.log:
So looks like @xiaoronglv is right We are going to try his solution, will let you know We use the latest rpush 2.7.0 |
Hi @soulfly , how about the solution? any updates? |
@shawzt looks like it works better, |
@soulfly Good to know. I will apply the changes to the index in our new project. |
At the moment we have another version in-parallel of our project with Redis instead of ActiveRecord. |
@soulfly , how long you have been running it with Redis? |
@shawzt it's already ~3 or 4 months up and running |
@soulfly how did switching to redis affect message throughput? Do you have any numbers to share? We are currently pushing around 100k in 5 minutes with active record backend. |
@amaierhofer we do not have ready-to-share performance metrics |
1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes rpush#200
1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes #200
v3.1.1 Fixed - Database deadlock [rpush#200](rpush#200) (by [@loadhigh](https://github.com/loadhigh) in [rpush#428](rpush#428)) Enhancements - Change the index on `rpush_notifications` to minimize number of locked records and pre-sort the records ([rpush#428](rpush#428) by [@loadhigh](https://github.com/loadhigh)) - Minimize the locking duration by moving the row dump code outside of the transaction ([rpush#428](rpush#428) by [@loadhigh](https://github.com/loadhigh))
1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes rpush#200
1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes rpush#200
1. Change the index on rpush_notifications to minimized number of locked records and pre-sort the records 2. Minimize the locking duration by moving the row dump code outside of the transaction fixes rpush#200
Hi , I encounter a problem about rpush, and try to fix it. But I can't find the root cause of
The text was updated successfully, but these errors were encountered: