Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
What are 'best practices' for dealing with transient database errors (like Deadlocks and constraint violations) in a highly concurrent Rails system? #942
Imported from Lighthouse. Original ticket at: http://rails.lighthouseapp.com/projects/8994/tickets/6596
Question: what are 'best practices' for dealing with transient database errors (like Deadlocks and constraint violations) in a highly concurrent Rails system?
Thank you in advance to anyone reading this; I spoke with Yehuda Katz yesterday at #mwrc and he thought ithis was a deep enough question to open here. **Disclaimer* My understanding of the database adapters supported by Rails is limited to MySQL, so any statement below relating to database behavior should be read as MySQL behavior. We are also running Rails 2.3.9 and might be unfamiliar with new functionality available in 3.x*
We have recently scaled our app horizontally using the 'shared nothing' architecture approach, and now that we have multiple servers (and background processes) acting concurrently on the Database we are tracing several ActiveRecord::StatementInvalid exceptions related to deadlock or other constraint violations that are transient and thus retryable.
The MySQL team believes deadlocks are transient, frequent and inherently safe in transactional databases - and that consuming applications should be 'always prepared to re-issue a transaction'; see http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html
My Team has been looking into this for several weeks now and have made the following observations:
Example constraint violation
Here, we've implemented a unique index on the email column for the users table to guarantee data integrity under concurrent conditions, matched with an 'optimistic' validation in the model. In this scenario neither process finds a user matching the email attribute and thus attempts a create. The create then executes the validates_uniqueness_of validation in both processes, which succeed. The
What we have done
Based on our observations, we believe that wholesale retry of requests instead of transactions is the preferred approach. We believe proper controller actions should be implemented to never leave data in an inconsistent state, thus it should be safe to retry any action in its entirety.
We know this is a hack. We know this hack increases network traffic and only works for HTML requests from browsers supporting JS; any actions which respond to any other formats are still hosed.
We originally attempted to use the rescue_from to interrupt the exception event and re-envoke the controller action; however, this led to duplicate render errors and a myriad of other problems because instance variables had been set/changed in the controller object.
We also tried to use a background NetHTTP process to attempt a second request that the user wouldn't experience… however this approach failed to supply the request context; i.e. the users session and cookies weren't available - much less the request headers.
What approach can you Rails geniuses come up with?