Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pessimistic locking to prevent calling destroy callbacks #7965

Closed
wants to merge 1 commit into from

Conversation

bdurand
Copy link
Contributor

@bdurand bdurand commented Oct 16, 2012

This change adds a pessimistic lock attempt before calling destroy callbacks on an ActiveRecord model. This is a work around for a race condition where a record is destroyed by concurrent processes. The record will only be deleted once, but the callbacks based on that deletion will be called multiple times. This can mess up logic like the association counter cache which is invoked by a callback.

The change adds a select call with lock => true before destroying so the database can lock the record (if it supports this mechanism). If the record no longer exists, then the callbacks are not called. The rest of the destroy chain is called so that the record will remain in a consistent state.

… times in a race condition when deleting records.
@jeremy
Copy link
Member

jeremy commented Oct 16, 2012

Good thinking. Wonder about the tradeoffs though - every destroy is now taking an exclusive lock. Seems like we'd be introducing a whole new slew of app slowdowns & deadlocks.

@NZKoz
Copy link
Member

NZKoz commented Oct 16, 2012

yeah, to me this is forcing a situation on users who otherwise don't care. Lots of the time the actions taken by a destroy action are basically idempotent so having a few run in quick successsion are unlikely to cause issues.

People who have applications where that's not the case can call lock themselves right?

@bdurand
Copy link
Contributor Author

bdurand commented Oct 16, 2012

I don't think it's really that big of a change. The destroy action happens inside of a database transactions and the delete SQL will lock the row being deleted anyway. This would just move the lock up to before the before_destroy callbacks. This could potentially increase the length of time a row is locked if it has significant before_destroy callbacks and you could in theory cause a deadlock. Of course you have the same risks now if you have any after_destroy callbacks.

(see full discussion at https://groups.google.com/forum/?fromgroups=#!topic/rubyonrails-core/cl20eykvNsQ)

@NZKoz
Copy link
Member

NZKoz commented Oct 17, 2012

Don't confuse explicit locks with database transactions, both transactions can issue the delete from statement and receive no errors (as evidenced by the fact that your code does this now). However with locks in place the second record would likely raise a RecordNotFound or some other error when attempting to destroy the object.

This is adding an additional error case that at present, people don't have to handle.

While it's an important change for your use case, and anyone else using cache counters or any other non-idempotent destroy callbacks like this, there are large numbers of people who can safely issue the destroys in parallel as the operations are completely idempotent and the database just does nothing.

You can work around this problem by using find(params[:id], :lock=>true) in your destroy action, that seems like a good enough option to me?

@bdurand
Copy link
Contributor Author

bdurand commented Oct 18, 2012

Unless I'm missing something there shouldn't be any new error condition introduced. The database lock is accomplished by using a relation finder and not the lock! method in pessimistic.rb for just the reason you mention (as well as the fact that I didn't want to potentially alter the object being destroyed since that could have side effects as well; I also fell back to executing the raw SQL from the finder so no after_find callbacks would be called the could have side effects).

My point about the transaction is that (on database engines that support them) the row should already be locked after the delete statement is executed and remain locked until the transaction is committed or rolled back. The additional logic won't add any row locks that aren't already part of the transaction. What it will do is move them up to occur before the before_destroy callbacks are executed. See this graphic for a timeline of when the locks occur https://www.evernote.com/shard/s146/sh/c9fef20a-c5c0-49cc-a8d9-6fd1aa110af8/3ae7ff1809ff17295e33ecd5ab4db55c.

In a situation right now where two processes try to destroy the same record, the first one to the DELETE statement will lock the row. The second one will wait for the row to be released and then run the same DELETE statement which will do nothing. In the change I am proposing, the first process to get to the SELECT FOR UPDATE statement will lock the row. The second one will wait for the lock to be released and then discover that the record no longer exists. It will still actually run the DELETE statement (once again to reduce the risk of breaking existing applications). What it will not do is run the destroy callbacks (which could be a big change for some applications).

I've written a script that can be run against both MySQL and PostgreSQL that outputs the behavior of when two threads attempt to concurrently destroy the same record to demonstrate this. The scrip can be found here: https://gist.github.com/3913338. Sample output of the script is here: https://gist.github.com/3913376.

I think the issue really is if this is the correct behavior (i.e. not to run destroy callbacks if the record was not deleted). I wouldn't necessarily want to slip this into a minor point release of the framework because it could have side effects, but I think adding it for a major release is warranted to protect data integrity.

@NZKoz
Copy link
Member

NZKoz commented Oct 19, 2012

You're forgetting that you can issue delete from foos where id = N millions of times without error, but you can't issue the select ... FOR UPDATE after the delete's been issued.

If we were to do a fix for your underlying issue it'd be best to silently not execute the callbacks if the delete statement didn't affect any rows ala optimistic locking:

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/locking/optimistic.rb#L93-97

@bdurand
Copy link
Contributor Author

bdurand commented Oct 19, 2012

I'm still not understanding why you can't issue a select for update statement after a delete statement. After the row has been deleted it is simply a query statement that returns zero rows (and hence locks zero rows). There is no error condition on selecting zero rows for update either in the database or in the Ruby code.

I've tested this a multitude of ways: inside a transaction, outside a transaction, without transactions and cannot produce an error. Do you have some code which will show the error?

My first inclination was to use an optimistic scheme to ignore the callbacks. This would only affect the after_destroy callbacks and not before and around destroy so it wouldn't be a complete solution so I'd only like to fall back to it if this solution does not work. It would also require moving the counter cache callback from a before destroy to an after destroy which could also have an impact on some application code.

@NZKoz
Copy link
Member

NZKoz commented Oct 20, 2012

OK, you've convinced me that I'm over-reacting on the risk of errors, however it still feels wrong to me that a record being destroyed wouldn't execute the before_destroy callbacks. We don't do these queries when saving a record when it could just have easily been deleted out from underneath the user.

The AR approach has always been, if you need pessimistic database-level locking you have to request it explicitly, and if you don't use pessimistic locking you play with the risk that this kind of thing happens. For instance, last write wins for updates, we don't do a secret hidden FOR UPDATE query there. If that matters you use optimistic or pessimistic locking.

As for the counter caches, I think you could make a reasonable case that they should be after destroy callbacks, and that after_destroy callbacks shouldn't fire if the record wasn't actually destroyed? You could accomplish that just by looking at the number of rows affected. Could even have record_destroyed? as a predicate method which allows users to opt-in to this behaviour for their own after_destroy callbacks? That seems like a fine change for a 4.0 release to me.

@bdurand
Copy link
Contributor Author

bdurand commented Oct 22, 2012

I coded up a couple of less aggressive alternatives to this pull request.

This one introduces an obtain_lock method which implements the lightweight lock mechanism (i.e. it doesn't alter the record like the lock! method does so is more suitable for a callback). The lock is then requested by the counter cache before_destroy callback so it only affects applications using this feature.

bdurand@db74fcd

This one switches the counter cache callback to after_destroy and adds a row_deleted? method.

bdurand@fc2b863

I prefer the first method since it feels like less of a hack and it also provides a hook application developers can use for before_destroy callbacks and not just after_destroy (it would also work for before_update as well). I could also see a case being made to include both methods.

@bdurand
Copy link
Contributor Author

bdurand commented Nov 1, 2012

Any feedback on the approaches in either of these commits and which would be preferred?

bdurand@db74fcd
bdurand@fc2b863

@stouset
Copy link
Contributor

stouset commented Mar 20, 2013

The second approach looks "safer", but it doesn't prevent the issue of non-idempotent before_destroy callbacks from being run, does it?

Obviously the counter_cache issue should be solved from Rails' end (since it's not idempotent), but maybe a simple halfway solution here is just to indicate in the documentation that callbacks should be idempotent?

@jaggederest
Copy link
Contributor

This ticket is pretty stale, is anyone still interested in this issue? Does counter_cache still behave badly in the context of multiple simultaneous destroys?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants