Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Data corruption risk: Roll back open transactions when the running thread is killed. #13656

Merged
merged 1 commit into from Sep 13, 2014

Conversation

Projects
None yet
9 participants
Contributor

chanks commented Jan 9, 2014

This is a fix for this issue:

http://coderrr.wordpress.com/2011/05/03/beware-of-threadkill-or-your-activerecord-transactions-are-in-danger-of-being-partially-committed/

In a nutshell: When the Ruby process exits, Thread#kill is used to end all threads that are still running (aside from the main one). Thread#kill runs ensure blocks but not rescue blocks, which means that if one of those threads is in the middle of a transaction, it will be prematurely committed rather than rolled back, which is bad bad news.

Luckily, Ruby 2.0+ sets a thread's status to "aborting" when it is killed, so it's a simple fix to check for it in the ensure block. The problem is not fixable on Ruby 1.9, so the spec doesn't run there.

Member

senny commented Jan 10, 2014

Owner

tenderlove commented Jan 10, 2014

Does this actually fix it? We can't predict when the thread's status will change to aborting, so it's possible the thread could be killed inside the ensure block while committing the transaction. I haven't examined this problem enough to know if that's OK.

/cc @headius

Contributor

chanks commented Jan 10, 2014

If we reach the ensure block without incident, then the thread has already successfully made all its changes to the database. If the thread is killed in the ensure block after the thread status check, it'll still commit, which is fine. If it's killed in the ensure block before the thread status check, it'll be rolled back needlessly, which is unfortunate, but not catastrophic (or avoidable).

Contributor

headius commented Jan 11, 2014

@chanks That's not quite true.

The thread to be killed could have just entered the ensure block and not yet run the commit logic. When that thread gets killed, the ensure block will terminate and the commit will never happen.

The kill could also happen during the commit logic, causing some sort of corruption or partial commit if it's not done atomically.

You have to assume the kill could come in at any time in that logic. It could happen before the thread status check, after the status check but before the commit, during the commit, and so on.

I believe Ruby 2.1 has started to add features to prevent kill/raise from reaching a thread within certain user-specified scopes, but it's not a great solution. The ability to arbitrarily kill threads in Ruby is still a major flaw.

Contributor

chanks commented Jan 11, 2014

Ah, I was assuming that since Thread#kill still executed ensure blocks, it wouldn't interrupt an ensure block that it occurred inside.

But even so, the important thing is that a COMMIT isn't issued when the transaction has been interrupted, and I don't see how that could happen with the code as patched. If it hits around the status check, commit_transaction won't be called. And if the process ends and the connection closes without a COMMIT, the database should roll it back for us (at least I know Postgres will, I'm not sure about other RDBMSes).

And if it hits while we're in the process of executing COMMIT, then it may or may not execute, but either way, we finished the transaction block without incident, so all of the changes to the database have been made successfully, and either committing or rolling back wouldn't be catastrophic.

@chanks chanks referenced this pull request in chanks/que Jan 29, 2014

Closed

@attrs[:run_at] error #24

Contributor

chanks commented Feb 12, 2014

@tenderlove Bumping because I feel it's important to get this into 4.1.0. Even though removing Thread#kill in Ruby might be a more perfect solution, this is still a vast improvement over the current situation, and I believe it is the best we can do until Ruby is fixed.

Owner

tenderlove commented Feb 12, 2014

I'm uneasy about this, so I'm deferring to @jeremy.

Owner

jeremy commented Feb 12, 2014

Would make more sense to nest the begin/rescue scopes anyway. Get rid of the ensure. The rescue shouldn't handle begin_transaction exceptions. Roughly:

txn = begin_transaction
begin
  yield
rescue Exception
  rollback_transaction txn
  raise
else
  commit_transaction txn
end
Contributor

chanks commented Feb 13, 2014

@jeremy Neat, I didn't know you could follow a rescue with an else like that. Unfortunately, it doesn't seem to be executed in the event of a return from inside the transaction block like the ensure is.

@chanks chanks added a commit to chanks/rails that referenced this pull request Feb 19, 2014

@chanks chanks Don't let within_new_transaction catch exceptions raised by begin_tra…
…nsaction.

Per discussion at rails#13656 (comment)
1956940
Contributor

chanks commented Feb 19, 2014

@jeremy I went ahead and made sure that exceptions from begin_transaction wouldn't be caught, but I don't believe there's any way to remove the ensure block without breaking this behavior.

Contributor

chanks commented Mar 24, 2014

@jeremy Bumping this again, I feel it's important to get this into 4.1.0.

Owner

rafaelfranca commented Mar 24, 2014

To be included in 4.1.0 at this time it needs to be a regression between 4.0 and 4.1, but this doesn't seems the case. It will be included in 4.1.1 thought.

Contributor

chanks commented Mar 24, 2014

@rafaelfranca Ok, thanks!

Contributor

joevandyk commented Aug 14, 2014

@rafaelfranca Did this make it in 4.1.1?

Owner

rafaelfranca commented Aug 14, 2014

No, it was not even merged yet.

@rafaelfranca rafaelfranca added this to the 4.1.5 milestone Aug 14, 2014

Contributor

joevandyk commented Aug 14, 2014

@rafaelfranca thanks!

Member

arthurnn commented Aug 15, 2014

not sure if this should make to master at all.
This is a issue indeed, but not much something we can solve in Rails side... as @headius described, the thread kill could happen at any time, so I am 👎 in polluting the code to try to catch all the cases.

Contributor

chanks commented Aug 15, 2014

It's currently impossible in Ruby to guarantee that a Thread#kill inside an ensure block won't mess up what's going on in the ensure block.

It is possible to guarantee that Thread#kill anywhere else in the code won't prematurely commit your transaction and result in data corruption. That's what this patch does.

This is a 99% solution to the problem. The last 1% is up to Ruby to fix, but don't throw out a 99% solution because it's not a 100% solution.

Contributor

joevandyk commented Aug 19, 2014

Why is ActiveRecord committing the transaction inside an ensure block in the first place?

Contributor

chanks commented Aug 19, 2014

@joevandyk It's so that this commits instead of rolling back:

ActiveRecord::Base.transaction do
  # do stuff
  return
end

@chanks chanks referenced this pull request in chanks/que Aug 19, 2014

Closed

Add configuration option for maximum job run time #51

Contributor

joevandyk commented Aug 19, 2014

Why would the commit need to be inside the ensure block? Seems like it could be something like

class ActiveRecord
  def transaction(&block)
    execute "begin"
    yield
    execute "commit" if not_committed?
  rescue
    # whatever
  ensure
    # no need to commit in here?
  end
end
Contributor

chanks commented Aug 19, 2014

The return inside the block doesn't just end the block, it jumps back in the call stack to return from the current method immediately. Only ensure blocks are still run.

Contributor

chanks commented Aug 22, 2014

Can anyone point out a line in the method I patched (or anywhere else in the Rails codebase, for that matter) where a Thread#kill could strike and still cause a "COMMIT" to be issued prematurely? I don't believe there is one. If it strikes before the ensure block begins, the thread status-checking logic issues a rollback. If it strikes inside the ensure block, either a "COMMIT" is issued (as is proper) before it strikes, or all that logic is skipped and neither "COMMIT" nor "ROLLBACK" is issued. Then the connection dies and the transaction is thrown out (also acceptable). I don't see a chain of events wherein a Thread#kill still results in a prematurely committed transaction.

Owner

matthewd commented Aug 22, 2014

@chanks agree.

Can you please rebase this? I think we can lose the if transaction with your rearrangement, and it's probably clearer if we keep the second rescue closer to the commit_transaction.

Throw in a changelog entry, and I think this is good to go.

Contributor

chanks commented Aug 22, 2014

@matthewd Done!

@matthewd matthewd merged commit 997a621 into rails:master Sep 13, 2014

1 check failed

continuous-integration/travis-ci The Travis CI build could not complete due to an error
Details

@matthewd matthewd added a commit that referenced this pull request Sep 13, 2014

@matthewd matthewd Merge pull request #13656 from chanks/rollback_transactions_in_killed…
…_threads

Data corruption risk: Roll back open transactions when the running thread is killed.
1d4d15a
Owner

matthewd commented Sep 13, 2014

There's definitely room for improvement from ruby core here: ideally, we'd have some single place that would tell us, from inside an ensure, what's currently unwinding the stack:

  • end of begin block reached
  • non-local return
  • exception
  • throw
  • thread killed
  • .. others?

With the new aborting status, we're up to being able to detect three of those. 😐

Anyway, this certainly improves things. Thanks for persevering! ❤️ ❤️

@chanks chanks deleted the chanks:rollback_transactions_in_killed_threads branch Sep 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment