Reduced memory leak problem in transactions by lazily updating AR objects #9068

Merged
merged 1 commit into from Feb 20, 2013

Conversation

Projects
None yet
6 participants
Contributor

wangjohn commented Jan 25, 2013

This pull request concerns Issue #776.

I handle memory bloat by having the transaction hold only the AR objects which it absolutely needs to know about. These are the AR objects with callbacks (they need to be updated as soon as something in the transaction occurs).

All other AR objects can be updated lazily by keeping a reference to a TransactionState object. If an AR object is inside a transaction, then the transaction will add its TransactionState to the AR object. When the user makes a call to some attribute on an AR object (which has no callbacks) associated with a transaction, the AR object will call the sync_with_transaction_state method and make sure it is up to date with the transaction. After it has synced with the transaction state, the AR object will return the attribute that was requested.

Most of the logic in the changes are used to handle multiple transactions, in which case the AR object has to recursively follow parent pointers of TransactionState objects.

activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
+ def rolledback?(counter)
+ if @states[counter]
+ return @states[counter] == :rolledback
+ end
@tenderlove

tenderlove Jan 25, 2013

Owner

I think you can simplify this function to just :rolledback == @states[counter]

activerecord/lib/active_record/core.rb
+ elsif transaction_state.rolledback?(count)
+ rolledback!
+ end
+ rescue => e
@tenderlove

tenderlove Jan 25, 2013

Owner

What happens if we just let this exception raise? Are there tests that depend on it?

activerecord/lib/active_record/core.rb
+
+ last_counter = transaction_state.num_states - 1
+ if state_counter < last_counter
+ (state_counter..last_counter).each do |count|
@tenderlove

tenderlove Jan 25, 2013

Owner

This creates a Range object that we don't need. I think you can change to something like this:

state_counter.upto(last_counter) do |count|
   ...
end

It may be off by one though, so you need to check.

activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
@@ -116,7 +134,11 @@ def commit
end
def add_record(record)
- records << record
+ if !record._rollback_callbacks.empty? || !record._commit_callbacks.empty? || !record._create_callbacks.empty?
@tenderlove

tenderlove Jan 25, 2013

Owner

The opposite of xxx.empty? is xxx.any?. So you can change all of these empty?s to any? and remove the !.

Contributor

wangjohn commented Jan 26, 2013

I corrected the code based on your suggestions. Also, it seems that all transactions, once they are rolledback or committed, are never changed so that the list of states in the TransactionState class is unnecessary. I've run a pretty simple test where I do the following:

Topic.transaction do
while true
Topic.new
end
end

On my machine (ruby 1.9.3), the amount of memory seems to stabilize pretty quickly and I was able to created millions of objects without any marked increase in memory. Note that the Topic class does not have callbacks.

activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
@@ -14,11 +14,17 @@ def state
end
class TransactionState
+ attr_reader :parent
@tenderlove

tenderlove Jan 28, 2013

Owner

Change this to attr_accessor.

activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
+ end
+
+ def set_parent_state(parent)
+ @parent = parent
end
@tenderlove

tenderlove Jan 28, 2013

Owner

Is this supposed to set the parent, or the parent state? The method name is confusing.

activerecord/lib/active_record/core.rb
@@ -347,8 +347,52 @@ def slice(*methods)
Hash[methods.map { |method| [method, public_send(method)] }].with_indifferent_access
end
+ def add_transaction_state(state)
@tenderlove

tenderlove Jan 28, 2013

Owner

Change this to set_transaction_state

activerecord/lib/active_record/core.rb
+ return
+ end
+
+ if !@reflects_state[depth]
@tenderlove

tenderlove Jan 28, 2013

Owner

Change to unless @reflects_state[depth]

activerecord/lib/active_record/core.rb
+ @reflects_state[depth] = true
+ end
+
+ if !transaction_state.parent.nil? && !@reflects_state[depth+1]
@rafaelfranca

rafaelfranca Jan 28, 2013

Owner
if transaction_state.parent && !@reflects_state[depth+1]
activerecord/lib/active_record/core.rb
+ end
+
+ def update_attributes_from_transaction_state(transaction_state, depth)
+ if transaction_state.nil? || _rollback_callbacks.any? || _commit_callbacks.any? || _create_callbacks.any?
@rafaelfranca

rafaelfranca Jan 29, 2013

Owner

What about invert the conditional to avoid the short-circuit return?

if transaction_state || _rollback_callbacks.empty? || _commit_callbacks.empty? || _create_callbacks.empty?
  unless @reflects_state[depth]
    if transaction_state.committed?
      committed!
    elsif transaction_state.rolledback?
      rolledback!
    end

    @reflects_state[depth] = true
  end

  if transaction_state.parent && !@reflects_state[depth + 1]
    update_attributes_from_transaction_state(transaction_state.parent, depth + 1)
  end
end
@rafaelfranca

rafaelfranca Jan 29, 2013

Owner

Also would be great if you extracted this conditional to a method with a meaningful name.

activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
@@ -116,7 +122,11 @@ def commit
end
def add_record(record)
- records << record
+ if record._rollback_callbacks.any? || record._commit_callbacks.any? || record._create_callbacks.any?
@rafaelfranca

rafaelfranca Jan 29, 2013

Owner

Maybe we can extract this conditional to a method like

record.has_transactional_callbacks?

Doing this we can reuse it on update_attributes_from_transaction_state

Contributor

wangjohn commented Jan 29, 2013

Thanks for all the comments! I've made changes so that my code reflects all of these. The tests still pass.

activerecord/lib/active_record/core.rb
+ @transaction_state = state
+ end
+
+ def has_transactional_callbacks?
@rafaelfranca

rafaelfranca Jan 29, 2013

Owner

Put # :nodoc: since this method should not be part of the public API

Owner

rafaelfranca commented Jan 29, 2013

Very good! 👍

Contributor

wangjohn commented Jan 29, 2013

Ok, nodocs have been added.

activerecord/lib/active_record/core.rb
+ end
+
+ def has_transactional_callbacks? # :nodoc:
+ _rollback_callbacks.any? || _commit_callbacks.any? || _create_callbacks.any?
@lexmag

lexmag Jan 30, 2013

Contributor

any? definitely not the same as ! + empty?

Contributor

wangjohn commented Jan 31, 2013

While its true that any? is not the same as !empty?, it doesn't matter because we are searching for non-nil callbacks from the _#{transactional}_callbacks stack. Technically, using any? is actually better than using !empty? because a nil callback should not set off the has_transaction_callbacks? method.

Contributor

lexmag commented Jan 31, 2013

_#{transactional}_callbacks could not contain nil object. Right? And ! + empty? has better performance.
There is example 2ff47c4

Contributor

wangjohn commented Feb 20, 2013

I've added a CHANGELOG entry for this PR, and have rebased it with master.

Owner

rafaelfranca commented Feb 20, 2013

It stills doesn't can be automatically merged

activerecord/lib/active_record/core.rb
+ committed!
+ elsif transaction_state.rolledback?
+ rolledback!
+ end
Reduced memory leak problem in transactions by lazily updating AR obj…
…ects with new transaction state. If AR object has a callback, the callback will be performed immediately (non-lazily) so the transaction still has to keep records with callbacks.

rafaelfranca added a commit that referenced this pull request Feb 20, 2013

Merge pull request #9068 from wangjohn/transaction_callback_patch
Reduced memory leak problem in transactions by lazily updating AR objects

@rafaelfranca rafaelfranca merged commit 82a5432 into rails:master Feb 20, 2013

@tenderlove @wangjohn why was this line removed? it introduced a regression. On 3.2 the follow script runs the callback, on 4.0 it does not.

require "active_record"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")

ActiveRecord::Schema.define do
  create_table :posts
end

class Post < ActiveRecord::Base
  after_commit { puts "after commit" }
end

Post.transaction(joinable: false) do
  Post.create
end
Contributor

wangjohn replied Nov 19, 2013

@jonleighton Hmm, seems like this is actually a bug. This line should not have been removed, since the the transactional state is actually set in add_record.

I think we can add it back: I'll do that in a PR.

Contributor

wangjohn replied Nov 19, 2013

Sorry I take that back. The original intention of removing this line was that records that weren't already alive before the transaction were created wouldn't have a callback run on them.

It seems like that was a bad assumption to have. I was trying to make sure not all of the objects created in a transaction had strong references to them so that they could potentially get garbage collected.

I'm going to look into this more, because it seems like just adding this line back in doesn't actually solve the problem.

Member

jonleighton replied Nov 20, 2013

@wangjohn Thanks for looking into it. Your patch achieves that for records for which has_transactional_callbacks? returns false. For records that do have transactional callbacks, it's not going to be possible without using weak references in Ruby >= 2. But either way I don't think adding this line back in will affect that, unless I am missing something?

Contributor

wangjohn replied Nov 20, 2013

@jonleighton No you're right, adding that line back in won't fix it.

I think the original reason this architecture was chosen was because weak refs in ruby < 1.9 used a very slow _id2ref implementation. Now that Rails 4.0 has higher Ruby version requirement, maybe we should look into weak refs?

Do you have any idea of how prevalent this use case is?

Member

jonleighton replied Nov 20, 2013

@wangjohn the main thing is that transactions now don't leak for objects which don't use transactional callbacks. that's definitely going to be the most prevalent use case. When I chatted about this with @tenderlove last he didn't mind that it leaks for records that do use transactional callbacks. Using weakrefs where supported could be a nice enhancement but I don't think it's essential. Either way it would be good to fix this regression.

chancancode added a commit to chancancode/rails that referenced this pull request May 16, 2014

`TransactionState` is internal API, so added :nodoc:
This was introduced in 26853e8 / #9011. Its main purpose is to flip the
reference from `Transaction` -> AR objects to AR objects -> `TransactionState`.

This method this was extracted from originally from was a private API, and there
are no other public APIs to make this accessible to the user, so there is no
reason for this class to be public.

See also 67d8bb9 / #9068.

arthurnn added a commit to arthurnn/rails that referenced this pull request Jan 22, 2015

Use WeakRef to store records on transactions.
In order to restore state on records we need to store all records
touched in the transaction. However if we store all records, we will be
holding a hard reference to them, not allowing them to be garbage
collected. #9068 kinda solved the GC issue inversing the dependency.
As we are on ruby 2.2+ we can use WeakRef, and wont need to inverse that
depedency to restore state.

[fixes #15549] - Because records with a 'create' callback were still
been stored, and memory grow was still a problem for those.

arthurnn added a commit to arthurnn/rails that referenced this pull request Jan 22, 2015

Use WeakRef to store records on transactions.
In order to restore state on records we need to store all records
touched in the transaction. However if we store all records, we will be
holding a hard reference to them, not allowing them to be garbage
collected. #9068 kinda solved the GC issue inversing the dependency.
As we are on ruby 2.2+ we can use WeakRef, and wont need to inverse that
depedency to restore state.

[fixes #15549] - Because records with a 'create' callback were still
been stored, and memory grow was still a problem for those.

arthurnn added a commit to arthurnn/rails that referenced this pull request Jan 22, 2015

Use WeakRef to store records on transactions.
In order to restore state on records we need to store all records
touched in the transaction. However if we store all records, we will be
holding a hard reference to them, not allowing them to be garbage
collected. #9068 kinda solved the GC issue inversing the dependency.
As we are on ruby 2.2+ we can use WeakRef, and wont need to inverse that
depedency to restore state.

[fixes #15549] - Because records with a 'create' callback were still
been stored, and memory grow was still a problem for those.

arthurnn added a commit to arthurnn/rails that referenced this pull request Jan 31, 2015

Use WeakRef to store records on transactions.
In order to restore state on records we need to store all records
touched in the transaction. However if we store all records, we will be
holding a hard reference to them, not allowing them to be garbage
collected. #9068 kinda solved the GC issue inversing the dependency.
As we are on ruby 2.2+ we can use WeakRef, and wont need to inverse that
depedency to restore state.

[fixes #15549] - Because records with a 'create' callback were still
been stored, and memory grow was still a problem for those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment