Excessive redundant object allocation in ActiveRecord #7629

Closed
wlipa opened this Issue Sep 13, 2012 · 11 comments

3 participants

@wlipa

I've been looking into memory behavior in Rails workers. One thing I've noticed is that it's easy to instantiate multiple tens of thousands of objects on the Ruby heap even with find_each operating in batches of 1000. Most of these objects appear to be highly redundant.

Consider loading 1000 instances of an AR object MyClass which has 20 database fields. There will be at least 20 x 1000 strings allocated, as measured by GC.start; ObjectSpace.count_objects[:T_STRING]. Digging deeper, it looks like each instance has an internal attributes hash in an instance variable. The first key is typically the string "id". Each "id" string is an individual object, as determined by object_id, even though all of these strings are frozen for use as hash keys.

Would it be possible to take advantage of the very large amount of duplication in the keys of this hash to save thousands of unnecessary objects from being allocated every time a bulk query is run? Maybe something like a StringPool, or getting the column name directly from a lower layer, or using symbols would work.

Also please see discussion here:
https://groups.google.com/d/topic/rubyonrails-core/jFlXnFA4rP8/discussion

@rafaelfranca
Ruby on Rails member

Please don't open issue to feature request here. We are trying to use the issues tracker only to real issues and pull request. We recommend to open a discussion in the Rails core mailing list.

@wlipa

Sorry, I don't understand. This is not a feature request.

@rafaelfranca
Ruby on Rails member

Fo me it is not an issue. It is not something that we have to fix. It is a nice to have feature.

@rafaelfranca rafaelfranca reopened this Sep 13, 2012
@wlipa

It's not a feature - it's a performance / memory optimization. And quite a significant one, I believe.

@rafaelfranca
Ruby on Rails member

I reopened. But I still think this should not be in the issue tracker since it is not a real blocker or something that we have to fix.

@wlipa

Is there a better place to track performance bugs? Or do they not get tracked?

@steveklabnik
Ruby on Rails member

Yes, this is a feature request.

There is actually a thread on rails-core about this right now.

@wlipa

Yes, I mentioned it in the report :)

I don't consider performance work to be a feature, but if that's the approach, good to know. Still curious how performance bugs are tracked.

@steveklabnik
Ruby on Rails member
@rafaelfranca
Ruby on Rails member

@wlipa if this is considered a performance regression bug so it should be here.

Let me explain what we are trying to avoid closing issues like this. This kind of issue always stays in the tracker without any kind of answer for months. To us is better to receive a pull request with the fix or start a discussion in the mailing list to get someone to work to fix these performance work. This is why I closed this one.

I know that @jonleighton are interested to work in this one and I'm also trying to find spots where we can improve the performance, but we usually make this ad-hoc. I already added your discussion in my TODO list when you started.

As @steveklabnik said we aren't tracking this but we are tracking major regression. But I'll start a discussion with our team to see if we can improve the workflow to performance issues.

@wlipa

Thanks for the explanation. I'll keep an eye out for information on how better to handle performance bugs / problems in Rails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment