I've been looking into memory behavior in Rails workers. One thing I've noticed is that it's easy to instantiate multiple tens of thousands of objects on the Ruby heap even with find_each operating in batches of 1000. Most of these objects appear to be highly redundant.
Consider loading 1000 instances of an AR object MyClass which has 20 database fields. There will be at least 20 x 1000 strings allocated, as measured by GC.start; ObjectSpace.count_objects[:T_STRING]. Digging deeper, it looks like each instance has an internal attributes hash in an instance variable. The first key is typically the string "id". Each "id" string is an individual object, as determined by object_id, even though all of these strings are frozen for use as hash keys.
Would it be possible to take advantage of the very large amount of duplication in the keys of this hash to save thousands of unnecessary objects from being allocated every time a bulk query is run? Maybe something like a StringPool, or getting the column name directly from a lower layer, or using symbols would work.
Also please see discussion here:
Please don't open issue to feature request here. We are trying to use the issues tracker only to real issues and pull request. We recommend to open a discussion in the Rails core mailing list.
Sorry, I don't understand. This is not a feature request.
Fo me it is not an issue. It is not something that we have to fix. It is a nice to have feature.
It's not a feature - it's a performance / memory optimization. And quite a significant one, I believe.
I reopened. But I still think this should not be in the issue tracker since it is not a real blocker or something that we have to fix.
Is there a better place to track performance bugs? Or do they not get tracked?
Yes, this is a feature request.
There is actually a thread on rails-core about this right now.
Yes, I mentioned it in the report :)
I don't consider performance work to be a feature, but if that's the approach, good to know. Still curious how performance bugs are tracked.
@wlipa if this is considered a performance regression bug so it should be here.
Let me explain what we are trying to avoid closing issues like this. This kind of issue always stays in the tracker without any kind of answer for months. To us is better to receive a pull request with the fix or start a discussion in the mailing list to get someone to work to fix these performance work. This is why I closed this one.
I know that @jonleighton are interested to work in this one and I'm also trying to find spots where we can improve the performance, but we usually make this ad-hoc. I already added your discussion in my TODO list when you started.
As @steveklabnik said we aren't tracking this but we are tracking major regression. But I'll start a discussion with our team to see if we can improve the workflow to performance issues.
Thanks for the explanation. I'll keep an eye out for information on how better to handle performance bugs / problems in Rails.