Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type_map in Postgres causes memory bloat with large number of pg schemas #19578

Closed
bradrobertson opened this issue Mar 29, 2015 · 3 comments
Closed
Assignees
Milestone

Comments

@bradrobertson
Copy link
Contributor

I've set up a sample repo that best illustrates the issue.

If one were to use many postgres schemas in their app, ActiveRecord will cache type mappings in memory for all these schemas, which leads to memory bloat. This is the case in a multi-tenant setup, like the one used by the apartment gem

You can see the results of these mappings here.

The source of the problem seems to come from the load_additional_types call inside initialize_type_map.

You'll notice in the results sections, the *wo_additional_types files show considerably lower memory usage. I haven't fully traced down exactly where these additional types are used. The only thing I can see that does a lookup by oid if the type is unknown is in exec_query but I'm not quite sure how that differs from just query or exec.

From what I can see, the pg_type table that these additional types are being loaded from, contains rows that look like:

3189159,"_foos_16",3189160,",","array_in",,"b",0

(given the sample app mentioned above).

Again I haven't fully grasped what exactly all this is used for, but I'm wondering if this query can be tightened up a bit to handle this situation, given that those mappings don't actually appear to be useful/used (to my untrained eye).

In production, we're seeing memory usage of ~300+ MB PER CONNECTION which is held onto from the moment the thread connects to the db until it's shut down. This is particularly troubling when we run Sidekiq with 30+ threads.

I'm reaching out for advice or tips on how to make this better as this is about the end of my knowledge of the situation. I'm happy to submit a PR given some guidance on the best way to fix this (if that's possible).

Many Thanks

@sgrif sgrif self-assigned this Mar 29, 2015
@sgrif sgrif added this to the 4.2.2 milestone Mar 29, 2015
@sgrif
Copy link
Contributor

sgrif commented Mar 29, 2015

I haven't fully traced down exactly where these additional types are used.

Whenever any query is executed, we'll be grabbing the appropriate type object based on the OID of the resulting column's type. This is the core of how Active Record's type system and automatic schema detection works. In general, we want to preload all of these in one go, so we can avoid expensive checks per-query. (The alternative would be an additional iteration through the result set, which we don't want, or potentially an N+1 queries problem in Active Record itself which we really don't want).

In production, we're seeing memory usage of ~300+ MB PER CONNECTION

So as you can tell from your output, we're not actually retaining much memory. Just allocating it. However, the root of the problem is likely that the result object created here is never being cleared, and I'm guessing that the memory_profiler gem isn't picking up on that because it's coming from C. Clearing the result set should resolve the problem.

That said, looking into this also revealed several ridiculously low hanging fruit with regards to the performance of this code (since this will basically run once at boot time, it wasn't a focus in the past). The first one is just freezing a bunch of string literals that we were creating in a loop.

The biggest easy thing we can do is limit the size of the resulting query. We're grabbing the entire pg_type table twice, even though in both cases we have a very clearly defined set of conditions for whether we will do anything with the result. In your attached script, this is a difference between 6339 rows and 1618 rows.

Ultimately we're still allocating a bunch of strings in that result object that we don't really need (typdelim, which we only need for array types, and typtype which is only ever used for comparison). We could eliminate those short lived strings by breaking the query into 4 separate queries, or doing some funky conditionals in the query itself. Given that the changes I've already done here reduced allocations in your script by 90%, I'm comfortable with where we're at without doing that, so I'm going to leave them alone unless it proves to continue to be a problem.

Thanks for the report, the fix will be included in 5.0 and 4.2.2 (I'm confirming whether or not the problem is present in 4.1 but the fix will be a bit more difficult than just cherry picking these changes)

@sgrif sgrif closed this as completed in 445c12f Mar 29, 2015
sgrif added a commit that referenced this issue Mar 29, 2015
We were never clearing the `PG::Result` object used to query the types
when the connection is first established. This would lead to a
potentially large amount of memory being retained for the life of the
connection.

Investigating this issue also revealed several low hanging fruit on the
performance of these methods, and the number of allocations has been
reduced by ~90%.

Fixes #19578
@bradrobertson
Copy link
Contributor Author

Wow, thanks for the quick fix! I never would have come up with those on my own.

I'll play with this patch on production and see how we do. Thanks again.

@sgrif
Copy link
Contributor

sgrif commented Mar 30, 2015

Worth noting that the impact will be greatest if you are on Ruby 2.2 with this patch.

ethervoid added a commit to CartoDB/cartodb that referenced this issue Jan 31, 2019
ethervoid added a commit to CartoDB/cartodb that referenced this issue Feb 1, 2019
ethervoid added a commit to CartoDB/cartodb that referenced this issue Feb 6, 2019
* Patch for load times with pg_type loading in AR

- See #14615
- See rails/rails#19578
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants