-
Notifications
You must be signed in to change notification settings - Fork 165
Slow increase in management db memory use when rates_mode = basic #214
Comments
@spatula75 ETS tables are not cleaned up by running runtime GC. Can you enable |
@spatula75 thank you. On the chart above we see memory growth over the course of 24 or so hours. Default retention policies collect some samples for up to 24 hours. Does the growth level out after that? Can you try tweaking retention policies to be e.g. 4 hours to compare? Thank you. |
Our longest retention period is set to just over an hour, and unfortunately the increase we're seeing never really changes, even after 24 hours. |
@spatula75 OK, good to know. I guess we'll add ETS table info to the UI or rabbitmq-top to make investigating such issues easier. What kind of workload do you run against this node? Is connection, channel, queue, or binding churn relatively high? We are looking for a way to reproduce. |
The workload is mostly RPC-like, so once a minute a job will kick off, create channels, create consumers, publish a bunch of messages, close the channels used for publishing, and the consumers will await responses. On this cluster in particular, there are many messages which will time out, though we see similar growth on a cluster with fewer timeouts. Probably the most notable thing is a somewhat high amount of channel churn. I tried to make the problem worse by writing a script that repeatedly created a huge number of channels and then closed them, but this did not seem to affect the growth rate. |
@spatula75 OK, thanks, that's also helpful. We'll start by introducing #215 and then using it to see if we can trigger unusual table size growth. |
Cool. If/when you have a build containing #215, I'd be happy to run it on this cluster and point our normal workload at it to see if we can see which table(s) are causing trouble. |
OK, looks like #217 largely addresses it, at least per @gmr's comment his test tool no longer can reproduce the issue even after running for 10 times as long. We will produce a milestone build for @spatula75 to try. Closing this but feel free to reopen if #217 wasn't sufficient for the workload you have (and please provide a way to reproduce!) |
Sounds promising. We'll evaluate it just as soon as the milestone build is available. |
@spatula75 I included a pre-built drop-in for the plugin on ticket #217 if you wanted to test with that. |
@gmr thanks, giving it a try now. |
@gmr: is it expected that this build of the plugin will report many node-specific statistics (eg, memory, IO, descriptors, space) as 0's when it's loaded in 3.6.2? |
@spatula75 most node-wide stats are reported by |
Looks like it happens if I select a date range for the graphs that's longer than the available history. If I choose something like "last minute" then it's fine. Probably unrelated and not a big deal. |
So far this looks encouraging. The management DB has been running for an hour in our environment with no bloat. |
3.6.3 Milestone 1 includes this. |
With RMQ 3.6.2, on a system that is fairly close to idle most of the time, we nonetheless see slow, but consistent memory creep from the management DB, but only when rates_mode is basic. Setting it to 'none' seems to eliminate the problem. The slope of the curve is highly consistent (about 15MB/hr), even if we turn on scripts to try to create lots of channels, messages, etc:
Most of this space is being used by the management database table:
Management DB configuration for this host (we've adjusted to shorter retention periods):
On a different node, which is identically configured except for setting rates_mode=none (and which is actually busier), we don't see the memory creep:
Manually forcing an Erlang GC (
rabbitmqctl eval '[garbage_collect(P) || P <- processes()].'
) has no effect on the memory consumption.Our workaround is to set rates_mode=none for now.
The text was updated successfully, but these errors were encountered: