Excessive memory usage on subqueries #7980

zecke · 2020-09-28T02:00:52Z

In the PR #6372 and #7930 we have seen problems resulting from massive memory (over) allocations and the handling of large objects in Go.

There are multiple issues at play and multiple changes are required to address them. Let's start with the original problem description:

I ran into a problem with a rule like count_over_time( some_series[Many d:Small s]) that would end up with the kernel OOM killer sacrificing prometheus (besides overcommit being turned off). Taking a pprof/heap profile quickly pointed me to the Point slices and looking at the implementation shows me numSamples as a possible contributor. It's a WIP PR as I am not that familiar with Go (and Prometheus)

Most interestingly the --query.max-samples only partially helps. It does fail the query but still does bad things to the heap (it seems unable to return memory to the OS, even with GODEBUG=madvdontneed=1). The closest thing I have to a working theory is:

Pooled []Point with a large capacity ends up being used by smaller queries. This would keep the slices alive with a lot of memory unused. It seems a best practice for sync.Pool is to have objects of roughly the same memory cost in it. I peeked into Go's fmt implementation and have a patch that discards large slices (larger than the ones used in the benchmark).

Interestingly the above does not apply when --query.max-samples kicks in as many slices are not returned/inserted to the pool but I can still see Prometheus taking up most of the physical RAM and never return it to the OS (and then run at risk to get OOM killed). My hypothesis is that Go's memory allocator will try to directly allocate large slices.. and we fragment the heap because we have many of them.
On a conceptual level I think the capacity requested by getPointSlice calls in a query should not exceed maxSamples but that might be rather difficult to implement. Another observation is that numSteps is the upper bound and in some cases we might get a lower bound (Min(numSteps, len(result)) or even get it precisely?

What's out there and needed.

The sync.Pool is used wrongly in getPointSlice (slices are of very different capacity)
The getPointSlice/putPointSlice and relating benchmark do not cover large subqueries

The text was updated successfully, but these errors were encountered:

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum Samples the query might return. If more samples need to be added the query will fail.

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum possible value either by taking maxSamples-currentSamples or by looking at how many samples were returned.

Add an extremely large subquery to the benchmark. Lower the max samples to give the query a chance to complete. No other query gets close to the new limit.

Add an extremely large subquery to the benchmark. Lower the max samples by an order of magnitude to give the query a chance to complete. No other query gets close to the new limit.

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum possible value either by taking maxSamples-currentSamples or by looking at how many samples were returned.

Add an extremely large subquery to the benchmark. Lower the max samples by an order of magnitude to give the query a chance to complete. No other query gets close to the new limit.

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum possible value either by taking maxSamples-currentSamples or by looking at how many samples were returned.

Add an extremely large subquery to the benchmark. Lower the max samples by an order of magnitude to give the query a chance to complete. No other query gets close to the new limit.

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum possible value by taking maxSamples-currentSamples.

Add an extremely large subquery to the benchmark. Lower the max samples by an order of magnitude to give the query a chance to complete. No other query gets close to the new limit.

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum possible value by taking maxSamples-currentSamples. Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>

Add an extremely large subquery to the benchmark. Lower the max samples by an order of magnitude to give the query a chance to complete. No other query gets close to the new limit. Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>

Executing a sub query like count_over_time(something[99999d:1s]) will eventually create many and very large point slices. As todays Go's GC can't move objects bad things will happen. Worse due returning slices with different capacities, the large ones might stay alive. Mitigate this by limiting the numSteps to the maximum possible value by taking maxSamples-currentSamples. Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>

Add an extremely large subquery to the benchmark. Lower the max samples by an order of magnitude to give the query a chance to complete. No other query gets close to the new limit. Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>

beorn7 · 2024-05-21T11:12:49Z

Hello from the bug scrub.

Almost 4y have passed since this was filed, and all the related PRs got abandoned. However, we believe that the insight in this issue is still valuable and would like to keep it open as a reminder to work on those points.

beorn7 added component/promql kind/enhancement priority/P3 labels Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory usage on subqueries #7980

Excessive memory usage on subqueries #7980

zecke commented Sep 28, 2020

beorn7 commented May 21, 2024

Excessive memory usage on subqueries #7980

Excessive memory usage on subqueries #7980

Comments

zecke commented Sep 28, 2020

beorn7 commented May 21, 2024