Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage on subqueries #7980

Open
zecke opened this issue Sep 28, 2020 · 1 comment
Open

Excessive memory usage on subqueries #7980

zecke opened this issue Sep 28, 2020 · 1 comment

Comments

@zecke
Copy link
Contributor

zecke commented Sep 28, 2020

In the PR #6372 and #7930 we have seen problems resulting from massive memory (over) allocations and the handling of large objects in Go.

There are multiple issues at play and multiple changes are required to address them. Let's start with the original problem description:

I ran into a problem with a rule like count_over_time( some_series[Many d:Small s]) that would end up with the kernel OOM killer sacrificing prometheus (besides overcommit being turned off). Taking a pprof/heap profile quickly pointed me to the Point slices and looking at the implementation shows me numSamples as a possible contributor. It's a WIP PR as I am not that familiar with Go (and Prometheus)

Most interestingly the --query.max-samples only partially helps. It does fail the query but still does bad things to the heap (it seems unable to return memory to the OS, even with GODEBUG=madvdontneed=1). The closest thing I have to a working theory is:

Pooled []Point with a large capacity ends up being used by smaller queries. This would keep the slices alive with a lot of memory unused. It seems a best practice for sync.Pool is to have objects of roughly the same memory cost in it. I peeked into Go's fmt implementation and have a patch that discards large slices (larger than the ones used in the benchmark).

Interestingly the above does not apply when --query.max-samples kicks in as many slices are not returned/inserted to the pool but I can still see Prometheus taking up most of the physical RAM and never return it to the OS (and then run at risk to get OOM killed). My hypothesis is that Go's memory allocator will try to directly allocate large slices.. and we fragment the heap because we have many of them.
On a conceptual level I think the capacity requested by getPointSlice calls in a query should not exceed maxSamples but that might be rather difficult to implement. Another observation is that numSteps is the upper bound and in some cases we might get a lower bound (Min(numSteps, len(result)) or even get it precisely?

What's out there and needed.

  • The sync.Pool is used wrongly in getPointSlice (slices are of very different capacity)
  • The getPointSlice/putPointSlice and relating benchmark do not cover large subqueries
zecke added a commit to zecke/prometheus that referenced this issue Sep 30, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum Samples the query
might return. If more samples need to be added the query will fail.
zecke added a commit to zecke/prometheus that referenced this issue Sep 30, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum possible value
either by taking maxSamples-currentSamples or by looking at how many
samples were returned.
zecke added a commit to zecke/prometheus that referenced this issue Oct 1, 2020
Add an extremely large subquery to the benchmark. Lower the max
samples to give the query a chance to complete. No other query
gets close to the new limit.
zecke added a commit to zecke/prometheus that referenced this issue Oct 1, 2020
Add an extremely large subquery to the benchmark. Lower the max samples
by an order of magnitude to give the query a chance to complete. No
other query gets close to the new limit.
zecke added a commit to zecke/prometheus that referenced this issue Oct 1, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum possible value
either by taking maxSamples-currentSamples or by looking at how many
samples were returned.
zecke added a commit to zecke/prometheus that referenced this issue Oct 1, 2020
Add an extremely large subquery to the benchmark. Lower the max samples
by an order of magnitude to give the query a chance to complete. No
other query gets close to the new limit.
zecke added a commit to zecke/prometheus that referenced this issue Oct 1, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum possible value
either by taking maxSamples-currentSamples or by looking at how many
samples were returned.
zecke added a commit to zecke/prometheus that referenced this issue Oct 1, 2020
Add an extremely large subquery to the benchmark. Lower the max samples
by an order of magnitude to give the query a chance to complete. No
other query gets close to the new limit.
zecke added a commit to zecke/prometheus that referenced this issue Oct 3, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum possible value
by taking maxSamples-currentSamples.
zecke added a commit to zecke/prometheus that referenced this issue Oct 3, 2020
Add an extremely large subquery to the benchmark. Lower the max samples
by an order of magnitude to give the query a chance to complete. No
other query gets close to the new limit.
zecke added a commit to zecke/prometheus that referenced this issue Oct 9, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum possible value
by taking maxSamples-currentSamples.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
zecke added a commit to zecke/prometheus that referenced this issue Oct 9, 2020
Add an extremely large subquery to the benchmark. Lower the max samples
by an order of magnitude to give the query a chance to complete. No
other query gets close to the new limit.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
zecke added a commit to zecke/prometheus that referenced this issue Oct 14, 2020
Executing a sub query like count_over_time(something[99999d:1s]) will
eventually create many and very large point slices. As todays Go's GC
can't move objects bad things will happen. Worse due returning slices
with different capacities, the large ones might stay alive.

Mitigate this by limiting the numSteps to the maximum possible value
by taking maxSamples-currentSamples.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
zecke added a commit to zecke/prometheus that referenced this issue Oct 14, 2020
Add an extremely large subquery to the benchmark. Lower the max samples
by an order of magnitude to give the query a chance to complete. No
other query gets close to the new limit.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
@beorn7
Copy link
Member

beorn7 commented May 21, 2024

Hello from the bug scrub.

Almost 4y have passed since this was filed, and all the related PRs got abandoned. However, we believe that the insight in this issue is still valuable and would like to keep it open as a reminder to work on those points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants