Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topk and bottomk queries with large k cause OOM panic #3973

Closed
roganartu opened this Issue Mar 15, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@roganartu
Copy link
Contributor

roganartu commented Mar 15, 2018

Query

topk(9999999999, prometheus_build_info)

Expected

Element Value
prometheus_build_info{goversion="go1.10", ..., revision="f63e7db4cbdb616337ca877b306b9b96f7f4e381",version="2.2.0"} 1

Actual

Prometheus spins 100% on a core for a short period then OOM panics. This happens with lower values of k for queries with higher label cardinality.

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x1af4202, 0x16)
        /usr/local/go/src/runtime/panic.go:619 +0x81
runtime.sysMap(0xc422740000, 0x55e63c0000, 0x0, 0x28a6138)
        /usr/local/go/src/runtime/mem_linux.go:216 +0x20a
runtime.(*mheap).sysAlloc(0x288c9c0, 0x55e63c0000, 0x0)
        /usr/local/go/src/runtime/malloc.go:470 +0xd4
runtime.(*mheap).grow(0x288c9c0, 0x2af31dd, 0x0)
        /usr/local/go/src/runtime/mheap.go:907 +0x60
runtime.(*mheap).allocSpanLocked(0x288c9c0, 0x2af31dd, 0x28a6148, 0xc4204b9ee0)
        /usr/local/go/src/runtime/mheap.go:820 +0x301
runtime.(*mheap).alloc_m(0x288c9c0, 0x2af31dd, 0xffffffffffff0100, 0xc4204b9f10)
        /usr/local/go/src/runtime/mheap.go:686 +0x118
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:753 +0x4d
runtime.(*mheap).alloc(0x288c9c0, 0x2af31dd, 0xc420010100, 0x41467c)
        /usr/local/go/src/runtime/mheap.go:752 +0x8a
runtime.largeAlloc(0x55e63b88a0, 0x450001, 0x7fb9c24e7000)
        /usr/local/go/src/runtime/malloc.go:826 +0x94
runtime.mallocgc.func1()
        /usr/local/go/src/runtime/malloc.go:721 +0x46
runtime.systemstack(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:409 +0x79
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1170

Note: Increasing k by appending further digits results in short-circuiting with no OOM, so it seems there's at least some protection from this already, though the error message suggests this comes from the Go runtime not Prometheus itself:

level=error ts=2018-03-15T18:11:47.103256231Z caller=engine.go:613 component="query engine" msg="runtime panic in parser" err="runtime error: makeslice: cap out of range" 

Use Case

Grafana graph with a limit variable that is passed as k. To have an All option that at least looks like it includes all, you need to provide an arbitrarily high k. I'm sure there are others though, and in any case I suspect most users would not expect increasing k to continue increasing memory consumption past the number of available samples.

Cause

Calling make(..., k) for each set of unique labels:

prometheus/promql/engine.go

Lines 1327 to 1339 in e87c6c8

if op == itemTopK || op == itemQuantile {
result[groupingKey].heap = make(vectorByValueHeap, 0, k)
heap.Push(&result[groupingKey].heap, &Sample{
Point: Point{V: s.V},
Metric: s.Metric,
})
} else if op == itemBottomK {
result[groupingKey].reverseHeap = make(vectorByReverseValueHeap, 0, k)
heap.Push(&result[groupingKey].reverseHeap, &Sample{
Point: Point{V: s.V},
Metric: s.Metric,
})
}

Proposed Fix

For k over some threshold exponentially grow the Vector up to a maximum k instead of statically allocating all of k upfront.

@roganartu roganartu changed the title `topk` and `bottomk` queries with large `k` causes OOM panic topk and bottomk queries with large k causes OOM panic Mar 15, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 15, 2018

A simpler approach would be to use the smaller of k and the size of the input vector.

@roganartu roganartu changed the title topk and bottomk queries with large k causes OOM panic topk and bottomk queries with large k cause OOM panic Mar 15, 2018

davbo added a commit to davbo/prometheus that referenced this issue Apr 15, 2018

Fix OOM when a large K is used in topk queries
This attempts to close prometheus#3973.

Handles cases where the length of the input vector to an aggregate topk
/ bottomk function is less than the K paramater. The change updates
Prometheus to allocate a result vector the same length as the input
vector in these cases.

Previously Prometheus would out-of-memory panic for large K values. This
change makes that unlikely unless the size of the input vector is
equally large.

Signed-off-by: David King <dave@davbo.org>

brian-brazil added a commit that referenced this issue Apr 16, 2018

Fix OOM when a large K is used in topk queries (#4087)
This attempts to close #3973.

Handles cases where the length of the input vector to an aggregate topk
/ bottomk function is less than the K paramater. The change updates
Prometheus to allocate a result vector the same length as the input
vector in these cases.

Previously Prometheus would out-of-memory panic for large K values. This
change makes that unlikely unless the size of the input vector is
equally large.

Signed-off-by: David King <dave@davbo.org>

gouthamve added a commit to gouthamve/prometheus that referenced this issue Aug 1, 2018

Fix OOM when a large K is used in topk queries (prometheus#4087)
This attempts to close prometheus#3973.

Handles cases where the length of the input vector to an aggregate topk
/ bottomk function is less than the K paramater. The change updates
Prometheus to allocate a result vector the same length as the input
vector in these cases.

Previously Prometheus would out-of-memory panic for large K values. This
change makes that unlikely unless the size of the input vector is
equally large.

Signed-off-by: David King <dave@davbo.org>
@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.