fix: clear profile buffer at flush #3179

kolesnikovae · 2024-04-08T07:25:04Z

When writing a chunk of profiles to disk, we allocate a buffer to hold the affected profiles, which we reuse throughout the lifetime of a mutable block in memory. Each entry in this buffer refers to a profile, including its metadata and samples.

The problem is that we do not clear this buffer after use, preventing the written profiles from being garbage collected, and thus consuming memory unnecessarily.

To compound the problem, creating a new parquet row group may require a considerable amount of memory, as evidenced by the spike in the graph. This makes the ingesters susceptible to OOM kills, even in relatively small deployments.

Notable changes proposed in the PR:

Parquet writer is not retained in-between chunk flushes.
Iterators are not buffers when row groups are read from the disk at block flush.
Flush buffers are cleaned out explicitly.

No noticeable increase in CPU consumption was identified. From the OS view, the working set size of the process will likely not change substantially, because of the FS cache.

The impact of the change is expected to be larger in large-scale deployments, where the amount of erroneously retained object and buffers is larger.

kolesnikovae · 2024-04-09T06:40:35Z

pkg/parquet/row_reader.go

@@ -45,7 +41,7 @@ func NewMergeRowReader(readers []parquet.RowReader, maxValue parquet.Row, less f
 	}
 	its := make([]iter.Iterator[parquet.Row], len(readers))
 	for i := range readers {
-		its[i] = NewBufferedRowReaderIterator(readers[i], defaultRowBufferSize)
+		its[i] = NewBufferedRowReaderIterator(readers[i], 1)


Experiments show that buffering here is not helpful; we reference many column readers but could release them earlier, allowing the GC to free unused objects. Rows in our case are too large (approximately 10KB) for batching.

We may want to refactor this piece: a buffer of size 1 does not significantly harm performance, but it also does not provide any benefits.

kolesnikovae · 2024-04-10T04:09:12Z

I noticed that the change causes notable increase of the working set size (the bottom boundary), which is not expected; debugging

Update: I didn't find an explanation better than google/cadvisor#3286. I think the best we can do is to test it under memory pressure.

simonswine

LGTM

pkg/phlaredb/profiles.go

kolesnikovae · 2024-04-22T01:53:21Z

Closing it for now: we do have some issues with how buffers are handled / retained, however, this fix does not solve them completely, and the benefit is barely measurable at scale. Instead, it makes sense to check how the row groups are merged at flush (we could probably avoid this altogether)

kolesnikovae added 5 commits April 8, 2024 14:33

fix: clear profile buffer at flush

bd6fdd7

fix: clear profile buffer at flush

d83f827

fix: release parquet writer

0a3b315

fix: disable buffering

94fd9b2

fix: disable buffering

8bbd3e6

kolesnikovae marked this pull request as ready for review April 9, 2024 06:27

kolesnikovae requested a review from a team as a code owner April 9, 2024 06:27

kolesnikovae commented Apr 9, 2024

View reviewed changes

fix: in-memory profiles by reference

25038be

kolesnikovae force-pushed the fix/reclaim-profile-buffer-space branch from 38b9edc to 25038be Compare April 10, 2024 09:22

simonswine approved these changes Apr 12, 2024

View reviewed changes

pkg/phlaredb/profiles.go Outdated Show resolved Hide resolved

remove outdated comment

3bed0d3

kolesnikovae closed this Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clear profile buffer at flush #3179

fix: clear profile buffer at flush #3179

kolesnikovae commented Apr 8, 2024 •

edited

kolesnikovae Apr 9, 2024 •

edited

kolesnikovae commented Apr 10, 2024 •

edited

simonswine left a comment

kolesnikovae commented Apr 22, 2024

fix: clear profile buffer at flush #3179

fix: clear profile buffer at flush #3179

Conversation

kolesnikovae commented Apr 8, 2024 • edited

kolesnikovae Apr 9, 2024 • edited

Choose a reason for hiding this comment

kolesnikovae commented Apr 10, 2024 • edited

simonswine left a comment

Choose a reason for hiding this comment

kolesnikovae commented Apr 22, 2024

kolesnikovae commented Apr 8, 2024 •

edited

kolesnikovae Apr 9, 2024 •

edited

kolesnikovae commented Apr 10, 2024 •

edited