Reduce chunk write queue memory usage #131

replay · 2022-02-08T16:01:29Z

This avoids wasting memory on the c.chunkRefMap by re-initializing it regularly. When re-initializing it, it gets initialized with a size which is half of the peak usage of the time period since the last re-init event, for this we also track the peak usage and reset it on every re-init event.

Very frequent re-initialization of the map would cause unnecessary allocations, to avoid that there are two factors which limit the frequency of the re-initializations:

There is a minimum interval of 10min between re-init events
In order to re-init the map the recorded peak usage since the last re-init event must be at least 10 (objects in c.chunkRefMap).

When re-initializing it we initialize it to half of the peak usage since the last re-init event to try to hit the sweet spot in the trade-off between initializing it to a very low size potentially resulting in many allocations to grow it, and initializing it to a large size potentially resulting in unused allocated memory.

With this solution we have the following advantages:

If a tenant's number of active series decreases over time then we want that their queue size also shrinks over time. By always resetting it to half of the previous peak it will shrink together with the usage over time
We don't want to initialize it to a size of 0 because this would cause a lot of allocations to grow it back to the size which it actually needs. By initializing it to half of the previous peak it will rarely have to be grown to more than double of the initialized size.
We don't want to initialize it too frequently because that would also cause avoidable allocations, so there is a minimum interval of 10min between re-init events

replay · 2022-02-09T22:56:48Z

I don't know how to make this CI pass... It seems to be complaining about stuff which I haven't changed. I don't want to fix everything it complains about as part of this PR because doing so would blow up the scope unnecessarily.

replay · 2022-02-09T23:00:56Z

@codesome @pracucci @bboreham if you get a chance I'd appreciate some feedback on this proposal.
On one hand I don't like adding 2 relatively arbitrary constants, on the other hand I think this is way too deep in the internals to expose it via config flags.
This is to solve the issue which almost blocked the rollout of r171 due to an increase in memory usage, the context is in this Slack thread

bboreham

The functionality seems fine; I had some comments about naming / definition.

bboreham · 2022-02-15T12:21:28Z

tsdb/chunks/chunk_write_queue.go

@@ -121,6 +168,8 @@ func (c *chunkWriteQueue) addJob(job chunkWriteJob) (err error) {
 		}
 	}()

+	// c.isRunningMtx serializes the adding of jobs to the c.chunkRefMap, if c.jobs is full then c.addJob() will block


For me this statement needs to go on the definition of isRunningMtx.
Ideally things should be named after what they do, so perhaps change the name.

Moved the comment. Kept the name isRunningMtx because it's also protecting isRunning field.

bboreham · 2022-02-15T12:23:00Z

tsdb/chunks/chunk_write_queue.go

+// freeChunkRefMap checks whether the conditions to free the chunkRefMap are met,
+// if so it re-initializes it and frees the memory which it currently uses.
+// The chunkRefMapMtx must be held when calling this method.
+func (c *chunkWriteQueue) freeChunkRefMap() {


To me "free" is an implementation detail; what you are trying to do is "shrink" the data structure.
(Possibly needs a note that current Go runtime never releases the internal memory used for a map, which is why you are discarding the old one and making a new one)

Renamed "free" to "shrink" everywhere, and added comment about Go runtime.

bboreham · 2022-02-15T12:34:14Z

2 relatively arbitrary constants

I'm fine with them being constants; suggest putting a note in the diary to review the stats after 1 week in dev and after a few weeks in prod.

codesome

The method looks good

tsdb/chunks/chunk_write_queue.go

codesome · 2022-02-17T12:02:40Z

I just had another thought: In the upstream PR, we thought that because addJob would block if channel is full, the map would not grow beyond. That seems to be right, but there was a subtle bug there. Is this below patch enough and we don't require complex recycling since we always need to be ready for the peak usage:

diff --git a/tsdb/chunks/chunk_write_queue.go b/tsdb/chunks/chunk_write_queue.go
index 5cdd2e81f..6502c570e 100644
--- a/tsdb/chunks/chunk_write_queue.go
+++ b/tsdb/chunks/chunk_write_queue.go
@@ -128,12 +128,12 @@ func (c *chunkWriteQueue) addJob(job chunkWriteJob) (err error) {
                return errors.New("queue is not started")
        }
 
+       c.jobs <- job
+
        c.chunkRefMapMtx.Lock()
        c.chunkRefMap[job.ref] = job.chk
        c.chunkRefMapMtx.Unlock()
 
-       c.jobs <- job
-
        return nil
 }

(basically moves the blocking channel add before we update the map, otherwise we just update the map and then wait for the channel now)

replay · 2022-02-17T13:00:18Z

I don't think we should apply that patch, for two reasons:

I don't think it solves a bug, because at the time when an object is pushed into c.jobs the lock c.isRunningMtx is held, that's what I was trying to point out with that comment on line :171 in this PR:

// c.isRunningMtx serializes the adding of jobs to the c.chunkRefMap, if c.jobs is full then c.addJob() will block
// while holding c.isRunningMtx, this guarantees that c.chunkRefMap won't ever grow beyond the queue size + 1.

So if another thread calls .addJob() it will already block on trying to acquire c.isRunningMtx.

If we push the job into c.jobs before adding it into c.chunkRefMap[job.ref] then there's a chance that the consumer takes it from the chan and tries to delete it from the map before it's there, this would be a bug.

codesome · 2022-02-17T13:05:44Z

Oh right, missed the other lock, makes sense.

pstibrany · 2022-06-01T10:09:07Z

I've addressed review feedback, and would like to get this merged. Can you please re-review the PR?

I have a doubt about using 10 minutes timeout for shrinking the map. It seems too big to me. If map is empty, we can shrink it faster (eg. after 1 min) and pay the allocation price instead after this time.

My other concern is that PR only focuses on map size, but doesn't address channel size. Allocating channel with buffer of size 1000000 will allocate all memory upfront and never release it. Since chunkWriteJob uses 64-bytes, this will use 64 MB of heap for each open TSDB.

But let's see how this works in practice and decide if we need to address timeout and/or channel size based on our experience.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

…ed comments and threshold to 1000.

bboreham

Seems plausible.
Given the title of the PR, I would expect some data on results, i.e. memory usage measured before and after.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

replay

Thanks for picking this up @pstibrany !
Your changes look good to me.
(I can't approve my own PR, otherwise I would)

pstibrany · 2022-06-02T10:15:08Z

Given the title of the PR, I would expect some data on results, i.e. memory usage measured before and after.

Testing on Mimir ingesters with 7 tenants shows drop of memory allocated by chunks.newChunkWriteQueue by half (from from 814 MB to 427 MB when using queue size of 1000000). shrinkChunkRefMap didn't show up in the profile as place allocating objects "in-use". This is consistent with expectation: memory for map isn't kept allocated unless it's needed, but memory for the channel itself is still in-use (7 tenants, 1M channel buffer, 64 bytes entry = 427 MiB).

pstibrany · 2022-06-03T14:10:05Z

#247 addresses memory usage of channel.

Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.

* Update Prometheus with async chunk mapper changes. Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These result is lower memory usage by chunk mapper. Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

pstibrany · 2022-06-16T08:31:38Z

Sent to upstream Prometheus as prometheus/prometheus#10873.

replay force-pushed the reduce_chunk_write_queue_memory_usage branch from 6448d4b to 20f3746 Compare February 8, 2022 17:39

replay changed the title ~~[WIP] Reduce chunk write queue memory usage~~ Reduce chunk write queue memory usage Feb 8, 2022

replay marked this pull request as ready for review February 8, 2022 20:08

replay force-pushed the reduce_chunk_write_queue_memory_usage branch 2 times, most recently from 8b4bfa2 to 7cb970a Compare February 9, 2022 22:33

bboreham reviewed Feb 15, 2022

View reviewed changes

codesome reviewed Feb 17, 2022

View reviewed changes

tsdb/chunks/chunk_write_queue.go Outdated Show resolved Hide resolved

replay and others added 8 commits June 1, 2022 14:53

dont waste space on the chunkRefMap

abc8b2e

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

add time factor

e1089e4

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

add comments

98ec7a2

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

better readability

e2184ab

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

add instrumentation and more comments

fde461e

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

formatting

eb1c577

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

uppercase comments

3dbab4e

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Address review feedback. Renamed "free" to "shrink" everywhere, updat…

56e3a4a

…ed comments and threshold to 1000.

pstibrany force-pushed the reduce_chunk_write_queue_memory_usage branch from d7a75bc to 56e3a4a Compare June 1, 2022 12:53

bboreham approved these changes Jun 1, 2022

View reviewed changes

double space

f1d5506

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

replay commented Jun 1, 2022

View reviewed changes

pstibrany merged commit 459f599 into main Jun 2, 2022

pstibrany added a commit to grafana/mimir that referenced this pull request Jun 7, 2022

Update Prometheus with async chunk mapper changes.

6ace5c0

Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.

pstibrany mentioned this pull request Jun 7, 2022

Update Prometheus with async chunk mapper changes. grafana/mimir#2043

Merged

1 task

pstibrany added a commit to grafana/mimir that referenced this pull request Jun 8, 2022

Update Prometheus with async chunk mapper changes.

8fa805b

Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.

pstibrany added a commit to grafana/mimir that referenced this pull request Jun 8, 2022

Update Prometheus with async chunk mapper changes.

951ae7c

Included changes: grafana/mimir-prometheus#131 grafana/mimir-prometheus#247 These should result is lower memory usage by chunk mapper.

pstibrany mentioned this pull request Jun 16, 2022

Reduce chunk write queue memory usage 1 prometheus/prometheus#10873

Merged

pstibrany deleted the reduce_chunk_write_queue_memory_usage branch November 28, 2022 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce chunk write queue memory usage #131

Reduce chunk write queue memory usage #131

replay commented Feb 8, 2022 •

edited

Loading

replay commented Feb 9, 2022 •

edited

Loading

replay commented Feb 9, 2022

bboreham left a comment

bboreham Feb 15, 2022

pstibrany Jun 1, 2022

bboreham Feb 15, 2022

pstibrany Jun 1, 2022

bboreham commented Feb 15, 2022

codesome left a comment

codesome commented Feb 17, 2022

replay commented Feb 17, 2022 •

edited

Loading

codesome commented Feb 17, 2022

pstibrany commented Jun 1, 2022

bboreham left a comment

replay left a comment

pstibrany commented Jun 2, 2022

pstibrany commented Jun 3, 2022

pstibrany commented Jun 16, 2022

Reduce chunk write queue memory usage #131

Reduce chunk write queue memory usage #131

Conversation

replay commented Feb 8, 2022 • edited Loading

replay commented Feb 9, 2022 • edited Loading

replay commented Feb 9, 2022

bboreham left a comment

Choose a reason for hiding this comment

bboreham Feb 15, 2022

Choose a reason for hiding this comment

pstibrany Jun 1, 2022

Choose a reason for hiding this comment

bboreham Feb 15, 2022

Choose a reason for hiding this comment

pstibrany Jun 1, 2022

Choose a reason for hiding this comment

bboreham commented Feb 15, 2022

codesome left a comment

Choose a reason for hiding this comment

codesome commented Feb 17, 2022

replay commented Feb 17, 2022 • edited Loading

codesome commented Feb 17, 2022

pstibrany commented Jun 1, 2022

bboreham left a comment

Choose a reason for hiding this comment

replay left a comment

Choose a reason for hiding this comment

pstibrany commented Jun 2, 2022

pstibrany commented Jun 3, 2022

pstibrany commented Jun 16, 2022

replay commented Feb 8, 2022 •

edited

Loading

replay commented Feb 9, 2022 •

edited

Loading

replay commented Feb 17, 2022 •

edited

Loading