Fix loading huge series into RAM when points are overwritten #6556

jwilder · 2016-05-04T22:44:03Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated

In some query scenarios, if there are a lot of points on disk for the same series spread across many blocks in TSM files and a point is overwritten near the beginning of the shard's time range, the full series could be loaded into RAM triggering OOMs and huge allocations. I believe this same problem exists in the compactor, but this PR only fixes the query path. A separate fix will be needed for compactions.

The issue was that the KeyCursor code that handles overwriting points had a simple implementation that just deduped the whole series in this case. This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with updated points. It then keeps track of what section of the blocks have been read so they are not re-read when the later points are decoded.

Since the points in a block are always sorted, the code was also changed to remove the Deduplicate calls since they end up reallocating the slice (as well as creating an equally sized map). Instead, we do a sorted merge and re-use the slice as much as we can.

To reproduce this issue, I wrote 50M points to a single series and overwrote the first point a few times until the TSM file blocks were in the state that this issue surfaced. Then I ran a select count(value) from cpu query to could every point in the series.

Some before and after system stats:

Before

$time influx -database stress -execute "select count(value) from cpu"
name: cpu
---------
time    count
0   50000000


real    0m39.994s
user    0m0.004s
sys 0m0.006s

$ ps -o rss,vsz,pid $(pgrep influxd)
   RSS      VSZ   PID
5960276 573563196 20644

After

$ time influx -database stress -execute "select count(value) from cpu"
name: cpu
---------
time    count
0   50000000

real    0m11.352s
user    0m0.004s
sys 0m0.006s

$ ps -o rss,vsz,pid $(pgrep influxd)
   RSS      VSZ   PID
137428 573564744 20907

Query time reduced from 40s to 11.3s and RSS from 6GB to 137MB.

mention-bot · 2016-05-04T22:44:05Z

By analyzing the blame information on this pull request, we identified @mark-rushakoff and @joelegasse to be potential reviewers

jwilder · 2016-05-04T22:44:27Z

@benbjohnson

benbjohnson · 2016-05-04T23:27:00Z

tsdb/engine/tsm1/encoding.go

+			a[i] = b[j]
+			i++
+			j++
+		}


Minor nit but we should probably save UnixNano() since we're calling it twice. Maybe atime & btime?

Oh, wait, I was thinking those were time.Time values. Ok, it probably doesn't make a big difference if they're just grabbing int64 values underneath.

Might be a small savings to save the slice indexing call.

minor nit: you could save a few lines with:

if a[i].UnixNano() > b[j].UnixNano() { a[i], b[j] = b[j], a[i] } else if a[i].UnixNano() == b[j].UnixNano(){ a[i] = b[j] j++ } i++

@e-dard That's nicer. I'll update it to use:

var i, j int for ; i < len(a) && j < len(b); i++ { av, bv := a[i].UnixNano(), b[j].UnixNano() if av > bv { a[i], b[j] = b[j], a[i] } else if av == bv { a[i] = b[j] j++ } }

benbjohnson · 2016-05-04T23:33:16Z

Overall it lgtm. Just a few minor comments.

e-dard · 2016-05-05T09:55:26Z

LGTM 👍

In some query scenarios, if there are a lot of points on disk spread across many blocks in TSM files and a point is overwritten near the begginning of the shard's timerange, the full series could be loaded into RAM triggering OOMs and huge allocations. The issue was that the KeyCursor code that handles overwriting points had a simple implementation that just deduped the whole series in this case. This falls over when the series is quite large. Instead, the KeyCursor has been changed to only decode blocks with updated points. It then keeps track of what section of the blocks have been read so they are not re-read when the later points are decoded. Since the points in a block are always sorted, the code was also changed to remove the Deduplicate calls since they end up reallocating the slice. Instead, we do a sorted merge and re-use the slice as much as we can.

If a large series contains a point that is overwritten, the compactor would load the whole series into RAM during a full compaction. If the series was large, it could cause very large RAM spikes and OOMs. The change reworks the compactor to merge blocks more incrementally similar to the fix done in #6556.

If a large series contains a point that is overwritten, the compactor would load the whole series into RAM during a full compaction. If the series was large, it could cause very large RAM spikes and OOMs. The change reworks the compactor to merge blocks more incrementally similar to the fix done in #6556. Fixes #6557

jwilder added this to the 1.0.0 milestone May 4, 2016

jwilder mentioned this pull request May 4, 2016

Overwriting points on large series can cause memory spikes during compactions #6557

Closed

benbjohnson reviewed May 4, 2016
View reviewed changes

jwilder force-pushed the jw-tsm-values branch from 2754214 to a0ac754 Compare May 5, 2016 15:34

jwilder merged commit fbf1e4a into master May 5, 2016

jwilder deleted the jw-tsm-values branch May 5, 2016 16:09

jwilder added the area/performance label May 5, 2016

jwilder mentioned this pull request May 6, 2016

Fix memory spike when compacting overwritten points #6567

Merged

3 tasks

jwilder mentioned this pull request May 17, 2016

Compaction fixes #6653

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loading huge series into RAM when points are overwritten #6556

Fix loading huge series into RAM when points are overwritten #6556

jwilder commented May 4, 2016 •

edited

Loading

mention-bot commented May 4, 2016

jwilder commented May 4, 2016

benbjohnson May 4, 2016

benbjohnson May 4, 2016

jwilder May 5, 2016

e-dard May 5, 2016

jwilder May 5, 2016

benbjohnson commented May 4, 2016

e-dard commented May 5, 2016

Fix loading huge series into RAM when points are overwritten #6556

Fix loading huge series into RAM when points are overwritten #6556

Conversation

jwilder commented May 4, 2016 • edited Loading

Required for all non-trivial PRs

Before

After

mention-bot commented May 4, 2016

jwilder commented May 4, 2016

benbjohnson May 4, 2016

Choose a reason for hiding this comment

benbjohnson May 4, 2016

Choose a reason for hiding this comment

jwilder May 5, 2016

Choose a reason for hiding this comment

e-dard May 5, 2016

Choose a reason for hiding this comment

jwilder May 5, 2016

Choose a reason for hiding this comment

benbjohnson commented May 4, 2016

e-dard commented May 5, 2016

jwilder commented May 4, 2016 •

edited

Loading