Handle high cardinality deletes in TSM engine #9084

jwilder · 2017-11-08T19:04:45Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated
Sign CLA (if not already signed)

This PR has a number of changes to reduce memory usage in the TSM engine for deletes that could lead to OOMs. This required reworking how deletes are processed altogether.

The initial version of the deletes in the engine took a slice of all the series keys to be deleted and then processed those keys. This worked fine for smaller cardinality deletes, but is problematic for higher cardinalities as well as when the data set grows across many TSM files.

The main problems addressed are:

The creation of the large slice of series keys has been switched to use an iterator approach.
The Engine.DeleteSeriesRange needed to create several intermediate slices and maps that were at least as big as the original series keys passed in. These slices and maps are used to track the field-less series keys (passed in), to the actual series keys stored in TSM (with fields).
Writing tombstones required reading in the existing tombstones on disk and adding new ones to the set and re-writing them out (Avoid re-reading tombstone data when writing tombstones #5503). This makes deleting smaller batches of writes very expensive. This has been fixed via a new tombstone format and interfaces to write and commit them in batches that avoids the previous problems.
Deletes and writes to the same series could leave the index in an inconsistent state because concurrent writes could be adding series to the index while a delete removes it unknowingly. A temporary fix in 1.3 was to re-add all the series keys after compactions completes. This is unworkable with higher cardinalities and is handled better in this PR.
Determining whether a series and measurement needs to be removed from the index was expensive. It previously worked by applying the deletes, and then calling containsSeries which determined if any data exists for the series in the engine. If nothing existed, the entries were removed from the index as well. This was racy and memory intensive and has been reworked to use smaller, re-usable batches.

The graphs below are a before and after deleting 8M series from inmem in a single shard with about 20 large TSM files that are not fully compacted. The first graphs shows a fairly large spike that is correlated w/ the number of series and TSM files to write tombstones. The seconds still uses a bit too much memory (still looking into), but the large spike is gone.

Inmem Deleting 8M series Before/After

TSI Deleting 32M series Before/After

Master ran for 12+hrs and I eventually killed it. This change completed in 17m, but I should be able to improve that further.

stuartcarnie · 2017-11-08T20:23:25Z

tsdb/engine/tsm1/engine.go

-		for len(tempKeys) > 0 && bytes.Compare(tempKeys[0], seriesKey) < 0 {
-			tempKeys = tempKeys[1:]
+		// Strip off the field portion of the key
+		seriesKey, _ := SeriesAndFieldFromCompositeKey([]byte(k))


it appears k already a []byte, therefore []byte(k) is redundant

stuartcarnie · 2017-11-08T21:03:53Z

pkg/bytesutil/bytesutil.go

+	for i < len(a) {
+		// Find the next gap to remove
+		iStart = i
+		for ; i < len(a) && a[i] == val; i += width {


what do you think about moving the increment inside the for statement block?

for i < len(a) && a[i] == val { i += width }

I can do that.

stuartcarnie

LGTM 👍🏻

e-dard

LGTM 👍 just a nit.

e-dard · 2017-11-09T16:55:30Z

tsdb/index/tsi1/index.go

-			return nil
-		} else if e := itr.Next(); e != nil {
+		mm := fs.Measurement(mname)
+		if mm == nil || mm.HasSeries() {


nit: if a measurement is nil, then it can't have any series. Why not just make HasSeries check if the receiver is nil and return false? Then you don't need the mm == nil || mm.HasSeries()

This adds a new v4 tombstone format that extends the v3 format by allowing multiple batches of tombstones to be written without having to re-read all the existing tombstones. This uses gzip multi stream to append multiple v3 files together to create a v4 format.

This removes the in-memory tombstone buffer when writing tombstones which eliminates one source of large memory spikes during deletes.

Allows callers to use []byte and avoid a string allocation

This is a version of DeleteRange that take a func predicate to determine whether a series key should be deleted or not. This avoids the large slice allocations with higher cardinalities.

The query language min and max times are slighly different than the values used in the engine. This allows faster codes to be used when the whole time range is deleted.

If fn returned and error, the goroutines sending keys from TSM files would get blocked indefinitely and leak.

This removes the containsSeries func which ends up creating a map sized to the slice of keys passed in. This doesn't scale well to high cardinalities and creates a lot of garbage.

If you have lots of data stored locally, this test takes a while to complete since it loads it all up from the users home dir.

This filters out keys that do not exist in a TSM file to avoid writing entries that would end up being ignored when applied.

This optimizes how deletes are processed to reduce memory usage and improve efficiency.

This removes more allocations and speeds up some critical sections.

This fixes a race where writes and deletes to the same series and measurements could sometimes leave the index in an inconsistent state.

The DropSeries code path ended up creating a MeasurementSeriesIterator for each dropped series, this was too expensive just to see if a series exists. This adds a HasSeries func and fixes and issue where TSI files were compacted while an iterator was still in use causing a panic.

[ci skip]

jwilder added this to the 1.5.0 milestone Nov 8, 2017

jwilder requested review from stuartcarnie and e-dard November 8, 2017 19:04

ghost assigned jwilder Nov 8, 2017

ghost added the review label Nov 8, 2017

stuartcarnie reviewed Nov 8, 2017

View reviewed changes

stuartcarnie approved these changes Nov 8, 2017

View reviewed changes

jwilder force-pushed the jw-delete-time branch from 14a45aa to a27f6d0 Compare November 9, 2017 16:50

e-dard approved these changes Nov 9, 2017

View reviewed changes

jwilder added 17 commits November 13, 2017 08:48

Allow buffering tombstones before writing to disk

17bae05

Extract writeTombstone func

1e56894

Extract commit func

bd15d37

Store temporary tombstones on disk

44e782f

This removes the in-memory tombstone buffer when writing tombstones which eliminates one source of large memory spikes during deletes.

Add a BatchDelete capability to TSMReader

4ed1934

Use BatchDeleter in FileStore

9ac8360

Add BatchDeleters type

6b19d2b

Extract ParseKeyBytes from ParseKey

de5592c

Allows callers to use []byte and avoid a string allocation

Add DeleteRangeWith

5a775c5

This is a version of DeleteRange that take a func predicate to determine whether a series key should be deleted or not. This avoids the large slice allocations with higher cardinalities.

Make BatchDeleters concurrent

2959b8d

Adjust min/max time to work in the engine

b0c7a44

The query language min and max times are slighly different than the values used in the engine. This allows faster codes to be used when the whole time range is deleted.

Fix leaked goroutine in FileStore.WalkKeys

1c65bb3

If fn returned and error, the goroutines sending keys from TSM files would get blocked indefinitely and leak.

Reduce allocations when reading tombstone v4

cb65877

Rework Engine.DeleteSeriesRange to avoid allocations

88c48ec

This removes the containsSeries func which ends up creating a map sized to the slice of keys passed in. This doesn't scale well to high cardinalities and creates a lot of garbage.

Fix pid test to ignore local data

ca4998f

If you have lots of data stored locally, this test takes a while to complete since it loads it all up from the users home dir.

Don't write tombstones for keys that do not exist

eebd88f

This filters out keys that do not exist in a TSM file to avoid writing entries that would end up being ignored when applied.

jwilder force-pushed the jw-delete-time branch from a27f6d0 to a6e1fae Compare November 13, 2017 15:52

jwilder added 3 commits November 13, 2017 09:02

Optimized deletes in TSM index

0007683

This optimizes how deletes are processed to reduce memory usage and improve efficiency.

Extract MeasurementSeriesKeysByExprIterator

16d1f43

Use MeasurementSeriesKeysByExprIterator for deletes

f893beb

jwilder added 6 commits November 13, 2017 09:02

Make DeleteSeriesRange take SeriesIterator

aee395d

Optimize DeleteSeriesRange

80cd5e6

This removes more allocations and speeds up some critical sections.

Optimize bytesutil.Pack

04f4c3e

Fix create/delete series race

1369263

This fixes a race where writes and deletes to the same series and measurements could sometimes leave the index in an inconsistent state.

Fix temp tombstone files leaking

c0631c2

Fix brtfs docker build messages

5763d01

jwilder force-pushed the jw-delete-time branch from a6e1fae to e772cbb Compare November 13, 2017 16:14

jwilder force-pushed the jw-delete-time branch from e772cbb to 8b18cc4 Compare November 13, 2017 19:35

Update changelog

a8646b6

[ci skip]

jwilder merged commit 48e21e6 into master Nov 13, 2017

ghost removed the review label Nov 13, 2017

jwilder deleted the jw-delete-time branch November 13, 2017 21:39

jwilder mentioned this pull request Nov 17, 2017

Reduce calls to time.Now() #9132

Merged

4 tasks

marcvdm mentioned this pull request Nov 20, 2017

UDP service timeout - failed to write point batch to database #9045

Closed

jwilder mentioned this pull request Dec 1, 2017

Deleting data causes shard to be dropped #9178

Closed

jwilder mentioned this pull request Jan 8, 2018

DELETE query is very slow #9015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle high cardinality deletes in TSM engine #9084

Handle high cardinality deletes in TSM engine #9084

jwilder commented Nov 8, 2017 •

edited

stuartcarnie Nov 8, 2017

stuartcarnie Nov 8, 2017

jwilder Nov 9, 2017

stuartcarnie left a comment

e-dard left a comment

e-dard Nov 9, 2017

Handle high cardinality deletes in TSM engine #9084

Handle high cardinality deletes in TSM engine #9084

Conversation

jwilder commented Nov 8, 2017 • edited

Required for all non-trivial PRs

Inmem Deleting 8M series Before/After

TSI Deleting 32M series Before/After

stuartcarnie Nov 8, 2017

Choose a reason for hiding this comment

stuartcarnie Nov 8, 2017

Choose a reason for hiding this comment

jwilder Nov 9, 2017

Choose a reason for hiding this comment

stuartcarnie left a comment

Choose a reason for hiding this comment

e-dard left a comment

Choose a reason for hiding this comment

e-dard Nov 9, 2017

Choose a reason for hiding this comment

jwilder commented Nov 8, 2017 •

edited