Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle high cardinality deletes in TSM engine #9084

Merged
merged 28 commits into from Nov 13, 2017
Merged

Handle high cardinality deletes in TSM engine #9084

merged 28 commits into from Nov 13, 2017

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Nov 8, 2017

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

This PR has a number of changes to reduce memory usage in the TSM engine for deletes that could lead to OOMs. This required reworking how deletes are processed altogether.

The initial version of the deletes in the engine took a slice of all the series keys to be deleted and then processed those keys. This worked fine for smaller cardinality deletes, but is problematic for higher cardinalities as well as when the data set grows across many TSM files.

The main problems addressed are:

  • The creation of the large slice of series keys has been switched to use an iterator approach.
  • The Engine.DeleteSeriesRange needed to create several intermediate slices and maps that were at least as big as the original series keys passed in. These slices and maps are used to track the field-less series keys (passed in), to the actual series keys stored in TSM (with fields).
  • Writing tombstones required reading in the existing tombstones on disk and adding new ones to the set and re-writing them out (Avoid re-reading tombstone data when writing tombstones #5503). This makes deleting smaller batches of writes very expensive. This has been fixed via a new tombstone format and interfaces to write and commit them in batches that avoids the previous problems.
  • Deletes and writes to the same series could leave the index in an inconsistent state because concurrent writes could be adding series to the index while a delete removes it unknowingly. A temporary fix in 1.3 was to re-add all the series keys after compactions completes. This is unworkable with higher cardinalities and is handled better in this PR.
  • Determining whether a series and measurement needs to be removed from the index was expensive. It previously worked by applying the deletes, and then calling containsSeries which determined if any data exists for the series in the engine. If nothing existed, the entries were removed from the index as well. This was racy and memory intensive and has been reworked to use smaller, re-usable batches.

The graphs below are a before and after deleting 8M series from inmem in a single shard with about 20 large TSM files that are not fully compacted. The first graphs shows a fairly large spike that is correlated w/ the number of series and TSM files to write tombstones. The seconds still uses a bit too much memory (still looking into), but the large spike is gone.

Inmem Deleting 8M series Before/After

screen shot 2017-11-08 at 10 59 35 am screen shot 2017-11-08 at 11 12 00 am

TSI Deleting 32M series Before/After

screen shot 2017-11-09 at 6 58 05 am
screen shot 2017-11-08 at 8 36 35 pm

Master ran for 12+hrs and I eventually killed it. This change completed in 17m, but I should be able to improve that further.

@jwilder jwilder added this to the 1.5.0 milestone Nov 8, 2017
@ghost ghost assigned jwilder Nov 8, 2017
@ghost ghost added the review label Nov 8, 2017
for len(tempKeys) > 0 && bytes.Compare(tempKeys[0], seriesKey) < 0 {
tempKeys = tempKeys[1:]
// Strip off the field portion of the key
seriesKey, _ := SeriesAndFieldFromCompositeKey([]byte(k))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears k already a []byte, therefore []byte(k) is redundant

for i < len(a) {
// Find the next gap to remove
iStart = i
for ; i < len(a) && a[i] == val; i += width {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about moving the increment inside the for statement block?

for i < len(a) && a[i] == val {
    i += width
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that.

Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻

Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 just a nit.

return nil
} else if e := itr.Next(); e != nil {
mm := fs.Measurement(mname)
if mm == nil || mm.HasSeries() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if a measurement is nil, then it can't have any series. Why not just make HasSeries check if the receiver is nil and return false? Then you don't need the mm == nil || mm.HasSeries()

This adds a new v4 tombstone format that extends the v3 format
by allowing multiple batches of tombstones to be written without
having to re-read all the existing tombstones.  This uses gzip
multi stream to append multiple v3 files together to create a v4
format.
This removes the in-memory tombstone buffer when writing tombstones
which eliminates one source of large memory spikes during deletes.
Allows callers to use []byte and avoid a string allocation
This is a version of DeleteRange that take a func predicate to determine
whether a series key should be deleted or not.  This avoids the large
slice allocations with higher cardinalities.
The query language min and max times are slighly different than the
values used in the engine.  This allows faster codes to be used when
the whole time range is deleted.
If fn returned and error, the goroutines sending keys from TSM files
would get blocked indefinitely and leak.
This removes the containsSeries func which ends up creating a map
sized to the slice of keys passed in.  This doesn't scale well to
high cardinalities and creates a lot of garbage.
If you have lots of data stored locally, this test takes a while to
complete since it loads it all up from the users home dir.
This filters out keys that do not exist in a TSM file to avoid
writing entries that would end up being ignored when applied.
This removes more allocations and speeds up some critical sections.
This fixes a race where writes and deletes to the same series and
measurements could sometimes leave the index in an inconsistent state.
The DropSeries code path ended up creating a MeasurementSeriesIterator
for each dropped series, this was too expensive just to see if a
series exists.

This adds a HasSeries func and fixes and issue where TSI files were
compacted while an iterator was still in use causing a panic.
[ci skip]
@jwilder jwilder merged commit 48e21e6 into master Nov 13, 2017
@ghost ghost removed the review label Nov 13, 2017
@jwilder jwilder deleted the jw-delete-time branch November 13, 2017 21:39
@jwilder jwilder mentioned this pull request Nov 17, 2017
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants