Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete series support for TSM #6483

Merged
merged 19 commits into from
Apr 27, 2016
Merged

Delete series support for TSM #6483

merged 19 commits into from
Apr 27, 2016

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Apr 27, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

This PR adds the capability to the TSM engine to remove points based on time ranges to support the DELETE SERIES statement. The query parser/execution is not part of this PR.

The main changes to support this capability are:

  • Changed the tombstone file format to also store time ranges per key
  • Update indirectIndex to support partially deleted keys
  • Update WAL to store time ranges with deleted keys
  • Update Cache to allow removing keys by ranges of time
  • Update CacheLoader to handle keys that are partially deleted
  • Update Compactor to skip and filter out blocks and values that are partially deleted

There are a few unrelated changes in the PR done to make the implementation simpler as well as fix some bugs:

  • directIndex and indirectIndex implemented the same TSMIndex interface which is used for reading and writing. We really only used directIndex for writing and tests and indirectIndex for reading. These have been split out into separate reader and writer types so that we do not need to continue to implement reading capabilities in the directIndex when they are not used. This will also make it easier to support different version of indexes in the future when we need to extend the TSM file format.
  • The prior change allowed for a lot of code to be removed (mainly fileAccessor)
  • Fixed many places where corrupted WAL segments could panic. Should fix HELP: influxdb cannot start back up - 'panic: runtime error: slice bounds out of range' #6468 and similar ones in the future.
  • Removed NewTSMReaderWithOptions which allowed creating a TSMReader w/ a direct or indirect index. There is only NewTSMReader now that always uses indirectIndex. Create a TSMReader with a directIndex was only useful for writing tests that don't write to disk.
  • Removed several inefficient TSMReader funcs that should never be called (e.g. Keys)

cc @benbjohnson

@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @joelegasse to be a potential reviewer

@jwilder jwilder added this to the 0.13.0 milestone Apr 27, 2016
c.DeleteRange(keys, math.MinInt64, math.MaxInt64)
}

// DeleteRange will remove the keys containing points between min and max from the cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be re-worded to mention that only the values within the given range are removed. Right now, I interpret this to mean the entire key is removed if any value is in that time range.

@jwilder jwilder force-pushed the jw-delete-series branch 2 times, most recently from 914a123 to 1a432f9 Compare April 27, 2016 18:02
@benbjohnson
Copy link
Contributor

Overall it looks really good. There were a couple nits mentioned by @joelegasse but other than that, 👍 .

Also, I'm happy that I'm not the only one submitting 2KLOC PRs. :)

@jwilder
Copy link
Contributor Author

jwilder commented Apr 27, 2016

@benbjohnson It's mostly test code. :)

All comments addressed. I brought back engine.DeleteSeries and had that call engine.DeleteSeriesRange.

This adds support for a time range to tombstone files to allow a subset
of points to be deleted instead of the whole series.  It changes the
tombstone file format to a binary format and maintains backwards compatibility
with the old text format tombstone files.
There are two TSMIndex implementations, the directIndex and the
indirectIndex.  Originally, we only had the directIndex and later
added the indirectIndex and NewTSMReaderWithOptions in order to
allow both indexes to be used in tests and code.  This has created
a problem since we really only use the directIndex for writing and
always use the indirectIndex for reading.

This changes removes the NewTSMReaderWithOptions func so that it is
no longer possible to create a TSMReader with a directIndex.  This
will allow a lot of the block reading code used by the directIndex
to be removed and simplify maintainence.  It also gives better test
coverage of the code that is actually used by the TSM engine now.
No longer used
Allows for future versionion of the TSMIndex as well as removing
a lot of unnecessary code.
It's very inneficient and should never be used.
This will skip blocks that are fully tombstoned as well as remove
points that have been removed within a block.
@joelegasse
Copy link
Contributor

LGTM 👍

@jwilder jwilder merged commit 51b1af4 into master Apr 27, 2016
@jwilder jwilder deleted the jw-delete-series branch April 27, 2016 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HELP: influxdb cannot start back up - 'panic: runtime error: slice bounds out of range'
4 participants