DROP SERIES is terribly inefficient #592

spantaleev · 2014-05-29T12:26:12Z

Dropping many (small) time-series results in many compact operations, which cause a lot of I/O.

My last tests showed that dropping 24 series (a tiny part of my whole 550MB database), takes about an hour to finish and results in 33 GB of I/O.

Does it really need to run a full compact operation on every drop series query? Maybe it should do it just once, for each "drop series batch"?

Looking at the code, it also seems to run a compaction for the full range, which may not be ideal, but I don't know that much about LevelDB or how InfluxDB uses it to comment on that.

From what I last heard, @jvshahid is working on improving the situation. This is just a place to track the issue.

pauldix · 2014-05-30T18:36:23Z

I suppose we could remove the compactions from delete and drop completely. Then just have a separate command to force a compaction. Then you can do whatever you want and do the compactions in one go. Maybe that'll work better?

freeeve · 2014-05-30T18:41:00Z

Does compact run any other time or would it be entirely manual? (such as startup, or something?)

spantaleev · 2014-05-30T20:19:39Z

Does skipping compaction make the rest of the "drop series" work not reclaim much space at all?

Running compaction manually sounds like a good workaround to the problem, but.. It'd be better if InfluxDB users don't have to worry about details of the underlying storage engine that they don't (need to) understand.

Ideally, InfluxDB can put off compaction for a while (not do it on every "drop series" call) and estimate when is a good time to do it. Probably depending on the number of points deleted since the last compaction operation (but still not too often - deleting many large series within a short amount of time should only make it trigger once).

pauldix · 2014-05-31T17:15:13Z

yeah, that's a good idea. We could just trigger a compaction some amount of
time after a drop or delete occurs. With successive drops and deletes just
incrementing the timer a little bit.

On Fri, May 30, 2014 at 4:19 PM, Slavi Pantaleev notifications@github.com
wrote:

Does skipping compaction make the rest of the "drop series" work not
reclaim much space at all?

Running compaction manually sounds like a good workaround to the problem,
but.. It'd be better if InfluxDB users don't have to worry about details of
the underlying storage engine that they don't (need to) understand.

Ideally, InfluxDB can put off compaction for a while (not do it on every
"drop series" call) and estimate when is a good time to do it. Probably
depending on the number of points deleted since the last compaction
operation (but still not too often - deleting many large series within a
short amount of time should only make it trigger once).

—
Reply to this email directly or view it on GitHub
#592 (comment).

pauldix · 2014-07-14T23:34:28Z

Marking this one as closed for now. In v0.8.0 drop series will now be fast to the end user. It simply removes the series metadata and returns a response. It then backgrounds the deletion of the series data without calling compactions.

If it's series data that you're regularly going to be dropping, consider using the shard spaces retention policy feature: http://influxdb.com/docs/v0.8/advanced_topics/sharding_and_storage.html

pauldix closed this as completed Jul 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DROP SERIES is terribly inefficient #592

DROP SERIES is terribly inefficient #592

spantaleev commented May 29, 2014

pauldix commented May 30, 2014

freeeve commented May 30, 2014

spantaleev commented May 30, 2014

pauldix commented May 31, 2014

pauldix commented Jul 14, 2014

DROP SERIES is terribly inefficient #592

DROP SERIES is terribly inefficient #592

Comments

spantaleev commented May 29, 2014

pauldix commented May 30, 2014

freeeve commented May 30, 2014

spantaleev commented May 30, 2014

pauldix commented May 31, 2014

pauldix commented Jul 14, 2014