Improved compaction scheduling #8886

jwilder · 2017-09-27T16:47:51Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated
Sign CLA (if not already signed)

This fixes some regressions and new issues due to recent changes from #8872 and #8856.

Fix long process stalls #8872 fixed some long process stalls due to major page faults, but also causes a significant reduction in throughput for lower cardinalities. This reverts of O_SYNC flag for the wal for now.
Snapshot compaction improvements #8856 increased the throughput for snapshot compactions which keeps memory usage lower at higher cardinalities, but increases the number of level 1 tsm files more quickly. This exposes a issue with the current compaction planning process where the planning for a level would get block on a slower compaction within the level. For example, a level 3 compaction might plan for 4 groups to be compacted concurrently. If 3 completed quickly, the level 3 planner would wait until the last compaction completed. Since lower levels are getting written more quickly, a large backlog could accumulate. This changes the planning to better handle these situations and make use of all the cores allocated for compactions (limited by max-concurrent-compactions).

There are also two bug fixes where the temp TSM index file could be left around when running out of disk space as well as fixing the cache mem bytes stat to include the size of the series key.

e-dard

Not necessarily changes, but just a couple of queries about closing/removing files.

e-dard · 2017-10-03T15:28:36Z

tsdb/engine/tsm1/writer.go

+		return nil
+	}
+
+	if err := d.fd.Close(); err != nil {


So if there is a problem closing the file we're not going to clean up, but we're not going to return an error to the caller either? This scenario seems unusual enough that we don't want to ignore it or not log it.

That should be returning the err here. Will fix.

e-dard · 2017-10-03T15:29:25Z

tsdb/engine/tsm1/writer.go

+	}
+
+	if f, ok := t.wrapped.(*os.File); ok {
+		f.Close()


In this case we're going to ignore a Close error (fine) but then still try and remove the file. This seems at odds with how the directIndex is handled.

I'll fix this up too.

Doing this for the WAL reduces throughput quite a bit.

This changes the compaction scheduling to better utilize the available cores that are free. Previously, a level was planned in its own goroutine and would kick off a number of compactions groups. The problem with this model was that if there were 4 groups, and 3 completed quickly, the planning would be blocked for that level until the last group finished. If the compactions at the prior level are running more quickly, a large backlog could accumlate. This now moves the planning to a single goroutine that plans each level in succession and starts as many groups as it can. When one group finishes, the planning will start the next group for the level.

Shows up under high cardinality compactions.

This switches the thresholds that are used for writing snapshots concurrently. This scales better than the prior model.

The chunked slice is unnecessary and we can re-use k.blocks throughout the compaction.

This gives an indication as to whether compactions are backed up or not.

With higher cardinality or larger series keys, the files can roll over early which causes them to take longer to be compacted by higher levels. This causes larger disk usage and higher numbers of tsm files at times.

One shard might be able to run a compaction, but could fail to limits being hit. This loop would continue indefinitely as the same task would continue to be rescheduled.

This check doesn't make sense for high cardinality data as the files typically get big and sparse very quickly. This causes a lot of extra disk space to be used which is taken up by large indexes and sparse data.

If there is a backlog of level 3 and 4 compacitons, but few level 1 and 2 compactions, allow them to use some excess capacity.

Some files seem to get orphan behind higher levels. This causes the compactions to get blocked as the lowere level files will not get picked up by their lower level planners. This allows the full plan to identify them and pull them into their plans.

jwilder added this to the 1.4.0 milestone Sep 27, 2017

jwilder requested a review from e-dard September 27, 2017 16:47

jwilder force-pushed the jw-planner branch from 89e0f59 to a7bdabd Compare September 27, 2017 18:05

ghost assigned jwilder Sep 27, 2017

ghost added the review label Sep 27, 2017

jwilder force-pushed the jw-planner branch from a7bdabd to 74767df Compare October 3, 2017 03:15

e-dard suggested changes Oct 3, 2017

View reviewed changes

jwilder changed the title ~~Fix regressions~~ Fix regressions and improve compaction scheduling Oct 3, 2017

jwilder changed the title ~~Fix regressions and improve compaction scheduling~~ Improved compaction scheduling Oct 3, 2017

jwilder added 14 commits October 3, 2017 10:48

Don't return tsm files part of a compaction plan

1610ae5

Only use O_SYNC for tsm file writing

f668b0c

Doing this for the WAL reduces throughput quite a bit.

Fix cache mem size not including key size

a5afaf7

Ensure temp index files are cleaned up on error

7081735

Remove a defer allocation

3af9c7d

Shows up under high cardinality compactions.

Rework snapshotting concurrency

2c5006f

This switches the thresholds that are used for writing snapshots concurrently. This scales better than the prior model.

Reduce allocation in tsmKeyIterator.Next

16ece49

The chunked slice is unnecessary and we can re-use k.blocks throughout the compaction.

Add compaction backlog stat

71071ed

This gives an indication as to whether compactions are backed up or not.

Use first file in generation for level

4ff4ba0

With higher cardinality or larger series keys, the files can roll over early which causes them to take longer to be compacted by higher levels. This causes larger disk usage and higher numbers of tsm files at times.

Prevent infinite scheduling loop

90df803

One shard might be able to run a compaction, but could fail to limits being hit. This loop would continue indefinitely as the same task would continue to be rescheduled.

Remove multiple file skipping for full compaction planning

0c05058

This check doesn't make sense for high cardinality data as the files typically get big and sparse very quickly. This causes a lot of extra disk space to be used which is taken up by large indexes and sparse data.

Unconditionally remove file when calling Remove

f2a681c

Update changelog

a43e652

jwilder force-pushed the jw-planner branch from 45b344e to a43e652 Compare October 3, 2017 16:50

e-dard approved these changes Oct 3, 2017

View reviewed changes

jwilder added 2 commits October 4, 2017 08:11

Allow lower priority compactions to use excess capacity

a1d0b52

If there is a backlog of level 3 and 4 compacitons, but few level 1 and 2 compactions, allow them to use some excess capacity.

jwilder merged commit 7e2b22f into master Oct 4, 2017

ghost removed the review label Oct 4, 2017

jwilder deleted the jw-planner branch October 4, 2017 15:04

jwilder mentioned this pull request Oct 4, 2017

Wait for compaction goroutines to finish #8914

Merged

4 tasks

This was referenced Mar 6, 2018

error compacting TSM files: compaction aborted: bad plan #9393

Closed

error compacting TSM files: open /***/000000064-000000004.tsm: no such file or directory engine=tsm #9314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved compaction scheduling #8886

Improved compaction scheduling #8886

jwilder commented Sep 27, 2017

e-dard left a comment

e-dard Oct 3, 2017

jwilder Oct 3, 2017

e-dard Oct 3, 2017

jwilder Oct 3, 2017

Improved compaction scheduling #8886

Improved compaction scheduling #8886

Conversation

jwilder commented Sep 27, 2017

Required for all non-trivial PRs

e-dard left a comment

Choose a reason for hiding this comment

e-dard Oct 3, 2017

Choose a reason for hiding this comment

jwilder Oct 3, 2017

Choose a reason for hiding this comment

e-dard Oct 3, 2017

Choose a reason for hiding this comment

jwilder Oct 3, 2017

Choose a reason for hiding this comment