Custom Retention #3642

cyriltovena · 2021-04-22T14:54:20Z

This PR adds custom retention. You can set a retention per stream or a global one for each user. The retention is currently applied by the compactor using the limits/overrides. This works is currently targeted only for boltdb shipper store.

The design doc: https://docs.google.com/document/d/1xyDNai5MZI2DWnoFm8vrSnS372fx0HohDiULbXkJosQ/edit?usp=sharing
is a bit obsolete but still gives a general idea.

The compactor has now 2 new components. A Marker and a Sweeper.

Marker

The marker goes through all tables one by one:

It processes the whole index once to gather informations about series i.e labels/metric.
Then it creates marks (ie. file) of chunks to be deleted using defined rules.
It will also delete tables if they are empty or cleanup series that don't exists anymore.
Then finally update in the object store the index.

Only tables that are compacted are considered for now (simplicity).

Sweeper

The sweeper reads all marks available and takes only the one that have a min age (currently default to two hours) and then delete chunks. Once a chunk is delete the marks file is update. In case of failure the mark will be retried until it works. (404 are not failure).

The min age is there to allow for all Loki components to have time to download all new indexes so that we don't reference to a chunk that no longer exists.

The retention as seen by the user though should be pretty precise i.e time to sync + time to process all tables.

Done:

unit tests
benchmark
metrics instrumentation

TODO:

testing a dev environment.

Following works (other PR):

mixin and dashboard.
documentation
more code documentation.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

CLAassistant · 2021-04-22T14:54:24Z

All committers have signed the CLA.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

…ark file. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

sandeepsukhani

I have added some nits and suggestions. Overall the code looks good to me.

pkg/storage/stores/shipper/compactor/retention/retention.go

sandeepsukhani · 2021-04-26T09:50:11Z

pkg/storage/stores/shipper/compactor/compactor.go

+				c.sweeper.Stop()
+				wg.Done()
+			}()
+			c.sweeper.Start()


I think we need some way to lock the tables when mark operation is running for a table to avoid running compaction on them simultaneously and hence avoid deleted index to re-appear.

Currently I don't touch table that are not compacted.

pkg/storage/stores/shipper/compactor/retention/retention.go

sandeepsukhani · 2021-04-26T13:32:00Z

pkg/storage/stores/shipper/compactor/retention/retention.go

+	}
+
+	if len(objects) != 1 {
+		// todo(1): in the future we would want to support more tables so that we can apply retention below 1d.


I think it would be hard to find a window in a large cluster where there is just one file. I think we should run mark phase immediately after compaction which also avoids conflicting updates between compaction and retention process since they run in their independent goroutines and we run compaction in our ops cluster every 5 mins which increases the chance of conflicts.

This is just food for thoughts this comment.

But I thought that may be in the future we want to apply this where compaction is not happening.

I would still suggest changing this now because of a couple of reasons:

One less way to modify the index.

We compact a table only when it has 4 or more files which means retention won't run if we stopped getting writes period and we >1 count(files) <4.

With a large cluster we run 30+ ingesters and a file could get uploaded as soon as compaction ran which means it would be hard to find a window where there is only one file in an active table.

We can just add a map of tableName->mutex and both compaction and retention code will have to acquire a lock to avoid a race. What do you think?

pkg/storage/stores/shipper/compactor/compactor.go

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

… first shot. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

sandeepsukhani

Awesome work Cyril! It looks good to me!

sandeepsukhani · 2021-04-28T10:19:44Z

pkg/storage/stores/shipper/compactor/compactor.go


+	retentionWorkDir := filepath.Join(cfg.WorkingDirectory, "retention")
+
+	sweeper, err := retention.NewSweeper(retentionWorkDir, retention.NewDeleteClient(objectClient), cfg.RetentionDeleteWorkCount, cfg.RetentionDeleteDelay, r)


Let us add a comment that we assume chunks would always be stored in the same object store as the index files?
People who are migrating between stores might face an issue here.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena · 2021-04-28T11:18:33Z

As per our slack discussion, I'll create a follow up PR to trigger compaction if required.

cyriltovena added 28 commits March 24, 2021 22:06

Playing around with a POC based on the design doc.

8ee10a1

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Getting series ID for label matchers is now working.

934d04d

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes tests.

e87510f

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

wip/

337c107

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

We're parsing the index label now.

a68cb4c

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Starting to extract interfaces to make the code testable.

2ae282d

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Work in progress will try to add labels to chunk ref iterator

7ca79a8

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Iterator for chunks ref with all labels !!!.

22af468

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Optimize code away.

0af8883

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

with delete into the mix

ce3b995

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Cleaner but not yet working for v10 and v11.

edb035c

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes series cleaner.

0c0032f

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

hooking into the compactor.

773d403

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Hooking limit retention config.

cdc752d

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Finishing off the marker processor.

a93bb91

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Working on sweeper and fixing tests.

87822c7

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Adding more tests and founding more bugs along the way.

42dd29e

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Bug with path once boltdb is closed.

8468cfb

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixing more bug and more robust test.

b560196

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

More test and cleanup getting close.

8f85309

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes moar bugs with regards to period schema

b0aa3dd

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fix a flaky tests because of boltdb still open.

bb6c010

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Add more metrics.

a10c898

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Adding metrics.

3e21016

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Improve benchmark.

f66e6be

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Merge remote-tracking branch 'upstream/main' into customer-retention

efb1dcd

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes issue.

520fad5

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

lint code.

f644e9b

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

pull-request-size bot added the size/XXL label Apr 22, 2021

cyriltovena requested a review from sandeepsukhani April 22, 2021 14:55

cyriltovena added 8 commits April 23, 2021 10:20

more logs.

a6cceaf

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Save files without using table key

9a8f43c

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Improve logging and ability to use more goroutines.

74cb42b

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Removes duplicate metrics since histogram contains total too.

78a4673

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Add more logs.

6091368

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes a deadlock bug when too many workers are trying to update the m…

7086c12

…ark file. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes a deadlock when reading and updating db at the same time.

ceb3b8f

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes default config test of boltdb.

0e9ad61

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

sandeepsukhani reviewed Apr 26, 2021

View reviewed changes

periklis reviewed Apr 26, 2021

View reviewed changes

pkg/storage/stores/shipper/compactor/compactor.go Outdated Show resolved Hide resolved

pkg/storage/stores/shipper/compactor/compactor.go Outdated Show resolved Hide resolved

cyriltovena added 6 commits April 26, 2021 16:49

PR Review feedbacks.

80f8cc4

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Remove the user of assert to not fail a test if it's incorrect on the…

456bb52

… first shot. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Add experimental notice to the flag documentation

520d393

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Fixes empty index detection and table deletion.

9616c44

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Do not delete folder it's not necessary with object store.

a7dc298

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Better working path cleanup

af0edc9

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

sandeepsukhani approved these changes Apr 28, 2021

View reviewed changes

got linted.

b451e75

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena merged commit 806d6a5 into grafana:main Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Retention #3642

Custom Retention #3642

cyriltovena commented Apr 22, 2021 •

edited

Loading

CLAassistant commented Apr 22, 2021 •

edited

Loading

sandeepsukhani left a comment

sandeepsukhani Apr 26, 2021

cyriltovena Apr 26, 2021

sandeepsukhani Apr 26, 2021

cyriltovena Apr 26, 2021 •

edited

Loading

sandeepsukhani Apr 27, 2021

sandeepsukhani left a comment

sandeepsukhani Apr 28, 2021

cyriltovena commented Apr 28, 2021


		retentionWorkDir := filepath.Join(cfg.WorkingDirectory, "retention")

		sweeper, err := retention.NewSweeper(retentionWorkDir, retention.NewDeleteClient(objectClient), cfg.RetentionDeleteWorkCount, cfg.RetentionDeleteDelay, r)

Custom Retention #3642

Custom Retention #3642

Conversation

cyriltovena commented Apr 22, 2021 • edited Loading

Marker

Sweeper

CLAassistant commented Apr 22, 2021 • edited Loading

sandeepsukhani left a comment

Choose a reason for hiding this comment

sandeepsukhani Apr 26, 2021

Choose a reason for hiding this comment

cyriltovena Apr 26, 2021

Choose a reason for hiding this comment

sandeepsukhani Apr 26, 2021

Choose a reason for hiding this comment

cyriltovena Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

sandeepsukhani Apr 27, 2021

Choose a reason for hiding this comment

sandeepsukhani left a comment

Choose a reason for hiding this comment

sandeepsukhani Apr 28, 2021

Choose a reason for hiding this comment

cyriltovena commented Apr 28, 2021

cyriltovena commented Apr 22, 2021 •

edited

Loading

CLAassistant commented Apr 22, 2021 •

edited

Loading

cyriltovena Apr 26, 2021 •

edited

Loading