Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Retention #3642

Merged
merged 43 commits into from
Apr 28, 2021
Merged

Custom Retention #3642

merged 43 commits into from
Apr 28, 2021

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Apr 22, 2021

This PR adds custom retention. You can set a retention per stream or a global one for each user. The retention is currently applied by the compactor using the limits/overrides. This works is currently targeted only for boltdb shipper store.

The design doc: https://docs.google.com/document/d/1xyDNai5MZI2DWnoFm8vrSnS372fx0HohDiULbXkJosQ/edit?usp=sharing
is a bit obsolete but still gives a general idea.

The compactor has now 2 new components. A Marker and a Sweeper.

Marker

The marker goes through all tables one by one:

  • It processes the whole index once to gather informations about series i.e labels/metric.
  • Then it creates marks (ie. file) of chunks to be deleted using defined rules.
  • It will also delete tables if they are empty or cleanup series that don't exists anymore.
  • Then finally update in the object store the index.

Only tables that are compacted are considered for now (simplicity).

Sweeper

The sweeper reads all marks available and takes only the one that have a min age (currently default to two hours) and then delete chunks. Once a chunk is delete the marks file is update. In case of failure the mark will be retried until it works. (404 are not failure).

The min age is there to allow for all Loki components to have time to download all new indexes so that we don't reference to a chunk that no longer exists.

The retention as seen by the user though should be pretty precise i.e time to sync + time to process all tables.

Done:

  • unit tests
  • benchmark
  • metrics instrumentation

TODO:

  • testing a dev environment.

Following works (other PR):

  • mixin and dashboard.
  • documentation
  • more code documentation.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@CLAassistant
Copy link

CLAassistant commented Apr 22, 2021

CLA assistant check
All committers have signed the CLA.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
…ark file.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some nits and suggestions. Overall the code looks good to me.

c.sweeper.Stop()
wg.Done()
}()
c.sweeper.Start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some way to lock the tables when mark operation is running for a table to avoid running compaction on them simultaneously and hence avoid deleted index to re-appear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I don't touch table that are not compacted.

}

if len(objects) != 1 {
// todo(1): in the future we would want to support more tables so that we can apply retention below 1d.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be hard to find a window in a large cluster where there is just one file. I think we should run mark phase immediately after compaction which also avoids conflicting updates between compaction and retention process since they run in their independent goroutines and we run compaction in our ops cluster every 5 mins which increases the chance of conflicts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just food for thoughts this comment.

But I thought that may be in the future we want to apply this where compaction is not happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still suggest changing this now because of a couple of reasons:

  1. One less way to modify the index.
  2. We compact a table only when it has 4 or more files which means retention won't run if we stopped getting writes period and we >1 count(files) <4.
  3. With a large cluster we run 30+ ingesters and a file could get uploaded as soon as compaction ran which means it would be hard to find a window where there is only one file in an active table.

We can just add a map of tableName->mutex and both compaction and retention code will have to acquire a lock to avoid a race. What do you think?

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
… first shot.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work Cyril! It looks good to me!


retentionWorkDir := filepath.Join(cfg.WorkingDirectory, "retention")

sweeper, err := retention.NewSweeper(retentionWorkDir, retention.NewDeleteClient(objectClient), cfg.RetentionDeleteWorkCount, cfg.RetentionDeleteDelay, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us add a comment that we assume chunks would always be stored in the same object store as the index files?
People who are migrating between stores might face an issue here.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena
Copy link
Contributor Author

As per our slack discussion, I'll create a follow up PR to trigger compaction if required.

@cyriltovena cyriltovena merged commit 806d6a5 into grafana:main Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants