Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exemplar resize #8974

Merged
merged 12 commits into from
Jul 20, 2021
Merged

Exemplar resize #8974

merged 12 commits into from
Jul 20, 2021

Conversation

mdisibio
Copy link
Contributor

This PR adds Resize method to CircularExemplarStorage. The need for this originally stems from Cortex, to support dynamically adjusting exemplar storage at runtime, but it is submitted here as it aligns with Prometheus direction of hot reload for some configuration. A future PR is expected to complete this functionality and utilize this Resize method. It seemed better to submit separately, but this work can be combined if preferred.

Storage is resized by allocating a new circular buffer and index, and replaying data into it which rebuilds index and entry linking. There is some optimization to reduce allocs and replay as little as needed (i.e. when shrinking buffer). Other approaches were considered, but lacked correctness or were less straightforward. For example, it would be very efficient when growing the ring to simply leave entries in place which preserves all indexes, but loses guarantee that oldest exemplars are overwritten first. Instead of the current implementation of absolute indexing between exemplars, we could use relative indexing or direct pointers, but this increases complexity for adding or selecting.

Benchmark:

$ go test -run=ResizeExemplar -bench=ResizeExemplar
goos: darwin
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkResizeExemplars/grow-12         	 1726984	       619.9 ns/op
BenchmarkResizeExemplars/shrink-12       	 4756222	       358.8 ns/op
PASS
ok  	github.com/prometheus/prometheus/tsdb	15.775s

Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
@mdisibio mdisibio requested a review from codesome as a code owner June 21, 2021 15:50
@cstyan
Copy link
Member

cstyan commented Jun 22, 2021

I'll push the changes to make use of this via a config file option + /reload in the next day or so.

reloadable storage config.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Member

cstyan commented Jun 24, 2021

I've added a rough implementation of having a storage config section (tsdb/exemplars) in the config file so that resize can be used. Not entirely happy with how the passing of the config works given that TSDB starts up before the full config file is read.

Fixing tests is on my todo list for the morning.

@roidelapluie
Copy link
Member

I've added a rough implementation of having a storage config section (tsdb/exemplars) in the config file so that resize can be used. Not entirely happy with how the passing of the config works given that TSDB starts up before the full config file is read.

Fixing tests is on my todo list for the morning.

The configuration file is read once before everything. Maybe we can reuse that somehow?

config/config.go Outdated Show resolved Hide resolved
Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Member

cstyan commented Jun 29, 2021

Implemented Julien's and Ben's suggestions and tried to clean up the config reloading a bit by having ApplyConfig for db.go/head.go modify the opts structs, and then call the exemplar storage's Resize with the new value from the Head opts struct.

This also means the code can check the opts struct values and skip sending an exemplar to the exemplar storage if the storage itself is disabled or sized to -1 (used to resize the storage back to 0 and use a noop storage).

I'm still not happy with how the default storage size is set, but I don't fully understand the config file parsing yet and how we could inject DefaultExemplarConfig if the feature flag --enable-feature=exemplar-storage is set on the command line.

@cstyan cstyan requested a review from roidelapluie June 30, 2021 05:35
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, haven't looked at the resize function yet. Looks good in general, I have some question/concerns in the comments below.

I was wondering if resize could affect ingestion of samples with exemplars in the same commit since it takes a lock for resize, but with such low overhead of resize (as seen from benchmarks), I guess that is not an issue.

config/config.go Show resolved Hide resolved
tsdb/db.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated
if err != nil {
return err
}
h.exemplars = e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause race with the appending since there is nothing protecting this ApplyConfig/h.exemplars. I think we should get rid of noopExemplarStorage and just start with a 0 size exemplar storage in NewHead without having to lock ApplyConfig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We'd still be allocating some memory with a 0 length buffer and empty map, but I think that's reasonable.

I'll think about this a little before making any changes here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race still exists

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll deal with this one in the morning

tsdb/head.go Outdated
Comment on lines 454 to 456
// Head uses opts.MaxExemplars in combination with opts.EnableExemplarStorage
// to decide if it should pass exemplars along to it's exemplar storage, so we
// need to update opts.MaxExemplars here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only use opts.EnableExemplarStorage for deciding and not opts.MaxExemplars
Reasons:

  • opts is a config given for starting Head and would be error prone to keep it up to date with dynamic fields like opts.MaxExemplars. It is best to get these dynamic config from the exemplar storage itself.
  • There could be a case where opts.MaxExemplars is made >0 from <0 while the appender has not committed, in which case we should prolly not have discarded those exemplars and have kept in appender and committed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few additional changes that we'd have to make in order to do this.

opts is a config given for starting Head and would be error prone to keep it up to date with dynamic fields like opts.MaxExemplars. It is best to get these dynamic config from the exemplar storage itself.

All config for the exemplar storage is passed through TSDB head already, and opts was what we were using before. I was trying to make this change as small as possible. If opts has only the EnableExemplarStorage bool and not the MaxExemplars int then we need ValidateExemplar to return a new error type if MaxExemplars is < 0.

There could be a case where opts.MaxExemplars is made >0 from <0 while the appender has not committed, in which case we should prolly not have discarded those exemplars and have kept in appender and committed.

I'm not sure what you're saying here/suggesting we change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my above comment. I was thinking about interaction between and open appender and resize and what should be the right way to append exemplars in that case. So my new question would be, is it fine to only ingest partial exemplars from a scrape? (although rare case, but can happen during a config reload)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my new question would be, is it fine to only ingest partial exemplars from a scrape? (although rare case, but can happen during a config reload)

I think for now this would be okay, but we could explore improvements in the future.

tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
Signed-off-by: Callum Styan <callumstyan@gmail.com>
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated
es, err := NewCircularExemplarStorage(opts.NumExemplars, r)
if err != nil {
return nil, err
var em *ExemplarMetrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid this being nil. There are cases where ApplyConfig can panic and we are not updating h.exemplarMetrics in ApplyConfig if we moved from <0 exemplars to >0. So we could start from this being not nil from the beginning.

Suggested change
var em *ExemplarMetrics
em := NewExemplarMetrics(r)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases where ApplyConfig can panic can you elaborate?

tsdb/exemplar.go Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated
Comment on lines 454 to 456
// Head uses opts.MaxExemplars in combination with opts.EnableExemplarStorage
// to decide if it should pass exemplars along to it's exemplar storage, so we
// need to update opts.MaxExemplars here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my above comment. I was thinking about interaction between and open appender and resize and what should be the right way to append exemplars in that case. So my new question would be, is it fine to only ingest partial exemplars from a scrape? (although rare case, but can happen during a config reload)

tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
tsdb/head.go Outdated Show resolved Hide resolved
Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Member

cstyan commented Jul 9, 2021

Thanks for the review @codesome. Still not entirely happy with the config file parsing and setting the default config but maybe I'm just hung up on only setting the default config if the feature flag is set, which we're not really set up to do nicely at the moment.

when resizing from Head code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link

@johannaratliff johannaratliff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have less context here but didn't see any logic errors - seemed like a good refactor + addition of resize logic.

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked exemplar.go and exemplar_test.go yet, but all other parts LGTM where I had the most comments. Since @cstyan has reviewed these exemplar files, I will do one last sanity check and merge it after that.

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some potential panics. Can you update the tests to cover these cases too? With start and end size to be negative (including resize from negative to negative sizes)

tsdb/exemplar.go Show resolved Hide resolved
tsdb/exemplar.go Show resolved Hide resolved
tsdb/exemplar.go Show resolved Hide resolved
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@LeviHarrison
Copy link
Member

I was looking at the exemplars storage feature flag docs and noticed that the storage.exemplars.exemplars-limit flag specified in the description had been removed in this PR. Were StorageConfig and ExemplarsConfig, which replaced this flag, ever documented?

cstyan added a commit that referenced this pull request Nov 25, 2021
Signed-off-by: Callum Styan <callumstyan@gmail.com>
juliusv pushed a commit that referenced this pull request Dec 1, 2021
…on (#9868)

* Update exemplar docs based on changes from #8974

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing code block closing + unindent one level.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants