Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream prometheus/prometheus at 37b408c #639

Merged
merged 71 commits into from
Jun 6, 2024

Conversation

gotjosh
Copy link
Contributor

@gotjosh gotjosh commented May 30, 2024

Note to reviewers:

Please pay close attention to this diff, in particular to the following files:

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
        deleted by us:   .github/workflows/ci.yml
        both modified:   model/rulefmt/rulefmt.go
        both modified:   promql/engine_test.go
        both modified:   rules/alerting.go
        both modified:   rules/group.go
        both modified:   rules/manager.go
        both modified:   rules/manager_test.go
        both modified:   rules/recording.go
        both modified:   rules/rule.go
        both modified:   tsdb/index/index_test.go

The change that affected the tsdb/index/index_test.go and promql/engine_test.go is prometheus/prometheus#14071 -- I'll need an in-depth review of those two files and that PR as I modified to the best of my ability with what I understood.

Relevant

Not Relevant

mickael-carl and others added 30 commits December 16, 2023 22:42
Signed-off-by: Mickael Carl <mcarl@apple.com>
Signed-off-by: Mickael Carl <mcarl@apple.com>
…d for transient file writes.

use it in loadDataAsQueryable to make sure the RO Head doesn't truncate or cut new chunks in data/chunks_head/.

add a -sandbox-dir-root flag to "promtool tsdb dump/dump-openmetrics" to control the root of that sandbox dirrectory.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
…e discovery doc

Signed-off-by: Jiekun <zhujiekun@52tt.com>
Allow users to opt-out of the multi-cluster setup for Prometheus
dashboard, in environments where it isn't applicable.

Refer: prometheus/prometheus#13180.

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This adds some more test cases for unicode values, and also a benchmark
for zeroOrOneCharacterStringMatcher.Matches()

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This replaces the custom `moreThanOneRune` function with the standard
`utf8.DecodeRuneInString(s)` that can be used to figure out the size of
the first rune.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
…to promql testing framework

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
On Windows, Go will sleep 15ms if you ask for less.  TestAsyncRuleEvaluation
compares actual delay to the nominal time, so using 15ms should work
better on Windows, and be hardly noticeable elsewhere.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>
Enable perfsprint linter and fix up code issues
bugfix: allow opting-out of multi-cluster setups
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…gMatcher-Matches

Improve `zeroOrOneCharacterStringMatcher` by using `utf8.DecodeRuneInString`
tsdb/index: Refactor Reader tests

Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
[TEST] Rules: Sleep 15ms to fit Windows behaviour better
docs: [ovh sd] Added missing label for OVH dedicated server in SD
…cordingly. Signed-off-by: kushagra Shukla <kushalshukla110@gmail.com>

Signed-off-by: kushagra Shukla <kushalshukla110@gmail.com>
defaultStatsRenderer->DefaultStatsRenderer
Add short docstrings.

I'd like to use the stats renderer to peek at the statistics in
grafana/mimir#7966

However I cannot call the original function without this export afterwards.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Converts existing native histogram unit tests to the PromQL testing framework
Follow up on prometheus/prometheus#14096

As promised, I bring a benchmark, which shows a very small improvement
if context is checked every 128 iterations of label instead of every
100.

It's much easier for a computer to check modulo 128 than modulo 100.
This is a very small 0-2% improvement but I'd say this is one of the
hottest paths of the app so this is still relevant.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@bboreham
Copy link
Contributor

bboreham commented Jun 3, 2024

This came out of a major incident, so it is quite important, but also it has already been merged here as #633.

…alse positives.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
AlignEvaluationTimeOnInterval bool `yaml:"align_evaluation_time_on_interval,omitempty"`
}

func (g *RuleGroup) UnmarshalYAML(value *yaml.Node) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking for opinions here - is this enough to produce a "migration"? Users only write YAML directly to object storage before the ruler picks it up when encoding/decoding, it'll always show you the equivalent value of evaluation_delay as rule_offset.

Perhaps, not any sort of modification and just supporting both values (whilst having preference for rule_offset is an approach we want?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way you did it, but I'd fail if both fields are found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for being backwards compatible I think it's enough.

Maybe just always set them to the same value so that the code can use either (I haven't reviewed the rest of this PR); this also unblocks rollbacks from newer ruler to older ruler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just always set them to the same value so that the code can use either

I agree on this. Looks safer, even if once we'll upgrade mimir-prometheus in mimir we'll remove the usage of EvaluationDelay (but better safe than sorry).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change to preserve both in a8dec7c

I like the way you did it, but I'd fail if both fields are found.

I think failing on both sounds sensible but seems more of a Mimir validation than a Prometheus one. We do have a function to validate rule groups for Mimir at an API level and I think this validation belongs there: https://github.com/grafana/mimir/blob/main/pkg/mimirtool/rules/rules.go#L255

Maybe just always set them to the same value so that the code can use either (I haven't reviewed the rest of this PR); this also unblocks rollbacks from newer ruler to older ruler.

I'm not sure you meant set them always to the same value if neither of them exist which is the alternative I had in mind but this is what I did:

  1. If either evaluation_delay or query_offset is set, set the other to the same value.
  2. If both of them have values (technically not possible with the validation in Mimir) output them as is
  3. In mimir, we'll only use query_offset

Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there would be two typical use cases:

  1. Always load rules from code. In which case the user will set the delay and we correctly unmarshal into the offset now.
  2. GET the rule from API, update and POST. The offset may or may not be returned on the POST, but both should work as expected I think

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If both of them have values (technically not possible with the validation in Mimir) output them as is

this is the bit I had in mind - if they conflict, then just choose one and set both to it. It's more an argument in principle, than solving for a specific problem right now. A group with different offset and delay is an invalid group. Not feeling strongly about it and I don't need to look at it again.

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps there can be some confusion if

  1. the tenant gets a rule group which has both evalution_delay and query_offset
  2. They set only evaluation_delay in the json/yaml we returned.
  3. post back the same json without removing query_offset

if the client only pays attention to evaluation_delay, they won't ever understand why the delay is not being changed in Mimir. If we set them both to the same value, then the client would at least notice that the changes aren't reflected in Mimir.

Somewhat contrived, but I imagine that's how some scripting languages clients might work. Don't want to block this further, sorry

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Mimir, we have to make it work with older clients not supporting query_offset. For this reason, we can't stop exporting evaluation_delay, but we also want to export query_offset. I want a client to be able to successfully upload and exported config. For this reason, I'm not convinced the following is an option:

I like the way you did it, but I'd fail if both fields are found.
I think failing on both sounds sensible but seems more of a Mimir validation than a Prometheus one

I would keep it simple: export both fields, just fail the validation if they're both set but to different values.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be some mistakes in index_test.go, please see comments.

tsdb/index/index_test.go Show resolved Hide resolved
tsdb/index/index_test.go Show resolved Hide resolved
tsdb/index/index_test.go Outdated Show resolved Hide resolved
tsdb/index/index_test.go Show resolved Hide resolved
- Don't change t to innerT
- Bring back `t.Cleanup` with `reader.Close()`
- Duplicate the createFileReader functions

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh gotjosh marked this pull request as ready for review June 4, 2024 08:28
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side.

Maybe you could get input from @bboreham on prometheus/prometheus#14079?

@bboreham
Copy link
Contributor

bboreham commented Jun 4, 2024

What is your question?

@aknuds1
Copy link
Contributor

aknuds1 commented Jun 4, 2024

@bboreham from the PR description:

prometheus/prometheus#14079 seems changelog worthy, TSDB experts, help me out?

@bboreham
Copy link
Contributor

bboreham commented Jun 4, 2024

Yes, put it in the changelog.

aux.QueryOffset = g.EvaluationDelay
}

aux.EvaluationDelay = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this break clients that expect eval delay to be there? I'm thinking about grafana specifically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old versions of mimirtool might also expect evaluation_delay to be set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dimitar here. We need to keep exporting evaluation_delay otherwise old clients break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout, I've made the change to preserve both in a8dec7c

config/config.go Show resolved Hide resolved
AlignEvaluationTimeOnInterval bool `yaml:"align_evaluation_time_on_interval,omitempty"`
}

func (g *RuleGroup) UnmarshalYAML(value *yaml.Node) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just always set them to the same value so that the code can use either

I agree on this. Looks safer, even if once we'll upgrade mimir-prometheus in mimir we'll remove the usage of EvaluationDelay (but better safe than sorry).

aux.QueryOffset = g.EvaluationDelay
}

aux.EvaluationDelay = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dimitar here. We need to keep exporting evaluation_delay otherwise old clients break.

rules/manager_test.go Outdated Show resolved Hide resolved
rules/manager_test.go Outdated Show resolved Hide resolved
rules/manager_test.go Outdated Show resolved Hide resolved
tsdb/head_read_test.go Outdated Show resolved Hide resolved
tsdb/ooo_head_read_test.go Outdated Show resolved Hide resolved
This reverts commit c9bb46a.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
func (f *mmapedFile) Close() error {
err := f.m.Unmap()
if err != nil {
err = fmt.Errorf("mmapedFile: unmapping: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note : there was a fix to this later: prometheus/prometheus#13948

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually fix is further down to an existing code path, but line numbers didn't line up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should vendor this fix too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this PR merged first

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…nce when reading the setting

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci merged commit ef8f745 into main Jun 6, 2024
7 of 8 checks passed
@pracucci pracucci deleted the gotjosh/update-mimir-prometheus-37b408c branch June 6, 2024 16:47
@gotjosh
Copy link
Contributor Author

gotjosh commented Jun 6, 2024

Thank you all for the help 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet