Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vendored mimir-prometheus #7057

Merged
merged 3 commits into from
Jan 15, 2024
Merged

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Jan 5, 2024

What this PR does

In this PR I'm upgrading the vendored mimir-prometheus and their deps.

Prometheus changes

The analysis of all Prometheus changes can be found here: grafana/mimir-prometheus#580

In particular, what's affecting Mimir:

  • MetricType moved from github.com/prometheus/prometheus/model/textparse to github.com/prometheus/common/model
  • Histogram.ToFloat() now takes in input *FloatHistogram to eventually reuse it (if nil the old behaviour is preserved). In this PR I'm always passing nil to try not burying any logic change. As separate work we can investigate if we would get any perf benefit recycling the FloatHistogram.
  • discovery.NewManager() now takes in input the metrics Registerer instead of registering metrics on the global one. I've changed Mimir code to pass the per-tenant registerer, so metrics will be renamed and will have the user label attached (see CHANGELOG for the list of renamed metrics). Similar change for refresh.NewDiscovery(). It's important to note that these metrics are only exposed when ruler is configured to discover Alertmanager via service discovery, which is not how we run Mimir in Jsonnet, Helm or at Grafana Labs. None of our dashboards / alerts queries such metrics.
  • Appender interface added AppendCTZeroSample(). It's still a work in progress in Prometheus. We can defer the actual support for now.
  • Prometheus moved from github.com/pkg/errors to the standard errors package. This means that github.com/pkg/errors.Cause() will not work anymore on errors wrapped by Prometheus. To avoid subtle bugs, I've removed the usage of Cause() on errors returned by Prometheus TSDB and PromQL. There are few other places where we're still using Cause(), but these are in few selected places where there shouldn't be any Prometheus error. In a separate PR, I will propose to get rid of errors.Cause() completely (but I prefer to keep it separated, to make the review process safer).
  • There are 3 new PromQL functions. Currently they're all experimental functions and in Prometheus they need to be enabled via a feature flag:
    • mad_over_time: the function is shardable, like other _over_time() functions (conceptually it's not different than quantile_over_time()).
    • sort_by_label and sort_by_label_desc: like sort() and sort_desc(), these functions are currently not safe to be parallelised by query sharding, so I've added them to the exclusion list.
    • I've introduced -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to allow to enable experimental functions (disabled by default).

Other vendored libs changes

-       go.opentelemetry.io/otel v1.19.0
-       go.opentelemetry.io/otel/trace v1.19.0
+       go.opentelemetry.io/otel v1.21.0
+       go.opentelemetry.io/otel/trace v1.21.0

Cheched the release notes. Many changes on tracing, few on metrics. Since many of such changes are related to packages we don't import, I've checked the actual code diff (it's small).

The change in this PR affects us. The PR description explains the rationale, but basically any OTel tracer implementation needs to define a fallback for non implemented functions (I've picked the noop one, we aim to implement all functions anyway).

-       github.com/hashicorp/consul/api v1.25.1 // indirect
+       github.com/hashicorp/consul/api v1.26.1 // indirect

Checked the diff and looks just adding new features:

-       github.com/miekg/dns v1.1.56 // indirect
+       github.com/miekg/dns v1.1.57 // indirect

Checked the diff and looks mostly code cleanup:

Other deps we don't use directly

  • github.com/felixge/httpsnoop
  • github.com/fsnotify/fsnotify
  • github.com/go-logr/logr

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@pracucci pracucci force-pushed the upgrade-vendored-mimir-prometheus branch from 45d9b01 to 8b0e00a Compare January 5, 2024 17:27
@pracucci pracucci marked this pull request as ready for review January 5, 2024 17:27
@pracucci pracucci requested review from grafanabot and a team as code owners January 5, 2024 17:27
@@ -60,5 +64,5 @@ func NewPromQLEngineOptions(cfg Config, activityTracker *activitytracker.Activit
NoStepSubqueryIntervalFn: func(int64) int64 {
return cfg.DefaultEvaluationInterval.Milliseconds()
},
}
}, cfg.PromQLExperimentalFunctionsEnabled
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: the reason why I propose to return cfg.PromQLExperimentalFunctionsEnabled from NewPromQLEngineOptions() even if it's based on the input cfg is to have a stronger way to signal the callers to also set the PromQL parser's option to enable the experimental functions.

@aknuds1
Copy link
Contributor

aknuds1 commented Jan 8, 2024

I've introduced -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to allow to enable experimental functions (disabled by default).

Should this be in the changelog?

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just wondering about the following:

I've introduced -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to allow to enable experimental functions (disabled by default).

Should this be in the changelog?

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm. Agree with Arve that we should include new option in the changelog. I think we should also document which functions it affects.

(Personal note: It's funny to see sort by labels function added, as the PR implementing this function was based on my old PR from 7 years ago. Seeing it now in Mimir, I'm not sure it's a good fit, but here we are... 😂).

@@ -44,10 +46,12 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.MaxSamples, "querier.max-samples", 50e6, sharedWithQueryFrontend("Maximum number of samples a single query can load into memory."))
f.DurationVar(&cfg.DefaultEvaluationInterval, "querier.default-evaluation-interval", time.Minute, sharedWithQueryFrontend("The default evaluation interval or step size for subqueries."))
f.DurationVar(&cfg.LookbackDelta, "querier.lookback-delta", 5*time.Minute, sharedWithQueryFrontend("Time since the last sample after which a time series is considered stale and ignored by expression evaluations."))
f.BoolVar(&cfg.PromQLExperimentalFunctionsEnabled, "querier.promql-experimental-functions-enabled", false, sharedWithQueryFrontend("True to enable support for experimental PromQL functions."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] It can also be set without passing =true value, simplify wording:

Suggested change
f.BoolVar(&cfg.PromQLExperimentalFunctionsEnabled, "querier.promql-experimental-functions-enabled", false, sharedWithQueryFrontend("True to enable support for experimental PromQL functions."))
f.BoolVar(&cfg.PromQLExperimentalFunctionsEnabled, "querier.promql-experimental-functions-enabled", false, sharedWithQueryFrontend("Enable experimental PromQL functions."))

Shall we include list of experimental functions? How does user know when to use this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified CLI description in 35db456.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we include list of experimental functions? How does user know when to use this?

Yes, it's in about-versioning.md and I've just added to CHANGELOG too.

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the behaviour for the CT zero samples (and implement them eventually because they look cool)

I also agree with Arve about the changelog :)

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for carrying this out

(Assuming you address Peter's and Arve's comments)

@pracucci pracucci force-pushed the upgrade-vendored-mimir-prometheus branch from 9283af6 to fdfcd20 Compare January 15, 2024 07:41
Added PR number to CHANGELOG
Do not use errors.Cause() on errors returned by Prometheus
Fixed TestConfig_TranslatesToPrometheusTargetGroup
Fixed TestConfig_ConstructsLookupNamesCorrectly
Updated query sharding with new PromQL experimental functions
Added -querier.promql-experimental-functions-enabled support
Fix linter issues
Added new CLI flag to list of experimental features

Signed-off-by: Marco Pracucci <marco@pracucci.com>
…ns-enabled

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the upgrade-vendored-mimir-prometheus branch from fdfcd20 to 35db456 Compare January 15, 2024 07:45
@pracucci
Copy link
Collaborator Author

I've introduced -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to allow to enable experimental functions (disabled by default).
Should this be in the changelog?

Yes, absolutely. I just forgot about it, and added in 35db456.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci merged commit ee9b483 into main Jan 15, 2024
28 checks passed
@pracucci pracucci mentioned this pull request Jan 25, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants