Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark series ephemeral in distributor based on series matchers in runtime config #3897

Merged
merged 34 commits into from
Jan 18, 2023

Conversation

replay
Copy link
Contributor

@replay replay commented Jan 9, 2023

What this PR does

This PR makes it possible to mark certain series as ephemeral, based on a list of label matchers that can be configured via the runtime config.
All the code in this PR is behind a feature flag marked as experimental.

The Distributor property EphemeralSeriesProvider should be easily replaceable to inject other ephemeral series provider implementations.

Which issue(s) this PR fixes or relates to

Related: #3884

@replay replay force-pushed the mark_series_ephemeral_in_distributor branch 2 times, most recently from dfaeb66 to e323359 Compare January 10, 2023 12:01
@replay replay changed the title WIP Mark series ephemeral in distributor based on data in kvstore Mark series ephemeral in distributor based on data in kvstore Jan 10, 2023
@replay replay changed the title Mark series ephemeral in distributor based on data in kvstore Mark series ephemeral in distributor based on ephemeral metrics config in kvstore Jan 10, 2023
@replay replay marked this pull request as ready for review January 10, 2023 14:12
@replay replay requested review from a team as code owners January 10, 2023 14:12
@replay replay requested a review from pstibrany January 10, 2023 14:13
@replay replay marked this pull request as draft January 10, 2023 14:29
@pstibrany pstibrany mentioned this pull request Jan 10, 2023
@replay replay force-pushed the mark_series_ephemeral_in_distributor branch 5 times, most recently from 01583c8 to 908cc58 Compare January 12, 2023 13:20
@replay replay changed the title Mark series ephemeral in distributor based on ephemeral metrics config in kvstore Mark series ephemeral in distributor based on ephemeral series matchers in runtime config Jan 12, 2023
Comment on lines 1065 to 1068
// No ephemeral series from ruler.
if req.Source == mimirpb.RULE {
return next(ctx, pushReq)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want this, seems like it could be considered a bug if it isn't well documented. WDYT @pstibrany

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need some way to direct output from rules to persistent storage only, ignoring the config in distributors. This was easiest way, but we can try to come up with a different that fits the design better.

@@ -197,6 +207,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet, logger log.Logger) {
f.Float64Var(&cfg.InstanceLimits.MaxIngestionRate, maxIngestionRateFlag, 0, "Max ingestion rate (samples/sec) that this distributor will accept. This limit is per-distributor, not per-tenant. Additional push requests will be rejected. Current ingestion rate is computed as exponentially weighted moving average, updated every second. 0 = unlimited.")
f.IntVar(&cfg.InstanceLimits.MaxInflightPushRequests, maxInflightPushRequestsFlag, 2000, "Max inflight push requests that this distributor can handle. This limit is per-distributor, not per-tenant. Additional requests will be rejected. 0 = unlimited.")
f.IntVar(&cfg.InstanceLimits.MaxInflightPushRequestsBytes, maxInflightPushRequestsBytesFlag, 0, "The sum of the request sizes in bytes of inflight push requests that this distributor can handle. This limit is per-distributor, not per-tenant. Additional requests will be rejected. 0 = unlimited.")
f.BoolVar(&cfg.EphemeralMetricsEnabled, "distributor.ephemeral-metrics-enabled", false, "Enable marking series as ephemeral based on the given matchers in the runtime config.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we can remove this flag and just leave the feature always enabled, but while it's still experimental I prefer having a kill switch so that we can disable it easily for example if there's a bug in the distributor middleware.

@replay replay marked this pull request as ready for review January 12, 2023 14:26
@replay replay changed the title Mark series ephemeral in distributor based on ephemeral series matchers in runtime config Mark series ephemeral in distributor based on series matchers in runtime config Jan 12, 2023
for ix := 0; ix < len(req.Timeseries); ix++ {
ts := req.Timeseries[ix]

if !ephemeralChecker.IsEphemeral(ts.Labels) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass req.Source to IsEphemeral, then we can have checker decide whether series originated from ruler can be marked as ephemeral or not. That's probably better than having if req.Source == mimirpb.RULE check above. (Can be in separate PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can pass the source to EphemeralChecker call. That would allow us to have a two separate sets of configs, but if someone wants the same config for API and Ruler, then they would need to duplicate them.

Copy link
Contributor Author

@replay replay Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, I like it. So then the list of label matchers would have to be keyed by the source, kind of like:

overrides:
  anonymous:
    ephemeral_series_matchers:
      api:
        - __name__="metric_from_api"
      ruler:
        - __name__="metric_from_ruler"
      any:
        - __name__="always_ephemeral"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this in b8e8c76

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job! I've left some comments that I'd like to see addressed (missing test, some clarifications) before merging.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
continue
}

if cap(deleteTs) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Normally we use len for such check, why is cap better here?

Copy link
Contributor Author

@replay replay Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason why i chose cap() is because if cap() == 0 then the next operation is deleteTs = make([]int, 0, len(req.Timeseries)-ix), this will set the cap() to > 0 but not the len(), so the check and the operation which is decided based on the check are closer together. Relying on len() would work too, but the increased len() on a subsequent iteration is a less direct consequence because it only gets raised further down in the loop via append().

anyway, I'm happy to change it to len() if you prefer that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just relying on the first boolean now instead of len or cap: https://github.com/grafana/mimir/pull/3897/files#diff-8e51742acf8f88fa2ae01855fb19125feb5c54c41023f1eaa8beded9f1b4bcc3R1063-R1076

This made sense because there are two separate actions that need to be taken on the first iteration, only one of which is directly related to the length of deleteTs.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/util/ephemeral/interface.go Outdated Show resolved Hide resolved
pkg/util/ephemeral/interface.go Outdated Show resolved Hide resolved
pkg/util/ephemeral/label_matchers.go Outdated Show resolved Hide resolved
pkg/util/validation/limits.go Outdated Show resolved Hide resolved
pkg/util/validation/limits.go Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Outdated Show resolved Hide resolved
for ix := 0; ix < len(req.Timeseries); ix++ {
ts := req.Timeseries[ix]

if !ephemeralChecker.IsEphemeral(ts.Labels) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can pass the source to EphemeralChecker call. That would allow us to have a two separate sets of configs, but if someone wants the same config for API and Ruler, then they would need to duplicate them.


# (experimental) Enable marking series as ephemeral based on the given matchers
# in the runtime config.
# CLI flag: -distributor.ephemeral-metrics-enabled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's called "ephemeral metrics" while below it's called "ephemeral series". Pick one of the two, so we standardise naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, good catch. I started out calling it "ephemeral metrics", but then we changed the checker so it checks the entire labelset instead of only checking the metric name, so I'll rename it to "ephemeral series" everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done here: 52465b710c2c347b675cf7c2fb959a8c9631f9ac

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! The overall design LGTM. I left Peter review the tiny bits in the logic ;)

pkg/util/validation/limits.go Outdated Show resolved Hide resolved
@@ -168,6 +169,8 @@ type Limits struct {
ForwardingEndpoint string `yaml:"forwarding_endpoint" json:"forwarding_endpoint" doc:"nocli|description=Remote-write endpoint where metrics specified in forwarding_rules are forwarded to. If set, takes precedence over endpoints specified in forwarding rules."`
ForwardingDropOlderThan model.Duration `yaml:"forwarding_drop_older_than" json:"forwarding_drop_older_than" doc:"nocli|description=If set, forwarding drops samples that are older than this duration. If unset or 0, no samples get dropped."`
ForwardingRules ForwardingRules `yaml:"forwarding_rules" json:"forwarding_rules" doc:"nocli|description=Rules based on which the Distributor decides whether a metric should be forwarded to an alternative remote_write API endpoint."`

EphemeralSeriesMatchers ephemeral.LabelMatchers `yaml:"ephemeral_series_matchers" json:"ephemeral_series_matchers" doc:"nocli|description=Matchers to check if a series should be marked ephemeral, matching series get marked to be stored in the ephemeral store." category:"experimental"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "description" here? If we set the description here, the CLI flag description won't be used, but CLI flag description is more detailed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "description" here?

I just followed the example of the line above.

I removed the description here: 3635f97f3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out how to make it use the description from the CLI flag.
Do you know if there's something else that I need to do apart from just removing this doc tag?

@replay
Copy link
Contributor Author

replay commented Jan 16, 2023

Going to force-push to rebase on main.

@replay replay force-pushed the mark_series_ephemeral_in_distributor branch from 2a6013f to 169acf2 Compare January 16, 2023 11:46
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good to me!

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
@@ -297,7 +297,7 @@ func (t *Mimir) initDistributorService() (serv services.Service, err error) {
// ruler's dependency)
canJoinDistributorsRing := t.Cfg.isAnyModuleEnabled(Distributor, Write, All)

t.Distributor, err = distributor.New(t.Cfg.Distributor, t.Cfg.IngesterClient, t.Overrides, t.ActiveGroupsCleanup, t.Ring, canJoinDistributorsRing, t.Registerer, util_log.Logger)
t.Distributor, err = distributor.New(t.Cfg.Distributor, t.Cfg.IngesterClient, t.Overrides, t.ActiveGroupsCleanup, t.Ring, t.Overrides, canJoinDistributorsRing, t.Registerer, util_log.Logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to be able to overrides ephemeral checker in GEM. To do that, we could do the following:

  • introduce t.EphemeralChecker field, and let initDistributorService use that field when constructing distributor
  • add new ephemeral-checker module, with initEphemeralChecker function setting t.EphemeralChecker = t.Overrides. Distributor service module needs to depend on this module.
  • in GEM we can then have yet another new module (say GemEphemeralChecker for this discussion) that depends on ephemeral-checker and sets t.EphemeralChecker to a new value. (We would need to set distributor-service to depend on our module, to make sure that init-methods are called in correct order: initEphemeralChecker, initGemEphemeralChecker and finally initDistributorService)

We can do this preparation in a separate PR if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, I haven't thought further than "inject it into the Distributor's constructor".
I think to keep the PRs small and simple I'd prefer to split that out into another PR, but I'll basically just follow your plan as you described.

pkg/util/ephemeral/label_matchers.go Outdated Show resolved Hide resolved
pkg/util/ephemeral/interface.go Outdated Show resolved Hide resolved
type matcherSet []*labels.Matcher

// matches checks whether all the matchers match the given label set.
func (ms matcherSet) matches(lset []mimirpb.LabelAdapter) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this bit implemented, so I assume that we will optimize it in separate PR.

pkg/util/validation/limits.go Outdated Show resolved Hide resolved
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you very much for addressing my feedback.

// EphemeralChecker returns an object which checks a series and decides whether
// this series should be ephemeral based on the given user's configuration.
// Can return nil if the user has no relevant configuration.
EphemeralChecker(string, mimirpb.WriteRequest_SourceEnum) SeriesChecker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I'd include names of parameters in this method, to better document their meaning. (It's not clear that first parameter is tenant)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done here 9e338b7d7

}

// HasMatchers returns true if there is at least one matcher defined, otherwise it returns false.
func (m *MatcherSetsForSource) HasMatchers() bool {
Copy link
Member

@pstibrany pstibrany Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] There doesn't seem to be a good reason why m needs to be a pointer. (No method of MatcherSetsForSource modifies m)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, done: c56aaa0cd

# (experimental) Enable marking series as ephemeral based on the given matchers
# in the runtime config.
# CLI flag: -distributor.ephemeral-series-enabled
[ephemeral_metrics_enabled: <boolean> | default = false]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YAML config option is still "metrics". Please change it to "series" too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for missing those... fixed: 705f50fcb663fa3cd31021a0187f3dda532501c1

@@ -2826,6 +2831,8 @@ The `limits` block configures default and per-tenant limits imposed by component
# Rules based on which the Distributor decides whether a metric should be
# forwarded to an alternative remote_write API endpoint.
[forwarding_rules: <map of string to validation.ForwardingRule> | default = ]

ephemeral_series_matchers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is empty here. Can you fix it, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into why this doesn't pick up the description from the CLI flag definition...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here, I needed to update the handling of custom types in the doc-generator to make it work: 01b5685

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the mark_series_ephemeral_in_distributor branch from 01b5685 to 51781c8 Compare January 17, 2023 15:29
@replay replay merged commit 3580f8f into main Jan 18, 2023
@replay replay deleted the mark_series_ephemeral_in_distributor branch January 18, 2023 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants