-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark series ephemeral in distributor based on series matchers in runtime config #3897
Conversation
dfaeb66
to
e323359
Compare
01583c8
to
908cc58
Compare
pkg/distributor/distributor.go
Outdated
// No ephemeral series from ruler. | ||
if req.Source == mimirpb.RULE { | ||
return next(ctx, pushReq) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want this, seems like it could be considered a bug if it isn't well documented. WDYT @pstibrany
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need some way to direct output from rules to persistent storage only, ignoring the config in distributors. This was easiest way, but we can try to come up with a different that fits the design better.
pkg/distributor/distributor.go
Outdated
@@ -197,6 +207,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet, logger log.Logger) { | |||
f.Float64Var(&cfg.InstanceLimits.MaxIngestionRate, maxIngestionRateFlag, 0, "Max ingestion rate (samples/sec) that this distributor will accept. This limit is per-distributor, not per-tenant. Additional push requests will be rejected. Current ingestion rate is computed as exponentially weighted moving average, updated every second. 0 = unlimited.") | |||
f.IntVar(&cfg.InstanceLimits.MaxInflightPushRequests, maxInflightPushRequestsFlag, 2000, "Max inflight push requests that this distributor can handle. This limit is per-distributor, not per-tenant. Additional requests will be rejected. 0 = unlimited.") | |||
f.IntVar(&cfg.InstanceLimits.MaxInflightPushRequestsBytes, maxInflightPushRequestsBytesFlag, 0, "The sum of the request sizes in bytes of inflight push requests that this distributor can handle. This limit is per-distributor, not per-tenant. Additional requests will be rejected. 0 = unlimited.") | |||
f.BoolVar(&cfg.EphemeralMetricsEnabled, "distributor.ephemeral-metrics-enabled", false, "Enable marking series as ephemeral based on the given matchers in the runtime config.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we can remove this flag and just leave the feature always enabled, but while it's still experimental I prefer having a kill switch so that we can disable it easily for example if there's a bug in the distributor middleware.
pkg/distributor/distributor.go
Outdated
for ix := 0; ix < len(req.Timeseries); ix++ { | ||
ts := req.Timeseries[ix] | ||
|
||
if !ephemeralChecker.IsEphemeral(ts.Labels) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass req.Source
to IsEphemeral
, then we can have checker decide whether series originated from ruler can be marked as ephemeral or not. That's probably better than having if req.Source == mimirpb.RULE
check above. (Can be in separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we can pass the source to EphemeralChecker
call. That would allow us to have a two separate sets of configs, but if someone wants the same config for API and Ruler, then they would need to duplicate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, I like it. So then the list of label matchers would have to be keyed by the source, kind of like:
overrides:
anonymous:
ephemeral_series_matchers:
api:
- __name__="metric_from_api"
ruler:
- __name__="metric_from_ruler"
any:
- __name__="always_ephemeral"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this in b8e8c76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice job! I've left some comments that I'd like to see addressed (missing test, some clarifications) before merging.
pkg/distributor/distributor.go
Outdated
continue | ||
} | ||
|
||
if cap(deleteTs) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Normally we use len
for such check, why is cap
better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason why i chose cap()
is because if cap() == 0
then the next operation is deleteTs = make([]int, 0, len(req.Timeseries)-ix)
, this will set the cap()
to > 0
but not the len()
, so the check and the operation which is decided based on the check are closer together. Relying on len()
would work too, but the increased len()
on a subsequent iteration is a less direct consequence because it only gets raised further down in the loop via append()
.
anyway, I'm happy to change it to len()
if you prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just relying on the first
boolean now instead of len
or cap
: https://github.com/grafana/mimir/pull/3897/files#diff-8e51742acf8f88fa2ae01855fb19125feb5c54c41023f1eaa8beded9f1b4bcc3R1063-R1076
This made sense because there are two separate actions that need to be taken on the first iteration, only one of which is directly related to the length of deleteTs
.
pkg/distributor/distributor.go
Outdated
for ix := 0; ix < len(req.Timeseries); ix++ { | ||
ts := req.Timeseries[ix] | ||
|
||
if !ephemeralChecker.IsEphemeral(ts.Labels) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we can pass the source to EphemeralChecker
call. That would allow us to have a two separate sets of configs, but if someone wants the same config for API and Ruler, then they would need to duplicate them.
|
||
# (experimental) Enable marking series as ephemeral based on the given matchers | ||
# in the runtime config. | ||
# CLI flag: -distributor.ephemeral-metrics-enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's called "ephemeral metrics" while below it's called "ephemeral series". Pick one of the two, so we standardise naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, good catch. I started out calling it "ephemeral metrics", but then we changed the checker so it checks the entire labelset instead of only checking the metric name, so I'll rename it to "ephemeral series" everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done here: 52465b710c2c347b675cf7c2fb959a8c9631f9ac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! The overall design LGTM. I left Peter review the tiny bits in the logic ;)
pkg/util/validation/limits.go
Outdated
@@ -168,6 +169,8 @@ type Limits struct { | |||
ForwardingEndpoint string `yaml:"forwarding_endpoint" json:"forwarding_endpoint" doc:"nocli|description=Remote-write endpoint where metrics specified in forwarding_rules are forwarded to. If set, takes precedence over endpoints specified in forwarding rules."` | |||
ForwardingDropOlderThan model.Duration `yaml:"forwarding_drop_older_than" json:"forwarding_drop_older_than" doc:"nocli|description=If set, forwarding drops samples that are older than this duration. If unset or 0, no samples get dropped."` | |||
ForwardingRules ForwardingRules `yaml:"forwarding_rules" json:"forwarding_rules" doc:"nocli|description=Rules based on which the Distributor decides whether a metric should be forwarded to an alternative remote_write API endpoint."` | |||
|
|||
EphemeralSeriesMatchers ephemeral.LabelMatchers `yaml:"ephemeral_series_matchers" json:"ephemeral_series_matchers" doc:"nocli|description=Matchers to check if a series should be marked ephemeral, matching series get marked to be stored in the ephemeral store." category:"experimental"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the "description" here? If we set the description here, the CLI flag description won't be used, but CLI flag description is more detailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the "description" here?
I just followed the example of the line above.
I removed the description here: 3635f97f3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out how to make it use the description from the CLI flag.
Do you know if there's something else that I need to do apart from just removing this doc
tag?
Going to force-push to rebase on |
2a6013f
to
169acf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks good to me!
@@ -297,7 +297,7 @@ func (t *Mimir) initDistributorService() (serv services.Service, err error) { | |||
// ruler's dependency) | |||
canJoinDistributorsRing := t.Cfg.isAnyModuleEnabled(Distributor, Write, All) | |||
|
|||
t.Distributor, err = distributor.New(t.Cfg.Distributor, t.Cfg.IngesterClient, t.Overrides, t.ActiveGroupsCleanup, t.Ring, canJoinDistributorsRing, t.Registerer, util_log.Logger) | |||
t.Distributor, err = distributor.New(t.Cfg.Distributor, t.Cfg.IngesterClient, t.Overrides, t.ActiveGroupsCleanup, t.Ring, t.Overrides, canJoinDistributorsRing, t.Registerer, util_log.Logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to be able to overrides ephemeral checker in GEM. To do that, we could do the following:
- introduce
t.EphemeralChecker
field, and letinitDistributorService
use that field when constructing distributor - add new
ephemeral-checker
module, withinitEphemeralChecker
function settingt.EphemeralChecker = t.Overrides
. Distributor service module needs to depend on this module. - in GEM we can then have yet another new module (say
GemEphemeralChecker
for this discussion) that depends onephemeral-checker
and setst.EphemeralChecker
to a new value. (We would need to set distributor-service to depend on our module, to make sure that init-methods are called in correct order:initEphemeralChecker
,initGemEphemeralChecker
and finallyinitDistributorService
)
We can do this preparation in a separate PR if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea, I haven't thought further than "inject it into the Distributor's constructor".
I think to keep the PRs small and simple I'd prefer to split that out into another PR, but I'll basically just follow your plan as you described.
type matcherSet []*labels.Matcher | ||
|
||
// matches checks whether all the matchers match the given label set. | ||
func (ms matcherSet) matches(lset []mimirpb.LabelAdapter) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this bit implemented, so I assume that we will optimize it in separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you very much for addressing my feedback.
pkg/util/ephemeral/interface.go
Outdated
// EphemeralChecker returns an object which checks a series and decides whether | ||
// this series should be ephemeral based on the given user's configuration. | ||
// Can return nil if the user has no relevant configuration. | ||
EphemeralChecker(string, mimirpb.WriteRequest_SourceEnum) SeriesChecker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I'd include names of parameters in this method, to better document their meaning. (It's not clear that first parameter is tenant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done here 9e338b7d7
pkg/util/ephemeral/label_matchers.go
Outdated
} | ||
|
||
// HasMatchers returns true if there is at least one matcher defined, otherwise it returns false. | ||
func (m *MatcherSetsForSource) HasMatchers() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] There doesn't seem to be a good reason why m
needs to be a pointer. (No method of MatcherSetsForSource
modifies m)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, done: c56aaa0cd
# (experimental) Enable marking series as ephemeral based on the given matchers | ||
# in the runtime config. | ||
# CLI flag: -distributor.ephemeral-series-enabled | ||
[ephemeral_metrics_enabled: <boolean> | default = false] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The YAML config option is still "metrics". Please change it to "series" too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for missing those... fixed: 705f50fcb663fa3cd31021a0187f3dda532501c1
@@ -2826,6 +2831,8 @@ The `limits` block configures default and per-tenant limits imposed by component | |||
# Rules based on which the Distributor decides whether a metric should be | |||
# forwarded to an alternative remote_write API endpoint. | |||
[forwarding_rules: <map of string to validation.ForwardingRule> | default = ] | |||
|
|||
ephemeral_series_matchers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is empty here. Can you fix it, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into why this doesn't pick up the description from the CLI flag definition...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here, I needed to update the handling of custom types in the doc-generator to make it work: 01b5685
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
01b5685
to
51781c8
Compare
What this PR does
This PR makes it possible to mark certain series as ephemeral, based on a list of label matchers that can be configured via the runtime config.
All the code in this PR is behind a feature flag marked as experimental.
The Distributor property
EphemeralSeriesProvider
should be easily replaceable to inject other ephemeral series provider implementations.Which issue(s) this PR fixes or relates to
Related: #3884