Distributor: add per-tenant counters for total and received requests #2770

56quarters · 2022-08-18T18:38:23Z

Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com

What this PR does

Add per-tenant counters for total and received requests to provide visibility
into what an appropriate value for a per-tenant request limit might be.

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Add per-tenant counters for total and received requests to provide visibility into what an appropriate value for a per-tenant request limit might be. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

replay · 2022-08-18T19:02:48Z

pkg/distributor/distributor.go

@@ -629,6 +643,7 @@ func (d *Distributor) wrapPushWithMiddlewares(next push.Func) push.Func {
 	// To guarantee that, middleware functions will be called in reversed order, wrapping the
 	// result from previous call.
 	middlewares = append(middlewares, d.instanceLimitsMiddleware) // should run first
+	middlewares = append(middlewares, d.prePushUserRequestMiddleware)


the instanceLimitsMiddleware could already reject some requests, shouldn't those be counted too?

I'm not sure. We don't increment per-tenant metrics for anything else before checking the instance limits.

The way the other per-tenant limits work is:

The total metric is "up to this number of X may be accepted"

The rejected metric is "this number of X was rejected or dropped for reason Y"

The received metric is "this number made it all the way to a distributor and will be sent to an ingester"

I'm having trouble fitting the instance limits into that model. We'd increase the "up to this number of X may be accepted" metric but then they'd all be rejected due to something the user has no control over. We'd need to add a "system is overloaded" reason to the rejection reasons for the various validation metrics which doesn't make sense to me (since this is an operational issue, not a validation one).

Additionally, hitting the instance limits results in 500 errors which will be retried by the user. If we didn't run that middleware first, we're potentially double counting user samples/exemplars/requests. This is a problem since these counters exist to provide visibility into the dimensions that we're limiting users on. It'd be confusing to increase the count of requests without a corresponding "rejection" metric when hitting instance limits.

pstibrany · 2022-08-19T14:21:30Z

pkg/distributor/distributor.go

+		}
+
+		cleanupInDefer = false
+		d.incomingRequests.WithLabelValues(userID).Add(1)


This line is literally the only required change in business logic. Shall we include it in d.instanceLimitsMiddleware instead? (We can rename that middleware, if that bothers you.)

Yeah, it's the only line. Sure, I can add it to the instance limits middleware.

Updated in 1777b9c

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pstibrany

Metric names are not quite clear to me. In a month or two I will not remember difference between cortex_distributor_received_requests_total and cortex_distributor_requests_in_total. Without checking help, I'd think that "cortex_distributor_received_requests_total" is actually ALL requests. Can we try to find better names?

Thinking out loudly:

"incoming" (all) vs "processed" (after initial checks)
"incoming" (all) vs "accepted" (as in "accepted" by distributor, not necessarily ingester)
"incoming" vs "handled"

(that's all variation on the same theme)

56quarters · 2022-08-19T14:57:24Z

Metric names are not quite clear to me. In a month or two I will not remember difference between cortex_distributor_received_requests_total and cortex_distributor_requests_in_total. Without checking help, I'd think that "cortex_distributor_received_requests_total" is actually ALL requests. Can we try to find better names?

Thinking out loudly:
* "incoming" (all) vs "processed" (after initial checks)

* "incoming" (all) vs "accepted" (as in "accepted" by distributor, not necessarily ingester)

* "incoming" vs "handled"
(that's all variation on the same theme)

Yes, BUT these names were picked to match the other per-tenant metrics for samples, exemplars, and metadata. I'd rather not "fix" these ones and break the pattern used by the others.

pstibrany · 2022-08-19T15:10:21Z

Yes, BUT these names were picked to match the other per-tenant metrics for samples, exemplars, and metadata. I'd rather not "fix" these ones and break the pattern used by the others.

Oh, I see. I missed that we have similarly named metrics for other stuff.

Looking at those metrics, I think some of them are broken after HA deduplication has been moved to separate middleware. (ping @replay)

56quarters · 2022-08-19T15:12:20Z

Looking at those metrics, I think some of them are broken after HA deduplication has been moved to separate middleware. (ping @replay)

I'll open an issue to fix those metrics now that we've split various checks into middleware.

pracucci

Can you add a unit test on the metric please? Should be easy to add the metric check to existing test cases.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Distributor: add per-tenant counters for total and received requests

41e9eda

Add per-tenant counters for total and received requests to provide visibility into what an appropriate value for a per-tenant request limit might be. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters force-pushed the 56quarters/per-user-requests branch from 4d717b7 to 41e9eda Compare August 18, 2022 18:49

56quarters marked this pull request as ready for review August 18, 2022 18:58

replay reviewed Aug 18, 2022

View reviewed changes

56quarters requested a review from replay August 19, 2022 13:26

pstibrany reviewed Aug 19, 2022

View reviewed changes

Code review changes

1777b9c

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters requested a review from pstibrany August 19, 2022 14:40

pstibrany reviewed Aug 19, 2022

View reviewed changes

56quarters requested a review from pstibrany August 19, 2022 15:09

pstibrany approved these changes Aug 19, 2022

View reviewed changes

56quarters mentioned this pull request Aug 19, 2022

Some distributor metrics are incorrect after middleware refactoring #2783

Closed

56quarters merged commit 3dc50e2 into main Aug 19, 2022

56quarters deleted the 56quarters/per-user-requests branch August 19, 2022 15:22

pracucci reviewed Aug 19, 2022

View reviewed changes

56quarters added a commit that referenced this pull request Aug 19, 2022

Add tests for per-tenant request metrics introduced in #2770

5a87fb1

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters added a commit that referenced this pull request Aug 19, 2022

Add tests for per-tenant request metrics introduced in #2770 (#2786)

ae57256

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters self-assigned this Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributor: add per-tenant counters for total and received requests #2770

Distributor: add per-tenant counters for total and received requests #2770

56quarters commented Aug 18, 2022 •

edited

Loading

replay Aug 18, 2022

56quarters Aug 18, 2022 •

edited

Loading

56quarters Aug 18, 2022 •

edited

Loading

pstibrany Aug 19, 2022

56quarters Aug 19, 2022

56quarters Aug 19, 2022

pstibrany left a comment

56quarters commented Aug 19, 2022

pstibrany commented Aug 19, 2022

56quarters commented Aug 19, 2022

pracucci left a comment

Distributor: add per-tenant counters for total and received requests #2770

Distributor: add per-tenant counters for total and received requests #2770

Conversation

56quarters commented Aug 18, 2022 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

replay Aug 18, 2022

Choose a reason for hiding this comment

56quarters Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

56quarters Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

pstibrany Aug 19, 2022

Choose a reason for hiding this comment

56quarters Aug 19, 2022

Choose a reason for hiding this comment

56quarters Aug 19, 2022

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

56quarters commented Aug 19, 2022

pstibrany commented Aug 19, 2022

56quarters commented Aug 19, 2022

pracucci left a comment

Choose a reason for hiding this comment

56quarters commented Aug 18, 2022 •

edited

Loading

56quarters Aug 18, 2022 •

edited

Loading

56quarters Aug 18, 2022 •

edited

Loading