Add metrics for aggregated discovery #115630

Jefftree · 2023-02-08T21:13:31Z

/kind feature

What this PR does / why we need it:

Add metrics for aggregated discovery. Adds metrics for # of requests split by status code, number of times discovery cache was aggregated, and the duration for aggregation.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Changed metrics for aggregated discovery to publish new time series (alpha).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/cc @apelisse @alexzielenski

cici37 · 2023-02-09T17:37:44Z

/triage accepted

sftim · 2023-02-13T11:27:37Z

The changelog entry doesn't look right. Try this:

-Yes, additional metrics are published for aggregated discovery
+Changed metrics for aggregated discovery to publish new time series (alpha).

Jefftree · 2023-02-14T21:28:35Z

The changelog entry doesn't look right. Try this:

-Yes, additional metrics are published for aggregated discovery
+Changed metrics for aggregated discovery to publish new time series (alpha).

ack, updated.

Jefftree · 2023-02-15T18:32:04Z

/cc @deads2k

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/etag.go

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/metrics.go

apelisse · 2023-03-07T19:07:31Z

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/metrics.go

+			StabilityLevel: metrics.ALPHA,
+		},
+	)
+	regenerationDurationGauge = metrics.NewGauge(


Do we need both? Doesn't the gauge also count the number of occurences? I think we discussed that, sorry if I keep forgetting 😅

Yes both are needed. https://pkg.go.dev/github.com/prometheus/client_golang/prometheus?utm_source=godoc#Gauge, I was planning to use this to record only the latest aggregation duration.

You might be thinking of histogram, and I'm not sure if that's needed for this. It seems more suitable for counting request count/duration rather than aggregation count/duration.

Gauge seems like the wrong choice here? Aren't we trying to gather duration that we'll want to get averages/median times?

In a sense. Gathering durations for something like api requests make sense because we have multiple request for the same discovery doc so they can be bucketed.

For reaggregation, an unchanged discovery doc will only have one data point. Realistically reaggregation only happen when CRDs are modified and the time should be proportional to the # custom resources. I think we really only care about the latest time after all the CRDs are installed (final state). Taking the average doesn't make as much sense because we don't really care about the duration for partial states (eg: half the CRDs are applied).

(I'm using CRD/aggregated apiservers synonymously but the same logic applies to both)

Discussed offline, dropping the duration metric since the number is negligible and we will have pretty good visibility with the regeneration counter as well as the request duration instrumentation.

apelisse · 2023-03-08T21:23:45Z

thanks
/lgtm
/approve

k8s-ci-robot · 2023-03-08T21:23:53Z

LGTM label has been added.

Git tree hash: 8dcb607fdeed872020cfdeec3a8729728d3bd392

Jefftree · 2023-03-08T21:52:26Z

/test pull-kubernetes-integration
Flake: #116364

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/handler.go

deads2k · 2023-03-09T14:11:50Z

just the question about component, lgtm otherwise.

Jefftree · 2023-03-09T17:25:04Z

Updated, thanks!

deads2k · 2023-03-10T14:18:10Z

/lgtm
/approve

k8s-ci-robot · 2023-03-10T14:18:17Z

LGTM label has been added.

Git tree hash: e7418e0975caaf033e2c21ba45eea3de81c045f2

k8s-ci-robot · 2023-03-10T14:18:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: apelisse, deads2k, Jefftree

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-apiserver/OWNERS~~ [deads2k]
~~staging/src/k8s.io/apiserver/OWNERS~~ [deads2k]
~~staging/src/k8s.io/kube-aggregator/OWNERS~~ [deads2k]
~~test/integration/apiserver/OWNERS~~ [Jefftree,apelisse,deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2023-03-10T15:16:32Z

@Jefftree: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-unit	`387d976`	link	unknown	`/test pull-kubernetes-unit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Jefftree · 2023-03-10T15:28:36Z

/retest
Flake: #107414

k8s-ci-robot requested a review from alexzielenski February 8, 2023 21:13

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 8, 2023

k8s-ci-robot requested a review from apelisse February 8, 2023 21:13

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2023

Jefftree force-pushed the agg-discovery-metrics branch from 4b1a190 to 3cbb5cd Compare February 9, 2023 17:48

k8s-ci-robot requested a review from deads2k February 15, 2023 18:32

Jefftree mentioned this pull request Feb 27, 2023

Enable Aggregated Discovery for Beta #116108

Merged

apelisse reviewed Feb 27, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/etag.go Outdated Show resolved Hide resolved

alexzielenski reviewed Feb 27, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/metrics.go Outdated Show resolved Hide resolved

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 2, 2023

Jefftree force-pushed the agg-discovery-metrics branch from a48d511 to 610d1d0 Compare March 2, 2023 22:44

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2023

Jefftree force-pushed the agg-discovery-metrics branch 2 times, most recently from 92a29e5 to 29e6d68 Compare March 3, 2023 16:24

apelisse reviewed Mar 7, 2023

View reviewed changes

Jefftree force-pushed the agg-discovery-metrics branch 2 times, most recently from aee1c51 to 02a110f Compare March 8, 2023 20:58

k8s-ci-robot assigned apelisse Mar 8, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 8, 2023

deads2k reviewed Mar 9, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/handler.go Outdated Show resolved Hide resolved

Add metrics for aggregated discovery

387d976

Jefftree force-pushed the agg-discovery-metrics branch from 02a110f to 387d976 Compare March 9, 2023 17:24

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2023

k8s-ci-robot requested a review from apelisse March 9, 2023 17:24

k8s-ci-robot assigned deads2k Mar 10, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2023

k8s-ci-robot merged commit 2e3c500 into kubernetes:master Mar 10, 2023

k8s-ci-robot added this to the v1.27 milestone Mar 10, 2023

Jefftree mentioned this pull request Mar 10, 2023

Aggregated Discovery kubernetes/enhancements#3352

Open

11 tasks

Jefftree deleted the agg-discovery-metrics branch March 21, 2023 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics for aggregated discovery #115630

Add metrics for aggregated discovery #115630

Jefftree commented Feb 8, 2023 •

edited

cici37 commented Feb 9, 2023

sftim commented Feb 13, 2023

Jefftree commented Feb 14, 2023

Jefftree commented Feb 15, 2023

apelisse Mar 7, 2023

Jefftree Mar 7, 2023 •

edited

apelisse Mar 8, 2023

Jefftree Mar 8, 2023

Jefftree Mar 8, 2023 •

edited

apelisse commented Mar 8, 2023

k8s-ci-robot commented Mar 8, 2023

Jefftree commented Mar 8, 2023

deads2k commented Mar 9, 2023

Jefftree commented Mar 9, 2023

deads2k commented Mar 10, 2023

k8s-ci-robot commented Mar 10, 2023

k8s-ci-robot commented Mar 10, 2023

k8s-ci-robot commented Mar 10, 2023

Jefftree commented Mar 10, 2023

Add metrics for aggregated discovery #115630

Add metrics for aggregated discovery #115630

Conversation

Jefftree commented Feb 8, 2023 • edited

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

cici37 commented Feb 9, 2023

sftim commented Feb 13, 2023

Jefftree commented Feb 14, 2023

Jefftree commented Feb 15, 2023

apelisse Mar 7, 2023

Choose a reason for hiding this comment

Jefftree Mar 7, 2023 • edited

Choose a reason for hiding this comment

apelisse Mar 8, 2023

Choose a reason for hiding this comment

Jefftree Mar 8, 2023

Choose a reason for hiding this comment

Jefftree Mar 8, 2023 • edited

Choose a reason for hiding this comment

apelisse commented Mar 8, 2023

k8s-ci-robot commented Mar 8, 2023

Jefftree commented Mar 8, 2023

deads2k commented Mar 9, 2023

Jefftree commented Mar 9, 2023

deads2k commented Mar 10, 2023

k8s-ci-robot commented Mar 10, 2023

k8s-ci-robot commented Mar 10, 2023

k8s-ci-robot commented Mar 10, 2023

Jefftree commented Mar 10, 2023

Jefftree commented Feb 8, 2023 •

edited

Jefftree Mar 7, 2023 •

edited

Jefftree Mar 8, 2023 •

edited