Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for aggregated discovery #115630

Merged
merged 1 commit into from Mar 10, 2023

Conversation

Jefftree
Copy link
Member

@Jefftree Jefftree commented Feb 8, 2023

/kind feature

What this PR does / why we need it:

Add metrics for aggregated discovery. Adds metrics for # of requests split by status code, number of times discovery cache was aggregated, and the duration for aggregation.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Changed metrics for aggregated discovery to publish new time series (alpha).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @apelisse @alexzielenski

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 8, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 8, 2023
@cici37
Copy link
Contributor

cici37 commented Feb 9, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2023
@sftim
Copy link
Contributor

sftim commented Feb 13, 2023

The changelog entry doesn't look right. Try this:

-Yes, additional metrics are published for aggregated discovery
+Changed metrics for aggregated discovery to publish new time series (alpha).

@Jefftree
Copy link
Member Author

The changelog entry doesn't look right. Try this:

-Yes, additional metrics are published for aggregated discovery
+Changed metrics for aggregated discovery to publish new time series (alpha).

ack, updated.

@Jefftree
Copy link
Member Author

/cc @deads2k

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 2, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2023
@Jefftree Jefftree force-pushed the agg-discovery-metrics branch 2 times, most recently from 92a29e5 to 29e6d68 Compare March 3, 2023 16:24
StabilityLevel: metrics.ALPHA,
},
)
regenerationDurationGauge = metrics.NewGauge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both? Doesn't the gauge also count the number of occurences? I think we discussed that, sorry if I keep forgetting 😅

Copy link
Member Author

@Jefftree Jefftree Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both are needed. https://pkg.go.dev/github.com/prometheus/client_golang/prometheus?utm_source=godoc#Gauge, I was planning to use this to record only the latest aggregation duration.

You might be thinking of histogram, and I'm not sure if that's needed for this. It seems more suitable for counting request count/duration rather than aggregation count/duration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gauge seems like the wrong choice here? Aren't we trying to gather duration that we'll want to get averages/median times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a sense. Gathering durations for something like api requests make sense because we have multiple request for the same discovery doc so they can be bucketed.

For reaggregation, an unchanged discovery doc will only have one data point. Realistically reaggregation only happen when CRDs are modified and the time should be proportional to the # custom resources. I think we really only care about the latest time after all the CRDs are installed (final state). Taking the average doesn't make as much sense because we don't really care about the duration for partial states (eg: half the CRDs are applied).

(I'm using CRD/aggregated apiservers synonymously but the same logic applies to both)

Copy link
Member Author

@Jefftree Jefftree Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, dropping the duration metric since the number is negligible and we will have pretty good visibility with the regeneration counter as well as the request duration instrumentation.

@Jefftree Jefftree force-pushed the agg-discovery-metrics branch 2 times, most recently from aee1c51 to 02a110f Compare March 8, 2023 20:58
@apelisse
Copy link
Member

apelisse commented Mar 8, 2023

thanks
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 8, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8dcb607fdeed872020cfdeec3a8729728d3bd392

@Jefftree
Copy link
Member Author

Jefftree commented Mar 8, 2023

/test pull-kubernetes-integration
Flake: #116364

@deads2k
Copy link
Contributor

deads2k commented Mar 9, 2023

just the question about component, lgtm otherwise.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2023
@Jefftree
Copy link
Member Author

Jefftree commented Mar 9, 2023

Updated, thanks!

@deads2k
Copy link
Contributor

deads2k commented Mar 10, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e7418e0975caaf033e2c21ba45eea3de81c045f2

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: apelisse, deads2k, Jefftree

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

@Jefftree: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit 387d976 link unknown /test pull-kubernetes-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Jefftree
Copy link
Member Author

/retest
Flake: #107414

@k8s-ci-robot k8s-ci-robot merged commit 2e3c500 into kubernetes:master Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Mar 10, 2023
@Jefftree Jefftree deleted the agg-discovery-metrics branch March 21, 2023 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants