Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: add support for broadcasting metrics from CRI #113609

Merged
merged 5 commits into from Nov 8, 2022

Conversation

haircommander
Copy link
Contributor

@haircommander haircommander commented Nov 3, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR includes #113025 as well as support in the kubelet for pulling the metrics from CRI and broadcasting them to prometheus

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

There are two open questions that I need someone who knows more about the way these endpoints work:

  • should the collecting be async for each metric
  • is it idiomatic to return the key and value of labels as a map pair/struct. I see most examples as having constant labels, and others as building each slice each time, but it seems inefficient to me

Does this PR introduce a user-facing change?

Add alpha support for returning container and pod metrics from CRI, instead of cAdvsior

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/2371

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 3, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 3, 2022
@haircommander
Copy link
Contributor Author

/sig node
/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 3, 2022
@haircommander
Copy link
Contributor Author

I was able to get a successful e2e by using https://github.com/haircommander/cri-o/tree/metrics-wip and this. Running with a local cluster, I could curl the Kubelet's HTTP endpoint sudo curl -kv http://10.0.2.15:10255/metrics/cadvisor and it got me the metrics cri-o was spoofing 😄


message Metric {
string name = 1;
string help = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we want help here? This is a bit unfortunate because it means we need to send help text for every single metric when help is the same for the same metric "name". It seems like it will increase the rpc payload size alot due to the duplication (since the metric is per container).

Maybe we can omit this for now and figure out best way to include this later?

pkg/kubelet/metrics/collectors/cri_metrics.go Outdated Show resolved Hide resolved
staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto Outdated Show resolved Hide resolved
pkg/kubelet/metrics/collectors/cri_metrics.go Outdated Show resolved Hide resolved
Copy link

@LuBingtan LuBingtan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work :)

@k8s-ci-robot k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Nov 4, 2022
@haircommander
Copy link
Contributor Author

a couple of highlights to the updates:

  • Separated CRI calls for ListMetricDescriptors and ListPodSandboxMetrics. That allows for there to be less overhead, as more information can be called once and then cached
  • Added NewConstMetric call and moved to comp-base metrics instead of prometheus
    • I went for INTERNAL metric type to start (WDYT @bobbypage )
  • Did a bit of cleanup here and there

PTAL @dashpole @bobbypage

@haircommander haircommander force-pushed the sandbox-metrics branch 2 times, most recently from 5f6bd86 to 7a71cbf Compare November 4, 2022 20:02
@haircommander
Copy link
Contributor Author

thanks for the review @bobbypage, comments addressed

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

we need to measure performance before enabling beta.

r.RawMustRegister(metrics.NewPrometheusMachineCollector(prometheusHostAdapter{s.host}, includedMetrics))
if utilfeature.DefaultFeatureGate.Enabled(features.PodAndContainerStatsFromCRI) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks sufficiently isolated to make it safe for alpha.

all the code falls back to existing behavior when not enabled.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, derekwaynecarr, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2022
@dims
Copy link
Member

dims commented Nov 8, 2022

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 8, 2022
@bobbypage
Copy link
Member

Thank you for the updates!

/lgtm

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2022
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 8, 2022
haircommander and others added 5 commits November 8, 2022 14:47
so that a caller can use the metrics.Metric structure but still handle
errors

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Added new gRPC call 'ListPodSanboxMetrics' which would return additional
container stats currently supported by cAdvisor, but outside the scope
of /stats/summary api. Added new types to support metric exporting of
prometheus, including Metric and other subfields. Added fake runtime
changes associated with the CRI changes.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
that pulls metrics from the CRI

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
@dims
Copy link
Member

dims commented Nov 8, 2022

re-apply LGTM

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022
@bobbypage
Copy link
Member

/lgtm

@haircommander
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit b4040b3 into kubernetes:master Nov 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants