Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubelet metrics for ephemeral containers #99000

Merged
merged 2 commits into from Jul 8, 2021

Conversation

verb
Copy link
Contributor

@verb verb commented Feb 11, 2021

What type of PR is this?

/kind feature
/sig node
/priority important-soon

What this PR does / why we need it:

This adds alpha metrics to the kubelet that track pods and containers under management and counters for container creation. This is part of the Ephemeral Containers (kubernetes/enhancements#277) PRR.

Which issue(s) this PR fixes:

Fixes #97974

Special notes for your reviewer:

As part of code cleanup this slightly changes the format of log messages of particular container types. This isn't necessary and can be reverted, but using the same label in log messages in metrics seems cleaner and more supportable.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://git.k8s.io/enhancements/keps/sig-node/277-ephemeral-containers

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Feb 11, 2021
@verb
Copy link
Contributor Author

verb commented Feb 11, 2021

/retest

@verb
Copy link
Contributor Author

verb commented Feb 11, 2021

I was going to ask @ehashman for advice on these metrics, but I just found sig-instrumentation docs that make me think that I've got it approximately right.

I'll go ahead finish up my TODO to add tests, remove the WIP hold and then submit it for review.

@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 11, 2021
@ehashman ehashman moved this from Triage to Waiting on Author in SIG Node PR Triage Feb 11, 2021
@ehashman
Copy link
Member

Let me know when this is ready for review and I'll TAL.

@verb verb force-pushed the 1.21-kubelet-metrics branch 2 times, most recently from 314b071 to 437ec78 Compare February 12, 2021 13:13
@verb verb changed the title WIP: Add kubelet managed pod metrics Add kubelet managed pod metrics Feb 12, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 12, 2021
@verb
Copy link
Contributor Author

verb commented Feb 12, 2021

/retest

@verb
Copy link
Contributor Author

verb commented Feb 12, 2021

Looks like timeouts that are unrelated to this change

/retest

@kikisdeliveryservice kikisdeliveryservice added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 12, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 12, 2021
@kikisdeliveryservice
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 12, 2021
@verb
Copy link
Contributor Author

verb commented May 25, 2021

@dashpole Thanks for the guidance here. No, I don't need to compare these two. For ephemeral containers I really only want to answer two questions:

  1. How many ephemeral containers are running on this node?
  2. How many ephemeral containers has this node started? were there errors?

The kubelet discards container type information, so the pod manager was the most convenient place to track the first. It's still unclear on what we want to measure. I plan on bringing it up at tonight's sig-node meetup.

@Random-Liu
Copy link
Member

Random-Liu commented May 25, 2021

I always treated "pod manager" a local cache of the pods in apiserver, thus it contains the "desired state", and no much different with counting pods from the apiserver.

However, I do agree with the point brought up in the meeting that, having the pods in the apiserver doesn't necessarily mean that kubelet has seen it:

  1. There might be a bug that kubelet doesn't see a pod;
  2. There might be a delay before kubelet sees the pod.

So in theory, the "pod manager" does contain some fact about the actual state - "whether kubelet has seen the pods".

However, it is not quite clear to me how useful it is to introduce metrics counting pods in the pod manager.
For case 1), if it happens it is a bug we should troubleshoot, I think the pod events/status and kubelet log is a more useful indication of the actual issue, than comparing the pod number between apiserver and managed pods.
For case 2), if the delay is huge, it is also a bug we should troubleshoot, and the pod events/status and kubelet log can be used to debug the issue, just counting the number may not be that useful; if the delay is minimal, the metrics is almost the same with the counting on the apiserver side.

Given so, I'm not sure adding metrics for managed_pods actually brings more value than simply counting objects on the apiserver side.

The started containers/pods make more sense to me, because it exposes more "actual state" from kubelet, and I guess you are going to use it to measure "container creation rate" and "error rate"?

Please note that once we add these metrics, it becomes part of kubelet api and hard to deprecate. people may start depending on it, and may complain about the metrics don't always match the actual pods in the apiserver, like what happened before for running_pods. #99624

The initial problem we are trying to solve does make sense, can we limit the scope of this change to #97974.

@ehashman
Copy link
Member

ehashman commented Jun 3, 2021

Waiting on changes per the above from @Random-Liu, thanks @verb for all the work you've done here :)

@ehashman ehashman moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Jun 3, 2021
@verb
Copy link
Contributor Author

verb commented Jun 4, 2021

Ah, thanks for pinging this. If I understand correctly, both @Random-Liu and @dchen1107 are in favor of changing the "managed {containers,pods}" gauge to be narrowly scoped for ephemeral containers, and @derekwaynecarr thought the additional information for all types would be useful.

I would want metrics at every stage of an automation system. I agree that logs are better for debugging kubelet problems, but these metrics are for observing cluster state rather than debugging problems with the kubelet. Surfacing information on the kubelet's internal state, even if it is theoretically available from the API server, is useful. The API server may be using different feature flags or perhaps it's unreachable.

I think @dchen1107's point was that solving this problem deserves a full design rather than a hasty solution to meet the PRR requirements for a single feature, so we should add a metric that's specific to that PRR. On the other hand, it seems strange to me to provide this information for ephemeral containers and withhold it for other types of containers.

I can think of 3 different ways forward:

  1. Add container_type to the existing running_containers metrics. This is more monitoring driving design, so let's avoid it.
  2. All of these metrics are marked alpha. We could continue to add alpha metrics and that way, when someone has time to design the kubelet metrics API, there's a ready list of metrics that people actually want with no commitment to support them indefinitely.
  3. Add a managed_ephemeral_containers metric, as suggested by @Random-Liu.

I like the idea of there being some mechanism of iterating on unstable metrics (as in the second), but I don't have time to work on that right now, so I'll just plan on the third option.

This replaces the generic ManagedPod and ManagedContainer kubelet
metrics with a gauge to track only ephemeral container usage.
@verb verb changed the title Add kubelet managed pod metrics WIP: Add kubelet metrics for ephemeral containers Jun 15, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2021
@verb
Copy link
Contributor Author

verb commented Jun 16, 2021

/retest

@verb verb changed the title WIP: Add kubelet metrics for ephemeral containers Add kubelet metrics for ephemeral containers Jun 16, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2021
@verb
Copy link
Contributor Author

verb commented Jun 16, 2021

@ehashman PTAL! Thanks 😄

@ehashman ehashman moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Jun 23, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -431,6 +445,54 @@ var (
},
[]string{"container_state"},
)
// StartedPodsTotal is a counter that tracks pod sandbox creation operations
StartedPodsTotal = metrics.NewCounter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this one is NewCounter but the rest of these new metrics are NewCounterVec because this one doesn't have any additional labels (e.g. container type, error).

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021
@ehashman ehashman moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Jun 23, 2021
@verb
Copy link
Contributor Author

verb commented Jun 25, 2021

/assign @klueska

@verb
Copy link
Contributor Author

verb commented Jun 29, 2021

Hi @ehashman, I see you added this to the "Needs Approver" board. Is there a process for assigning the Approver or should I seek one out? Thanks!

@verb
Copy link
Contributor Author

verb commented Jul 5, 2021

/assign @dchen1107

@dchen1107
Copy link
Member

/lgtm
/approve

Thanks for narrow down the scope to the ephemeral containers, and renamed some of metrics per reviewers' and SIG Node's suggests. Also two reviewers from SIG instruments are ok with the small overlap even before the narrow down.

However, it is not quite clear to me how useful it is to introduce metrics counting pods in the pod manager.
For case 1), if it happens it is a bug we should troubleshoot, I think the pod events/status and kubelet log is a more useful indication of the actual issue, than comparing the pod number between apiserver and managed pods.
For case 2), if the delay is huge, it is also a bug we should troubleshoot, and the pod events/status and kubelet log can be used to debug the issue, just counting the number may not be that useful; if the delay is minimal, the metrics is almost the same with the counting on the apiserver side.

Discussed this at SIG Node a while back, and here is my take on this very topic: Logging and Metrics are serving the different purpose for observability and debuggability. Metrics are usually measured a time series, and can be optimized for the storage and longer retention of data, and easily build dashboards to reflect historical trends and be aggregated into daily or weekly frequency. This is why normally admins / SREs prefer to use metrics to detect the abnormal states, and the developer looking into the logs for the troubleshooting.

Back to this particular PR, I have to admit that the current metrics being introduced here might be the best effort. In a real environment, kubelet might be restarted due to different reasons. Once the kubelet is restarted, the count of the container starts with error would be lost, even the current running containers & pods can be re-generated through the current state. Just want to point it out this.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, verb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 8, 2021
@ehashman ehashman moved this from Needs Approver to Done in SIG Node PR Triage Jul 8, 2021
@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jul 8, 2021
@k8s-ci-robot k8s-ci-robot merged commit 7c84064 into kubernetes:master Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Create kubelet metrics for ephemeral containers creation and usage
9 participants