-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KEP for cAdvisor-less, CRI-full Container and Pod Stats #2364
Conversation
Welcome @haircommander! |
Hi @haircommander. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
First time doing this, not sure if I did everything right :) /cc @bobbypage (esteemed coauthor) |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general idea and would support the efforts :)
Summary API has two interfaces: | ||
* [cAdvisor stats provider](https://github.com/kubernetes/kubernetes/blob/release-1.20/pkg/kubelet/stats/cadvisor_stats_provider.go) | ||
* Calls cAdvisor directly to obtain node, pod, and container stats | ||
* [CRI stats provider](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cri_stats_provider.go#L54) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: link to a commit or release branch to not implicitly invalidate the link in the future.
| | InodesFree | cAdvisor or N/A | cAdvisor or N/A | | ||
| | Inodes | cAdvisor or N/A | cAdvisor or N/A | | ||
| | InodesUsed | CRI | CRI | | ||
| UserDefinedMetrics | All Fields | cAdvisor | N/A | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we proposing to remove them completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that depends on how we introduce these changes. The proposition as defined is that we don't implement them in CRI API. I believe they're deemed less useful because users can register custom metrics with prometheus in a more idiomatic way--and that UserDefinedMetrics were for a prometheus-less world.
If these changes are absorbed into the Kubelet by a new stats provider (leaving the old cri stats provider in tact), then the new one would just not support user defined metrics
if these changes were absorbed into the Kubelet by adapting the old cri stats provider, we likely would have to scrape these metrics from cAdvisor.
I'm generally more in favor of the former option. I believe there's a TODO below that notes that we should decide how to handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the field to be cAdvisor or N/A
to better reflect that intention
e92368d
to
3df7def
Compare
* There are some fields that are defined in SummaryAPI (e.g time in a lot). Do we add those? | ||
* How do we tell cAdvisor to not collect container metrics? | ||
* - Give it a bogus root? | ||
* - Flag? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a configuration file as a third option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find evidence there is a config file for cAdvisor. @bobbypage is this an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added "cAdvisor configuration option", which could mean config file if possible, or the config struct used in cAdvisor startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cAdvisor doesn't support file based config currently. Adding something like that may make sense, or maybe just adding a flag if needed similar to how kubelet configures cAdvisor options currently: https://github.com/kubernetes/kubernetes/blob/5d6dc8d/pkg/kubelet/cadvisor/cadvisor_linux.go#L70-L76
Thank you for the ping, taking a look at this. |
/cc |
@fuweid: GitHub didn't allow me to request PR reviews from the following users: fuweid. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Peter Hunt <pehunt@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
PRR looks good
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, ehashman, haircommander The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I think all of Dawn's comments have been addressed so... /lgtm |
A note for SIG Windows @kubernetes/sig-windows-leads Had offline discussion with @bobbypage. There are still open questions for Windows specific fields. Since we haven't heard the feedbacks from SIG Windows, we decided to move forward, but treat them as part of our alpha investigation. Please raise your concerns and suggestion to us. |
xref initial kubelet implementation kubernetes/kubernetes#103095 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super excited to see movement on this!
I don't really see any consideration for cgroups v2 in this KEP. The reason I say this is because several of the metrics required in this KEP were deemed optional (because unavailable in cgroups v2) in @giuseppe's cgroups v2 proposal, and I don't see the ramifications of that reflected in this KEP. That shouldn't block this work from going alpha, but I feel it's probably something that will need to be figured out for beta.
Other large metrics work like this also took the kubernetes-mixin (by far the most popular collection of recording & alerting rules and dashboards using these metrics) into consideration, in case any metrics do end up changing (I realize that the goal is that that doesn't happen, but I find it hard to believe that 100% will stay unchanged). I think it would be good to keep this in mind for this work as well.
3. Add support for the new CRI additions in supported container runtimes (CRI-O and containerd). | ||
4. Switch Kubelet's CRI stats provider from querying container and pod level stats from cAdvisor to newly added CRI pod and container level stats | ||
5. cAdvisor should stop collecting container and pod level stats. If any other components need container or pod level stats from cAdvisor, the CRI implementation should be queried instead. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to "Graduation criteria", this should state "cAdvisor should be updated to support no longer collecting stats that are duplicated with CRI implementation" as cAdvisor supports much larger set of container statistics than CRI is planned to.
### /metrics/cadvisor | ||
|
||
1. Expose the metric fields provided in `/metrics/cadvisor` in an analogous Prometheus endpoint directly from the CRI implementation. | ||
2. cAdvisor should stop collecting container and pod level stats, as well as stop broadcasting from `/metrics/cadvisor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to "Graduation criteria", this should state "cAdvisor should be updated to support no longer collecting stats that are duplicated with CRI implementation" as cAdvisor supports much larger set of container statistics than CRI is planned to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, I've opened #3003 PTAL
Signed-off-by: Peter Hunt pehunt@redhat.com