Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cAdvisor-less, CRI-full Container and Pod Stats #2371

Open
10 of 17 tasks
Tracked by #278
haircommander opened this issue Jan 29, 2021 · 136 comments
Open
10 of 17 tasks
Tracked by #278

cAdvisor-less, CRI-full Container and Pod Stats #2371

haircommander opened this issue Jan 29, 2021 · 136 comments
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. stage/beta Denotes an issue tracking an enhancement targeted for Beta status

Comments

@haircommander
Copy link
Contributor

haircommander commented Jan 29, 2021

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 29, 2021
@haircommander
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 29, 2021
@ehashman
Copy link
Member

ehashman commented Feb 2, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 2, 2021
@annajung annajung added stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Feb 3, 2021
@JamesLaverack
Copy link
Member

Hey @haircommander and @bobbypage, enhancements 1.21 shadow here,

Enhancements Freeze is 2 days away, Feb 9th EOD PST

The enhancements team is aware that KEP update is currently in progress (PR #2364). Please make sure to work on PRR questionnaires and requirements and get it merged before the freeze. For PRR related questions or to boost the PR for PRR review, please reach out in Slack on the #prod-readiness channel.

Any enhancements that do not complete the following requirements by the freeze will require an exception.

  • [IN PROGRESS] The KEP must be merged in an implementable state
  • [IN PROGRESS] The KEP must have test plans
  • [IN PROGRESS] The KEP must have graduation criteria
  • [IN PROGRESS] The KEP must have a production readiness review

@annajung
Copy link
Contributor

Hi @haircommander and @bobbypage, 1.21 Enhancements Lead here.

Enhancements Freeze is now in effect.

Unfortunately, your KEP needed to be updated and the PR has not yet merged. If you wish to be included in the 1.21 Release, please submit an Exception Request as soon as possible.

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.21 milestone Feb 10, 2021
@annajung annajung added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Feb 10, 2021
@ehashman
Copy link
Member

ehashman commented May 4, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 4, 2021
@JamesLaverack JamesLaverack added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team and removed tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team labels May 5, 2021
@salaxander
Copy link
Contributor

Hey @haircommander and @bobbypage - 1.22 enhancements team here! Just a reminder that enhancements freeze is coming up on 5/13. We'll need the KEP merged before then to get this included in the 1.22 release.

Let us know if there's anything we can do to help before then!

@salaxander
Copy link
Contributor

@haircommander @bobbypage #2364 merged so we've got you tracked for 1.22 :)

@PI-Victor
Copy link
Member

Hello @haircommander @bobbypage 👋, 1.22 Docs release lead here.
This enhancement is marked as ‘Needs Docs’ for 1.22 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.22 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Fri July 9, 11:59 PM PDT.
 Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you!

@haircommander
Copy link
Contributor Author

thanks for the heads up @PI-Victor !

@ehashman
Copy link
Member

xref kubernetes/kubernetes#102789

@haircommander
Copy link
Contributor Author

xref initial kubelet implementation kubernetes/kubernetes#103095

@salaxander
Copy link
Contributor

Hey @bobbypage and @haircommander - Just checking in as we're about 2 weeks away from 1.22 code freeze. I've got kubernetes/kubernetes#103095 and kubernetes/kubernetes#103095 tracked as the open k/k PRs. Are there any other open or merged PRs we should be tracking? Thanks!

@haircommander
Copy link
Contributor Author

none opened yet! I will post them here if we do. Thanks for your work @salaxander

@salaxander
Copy link
Contributor

Hey @haircommander - One more check-in as we're a week out from 1.22 code freeze. Any updates on if you expect kubernetes/kubernetes#103095 and kubernetes/kubernetes#103095 to merge before the deadline?

Thanks!

@haircommander
Copy link
Contributor Author

Thanks for checking! I do expect them to merge. We're waiting on a e2e POC which I'm working on :)

@salaxander
Copy link
Contributor

Hi @haircommander - One last ping (sorry!). Code freeze is tomorrow evening (PST), so those two open PRs will need to merge before then for this to be included in 1.22. Let me know if there's anything we can do to help :)

@zvonkok
Copy link

zvonkok commented Jan 8, 2025

@haircommander cgroupV2 https://github.com/containerd/cgroups/blob/main/cgroup2/stats/metrics.pb.go has no network stats as compared to cgroupsV1 https://github.com/containerd/cgroups/blob/main/cgroup1/stats/metrics.pb.go , who are what will provide the networks stats?

@haircommander
Copy link
Contributor Author

containerd will still need to report the network stats https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1/api.proto#L724 I'm not sure how it will do so. CRI-O collects them from /proc directly AFAIR

@haircommander
Copy link
Contributor Author

Do we have a reference list of which metrics need to be available to move this forward?

technically speaking, k8s doesn't require any metrics be reported, but to start to get feature pairty with cadvisor, this list will probably be the basis

@zvonkok
Copy link

zvonkok commented Jan 9, 2025

@haircommander Thanks for the update. The next question would be, what about Filesystem metrics? All the container_fs_* ?
There are also PerfMetrics what about those?

@zvonkok
Copy link

zvonkok commented Jan 9, 2025

❯ curl -s https://raw.githubusercontent.com/google/cadvisor/refs/heads/master/metrics/prometheus.go | grep 'name:  ' | sed 's/^[[:space:]]*//' 
name:      "container_last_seen",
name:      "container_cpu_user_seconds_total",
name:      "container_cpu_system_seconds_total",
name:        "container_cpu_usage_seconds_total",
name:      "container_cpu_cfs_periods_total",
name:      "container_cpu_cfs_throttled_periods_total",
name:      "container_cpu_cfs_throttled_seconds_total",
name:      "container_cpu_schedstat_run_seconds_total",
name:      "container_cpu_schedstat_runqueue_seconds_total",
name:      "container_cpu_schedstat_run_periods_total",
name:      "container_cpu_load_average_10s",
name:      "container_cpu_load_d_average_10s",
name:        "container_tasks_state",
name:        "container_hugetlb_failcnt",
name:        "container_hugetlb_usage_bytes",
name:        "container_hugetlb_max_usage_bytes",
name:      "container_memory_cache",
name:      "container_memory_rss",
name:      "container_memory_kernel_usage",
name:      "container_memory_mapped_file",
name:      "container_memory_swap",
name:      "container_memory_failcnt",
name:      "container_memory_usage_bytes",
name:      "container_memory_max_usage_bytes",
name:      "container_memory_working_set_bytes",
name:      "container_memory_total_active_file_bytes",
name:      "container_memory_total_inactive_file_bytes",
name:        "container_memory_failures_total",
name:      "container_memory_migrate",
name:        "container_memory_numa_pages",
name:        "container_fs_inodes_free",
name:        "container_fs_inodes_total",
name:        "container_fs_limit_bytes",
name:        "container_fs_usage_bytes",
name:        "container_fs_reads_bytes_total",
name:        "container_fs_reads_total",
name:        "container_fs_sector_reads_total",
name:        "container_fs_reads_merged_total",
name:        "container_fs_read_seconds_total",
name:        "container_fs_writes_bytes_total",
name:        "container_fs_writes_total",
name:        "container_fs_sector_writes_total",
name:        "container_fs_writes_merged_total",
name:        "container_fs_write_seconds_total",
name:        "container_fs_io_current",
name:        "container_fs_io_time_seconds_total",
name:        "container_fs_io_time_weighted_seconds_total",
name:        "container_blkio_device_usage_total",
name:        "container_network_receive_bytes_total",
name:        "container_network_receive_packets_total",
name:        "container_network_receive_packets_dropped_total",
name:        "container_network_receive_errors_total",
name:        "container_network_transmit_bytes_total",
name:        "container_network_transmit_packets_total",
name:        "container_network_transmit_packets_dropped_total",
name:        "container_network_transmit_errors_total",
name:        "container_network_tcp_usage_total",
name:        "container_network_tcp6_usage_total",
name:        "container_network_advance_tcp_stats_total",
name:        "container_network_udp6_usage_total",
name:        "container_network_udp_usage_total",
name:      "container_processes",
name:      "container_file_descriptors",
name:      "container_sockets",
name:      "container_threads_max",
name:      "container_threads",
name:        "container_ulimits_soft",
name:        "container_perf_events_total",
name:        "container_perf_events_scaling_ratio",
name:        "container_perf_events_total",
name:        "container_perf_events_scaling_ratio",
name:        "container_perf_uncore_events_total",
name:        "container_perf_uncore_events_scaling_ratio",
name:      "container_referenced_bytes",
name:        "container_memory_bandwidth_bytes",
name:        "container_memory_bandwidth_local_bytes",
name:        "container_llc_occupancy_bytes",
name:      "container_oom_events_total",

@zvonkok
Copy link

zvonkok commented Jan 9, 2025

How are the metrics going to be mapped into proto there are not corresponding field for all the metrics?

@haircommander
Copy link
Contributor Author

filesystem metrics will stay with cadvisor for now, perf metrics aren't needed as kubelet only requests cadvisor collect these metrics https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/cadvisor_linux.go#L85-L94

@haircommander
Copy link
Contributor Author

"metrics" are defined as arbitrary key value pairs that cadvisor reported through prometheus. "stats" is a structured object kubelet reports through /stats/summary API

"metrics" in this case will be reported through https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1/api.proto#L125-L133

@akhilerm
Copy link
Member

@haircommander Can you add this to 1.33 project board?

@haircommander
Copy link
Contributor Author

/label lead-opted-in
/milestone v1.33

@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Jan 31, 2025
@haircommander haircommander moved this from Considered for release to Proposed for consideration in SIG Node 1.33 KEPs planning Jan 31, 2025
@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Jan 31, 2025
@lzung
Copy link

lzung commented Feb 5, 2025

Hello @bitoku @bobbypage @haircommander 👋, v1.33 Enhancements team here.

Just checking in as we approach enhancements freeze on 02:00 UTC Friday 14th February 2025 / 19:00 PDT Thursday 13th February 2025.

This enhancement is targeting stage beta for v1.33 (correct me, if otherwise)
/stage beta

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: v1.33.
  • KEP readme has up-to-date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline on Thursday 6th February 2025 so that the PRR team has enough time to review your KEP.

For this KEP, we would just need to update the following:

  • Create the KEP readme using the latest template and merge it in the k/enhancements repo.
    • Missing response for How can someone using this feature know that it is working for their instance? under Monitoring Requirements
  • Update the kep.yaml with the current milestone v1.33
  • Ensure that the KEP has undergone a production readiness review for beta graduation and has been merged into k/enhancements.
  • Also, please make sure to update the issue with the correct stages and PRs. This will assist the release team in tracking progress more effectively.

The status of this enhancement is marked as At risk for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well.

If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!

@k8s-ci-robot k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Feb 5, 2025
@lzung lzung moved this to At risk for enhancements freeze in 1.33 Enhancements Tracking Feb 5, 2025
@haircommander
Copy link
Contributor Author

in thinking about this more, I don't know if containerd/containerd#10691 will land with enough time to make it into 1.33 (plus, if it does, containerd 2.1 will be released after 1.33 anyway). I think we should hold off on beta until we know we'll move forward on it in containerd

/remove-milestone v1.33
/remove-label lead-opted-in

@haircommander haircommander moved this from Proposed for consideration to Not for release in SIG Node 1.33 KEPs planning Feb 6, 2025
@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label Feb 6, 2025
@lzung
Copy link

lzung commented Feb 6, 2025

I see that this issue has been opted-out of v1.33 and is now planned for a future release. I will go ahead and mark it as Deferred on the board for tracking purposes - do let the enhancement team know otherwise.

@lzung lzung moved this from At risk for enhancements freeze to Deferred in 1.33 Enhancements Tracking Feb 6, 2025
@dipesh-rawat
Copy link
Member

Clearing the milestone as this KEP has been deferred from current v1.33 release (based on #2371 (comment)).

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.33 milestone Feb 10, 2025
@Urvashi0109
Copy link

Hello @haircommander, @bobbypage 👋, v1.33 Docs Shadow here.

Does this enhancement work planned for v1.33 require any new docs or modification to existing docs?

If so, please follow the steps here to open a PR against dev-1.33 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 27th February 2025 18:00 PDT.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. stage/beta Denotes an issue tracking an enhancement targeted for Beta status
Projects
Status: Major Change
Status: Removed From Milestone
Status: Removed from Milestone
Status: Tracked for Code Freeze
Status: Deferred
Status: Not for release
Status: No status
Status: Triage
Development

No branches or pull requests