Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet/stats: deduplicate makePodStorageStats #108855

Merged
merged 2 commits into from Sep 6, 2022

Conversation

haircommander
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Deduplicates pod storage stats generation between CRI and cadvisor stats providers, and unifies their behavior. This fixes a bug where cadvisor stats provider excessively reported a GC'd directory as not found

Which issue(s) this PR fixes:

Fixes #106957

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 21, 2022
@haircommander
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 21, 2022
@ehashman
Copy link
Member

/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2

@haircommander
Copy link
Contributor Author

/retest

@ehashman
Copy link
Member

/priority important-longterm
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 22, 2022
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage Mar 22, 2022
@@ -433,3 +435,29 @@ func addUsage(first, second *uint64) *uint64 {
total := *first + *second
return &total
}

func makePodStorageStats(s *statsapi.PodStats, rootFsInfo *cadvisorapiv2.FsInfo, resourceAnalyzer stats.ResourceAnalyzer, hostStatsProvider HostStatsProvider, isCRIStatsProvider bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff:

diff --git a/home/ehashman/tmp/foo2 b/home/ehashman/tmp/foo1
index 4ba09d2bead..2dc73b0ed5d 100644
--- a/home/ehashman/tmp/foo2
+++ b/home/ehashman/tmp/foo1
@@ -1,12 +1,12 @@
-func (p *criStatsProvider) makePodStorageStats(s *statsapi.PodStats, rootFsInfo *cadvisorapiv2.FsInfo) {
+func makePodStorageStats(s *statsapi.PodStats, rootFsInfo *cadvisorapiv2.FsInfo, resourceAnalyzer stats.ResourceAnalyzer, hostStatsProvider HostStatsProvider, isCRIStatsProvider bool) {
        podNs := s.PodRef.Namespace
        podName := s.PodRef.Name
        podUID := types.UID(s.PodRef.UID)
-       vstats, found := p.resourceAnalyzer.GetPodVolumeStats(podUID)
+       vstats, found := resourceAnalyzer.GetPodVolumeStats(podUID)
        if !found {
                return
        }
-       logStats, err := p.hostStatsProvider.getPodLogStats(podNs, podName, podUID, rootFsInfo)
+       logStats, err := hostStatsProvider.getPodLogStats(podNs, podName, podUID, rootFsInfo)
        if err != nil {
                klog.ErrorS(err, "Unable to fetch pod log stats", "pod", klog.KRef(podNs, podName))
                // If people do in-place upgrade, there might be pods still using
@@ -14,12 +14,12 @@ func (p *criStatsProvider) makePodStorageStats(s *statsapi.PodStats, rootFsInfo
                // We should continue generating other stats in that case.
                // calcEphemeralStorage tolerants logStats == nil.
        }
-       etcHostsStats, err := p.hostStatsProvider.getPodEtcHostsStats(podUID, rootFsInfo)
+       etcHostsStats, err := hostStatsProvider.getPodEtcHostsStats(podUID, rootFsInfo)
        if err != nil {
                klog.ErrorS(err, "Unable to fetch pod etc hosts stats", "pod", klog.KRef(podNs, podName))
        }
        ephemeralStats := make([]statsapi.VolumeStats, len(vstats.EphemeralVolumes))
        copy(ephemeralStats, vstats.EphemeralVolumes)
        s.VolumeStats = append(append([]statsapi.VolumeStats{}, vstats.EphemeralVolumes...), vstats.PersistentVolumes...)
-       s.EphemeralStorage = calcEphemeralStorage(s.Containers, ephemeralStats, rootFsInfo, logStats, etcHostsStats, true)
+       s.EphemeralStorage = calcEphemeralStorage(s.Containers, ephemeralStats, rootFsInfo, logStats, etcHostsStats, isCRIStatsProvider)
 }

pkg/kubelet/stats/cadvisor_stats_provider.go Show resolved Hide resolved
@ehashman ehashman moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Mar 22, 2022
@rpthms
Copy link

rpthms commented May 11, 2022

@haircommander Any chance you could take a look at the comments raised in this PR if you got time to spare? Thanks!

@haircommander
Copy link
Contributor Author

/retest

@haircommander
Copy link
Contributor Author

I personally think this route still makes sense. the behavior change is consistent with the cri stats manager, and makes logical sense to me. I am open to more feedback though

@pacoxu pacoxu moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Jun 20, 2022
@aneagoe
Copy link

aneagoe commented Aug 9, 2022

@andrewsykim @tallclair any chance you have some time to review this? It would be great to see some progress on this issue. Thanks!

@haircommander
Copy link
Contributor Author

/assign @endocrimes @mrunalp @rphillips @saschagrunert

Anyone have any feedback or can we move forward with this

@haircommander
Copy link
Contributor Author

haircommander commented Aug 16, 2022

Alright, I've updated to:
adopt the cadvisor stats provider behavior where we continue the stats collection, even if volume stats are not present
reduced the verbosity of the logs by increasing to 6

Thus standardizing behavior, keeping the old cadvisor behavior (as requested), and reducing verbosity for #106957

@dashpole @rphillips PTAL

Signed-off-by: Peter Hunt <pehunt@redhat.com>
and by doing so, fix a bug where the stats providers report a directory is not found after a pod's storage is removed

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@dchen1107
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2022
Copy link
Member

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@dims
Copy link
Member

dims commented Aug 20, 2022

/milestone v1.26

cc @mrunalp

@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Aug 20, 2022
@pacoxu pacoxu moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Sep 1, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: endocrimes, haircommander, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2022
@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Sep 6, 2022
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 6, 2022

@haircommander: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-crio-cgroupv2 5175e278bb8715e7c413c4f6a80bed4be1c698bd link false /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 6d1e915 into kubernetes:master Sep 6, 2022
14 checks passed
vrutkovs added a commit to vrutkovs/custom-okd-os that referenced this pull request Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Kubelet spamming 'Unable to fetch pod log stats' log messages after running cronjobs