Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage of pod always zero #17889

Open
edoardottt opened this issue Jan 4, 2024 · 10 comments
Open

CPU usage of pod always zero #17889

edoardottt opened this issue Jan 4, 2024 · 10 comments
Labels
co/runtime/docker Issues specific to a docker runtime lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@edoardottt
Copy link

What Happened?

Context: #13898

I've been using the flag --disable-optimizations, but that doesn't solve the problem for me.
I need to get some statistics about the utilization and requests of some pods, but I can't get that since the output tells me always zero, even under stress tests:

minikube   default                hello-node-59684db6fc-tzkzx                  0m (0%)        0m (0%)      1m (0%)     0Mi (0%)          0Mi (0%)        7Mi (0%)

Attach the log file

log.txt

Operating System

Ubuntu

Driver

Docker

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 4, 2024

You can compare with the output of docker stats. If they differ, it is an issue with cri-dockerd.

EDIT: There is also a dedicated crictl statsp, that can be used to check the explicit CRI feature

@afbjorklund afbjorklund added the co/runtime/docker Issues specific to a docker runtime label Jan 4, 2024
@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 4, 2024

I think it is related to ListPodSandboxStats not being implemented for Docker, so it returns nothing.

There is a hardcoded kubelet workaround for cri-o, but I think it is only implemented for containerd.

// UsingLegacyCadvisorStats returns true if container stats are provided by cadvisor instead of through the CRI.
// CRI integrations should get container metrics via CRI.
// TODO: cri-o relies on cadvisor as a temporary workaround. The code should
// be removed. Related issue:
// https://github.com/kubernetes/kubernetes/issues/51798
func UsingLegacyCadvisorStats(runtimeEndpoint string) bool {
        return strings.HasSuffix(runtimeEndpoint, CrioSocketSuffix)
}

@edoardottt
Copy link
Author

Thanks for the answer @afbjorklund , really appreciated!
So I should change container runtime? something like containerd?

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 4, 2024

I think the function was wrongly implemented, so the fallback doesn't work either. So it is probably a bug.

func (*UnimplementedRuntimeServiceServer) ListPodSandboxStats(ctx context.Context, req *ListPodSandboxStatsRequest) (*ListPodSandboxStatsResponse, error) {
        return nil, status.Errorf(codes.Unimplemented, "method ListPodSandboxStats not implemented")
}
func (ds *dockerService) ListPodSandboxStats(context.Context, *runtimeapi.ListPodSandboxStatsRequest) (*runtimeapi.ListPodSandboxStatsResponse, error) {
        return nil, fmt.Errorf("ListPodSandboxStats is not implemented")
}

If it (cri-dockerd) is indeed the issue here, there should be some evidence about the error in the kubelet logs.

@edoardottt
Copy link
Author

wow... @afbjorklund I didn't expect that.
Due to the fact that I'm a newbie here... how to collect some kind of resource utilization?

Is there a way (or more than one) to collect some info on the cluster and its components?

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 4, 2024

Can't duplicate it here, though. (v1.32.0)

top node

NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube   229m         2%     866Mi           5%        

top pods -A

NAMESPACE     NAME                               CPU(cores)   MEMORY(bytes)   
kube-system   coredns-5dd5756b68-wl86q           4m           14Mi            
kube-system   etcd-minikube                      31m          34Mi            
kube-system   kube-apiserver-minikube            79m          218Mi           
kube-system   kube-controller-manager-minikube   31m          40Mi            
kube-system   kube-proxy-hgtrw                   1m           16Mi            
kube-system   kube-scheduler-minikube            5m           17Mi            
kube-system   metrics-server-7c66d45ddc-gpvt7    6m           17Mi            
kube-system   storage-provisioner                3m           9Mi        

It uses cri-dockerd 0.3.3, will try 0.3.9 too

@edoardottt
Copy link
Author

edoardottt commented Jan 4, 2024

This is the command I'm executing:

kubectl resource-capacity --sort cpu.util --util --pods --pod-count

And this is the complete output I get:

NODE       NAMESPACE              POD                                          CPU REQUESTS   CPU LIMITS   CPU UTIL    MEMORY REQUESTS   MEMORY LIMITS   MEMORY UTIL   POD COUNT
                                                                                                                                                                       
minikube   *                      *                                            850m (7%)      0m (0%)      122m (1%)   370Mi (1%)        170Mi (0%)      642Mi (2%)    11/110
minikube   kube-system            kube-apiserver-minikube                      250m (2%)      0m (0%)      26m (0%)    0Mi (0%)          0Mi (0%)        220Mi (0%)    
minikube   kube-system            etcd-minikube                                100m (0%)      0m (0%)      10m (0%)    100Mi (0%)        0Mi (0%)        43Mi (0%)     
minikube   kube-system            kube-controller-manager-minikube             200m (1%)      0m (0%)      8m (0%)     0Mi (0%)          0Mi (0%)        51Mi (0%)     
minikube   kube-system            coredns-5dd5756b68-8fb7w                     100m (0%)      0m (0%)      2m (0%)     70Mi (0%)         170Mi (0%)      14Mi (0%)     
minikube   kube-system            kube-scheduler-minikube                      100m (0%)      0m (0%)      2m (0%)     0Mi (0%)          0Mi (0%)        18Mi (0%)     
minikube   kube-system            metrics-server-7c66d45ddc-6qfxq              100m (0%)      0m (0%)      2m (0%)     200Mi (0%)        0Mi (0%)        18Mi (0%)     
minikube   kube-system            storage-provisioner                          0m (0%)        0m (0%)      2m (0%)     0Mi (0%)          0Mi (0%)        9Mi (0%)      
minikube   kubernetes-dashboard   dashboard-metrics-scraper-7fd5cb4ddc-4hqsv   0m (0%)        0m (0%)      1m (0%)     0Mi (0%)          0Mi (0%)        7Mi (0%)      
minikube   kubernetes-dashboard   kubernetes-dashboard-8694d4445c-4n4d4        0m (0%)        0m (0%)      1m (0%)     0Mi (0%)          0Mi (0%)        10Mi (0%)     
minikube   default                hello-node-59684db6fc-tzkzx                  0m (0%)        0m (0%)      1m (0%)     0Mi (0%)          0Mi (0%)        7Mi (0%)      
minikube   kube-system            kube-proxy-rnzw2                             0m (0%)        0m (0%)      1m (0%)     0Mi (0%)          0Mi (0%)        15Mi (0%)

So it seems to work for me either, what I want to say is that when the hello-node test pod is under stress test, its metrics don't change, they keep saying 0 for everything... it's not responsive let's say

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 4, 2024

I think it is a known issue with cri-dockerd, there might be some workarounds available but anyway.

The dockershim code was removed from Kubernetes, before the new functionality was in cri-dockerd

https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/

You can mitigate this issue by using cAdvisor as a standalone daemonset.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 3, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/runtime/docker Issues specific to a docker runtime lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants