Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cavisor v0.39.1 is causing massively increased IOPS(from 100-300 IOPS to 10k-20K IOPS) #2947

Open
xiaoping8385 opened this issue Sep 30, 2021 · 10 comments

Comments

@xiaoping8385
Copy link

After we upgraded to cadvisor from v0.36.0 to v0.39.1, we notice the load of multiple worker nodes massively increased to up to 150 (load15, 4 vCPU per worker)! At the same time we noticed a massive increase of disk IOPS on our Netapp storage system which provides the NFS shares for the RWX volumes some workload Pods use (from 100-300 IOPS to about 10k-20k IOPS!).

As soon as we terminate the cAdvisor pods ,the node load and Netapp NFS share IOPS decrease right away to normal values back again.
we run a self-made bpftrace program in order to detect the massive amount of "nfs_statfs" and "nfs_getattr" events from the cAdvisor process (the shown amount of calls (most right values) are done within a 5-second interval):

12:12:19
@[17941, cadvisor, nfs_statfs]: 11993
@[17941, cadvisor, nfs_getattr]: 36093

I searched a commit which related to fs metric support, not sure if that has some relationship with this, can someone take a look please, thanks! 8c0666a#diff-dce7cce055566bed799f788cd0048e209a27a473c0f48b956fa1f1780e80d2c1

@xiaoping8385
Copy link
Author

xiaoping8385 commented Oct 13, 2021

@bobbypage @JohnnyG235711 , Could you take a look at this issue? is there a way to opt out this metric so that to avoid the extra load?

@bobbypage
Copy link
Collaborator

bobbypage commented Oct 13, 2021

Can you try running with DiskUsageMetrics disabled to see if it will help?

I think you'll need something like --disable_metrics=disk for cli flag to cAdvisor

if !h.includedMetrics.Has(container.DiskUsageMetrics) {
return nil
}

Let's pinpoint if the the commit you mentioned is the root cause for this.

@xiaoping8385
Copy link
Author

Thanks @bobbypage will give a try and keep you updated. thanks!

@PhilipSchmid
Copy link

PhilipSchmid commented Oct 18, 2021

Hi @bobbypage,

I can confirm, that the node load and IOPS on the Netapp go back to the "normal usage area", as soon as I set the --disable_metrics=disk flag.

laod1

  • Before the spike: v0.36.0
  • Start of the spike: Switch to v0.39.1
  • Spike starts decreasing again: Added the --disable_metrics=disk flag

One important question remains: Which metrics do we lose when we simply disable all disk metrics? Unfortunately, I'm not yet convinced that this is a feasible workaround..

Thanks & regards,
Philip

@bobbypage
Copy link
Collaborator

bobbypage commented Oct 18, 2021

Interesting, thanks for the note.

You can see usages of Disk Metrics here.

It's mostly used in docker / raw handler to get file system stats. Docker handler for file system stats is here which fills the FsStats struct. File system stats can also be obtained for non docker containers (i.e. other types of cgroups) by the rawContainerHandler here

@PhilipSchmid
Copy link

PhilipSchmid commented Oct 19, 2021

Hi @bobbypage

Thanks for your answer. I just had a more detailed look at the returned metrics. Once with the arg --disable_metrics=disk set, and once without it. Both times I received exactly the same amount of unique metrics:

$ curl -sk -H 'Authorization: Bearer ey....abc' https://10.10.10.10:10250/metrics/cadvisor | egrep -v '^#' | cut -d { -f 1 | uniq | wc -l
65
# Restart of cAdvisor with the `--disable_metrics=disk` arg set.
$ curl -sk -H 'Authorization: Bearer ey....abc' https://10.10.10.10:10250/metrics/cadvisor | egrep -v '^#' | cut -d { -f 1 | uniq | wc -l
65

So.. this workaround seems to work. Nevertheless, it's just a workaround and not a solution to fix the root cause. Do you think there's something you could do here?

Edit: Here's a Bash script to reproduce the issue: https://gist.github.com/PhilipSchmid/c19d085363809e5ba2356037e5e66551

Thanks & regards,
Philip

@xiaoping8385
Copy link
Author

xiaoping8385 commented Oct 20, 2021

https://10.10.10.10:10250/metrics/cadvisor this api is not for cadvisor pod, it's actually a request to the kubelet. kubelet using an embedded cadvisor which could also provide metrics.
To view the metrics that cadvisor POD provide, you need to refer to https://github.com/google/cadvisor/blob/master/docs/api.md
but I'm not sure which cadvisor version the kubelet use or integrated with.@bobbypage any ideas?

@PhilipSchmid
Copy link

Hi @xiaoping8385

Thanks for the input! I just redid the same test with the actual cadvisor endpoint:

$ ps aux | grep cadvisor                                                                                                
root     19659  104  0.2 3630068 192468 ?      S<sl 16:07   3:09 /usr/bin/cadvisor -logtostderr --port=31194 --profiling --housekeeping_interval=1s
$ curl -s http://localhost:31194/metrics | egrep -v '^#' | cut -d { -f 1 | uniq | wc -l
97

# Restart of cAdvisor with the `--disable_metrics=disk` arg set.

$ ps aux | grep cadvisor                                                                                                
root     25544  100  0.3 2574520 217828 ?      S<sl 16:14   2:45 /usr/bin/cadvisor -logtostderr --port=31194 --profiling --housekeeping_interval=1s --disable_metrics=disk
$ curl -s http://localhost:31194/metrics | egrep -v '^#' | cut -d { -f 1 | uniq | wc -l
116

As we can see, we have a difference of 19 metrics. Every odd is also that there are even more metrics available when --disable_metrics=disk is specified. How does this even make sense? I don't get it.

To be more precise, the following 24 metrics are only available, when the --disable_metrics=disk arg is not specified:

container_cpu_schedstat_run_periods_total
container_cpu_schedstat_run_seconds_total
container_cpu_schedstat_runqueue_seconds_total
container_file_descriptors
container_hugetlb_failcnt
container_hugetlb_max_usage_bytes
container_hugetlb_usage_bytes
container_memory_migrate
container_memory_numa_pages
container_network_advance_tcp_stats_total
container_network_tcp6_usage_total
container_network_tcp_usage_total
container_network_udp6_usage_total
container_network_udp_usage_total
container_processes
container_referenced_bytes
container_sockets
container_threads
container_threads_max
container_ulimits_soft
machine_cpu_cache_capacity_bytes
machine_node_hugepages_count
machine_node_memory_capacity_bytes
machine_thread_siblings_count

On the other hand, the following 5 metrics are only available, when the --disable_metrics=disk arg is set:

container_fs_inodes_free
container_fs_inodes_total
container_fs_io_time_weighted_seconds_total
container_fs_limit_bytes
container_fs_usage_bytes

Is there a better way to test this? This test does somehow not make sense to me.

Thanks & regards,
Philip

@xiaoping8385
Copy link
Author

xiaoping8385 commented Oct 26, 2021

@PhilipSchmid Looks like it was work as design, there is a default ignore metric list, check this(

cadvisor/cmd/cadvisor.go

Lines 78 to 93 in 8b96dc6

var (
// Metrics to be ignored.
// Tcp metrics are ignored by default.
ignoreMetrics = container.MetricSet{
container.MemoryNumaMetrics: struct{}{},
container.NetworkTcpUsageMetrics: struct{}{},
container.NetworkUdpUsageMetrics: struct{}{},
container.NetworkAdvancedTcpUsageMetrics: struct{}{},
container.ProcessSchedulerMetrics: struct{}{},
container.ProcessMetrics: struct{}{},
container.HugetlbUsageMetrics: struct{}{},
container.ReferencedMemoryMetrics: struct{}{},
container.CPUTopologyMetrics: struct{}{},
container.ResctrlMetrics: struct{}{},
container.CPUSetMetrics: struct{}{},
}
), and it means by default, those metric will be ignored.

However, if you disable only one metric, for this case, --disable_metrics=disk, it will overwrite the default-ingored-metric structure,(check this line

flag.Var(&ignoreMetrics, "disable_metrics", fmt.Sprintf("comma-separated list of `metrics` to be disabled. Options are %s.", optstr))
), so it turns out to only disable the "--disable_metrics=disk", but enabled all the other metrics in the default-ingored-metric list. that's why you get 24 more metrics with "--disable_metrics=disk" set. And, of course, you will not see any container fs related metrics as you already disable them.

@bobbypage Please confirm if this is a side effect of disable-metric usage or just design like this. Thanks!

@PhilipSchmid
Copy link

PhilipSchmid commented Oct 26, 2021

@xiaoping8385 Thanks for the clarification! Now it makes sense to me. Nevertheless, I personally think this isn't desired behavior but let's give @bobbypage some time to confirm this.

Thanks & regards
Philip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants