Monitoring Kubernetes PersistentVolumes #2359

kaarolch · 2019-01-30T07:43:43Z

Sice 1.12 Kubernetes team remove many core metrics from kubelet i.e. PV metric was dropped. Info. Does someone have a idea that would be the best choice to monitoring PV usage? Below I describe my workaround but unfortunately, this workaround give quite high permission to node_exporter container :D
What did you do?
Switch node_exporer rootfs to:

"--path.rootfs=/rootfs"

Mount kubelet disk plugin in my case ceph.rook.io:

"volumeMounts": [
              {
                "name": "proc",
                "readOnly": true,
                "mountPath": "/host/proc"
              },
              {
                "name": "sys",
                "readOnly": true,
                "mountPath": "/host/sys"
              },
              {
                "name": "rootfs",
                "readOnly": true,
                "mountPath": "/rootfs/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/"
              }

And add volumes:

"volumes": [
          {
            "name": "proc",
            "hostPath": {
              "path": "/proc",
              "type": ""
            }
          },
          {
            "name": "sys",
            "hostPath": {
              "path": "/sys",
              "type": ""
            }
          },
          {
            "name": "rootfs",
            "hostPath": {
              "path": "/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/",
              "type": ""
            }
          }
        ],

This options would reduce number of path mounted by node exporter.
Next I need to run pod as a rook as below or set privileged: true to node_exporter container.

"securityContext": {
          "runAsUser": 0,
          "runAsNonRoot": false
        },

"securityContext": {
          "privileged": true
        },

Unfortunately both solutions (privileged or root) are not a nice solution and I put it only as temporary workaround to get pv usage stat from node. Mounting only kubelet storage plugin folder a little increase security. In my perspective this is not enough because node_exporter still have full right to all persistent data....
What did you expect to see?
Possibility to get Persistent volume usage without full right to node_exporter.

Environment

Prometheus Operator version:

0.17
Kubernetes version information:

Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

vanila kuberenetes via kubespray(kubeadm)

The text was updated successfully, but these errors were encountered:

brancz · 2019-02-04T19:42:29Z

Yes this is very unfortunate. Unfortunately I don't have a good answer. I've heard of people putting node-exporter into their Pods as sidecars to monitor the mounted volumes. This is something that you should take to sig-storage in Kubernetes as they should make sure these metrics are available.

andrewsav-bt · 2019-02-13T05:00:14Z

@kaarolch can you clarify, where you read that a bunch of metrics were removed in 1.12? The issue you linked states that "1.12, the kubelet exposes a number of sources for metrics" and outlines a plan of removing some if them in future versions. I was not able to find any evidence that anything was removed in 1.12.

Persistence volume monitoring is indeed broken since kubelet_volume_stats_* are no longer in prometheus, I am however not convinced that it's kubernets fault as such.

kaarolch · 2019-02-13T06:07:12Z

@andrewsav-datacom hmm but in link form my post there is summary:

Current kubelet metrics that are not included in core metrics

Pod and Node-level Network Metrics

Persistent Volume Metrics

So As I understand correctly the PV metric are no longer included in core metrics and probably would be move to the csi-storage.

AndrewSav · 2019-02-13T06:16:09Z

I read this as "desired future state".

brancz · 2019-02-18T12:00:20Z

I think it's not as much of a removal than these metrics simply not being present/possible with CSI. Previously (as in before CSI) the kubelet managed mounting/preparing/managing volumes, which allowed it to consistently expose metrics about any volume it mounts. Now that the kubelet doesn't do this, it simply can't expose the metrics either.

dcardozoo · 2019-05-02T12:33:26Z

Hi, There's an update about this issue? there's any workaround can someone suggest? Currently the only metrics about Persistent storage available in Prometheus for me are kube_persistentvolume*

metalmatze · 2019-05-02T12:43:06Z

It seems there's some progress in this area, but we're not involved directly: kubernetes/kubernetes#76188

jingxu97 · 2019-05-15T22:24:51Z

cc @gnufied @msau42

msau42 · 2019-05-15T22:30:33Z

Are you seeing issues with non-CSI volumes or CSI volumes? Capacity usage for non-CSI volumes should work, and CSI volumes is being fixed in kubernetes/kubernetes#76188

stale · 2019-08-14T00:31:39Z

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

bostrt · 2019-09-10T19:43:16Z

Here's a doc on using the side-car container method but with no need for special privileges. Tested in OpenShift 3.11 (Kubernetes 1.11.0):

https://access.redhat.com/solutions/4406661

stale · 2019-11-09T19:56:16Z

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

onedr0p · 2020-02-07T16:59:10Z

Anyone have a status of this issue in 2020? It would be nice to have these metrics reported to us and without having to deploy a sidecar to every pod. I am using CSI volumes with rook-ceph. Thanks!

brancz · 2020-02-10T18:31:06Z

I think that would be best answered by sig storage people on Kubernetes. I don’t know off the top of my head.

onedr0p · 2020-02-10T20:35:19Z

Feels like the Spiderman meme with Prometheus operator, rook-ceph, and sig storage team pointing at each other. 😄 I'll still continue to dig into this issue.

mdgreenwald · 2020-03-02T10:01:28Z

This query worked for me and yields a percentage:

kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} - kubelet_volume_stats_available_bytes{persistentvolumeclaim="$volume"}) / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} * 100

onedr0p · 2020-03-02T13:04:56Z

@mdgreenwald That doesn't help if you're unable to get any kubelet_volume_* metrics gathered.

stale · 2020-05-01T13:55:27Z

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

kfirfer · 2020-07-20T17:17:32Z

Come across this problem and face it as-well..

lianfulei · 2020-07-24T02:59:49Z

I also have this problem

0xMH · 2020-07-27T14:45:56Z

Is there any workarounds for now instead of having to deploy this sidecar to every pod?

lianfulei · 2020-07-28T02:14:29Z

Can write script detection @0xMH

gnufied · 2020-07-28T02:30:44Z

Sorry, re-reading this thread. from sig-storage perspective and as far as I know, persistent volume metrics are still being reported by kubelet. For in-tree volume types, this should already work. For any CSI volume type - if they implement NodeGetVolumeStats RPC call, then PV metrics should be available from the kubelet.

You may notice that, these metrics are tied to lifecycle of a pod on a node. That is they are only reported when a volume is mounted/in-use on the node and that is expected behaviour.

gnufied · 2020-07-28T02:31:59Z

If you notice that, these metrics are missing for a particular driver/volume type. It might be most likely driver bug. If you think driver is alright, please open a bug against kubernetes/kubernetes and we will do our best to address it.

Saadwalami · 2020-08-09T06:27:25Z

any good news about this issue? because am having the same problem.

xander-sh · 2020-08-26T14:59:39Z

We have the same problem on k8s 1.17.4 and vmware csi driver v1.1.0
After update vmware csi driver to v2.0.1 metric are present on kubelet.
I think you should find support this metric (or GetNodeVolumeStats) in you PV driver.
https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/108/files

mickours · 2020-09-14T08:17:01Z

Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud.
kubernetes/cloud-provider-openstack#1064

jichenjc · 2020-09-25T02:33:01Z

Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud.
kubernetes/cloud-provider-openstack#1064

I used curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics and https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/nodeserver.go#L466 support already implemented, but

# curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics | grep kubelet_vo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  107k    0  107k    0     0  2917k      0 --:--:-- --:--:-- --:--:-- 2917k

see nothing... any comments additional stuff need to be added?

Davidrjx · 2021-05-31T13:04:03Z

similar case as aws efs csi which has implemented NodeGetVolumeStats rcp call, but no any exposing metrics with efs pv usage in prometheus, even can not find efs pv.

lkravi · 2021-06-09T05:01:51Z

Yes having the same issue for AWS EFS CSI, it's working fine for the EBS

stale · 2021-08-08T05:33:48Z

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

mehtameha1 · 2022-07-22T18:25:13Z

I have the same issue with EBS CSI as well.

lianfulei · 2022-07-22T18:25:36Z

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

github-actions · 2023-03-13T02:00:53Z

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

github-actions · 2023-07-11T02:04:57Z

This issue was closed because it has not had any activity in the last 120 days. Please reopen if you feel this is still valid.

ghost mentioned this issue May 12, 2019

Volume stats missing digitalocean/csi-digitalocean#134

Closed

stale bot added the stale label Aug 14, 2019

stale bot closed this as completed Aug 21, 2019

paulfantom reopened this Aug 21, 2019

stale bot removed the stale label Aug 21, 2019

stale bot added the stale label Nov 9, 2019

Thakurvaibhav mentioned this issue Dec 16, 2019

Can't get metrics of PersistentVolumeClaim Thakurvaibhav/k8s#13

Closed

stale bot removed the stale label Feb 7, 2020

onedr0p mentioned this issue Feb 7, 2020

kubelet_volume* / persistent volume metrics missing rook/rook#1659

Closed

onedr0p mentioned this issue Feb 13, 2020

Figure out why kubelet_volume_* metrics are not available in Prometheus onedr0p/home-ops#7

Closed

rajivmucheli mentioned this issue Apr 1, 2020

Unable to scarpe kubelet_volume * metrics, other kubelet metrics are scraped. kubernetes/kubernetes#89733

Closed

stale bot added the stale label May 1, 2020

stale bot removed the stale label Jul 20, 2020

paulfantom added the dependency/external label Oct 21, 2020

mkoertgen mentioned this issue Feb 21, 2021

Add node_exporter template apache/pulsar-helm-chart#103

Closed

stale bot added the stale label Aug 8, 2021

github-actions bot removed the stale label Jan 11, 2023

github-actions bot added the stale label Mar 13, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2023

Monitoring Kubernetes PersistentVolumes #2359

Monitoring Kubernetes PersistentVolumes #2359

Comments

kaarolch commented Jan 30, 2019

brancz commented Feb 4, 2019

andrewsav-bt commented Feb 13, 2019

kaarolch commented Feb 13, 2019

Current kubelet metrics that are not included in core metrics

AndrewSav commented Feb 13, 2019 • edited Loading

brancz commented Feb 18, 2019

dcardozoo commented May 2, 2019

metalmatze commented May 2, 2019

jingxu97 commented May 15, 2019

msau42 commented May 15, 2019

stale bot commented Aug 14, 2019

bostrt commented Sep 10, 2019

stale bot commented Nov 9, 2019

onedr0p commented Feb 7, 2020 • edited Loading

brancz commented Feb 10, 2020

onedr0p commented Feb 10, 2020

mdgreenwald commented Mar 2, 2020 • edited Loading

onedr0p commented Mar 2, 2020

stale bot commented May 1, 2020

kfirfer commented Jul 20, 2020

lianfulei commented Jul 24, 2020

0xMH commented Jul 27, 2020

lianfulei commented Jul 28, 2020

gnufied commented Jul 28, 2020

gnufied commented Jul 28, 2020

Saadwalami commented Aug 9, 2020

xander-sh commented Aug 26, 2020

mickours commented Sep 14, 2020

jichenjc commented Sep 25, 2020

Davidrjx commented May 31, 2021

lkravi commented Jun 9, 2021

stale bot commented Aug 8, 2021

mehtameha1 commented Jul 22, 2022

lianfulei commented Jul 22, 2022 via email

github-actions bot commented Mar 13, 2023

github-actions bot commented Jul 11, 2023

AndrewSav commented Feb 13, 2019 •

edited

Loading

onedr0p commented Feb 7, 2020 •

edited

Loading

mdgreenwald commented Mar 2, 2020 •

edited

Loading