Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring Kubernetes PersistentVolumes #2359

Closed
kaarolch opened this issue Jan 30, 2019 · 35 comments
Closed

Monitoring Kubernetes PersistentVolumes #2359

kaarolch opened this issue Jan 30, 2019 · 35 comments

Comments

@kaarolch
Copy link

Sice 1.12 Kubernetes team remove many core metrics from kubelet i.e. PV metric was dropped. Info. Does someone have a idea that would be the best choice to monitoring PV usage? Below I describe my workaround but unfortunately, this workaround give quite high permission to node_exporter container :D
What did you do?
Switch node_exporer rootfs to:

"--path.rootfs=/rootfs"

Mount kubelet disk plugin in my case ceph.rook.io:

"volumeMounts": [
              {
                "name": "proc",
                "readOnly": true,
                "mountPath": "/host/proc"
              },
              {
                "name": "sys",
                "readOnly": true,
                "mountPath": "/host/sys"
              },
              {
                "name": "rootfs",
                "readOnly": true,
                "mountPath": "/rootfs/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/"
              }

And add volumes:

"volumes": [
          {
            "name": "proc",
            "hostPath": {
              "path": "/proc",
              "type": ""
            }
          },
          {
            "name": "sys",
            "hostPath": {
              "path": "/sys",
              "type": ""
            }
          },
          {
            "name": "rootfs",
            "hostPath": {
              "path": "/var/lib/kubelet/plugins/ceph.rook.io/rook-ceph-system/mounts/",
              "type": ""
            }
          }
        ],

This options would reduce number of path mounted by node exporter.
Next I need to run pod as a rook as below or set privileged: true to node_exporter container.

"securityContext": {
          "runAsUser": 0,
          "runAsNonRoot": false
        },
"securityContext": {
          "privileged": true
        },

Unfortunately both solutions (privileged or root) are not a nice solution and I put it only as temporary workaround to get pv usage stat from node. Mounting only kubelet storage plugin folder a little increase security. In my perspective this is not enough because node_exporter still have full right to all persistent data....
What did you expect to see?
Possibility to get Persistent volume usage without full right to node_exporter.

Environment

  • Prometheus Operator version:

    0.17

  • Kubernetes version information:

Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

vanila kuberenetes via kubespray(kubeadm)

@brancz
Copy link
Contributor

brancz commented Feb 4, 2019

Yes this is very unfortunate. Unfortunately I don't have a good answer. I've heard of people putting node-exporter into their Pods as sidecars to monitor the mounted volumes. This is something that you should take to sig-storage in Kubernetes as they should make sure these metrics are available.

@andrewsav-bt
Copy link

@kaarolch can you clarify, where you read that a bunch of metrics were removed in 1.12? The issue you linked states that "1.12, the kubelet exposes a number of sources for metrics" and outlines a plan of removing some if them in future versions. I was not able to find any evidence that anything was removed in 1.12.

Persistence volume monitoring is indeed broken since kubelet_volume_stats_* are no longer in prometheus, I am however not convinced that it's kubernets fault as such.

@kaarolch
Copy link
Author

@andrewsav-datacom hmm but in link form my post there is summary:

Current kubelet metrics that are not included in core metrics

  • Pod and Node-level Network Metrics
  • Persistent Volume Metrics

So As I understand correctly the PV metric are no longer included in core metrics and probably would be move to the csi-storage.

@AndrewSav
Copy link

AndrewSav commented Feb 13, 2019

I read this as "desired future state".

@brancz
Copy link
Contributor

brancz commented Feb 18, 2019

I think it's not as much of a removal than these metrics simply not being present/possible with CSI. Previously (as in before CSI) the kubelet managed mounting/preparing/managing volumes, which allowed it to consistently expose metrics about any volume it mounts. Now that the kubelet doesn't do this, it simply can't expose the metrics either.

@dcardozoo
Copy link

Hi, There's an update about this issue? there's any workaround can someone suggest? Currently the only metrics about Persistent storage available in Prometheus for me are kube_persistentvolume*

@metalmatze
Copy link
Member

It seems there's some progress in this area, but we're not involved directly: kubernetes/kubernetes#76188

@jingxu97
Copy link

cc @gnufied @msau42

@msau42
Copy link

msau42 commented May 15, 2019

Are you seeing issues with non-CSI volumes or CSI volumes? Capacity usage for non-CSI volumes should work, and CSI volumes is being fixed in kubernetes/kubernetes#76188

@stale
Copy link

stale bot commented Aug 14, 2019

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

@stale stale bot added the stale label Aug 14, 2019
@stale stale bot closed this as completed Aug 21, 2019
@paulfantom paulfantom reopened this Aug 21, 2019
@stale stale bot removed the stale label Aug 21, 2019
@bostrt
Copy link

bostrt commented Sep 10, 2019

Here's a doc on using the side-car container method but with no need for special privileges. Tested in OpenShift 3.11 (Kubernetes 1.11.0):

https://access.redhat.com/solutions/4406661

@stale
Copy link

stale bot commented Nov 9, 2019

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

@onedr0p
Copy link

onedr0p commented Feb 7, 2020

Anyone have a status of this issue in 2020? It would be nice to have these metrics reported to us and without having to deploy a sidecar to every pod. I am using CSI volumes with rook-ceph. Thanks!

@brancz
Copy link
Contributor

brancz commented Feb 10, 2020

I think that would be best answered by sig storage people on Kubernetes. I don’t know off the top of my head.

@onedr0p
Copy link

onedr0p commented Feb 10, 2020

Feels like the Spiderman meme with Prometheus operator, rook-ceph, and sig storage team pointing at each other. 😄 I'll still continue to dig into this issue.

@mdgreenwald
Copy link

mdgreenwald commented Mar 2, 2020

This query worked for me and yields a percentage:

kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} - kubelet_volume_stats_available_bytes{persistentvolumeclaim="$volume"}) / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="$volume"} * 100

@onedr0p
Copy link

onedr0p commented Mar 2, 2020

@mdgreenwald That doesn't help if you're unable to get any kubelet_volume_* metrics gathered.

@stale
Copy link

stale bot commented May 1, 2020

This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.

@stale stale bot added the stale label May 1, 2020
@kfirfer
Copy link

kfirfer commented Jul 20, 2020

Come across this problem and face it as-well..

@stale stale bot removed the stale label Jul 20, 2020
@lianfulei
Copy link

I also have this problem

@0xMH
Copy link

0xMH commented Jul 27, 2020

Is there any workarounds for now instead of having to deploy this sidecar to every pod?

@lianfulei
Copy link

Can write script detection @0xMH

@gnufied
Copy link

gnufied commented Jul 28, 2020

Sorry, re-reading this thread. from sig-storage perspective and as far as I know, persistent volume metrics are still being reported by kubelet. For in-tree volume types, this should already work. For any CSI volume type - if they implement NodeGetVolumeStats RPC call, then PV metrics should be available from the kubelet.

You may notice that, these metrics are tied to lifecycle of a pod on a node. That is they are only reported when a volume is mounted/in-use on the node and that is expected behaviour.

@gnufied
Copy link

gnufied commented Jul 28, 2020

If you notice that, these metrics are missing for a particular driver/volume type. It might be most likely driver bug. If you think driver is alright, please open a bug against kubernetes/kubernetes and we will do our best to address it.

@Saadwalami
Copy link

any good news about this issue? because am having the same problem.

@xander-sh
Copy link

We have the same problem on k8s 1.17.4 and vmware csi driver v1.1.0
After update vmware csi driver to v2.0.1 metric are present on kubelet.
I think you should find support this metric (or GetNodeVolumeStats) in you PV driver.
https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/108/files

@mickours
Copy link

Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud.
kubernetes/cloud-provider-openstack#1064

@jichenjc
Copy link

Same problem with the cinder csi driver on managed Kubernetes instance at OVHCloud.
kubernetes/cloud-provider-openstack#1064

I used curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics and https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/nodeserver.go#L466 support already implemented, but

# curl -k https://localhost:6443/api/v1/nodes/127.0.0.1/proxy/metrics | grep kubelet_vo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  107k    0  107k    0     0  2917k      0 --:--:-- --:--:-- --:--:-- 2917k

see nothing... any comments additional stuff need to be added?

@Davidrjx
Copy link

similar case as aws efs csi which has implemented NodeGetVolumeStats rcp call, but no any exposing metrics with efs pv usage in prometheus, even can not find efs pv.

@lkravi
Copy link

lkravi commented Jun 9, 2021

Yes having the same issue for AWS EFS CSI, it's working fine for the EBS

@stale
Copy link

stale bot commented Aug 8, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

@stale stale bot added the stale label Aug 8, 2021
@mehtameha1
Copy link

I have the same issue with EBS CSI as well.

@lianfulei
Copy link

lianfulei commented Jul 22, 2022 via email

@github-actions github-actions bot removed the stale label Jan 11, 2023
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 13, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has not had any activity in the last 120 days. Please reopen if you feel this is still valid.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests