Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: Implement NodeServiceCapability_RPC_GET_VOLUME_STATS rpc call #76188

Merged
merged 1 commit into from May 18, 2019

Conversation

humblec
Copy link
Contributor

@humblec humblec commented Apr 5, 2019

Signed-off-by: Humble Chirammal hchiramm@redhat.com

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Expose CSI volume stats via kubelet volume metrics

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 5, 2019
@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 5, 2019
@humblec
Copy link
Contributor Author

humblec commented Apr 5, 2019

/retest

@humblec
Copy link
Contributor Author

humblec commented Apr 5, 2019

/assign @gnufied

@humblec
Copy link
Contributor Author

humblec commented Apr 5, 2019

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 5, 2019
@humblec humblec force-pushed the csi-in-m branch 2 times, most recently from 0e44d82 to 9dca72a Compare April 5, 2019 10:29
@humblec humblec changed the title csi: Add capability check for NodeServiceCapability_RPC_GET_VOLUME_STATS csi: Implement NodeServiceCapability_RPC_GET_VOLUME_STATS rpc call Apr 5, 2019
@humblec
Copy link
Contributor Author

humblec commented Apr 5, 2019

@gnufied Can you please take a look at this PR ? :)

@humblec
Copy link
Contributor Author

humblec commented Apr 5, 2019

/assign @rootfs

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 5, 2019
@humblec humblec force-pushed the csi-in-m branch 3 times, most recently from f6607c0 to e6604b0 Compare April 6, 2019 10:32
@gnufied
Copy link
Member

gnufied commented May 14, 2019

/assign @msau42

@@ -841,3 +849,94 @@ func (c *csiClientGetter) Get() (csiClient, error) {
c.csiClient = csi
return c.csiClient, nil
}

func (c *csiDriverClient) NodeSupportsVolumeStats(ctx context.Context) (bool, error) {
klog.V(4).Info(log("calling NodeGetCapabilities rpc to determine if NodeSupportsVolumeStats"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this log? Can we make it higher like 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lifted to V(5).

}

func (c *csiDriverClient) NodeGetVolumeStats(ctx context.Context, volID string, targetPath string) (*volume.Metrics, error) {
klog.V(4).Info(log("calling NodeGetVolumeStats rpc: [volid=%s, target_path=%s", volID, targetPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often is this called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As often as prometheus is configured to scrap the metrics endpoint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to scrap any metric for aws efs csi , not sure whether rpc call was executed or not.

for _, usage := range usages {
unit := usage.GetUnit()
switch unit.String() {
case "BYTES":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to consts.

metrics.Inodes = resource.NewQuantity(usage.GetTotal(), resource.BinarySI)
metrics.InodesUsed = resource.NewQuantity(usage.GetUsed(), resource.BinarySI)
default:
klog.Errorf("unknown key %s in usage", unit.String())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that CSI spec added more that we don't support yet?

Should it be a warning, or even Info at level 5 if we expect this to flood logs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should ideally never happen and if it does, it is probably an error and worth either fixing k8s or CSI driver.

@@ -408,6 +408,8 @@ func (p *csiPlugin) NewMounter(
}
klog.V(4).Info(log("created path successfully [%s]", dataDir))

mounter.MetricsProvider = NewMetricsCsi(volumeHandle, dataDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the path be dir or dataDir? dir is the one we pass as the targetPath to NodePublish?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, this indeed should be dir and not dataDir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure to validate this with a real CSI driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the path.

Can we make sure to validate this with a real CSI driver?

Yes, thats the very next plan. I dont think, the CSI drivers exist which expose the metrics. I have started the work for Ceph-CSI any way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be also tested via CSI mock driver fwiw. Mock driver can implement NodeGetVolumeStats RPC call and it can check for volume path to be same as publish path and return error if it isn't.

This has to go in an e2e though. @humblec can you add a github issue for creating an e2e for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @gnufied . I am opening an issue for e2e tracking as discussed earlier..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied The E2E issue is here #77933 . Thanks !

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2019
@humblec
Copy link
Contributor Author

humblec commented May 15, 2019

@msau42 the final review comments are addressed. PTAL .. Thanks !!

@gnufied
Copy link
Member

gnufied commented May 15, 2019

@humblec this PR was discussed in wg-csi-implementation stadnup and it was agreed that we should try and manually validate if CSI volume stats are being available through kubelet metrics if using a real CSI driver (even mock or hostpath driver would do).

We can do e2e as a follow up but could we do the manual validation for ANY CSI driver?

@msau42
Copy link
Member

msau42 commented May 15, 2019

/approve
/hold
for testing with a real csi driver

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: humblec, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2019
@humblec
Copy link
Contributor Author

humblec commented May 17, 2019

@humblec this PR was discussed in wg-csi-implementation stadnup and it was agreed that we should try and manually validate if CSI volume stats are being available through kubelet metrics if using a real CSI driver (even mock or hostpath driver would do).
We can do e2e as a follow up but could we do the manual validation for ANY CSI driver?

Sure @gnufied and @msau42 . I patched hostpath driver to return a handcrafted response for this RPC call ( also note/confirm the path /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount" in the request) and tested out. The final result is that, I could see the metrics in kubelet. Below is an example output for the RPC call and also the metrics in kubelet.

Logs from CSI driver:

I0516 15:55:17.079907       1 server.go:117] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0516 15:55:17.079920       1 server.go:118] GRPC request: {"volume_id":"c0419fd7-77f2-11e9-b47e-c85b7636c232","volume_path":"/var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount"}
I0516 15:55:17.080500       1 mount_linux.go:164] Detected OS without systemd
I0516 15:55:17.080510       1 nodeserver.go:302] VolumeStats: stats on targetpath/volumeID /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount/c0419fd7-77f2-11e9-b47e-c85b7636c232 has been requested.
E0516 15:55:17.080519       1 nodeserver.go:304] VolumeStats: stats on targetpath/volumeID /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount/c0419fd7-77f2-11e9-b47e-c85b7636c232 has been requested.
I0516 15:55:17.080784       1 nodeserver.go:316] targetpath /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount: already mounted
I0516 15:55:17.080794       1 server.go:123] GRPC response: {"usage":[{"total":10,"unit":1}]}

Logs/stats colllected in kubelet!


kubelet_volume_stats_available_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_capacity_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 10
kubelet_volume_stats_inodes{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_inodes_free{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_inodes_used{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_used_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 0

Hope this is good to go in and declare the metrics support in CSI !!.. Thanks a lot!

and implement Metrics Provider for CSI driver

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
@gnufied
Copy link
Member

gnufied commented May 17, 2019

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 17, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit fad34e4 into kubernetes:master May 18, 2019
humblec added a commit to humblec/org that referenced this pull request Apr 1, 2020
irvifa pushed a commit to irvifa/org that referenced this pull request Jul 1, 2020
@Davidrjx
Copy link

@humblec this PR was discussed in wg-csi-implementation stadnup and it was agreed that we should try and manually validate if CSI volume stats are being available through kubelet metrics if using a real CSI driver (even mock or hostpath driver would do).
We can do e2e as a follow up but could we do the manual validation for ANY CSI driver?

Sure @gnufied and @msau42 . I patched hostpath driver to return a handcrafted response for this RPC call ( also note/confirm the path /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount" in the request) and tested out. The final result is that, I could see the metrics in kubelet. Below is an example output for the RPC call and also the metrics in kubelet.

Logs from CSI driver:

I0516 15:55:17.079907       1 server.go:117] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0516 15:55:17.079920       1 server.go:118] GRPC request: {"volume_id":"c0419fd7-77f2-11e9-b47e-c85b7636c232","volume_path":"/var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount"}
I0516 15:55:17.080500       1 mount_linux.go:164] Detected OS without systemd
I0516 15:55:17.080510       1 nodeserver.go:302] VolumeStats: stats on targetpath/volumeID /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount/c0419fd7-77f2-11e9-b47e-c85b7636c232 has been requested.
E0516 15:55:17.080519       1 nodeserver.go:304] VolumeStats: stats on targetpath/volumeID /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount/c0419fd7-77f2-11e9-b47e-c85b7636c232 has been requested.
I0516 15:55:17.080784       1 nodeserver.go:316] targetpath /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount: already mounted
I0516 15:55:17.080794       1 server.go:123] GRPC response: {"usage":[{"total":10,"unit":1}]}

Logs/stats colllected in kubelet!


kubelet_volume_stats_available_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_capacity_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 10
kubelet_volume_stats_inodes{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_inodes_free{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_inodes_used{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_used_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 0

Hope this is good to go in and declare the metrics support in CSI !!.. Thanks a lot!

hi @humblec
did you hard-coded specific pv in driver and execute GetVolumeStats rpc call?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants