Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: Implement NodeServiceCapability_RPC_GET_VOLUME_STATS rpc call #76188

Merged
merged 1 commit into from May 18, 2019

Conversation

@humblec
Copy link
Contributor

commented Apr 5, 2019

Signed-off-by: Humble Chirammal hchiramm@redhat.com

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Expose CSI volume stats via kubelet volume metrics
@humblec

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

/retest

@humblec humblec force-pushed the humblec:csi-in-m branch from 6a2a39f to c16747d Apr 5, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

/assign @gnufied

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

/release-note-none

@humblec humblec force-pushed the humblec:csi-in-m branch 2 times, most recently from 0e44d82 to 9dca72a Apr 5, 2019

@humblec humblec changed the title csi: Add capability check for NodeServiceCapability_RPC_GET_VOLUME_STATS csi: Implement NodeServiceCapability_RPC_GET_VOLUME_STATS rpc call Apr 5, 2019

@humblec humblec force-pushed the humblec:csi-in-m branch from 70bfdd4 to 94b8697 Apr 5, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

@gnufied Can you please take a look at this PR ? :)

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

/assign @rootfs

@humblec humblec force-pushed the humblec:csi-in-m branch from 94b8697 to 1f87855 Apr 5, 2019

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Apr 5, 2019

@humblec humblec force-pushed the humblec:csi-in-m branch 3 times, most recently from f6607c0 to e6604b0 Apr 6, 2019

if (*(metrics.Inodes)).Cmp(*inodes) != 0 {
t.Fatalf("for %s: error: expected :%v , got: %v", tc.name, *inodes, *(metrics.Inodes))
}
if (*(metrics.InodesUsed)).Cmp(*usedInodes) != 0 {

This comment has been minimized.

Copy link
@gnufied

gnufied May 14, 2019

Member

same here about use of Cmp function.

This comment has been minimized.

Copy link
@humblec

humblec May 14, 2019

Author Contributor

Done.

@humblec humblec force-pushed the humblec:csi-in-m branch from b920d17 to ce1e86f May 14, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2019

One tiny nit and please squash all the commits in one. rest looks good to me.

Sure, its done in latest patch set.. Thanks @gnufied !!

@gnufied

This comment has been minimized.

Copy link
Member

commented May 14, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 14, 2019

@gnufied

This comment has been minimized.

Copy link
Member

commented May 14, 2019

/assign @msau42

@@ -841,3 +849,94 @@ func (c *csiClientGetter) Get() (csiClient, error) {
c.csiClient = csi
return c.csiClient, nil
}

func (c *csiDriverClient) NodeSupportsVolumeStats(ctx context.Context) (bool, error) {
klog.V(4).Info(log("calling NodeGetCapabilities rpc to determine if NodeSupportsVolumeStats"))

This comment has been minimized.

Copy link
@msau42

msau42 May 14, 2019

Member

do we really need this log? Can we make it higher like 5?

This comment has been minimized.

Copy link
@humblec

humblec May 15, 2019

Author Contributor

lifted to V(5).

}

func (c *csiDriverClient) NodeGetVolumeStats(ctx context.Context, volID string, targetPath string) (*volume.Metrics, error) {
klog.V(4).Info(log("calling NodeGetVolumeStats rpc: [volid=%s, target_path=%s", volID, targetPath))

This comment has been minimized.

Copy link
@msau42

msau42 May 14, 2019

Member

How often is this called?

This comment has been minimized.

Copy link
@gnufied

gnufied May 14, 2019

Member

As often as prometheus is configured to scrap the metrics endpoint.

for _, usage := range usages {
unit := usage.GetUnit()
switch unit.String() {
case "BYTES":

This comment has been minimized.

Copy link
@msau42

msau42 May 14, 2019

Member

Should this use the consts defined here:

This comment has been minimized.

Copy link
@humblec

humblec May 15, 2019

Author Contributor

Converted to consts.

metrics.Inodes = resource.NewQuantity(usage.GetTotal(), resource.BinarySI)
metrics.InodesUsed = resource.NewQuantity(usage.GetUsed(), resource.BinarySI)
default:
klog.Errorf("unknown key %s in usage", unit.String())

This comment has been minimized.

Copy link
@msau42

msau42 May 14, 2019

Member

This means that CSI spec added more that we don't support yet?

Should it be a warning, or even Info at level 5 if we expect this to flood logs?

This comment has been minimized.

Copy link
@gnufied

gnufied May 14, 2019

Member

This should ideally never happen and if it does, it is probably an error and worth either fixing k8s or CSI driver.

@@ -408,6 +408,8 @@ func (p *csiPlugin) NewMounter(
}
klog.V(4).Info(log("created path successfully [%s]", dataDir))

mounter.MetricsProvider = NewMetricsCsi(volumeHandle, dataDir)

This comment has been minimized.

Copy link
@msau42

msau42 May 14, 2019

Member

Should the path be dir or dataDir? dir is the one we pass as the targetPath to NodePublish?

This comment has been minimized.

Copy link
@gnufied

gnufied May 14, 2019

Member

good catch, this indeed should be dir and not dataDir.

This comment has been minimized.

Copy link
@msau42

msau42 May 14, 2019

Member

Can we make sure to validate this with a real CSI driver?

This comment has been minimized.

Copy link
@humblec

humblec May 15, 2019

Author Contributor

Replaced the path.

Can we make sure to validate this with a real CSI driver?

Yes, thats the very next plan. I dont think, the CSI drivers exist which expose the metrics. I have started the work for Ceph-CSI any way.

This comment has been minimized.

Copy link
@gnufied

gnufied May 15, 2019

Member

This can be also tested via CSI mock driver fwiw. Mock driver can implement NodeGetVolumeStats RPC call and it can check for volume path to be same as publish path and return error if it isn't.

This has to go in an e2e though. @humblec can you add a github issue for creating an e2e for this?

This comment has been minimized.

Copy link
@humblec

humblec May 15, 2019

Author Contributor

Sure @gnufied . I am opening an issue for e2e tracking as discussed earlier..

This comment has been minimized.

Copy link
@humblec

humblec May 15, 2019

Author Contributor

@gnufied The E2E issue is here #77933 . Thanks !

@humblec humblec force-pushed the humblec:csi-in-m branch from ce1e86f to a361a87 May 15, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 15, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

@msau42 the final review comments are addressed. PTAL .. Thanks !!

@gnufied

This comment has been minimized.

Copy link
Member

commented May 15, 2019

@humblec this PR was discussed in wg-csi-implementation stadnup and it was agreed that we should try and manually validate if CSI volume stats are being available through kubelet metrics if using a real CSI driver (even mock or hostpath driver would do).

We can do e2e as a follow up but could we do the manual validation for ANY CSI driver?

@msau42

This comment has been minimized.

Copy link
Member

commented May 15, 2019

/approve
/hold
for testing with a real csi driver

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: humblec, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@humblec humblec force-pushed the humblec:csi-in-m branch from a361a87 to 656d83b May 17, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

@humblec this PR was discussed in wg-csi-implementation stadnup and it was agreed that we should try and manually validate if CSI volume stats are being available through kubelet metrics if using a real CSI driver (even mock or hostpath driver would do).
We can do e2e as a follow up but could we do the manual validation for ANY CSI driver?

Sure @gnufied and @msau42 . I patched hostpath driver to return a handcrafted response for this RPC call ( also note/confirm the path /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount" in the request) and tested out. The final result is that, I could see the metrics in kubelet. Below is an example output for the RPC call and also the metrics in kubelet.

Logs from CSI driver:

I0516 15:55:17.079907       1 server.go:117] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0516 15:55:17.079920       1 server.go:118] GRPC request: {"volume_id":"c0419fd7-77f2-11e9-b47e-c85b7636c232","volume_path":"/var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount"}
I0516 15:55:17.080500       1 mount_linux.go:164] Detected OS without systemd
I0516 15:55:17.080510       1 nodeserver.go:302] VolumeStats: stats on targetpath/volumeID /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount/c0419fd7-77f2-11e9-b47e-c85b7636c232 has been requested.
E0516 15:55:17.080519       1 nodeserver.go:304] VolumeStats: stats on targetpath/volumeID /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount/c0419fd7-77f2-11e9-b47e-c85b7636c232 has been requested.
I0516 15:55:17.080784       1 nodeserver.go:316] targetpath /var/lib/kubelet/pods/4891f5a7-2c72-4f33-8e4b-88ba63b5eb1a/volumes/kubernetes.io~csi/pvc-b1e01c02-063a-467c-9f88-621cadbf780f/mount: already mounted
I0516 15:55:17.080794       1 server.go:123] GRPC response: {"usage":[{"total":10,"unit":1}]}

Logs/stats colllected in kubelet!


kubelet_volume_stats_available_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_capacity_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 10
kubelet_volume_stats_inodes{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_inodes_free{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_inodes_used{namespace="default",persistentvolumeclaim="csi-pvc"} 0
kubelet_volume_stats_used_bytes{namespace="default",persistentvolumeclaim="csi-pvc"} 0

Hope this is good to go in and declare the metrics support in CSI !!.. Thanks a lot!

csi: Implement NodeServiceCapability_RPC_GET_VOLUME_STATS rpc call
and implement Metrics Provider for CSI driver

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>

@humblec humblec force-pushed the humblec:csi-in-m branch from 656d83b to c511c90 May 17, 2019

@gnufied

This comment has been minimized.

Copy link
Member

commented May 17, 2019

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm and removed do-not-merge/hold labels May 17, 2019

@fejta-bot

This comment has been minimized.

Copy link

commented May 17, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit fad34e4 into kubernetes:master May 18, 2019

19 of 20 checks passed

pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation humblec authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Job succeeded.
Details
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.