New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: Implement NodeServiceCapability_RPC_GET_VOLUME_STATS rpc call #76188
Conversation
/retest |
/assign @gnufied |
/release-note-none |
0e44d82
to
9dca72a
Compare
@gnufied Can you please take a look at this PR ? :) |
/assign @rootfs |
f6607c0
to
e6604b0
Compare
/assign @msau42 |
pkg/volume/csi/csi_client.go
Outdated
@@ -841,3 +849,94 @@ func (c *csiClientGetter) Get() (csiClient, error) { | |||
c.csiClient = csi | |||
return c.csiClient, nil | |||
} | |||
|
|||
func (c *csiDriverClient) NodeSupportsVolumeStats(ctx context.Context) (bool, error) { | |||
klog.V(4).Info(log("calling NodeGetCapabilities rpc to determine if NodeSupportsVolumeStats")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this log? Can we make it higher like 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lifted to V(5).
} | ||
|
||
func (c *csiDriverClient) NodeGetVolumeStats(ctx context.Context, volID string, targetPath string) (*volume.Metrics, error) { | ||
klog.V(4).Info(log("calling NodeGetVolumeStats rpc: [volid=%s, target_path=%s", volID, targetPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often is this called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As often as prometheus is configured to scrap the metrics endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unable to scrap any metric for aws efs csi , not sure whether rpc call was executed or not.
pkg/volume/csi/csi_client.go
Outdated
for _, usage := range usages { | ||
unit := usage.GetUnit() | ||
switch unit.String() { | ||
case "BYTES": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the consts defined here: https://github.com/kubernetes/kubernetes/blob/ce1e86fb99a8e36912ebb48b226ebb1bc6a99797/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go#L235
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted to consts.
metrics.Inodes = resource.NewQuantity(usage.GetTotal(), resource.BinarySI) | ||
metrics.InodesUsed = resource.NewQuantity(usage.GetUsed(), resource.BinarySI) | ||
default: | ||
klog.Errorf("unknown key %s in usage", unit.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that CSI spec added more that we don't support yet?
Should it be a warning, or even Info at level 5 if we expect this to flood logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ideally never happen and if it does, it is probably an error and worth either fixing k8s or CSI driver.
pkg/volume/csi/csi_plugin.go
Outdated
@@ -408,6 +408,8 @@ func (p *csiPlugin) NewMounter( | |||
} | |||
klog.V(4).Info(log("created path successfully [%s]", dataDir)) | |||
|
|||
mounter.MetricsProvider = NewMetricsCsi(volumeHandle, dataDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the path be dir or dataDir? dir is the one we pass as the targetPath to NodePublish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, this indeed should be dir
and not dataDir
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure to validate this with a real CSI driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the path.
Can we make sure to validate this with a real CSI driver?
Yes, thats the very next plan. I dont think, the CSI drivers exist which expose the metrics. I have started the work for Ceph-CSI any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be also tested via CSI mock driver fwiw. Mock driver can implement NodeGetVolumeStats
RPC call and it can check for volume path to be same as publish path and return error if it isn't.
This has to go in an e2e though. @humblec can you add a github issue for creating an e2e for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @gnufied . I am opening an issue for e2e tracking as discussed earlier..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 the final review comments are addressed. PTAL .. Thanks !! |
@humblec this PR was discussed in wg-csi-implementation stadnup and it was agreed that we should try and manually validate if CSI volume stats are being available through kubelet metrics if using a real CSI driver (even mock or hostpath driver would do). We can do e2e as a follow up but could we do the manual validation for ANY CSI driver? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: humblec, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Sure @gnufied and @msau42 . I patched hostpath driver to return a handcrafted response for this RPC call ( also note/confirm the path Logs from CSI driver:
Logs/stats colllected in kubelet!
Hope this is good to go in and declare the metrics support in CSI !!.. Thanks a lot! |
and implement Metrics Provider for CSI driver Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
/hold cancel |
/retest Review the full test history for this PR. Silence the bot with an |
Justification: Maintainer/Owner of kubernetes volume plugins: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/glusterfs/OWNERS https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/iscsi/OWNERS Lead contributor @ https://github.com/kubernetes-incubator/external-storage/graphs/contributors Contributed CSI features like metrics support: kubernetes/kubernetes#76188 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Justification: Maintainer/Owner of kubernetes volume plugins: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/glusterfs/OWNERS https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/iscsi/OWNERS Lead contributor @ https://github.com/kubernetes-incubator/external-storage/graphs/contributors Contributed CSI features like metrics support: kubernetes/kubernetes#76188 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
hi @humblec |
Signed-off-by: Humble Chirammal hchiramm@redhat.com
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: