New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a panic for in-tree drivers that partialy support Block volume metrics #101587
Fix a panic for in-tree drivers that partialy support Block volume metrics #101587
Conversation
Hi @nixpanic. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gnufied, you expressed interest in adding a metrics test for BlockMode volumes, so assigning you already, before others have posted review comments. /assign gnufied |
/ok-to-test |
The Azure jobs (not required to pass) seem to have failed due to something unrelated:
|
kubernetes-sigs/azurefile-csi-driver#645 would fix the issue soon |
/retest |
/kind bug Without this change, some of the in-tree storage drivers can cause a panic. See #101431 (comment) for more details. |
That should be mentioned in the release note section of the PR description. |
Hmm - we are running into golang limitations. :-) But this code has broken in past and somehow gets past review process and hence I think is in urgent need of fixing. I am thinking we should tweak the way we are embedding interfaces in volume plugin interface such that it is possible to query if type BlockVolume interface {
GetGlobalMapPath(spec *Spec) (string, error)
GetPodDeviceMapPath() (string, string)
// if MetricsProvider is set return it , otherwise nil
GetMetricsProvider() MetricsProvider
MetricsProvider
} This way, it should be possible to write code like: if v.GetMetricsProvider() != nil {
fmt.Printf("name is: %s\n", g.GetMetrics())
} That is just one idea though, there could be other ways of solving this. But I do think - this code is bit fragile. :( |
Similar to how NewMetricsStatFS() works, the new NewMetricsBlock() provides the GetMetrics() interface for Block volumes. Additional metrics for Block volumes are difficult to gather. There is no guarantee that there is a filesystem on the volume, which makes most of the volume metrics useless. Advanced storage might be able to detect the actual consumption (when thin-provisioned) vs the capacity. However, this is out of the scope for a standard helper function and requires intimate knowledge of the used storage system.
PR kubernetes#97972 added support for gathering metrics for Block PVCs provided by CSI drivers. The in-tree drivers can support at leas the most basic metric; Capacity.
The in-tree drivers support gathering the capacity of the Block volume. Make sure that Kubelet exposes these for the matching PVCs.
af15cd2
to
62fc2b6
Compare
Thanks for the idea, @gnufied! Instead of |
Volumes that are provisioned with `VolumeMode: Block` often have a MetrucsProvider interface declared in their type. However, the MetricsProvider should implement a GetMetrics() function. In the cases where the storage drivers do not implement GetMetrics(), a panic can occur. Usual type-assertions are not sufficient in this case. All assertions assume the interface is present. There is no straight forward way to verify that a valid GetMetrics() function is provided. By adding SupportsMetrics(), storage driver implementations require careful reviewing for metrics support.
62fc2b6
to
b997e0e
Compare
/retest |
@nixpanic: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/priority important-longterm
Kubelet changes LGTM.
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, mrunalp, nixpanic The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
/sig storage
What this PR does / why we need it:
#97972 added support for gathering metrics for Block volumes provided by CSI drivers. The current in-tree drivers that support Block volumes can return at least the Capacity of the block-device.
CSI drivers are currently not tested for metrics gathering in the e2e framework. Adding support for this in the in-tree drivers makes it possible to verify the functionality and prevent regressions.
Which issue(s) this PR fixes:
Fixes #101431
Special notes for your reviewer:
Block volume metrics detection is quite limited. Except for the Capacity (size of the volume), there is little that can be gathered with standard tools. The contents of a Block volume can not be inspected like a filesystem. In theory drivers could thin-provision (like a sparse file) volumes, and provide Used/Available in addition to the Capacity. However, this needs access and details knowledge of the storage platform, and can not be detected with standard tools.
Does this PR introduce a user-facing change?