Record a volume processing failure event for pod while volume mgr start to process volumes. #58272

xingzhou · 2018-01-15T07:46:46Z

Happened to see this issue when using FlexVolume.

If we configured pod with a wrong name of FlexVolume, the volume manager will only report a time out event with no error details like the following:

Warning  FailedMount            2s                kubelet, 127.0.0.1  Unable to mount volumes for pod "nginx_default(58e1c30f-f9c7-11e7-b72e-080027e3793c)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx". list of unmounted volumes=[test]. list of unattached volumes=[test default-token-sn7tj]

So user needs to check the kubelet log and see a lot of errors like:

I0115 07:41:56.386911   10237 server.go:231] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"nginx", UID:"58e1c30f-f9c7-11e7-b72e-080027e3793c", APIVersion:"v1", ResourceVersion:"302", FieldPath:""}): type: 'Warning' reason: 'FailedProcessVolume' Error processing volume "test" for pod "nginx_default(58e1c30f-f9c7-11e7-b72e-080027e3793c)": failed to get Plugin from volumeSpec for volume "pv2" err=no volume plugin matched

At present, this single time out event is not good enough for user to detect what had happened:
User have to wait until volume attach timeout so that he/she can know there is an error with the volume.
Time out event does not show any details of the volume attach errors

Need to record volume processing events like “no match volume plugin” when volume manager reports the error.

/sig storage
/assign

The text was updated successfully, but these errors were encountered:

pospispa · 2018-01-24T15:14:35Z

So user needs to check the kubelet log and see a lot of errors

@xingzhou you mentioned here that there are several log messages per second for the same pod for the same problem.

Please, do you consider the several log messages per second to be an issue?

xingzhou · 2018-01-25T05:39:36Z

This is because by this fixed interval that DSW populator will loop all the pods and check the unprocessed volumes. In terms of log, I think it should be ok, as log should record the actual tracking of the code execution, but for event, I'm thinking it is too often as the event should report only 1 msg to people.

pospispa · 2018-01-25T10:42:05Z

I'm thinking it is too often as the event should report only 1 msg to people.

I agree with 1 msg to people.

In terms of log, I think it should be ok, as log should record the actual tracking of the code execution

Well, the DSW populator checks the unprocessed volumes several times per second. Would it be better to check them less frequently? An error that may occur can be basically permanent, e.g. in case PVC Protection is enabled and a pod is using PVC that is being deleted.

So the log may be flooded with error messages for volumes that weren't processed successfully.

fejta-bot · 2018-04-25T11:08:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-05-25T11:55:05Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-06-24T12:42:28Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 15, 2018

k8s-ci-robot assigned xingzhou Jan 15, 2018

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2018

xingzhou mentioned this issue Jan 15, 2018

Log volume process failure event to pod. #58273

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2018

k8s-ci-robot closed this as completed Jun 24, 2018

tsmetana mentioned this issue Aug 2, 2018

Log volume process failure event to pod. #66897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record a volume processing failure event for pod while volume mgr start to process volumes. #58272

Record a volume processing failure event for pod while volume mgr start to process volumes. #58272

xingzhou commented Jan 15, 2018

pospispa commented Jan 24, 2018

xingzhou commented Jan 25, 2018 •

edited

Loading

pospispa commented Jan 25, 2018

fejta-bot commented Apr 25, 2018

fejta-bot commented May 25, 2018

fejta-bot commented Jun 24, 2018

Record a volume processing failure event for pod while volume mgr start to process volumes. #58272

Record a volume processing failure event for pod while volume mgr start to process volumes. #58272

Comments

xingzhou commented Jan 15, 2018

pospispa commented Jan 24, 2018

xingzhou commented Jan 25, 2018 • edited Loading

pospispa commented Jan 25, 2018

fejta-bot commented Apr 25, 2018

fejta-bot commented May 25, 2018

fejta-bot commented Jun 24, 2018

xingzhou commented Jan 25, 2018 •

edited

Loading