Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet fails on unmount when IO errors #67072

Closed
flyingcougar opened this issue Aug 7, 2018 · 10 comments
Closed

kubelet fails on unmount when IO errors #67072

flyingcougar opened this issue Aug 7, 2018 · 10 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@flyingcougar
Copy link

flyingcougar commented Aug 7, 2018

/kind bug

What happened:
POD called to be deleted is stuck infinitely on 'Terminating' while it has a volume mounted with flexvolume driver and XFS filesystem was shutdown due to IO error on the drive.
Kubelet log:

E0807 10:15:28.651952   21655 nestedpendingoperations.go:267] Operation for "\"flexvolume-k8s/dummy/d3d357a1-9a27-11e8-b11b-525400daa710-dummy\" (\"d3d357a1-9a27-11e8-b11b-525400daa710\")" failed. No retries permitted until 2018-08-07 10:17:30.651880959 +0000 UTC m=+71706.754210097 (durationBeforeRetry 2m2s). Error: "UnmountVolume.TearDown failed for volume \"dummy\" (UniqueName: \"flexvolume-k8s/dummy/d3d357a1-9a27-11e8-b11b-525400daa710-dummy\") pod \"d3d357a1-9a27-11e8-b11b-525400daa710\" (UID: \"d3d357a1-9a27-11e8-b11b-525400daa710\") : Error checking if path exists: stat /var/lib/kubelet/pods/d3d357a1-9a27-11e8-b11b-525400daa710/volumes/k8s~dummy/dummy: input/output error"

What you expected to happen:
POD is Terminated.

How to reproduce it (as minimally and precisely as possible):

  • schedule a POD with flexvolume mounted XFS filesystem
  • simulate an IO error by injecting fault into the drive where mounted filesystem was placed (here is how I did this https://lxadm.com/Using_fault_injection)
  • perform some read/write operation on the filesystem until they start failing with input/output errors and you see a kernel log similar to this:
[76280.277926] FAULT_INJECTION: forcing a failure.
name fail_make_request, interval 100, probability 10, space 0, times -1
[76280.277935] CPU: 0 PID: 1397 Comm: kworker/0:0 Tainted: G               ------------ T 3.10.0-862.9.1.el7.x86_64.debug #1
[76280.277938] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[76280.277947] Workqueue: kdmflush dm_wq_work [dm_mod]
[76280.277950] Call Trace:
[76280.277965]  [<ffffffffa89e9211>] dump_stack+0x19/0x1b
[76280.277971]  [<ffffffffa85f2a7c>] should_fail+0x13c/0x160
[76280.277976]  [<ffffffffa858aa4c>] generic_make_request_checks+0x24c/0x520
[76280.277981]  [<ffffffffa8443eb8>] ? kmem_cache_alloc+0x128/0x3e0
[76280.277985]  [<ffffffffa858d580>] generic_make_request+0x30/0x420
[76280.277993]  [<ffffffffc0228d75>] __map_bio+0x135/0x250 [dm_mod]
[76280.278002]  [<ffffffffc0227680>] ? queue_io+0x80/0x80 [dm_mod]
[76280.278010]  [<ffffffffc0229097>] __clone_and_map_data_bio+0x177/0x280 [dm_mod]
[76280.278018]  [<ffffffffc0229531>] __split_and_process_bio+0x391/0x650 [dm_mod]
[76280.278026]  [<ffffffffc022a20e>] dm_wq_work+0x11e/0x150 [dm_mod]
[76280.278037]  [<ffffffffa82cda3c>] process_one_work+0x22c/0x720
[76280.278042]  [<ffffffffa82cd9ca>] ? process_one_work+0x1ba/0x720
[76280.278048]  [<ffffffffa82ce056>] worker_thread+0x126/0x3b0
[76280.278062]  [<ffffffffa82cdf30>] ? process_one_work+0x720/0x720
[76280.278066]  [<ffffffffa82d6a2f>] kthread+0xef/0x100
[76280.278071]  [<ffffffffa82d6940>] ? insert_kthread_work+0x80/0x80
[76280.278077]  [<ffffffffa89ff177>] ret_from_fork_nospec_begin+0x21/0x21
[76280.278081]  [<ffffffffa82d6940>] ? insert_kthread_work+0x80/0x80
[76280.278428] XFS (dm-2): metadata I/O error: block 0x19059 ("xlog_iodone") error 5 numblks 64
[76280.279200] XFS (dm-2): xfs_do_force_shutdown(0x2) called from line 1222 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffc0377403
[76280.279536] XFS (dm-2): Log I/O Error Detected.  Shutting down filesystem
[76280.280150] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
  • delete the POD

Anything else we need to know?:

  • debug kernel is required in order to be able to use fault injection

Environment:

  • Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
  • Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: 1VM on local virtualbox
  • OS (e.g. from /etc/os-release): centos7
  • Kernel (e.g. uname -a): 3.10.0-862.9.1.el7.x86_64.debug
  • Install tools: kubeadm
  • Others: dummy flexvolume driver modified to use xfs instead of tmpfs, debug kernel
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Aug 7, 2018
@flyingcougar
Copy link
Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 7, 2018
@flyingcougar flyingcougar changed the title kubelet fails on kubelet fails on unmount when IO errors Aug 7, 2018
@gnufied
Copy link
Member

gnufied commented Aug 10, 2018

Looks like related to - #67132 ?

@flyingcougar
Copy link
Author

@gnufied yes its related but fix proposed by @nkkashyap wont help me :(
I am getting false and err!=nil from util.PathExists(dir)

@gnufied
Copy link
Member

gnufied commented Aug 10, 2018

@flyingcougar what is the error value? or put it other way - you said err != nil . What is the value of it?

@mcgrof
Copy link

mcgrof commented Aug 13, 2018

Can you write an xfstest for this?

@gnufied
Copy link
Member

gnufied commented Aug 13, 2018

Also - #67097

@chakri-nelluri
Copy link
Contributor

Fixed by #67097

@chakri-nelluri
Copy link
Contributor

/assign @chakri-nelluri

@chakri-nelluri
Copy link
Contributor

Duplicate of #66868

@chakri-nelluri
Copy link
Contributor

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

5 participants