Skip to content

Commit 965d616

Browse files
joshhunt-akamaigregkh
authored andcommitted
md/raid10: fix deadlock with check operation and nowait requests
commit 7d96f31 upstream. When an array check is running it will raise the barrier at which point normal requests will become blocked and increment the nr_pending value to signal there is work pending inside of wait_barrier(). NOWAIT requests do not block and so will return immediately with an error, and additionally do not increment nr_pending in wait_barrier(). Upstream change commit 43806c3 ("raid10: cleanup memleak at raid10_make_request") added a call to raid_end_bio_io() to fix a memory leak when NOWAIT requests hit this condition. raid_end_bio_io() eventually calls allow_barrier() and it will unconditionally do an atomic_dec_and_test(&conf->nr_pending) even though the corresponding increment on nr_pending didn't happen in the NOWAIT case. This can be easily seen by starting a check operation while an application is doing nowait IO on the same array. This results in a deadlocked state due to nr_pending value underflowing and so the md resync thread gets stuck waiting for nr_pending to == 0. Output of r10conf state of the array when we hit this condition: crash> struct r10conf barrier = 1, nr_pending = { counter = -41 }, nr_waiting = 15, nr_queued = 0, Example of md_sync thread stuck waiting on raise_barrier() and other requests stuck in wait_barrier(): md1_resync [<0>] raise_barrier+0xce/0x1c0 [<0>] raid10_sync_request+0x1ca/0x1ed0 [<0>] md_do_sync+0x779/0x1110 [<0>] md_thread+0x90/0x160 [<0>] kthread+0xbe/0xf0 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1a/0x30 kworker/u1040:2+flush-253:4 [<0>] wait_barrier+0x1de/0x220 [<0>] regular_request_wait+0x30/0x180 [<0>] raid10_make_request+0x261/0x1000 [<0>] md_handle_request+0x13b/0x230 [<0>] __submit_bio+0x107/0x1f0 [<0>] submit_bio_noacct_nocheck+0x16f/0x390 [<0>] ext4_io_submit+0x24/0x40 [<0>] ext4_do_writepages+0x254/0xc80 [<0>] ext4_writepages+0x84/0x120 [<0>] do_writepages+0x7a/0x260 [<0>] __writeback_single_inode+0x3d/0x300 [<0>] writeback_sb_inodes+0x1dd/0x470 [<0>] __writeback_inodes_wb+0x4c/0xe0 [<0>] wb_writeback+0x18b/0x2d0 [<0>] wb_workfn+0x2a1/0x400 [<0>] process_one_work+0x149/0x330 [<0>] worker_thread+0x2d2/0x410 [<0>] kthread+0xbe/0xf0 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1a/0x30 Fixes: 43806c3 ("raid10: cleanup memleak at raid10_make_request") Cc: stable@vger.kernel.org Signed-off-by: Josh Hunt <johunt@akamai.com> Link: https://lore.kernel.org/linux-raid/20260303005619.1352958-1-johunt@akamai.com Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent 222055e commit 965d616

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

drivers/md/raid10.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1204,7 +1204,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
12041204
}
12051205

12061206
if (!regular_request_wait(mddev, conf, bio, r10_bio->sectors)) {
1207-
raid_end_bio_io(r10_bio);
1207+
free_r10bio(r10_bio);
12081208
return;
12091209
}
12101210

@@ -1425,7 +1425,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
14251425

14261426
sectors = r10_bio->sectors;
14271427
if (!regular_request_wait(mddev, conf, bio, sectors)) {
1428-
raid_end_bio_io(r10_bio);
1428+
free_r10bio(r10_bio);
14291429
return;
14301430
}
14311431

0 commit comments

Comments
 (0)