generic/068 hung_task while thawing file system #56

naota · 2021-08-06T10:10:24Z

Running generic/068 on dm-linear mapped SMR device (40GB = 256MB * 160 sequential write required zones) hang while thawing the file system.

naota · 2021-08-06T10:11:01Z

Full dmesg log

Aug 06 09:27:17 unknown: run fstests generic/068 at 2021-08-06 00:27:17
Aug 06 09:27:18 kernel: BTRFS: device fsid a7c0c778-270b-41b6-be4c-24dfc4300a05 devid 1 transid 5 /dev/mapper/scratch scanned by mkfs.btrfs (4365)
Aug 06 09:27:18 kernel: BTRFS info (device dm-0): has skinny extents
Aug 06 09:27:18 kernel: BTRFS info (device dm-0): flagging fs with big metadata feature
Aug 06 09:27:19 kernel: BTRFS info (device dm-0): host-managed zoned block device /dev/mapper/scratch, 160 zones of 268435456 bytes
Aug 06 09:27:19 kernel: BTRFS info (device dm-0): zoned mode enabled with zone size 268435456
Aug 06 09:27:19 kernel: BTRFS info (device dm-0): checking UUID tree
Aug 06 09:27:28 kernel: BTRFS info (device dm-0): reclaiming chunk 1073741824 with 3% used 96% unusable
Aug 06 09:27:28 kernel: BTRFS info (device dm-0): relocating block group 1073741824 flags data
Aug 06 09:27:36 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Aug 06 09:27:36 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 06 09:27:37 audit: BPF prog-id=102 op=UNLOAD
Aug 06 09:27:37 audit: BPF prog-id=101 op=UNLOAD
Aug 06 09:28:17 systemd[2216]: Starting Mark boot as successful...
Aug 06 09:28:17 systemd[2216]: Finished Mark boot as successful.
Aug 06 09:30:14 kernel: INFO: task kworker/u64:0:2205 blocked for more than 122 seconds.
Aug 06 09:30:14 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:14 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:14 kernel: task:kworker/u64:0   state:D stack:    0 pid: 2205 ppid:     2 flags:0x00004000
Aug 06 09:30:14 kernel: Workqueue: events_unbound btrfs_reclaim_bgs_work [btrfs]
Aug 06 09:30:14 kernel: Call Trace:
Aug 06 09:30:14 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:14 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:14 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:14 kernel:  ? _raw_spin_unlock_irqrestore+0x31/0x40
Aug 06 09:30:14 kernel:  schedule+0xea/0x290
Aug 06 09:30:14 kernel:  btrfs_start_ordered_extent+0x2d0/0x470 [btrfs]
Aug 06 09:30:14 kernel:  ? btrfs_wait_ordered_roots+0x6c0/0x6c0 [btrfs]
Aug 06 09:30:14 kernel:  ? mark_held_locks+0xad/0xf0
Aug 06 09:30:14 kernel:  ? finish_wait+0x280/0x280
Aug 06 09:30:14 kernel:  ? _raw_spin_unlock_irq+0x28/0x40
Aug 06 09:30:14 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:14 kernel:  btrfs_wait_ordered_range+0x25d/0x4a0 [btrfs]
Aug 06 09:30:14 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:14 kernel:  ? btrfs_run_ordered_extent_work+0x30/0x30 [btrfs]
Aug 06 09:30:14 kernel:  ? balance_dirty_pages_ratelimited+0x3dc/0xe80
Aug 06 09:30:14 kernel:  relocate_file_extent_cluster+0xeb4/0x1280 [btrfs]
Aug 06 09:30:14 kernel:  ? create_reloc_inode+0x930/0x930 [btrfs]
Aug 06 09:30:14 kernel:  ? kmem_cache_free+0x25a/0x300
Aug 06 09:30:14 kernel:  relocate_data_extent+0x1f8/0x400 [btrfs]
Aug 06 09:30:14 kernel:  relocate_block_group+0x9e2/0xd30 [btrfs]
Aug 06 09:30:14 kernel:  ? merge_reloc_roots+0x950/0x950 [btrfs]
Aug 06 09:30:14 kernel:  ? btrfs_wait_ordered_extents+0xf10/0xf10 [btrfs]
Aug 06 09:30:14 kernel:  btrfs_relocate_block_group+0x345/0x950 [btrfs]
Aug 06 09:30:14 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:14 kernel:  btrfs_relocate_chunk+0x8b/0x220 [btrfs]
Aug 06 09:30:14 kernel:  btrfs_reclaim_bgs_work.cold+0x13e/0x22a [btrfs]
Aug 06 09:30:14 kernel:  ? btrfs_mark_bg_unused+0x3b0/0x3b0 [btrfs]
Aug 06 09:30:14 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:14 kernel:  ? _raw_spin_unlock_irq+0x28/0x40
Aug 06 09:30:14 kernel:  process_one_work+0x7cd/0x1430
Aug 06 09:30:14 kernel:  ? pwq_dec_nr_in_flight+0x290/0x290
Aug 06 09:30:14 kernel:  worker_thread+0x59b/0x1050
Aug 06 09:30:14 kernel:  ? process_one_work+0x1430/0x1430
Aug 06 09:30:15 kernel:  kthread+0x38c/0x460
Aug 06 09:30:15 kernel:  ? set_kthread_struct+0x110/0x110
Aug 06 09:30:15 kernel:  ret_from_fork+0x22/0x30
Aug 06 09:30:15 kernel: INFO: task kworker/u64:6:3435 blocked for more than 123 seconds.
Aug 06 09:30:15 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:15 kernel: task:kworker/u64:6   state:D stack:    0 pid: 3435 ppid:     2 flags:0x00004000
Aug 06 09:30:15 kernel: Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
Aug 06 09:30:15 kernel: Call Trace:
Aug 06 09:30:15 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:15 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:15 kernel:  ? mark_held_locks+0xad/0xf0
Aug 06 09:30:15 kernel:  ? lock_contended+0xd40/0xd40
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wait+0x1f6/0x4e0
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:15 kernel:  schedule+0xea/0x290
Aug 06 09:30:15 kernel:  percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wake_function+0x480/0x480
Aug 06 09:30:15 kernel:  ? percpu_free_rwsem+0xa0/0xa0
Aug 06 09:30:15 kernel:  __percpu_down_read+0xda/0x110
Aug 06 09:30:15 kernel:  start_transaction+0xb6e/0x10b0 [btrfs]
Aug 06 09:30:15 kernel:  btrfs_join_transaction+0x1d/0x20 [btrfs]
Aug 06 09:30:15 kernel:  btrfs_finish_ordered_io.isra.0+0x755/0x1be0 [btrfs]
Aug 06 09:30:15 kernel:  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
Aug 06 09:30:15 kernel:  ? btrfs_unlink_subvol+0xda0/0xda0 [btrfs]
Aug 06 09:30:15 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:15 kernel:  ? lock_acquire+0x1a1/0x4b0
Aug 06 09:30:15 kernel:  ? process_one_work+0x719/0x1430
Aug 06 09:30:15 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:15 kernel:  finish_ordered_fn+0x15/0x20 [btrfs]
Aug 06 09:30:15 kernel:  btrfs_work_helper+0x1af/0xa50 [btrfs]
Aug 06 09:30:15 kernel:  ? process_one_work+0x6fa/0x1430
Aug 06 09:30:15 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:15 kernel:  ? _raw_spin_unlock_irq+0x28/0x40
Aug 06 09:30:15 kernel:  process_one_work+0x7cd/0x1430
Aug 06 09:30:15 kernel:  ? pwq_dec_nr_in_flight+0x290/0x290
Aug 06 09:30:15 kernel:  worker_thread+0x59b/0x1050
Aug 06 09:30:15 kernel:  ? _raw_spin_unlock_irqrestore+0x31/0x40
Aug 06 09:30:15 kernel:  ? process_one_work+0x1430/0x1430
Aug 06 09:30:15 kernel:  kthread+0x38c/0x460
Aug 06 09:30:15 kernel:  ? set_kthread_struct+0x110/0x110
Aug 06 09:30:15 kernel:  ret_from_fork+0x22/0x30
Aug 06 09:30:15 kernel: INFO: task fstest:4412 blocked for more than 123 seconds.
Aug 06 09:30:15 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:15 kernel: task:fstest          state:D stack:    0 pid: 4412 ppid:  4409 flags:0x00000000
Aug 06 09:30:15 kernel: Call Trace:
Aug 06 09:30:15 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:15 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:15 kernel:  ? mark_held_locks+0xad/0xf0
Aug 06 09:30:15 kernel:  ? lock_contended+0xd40/0xd40
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wait+0x1f6/0x4e0
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:15 kernel:  schedule+0xea/0x290
Aug 06 09:30:15 kernel:  percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:15 kernel:  ? percpu_rwsem_wake_function+0x480/0x480
Aug 06 09:30:15 kernel:  ? percpu_free_rwsem+0xa0/0xa0
Aug 06 09:30:15 kernel:  __percpu_down_read+0xda/0x110
Aug 06 09:30:15 kernel:  mnt_want_write+0x261/0x300
Aug 06 09:30:15 kernel:  path_openat+0x2055/0x2610
Aug 06 09:30:15 kernel:  ? path_lookupat+0x6b0/0x6b0
Aug 06 09:30:15 kernel:  ? __fput+0x345/0x860
Aug 06 09:30:15 kernel:  ? ____fput+0xe/0x10
Aug 06 09:30:15 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:15 kernel:  ? sched_clock+0x9/0x10
Aug 06 09:30:15 kernel:  ? sched_clock_cpu+0x18/0x170
Aug 06 09:30:15 kernel:  ? find_held_lock+0x3c/0x130
Aug 06 09:30:15 kernel:  do_filp_open+0x1aa/0x3e0
Aug 06 09:30:15 kernel:  ? may_open_dev+0xd0/0xd0
Aug 06 09:30:15 kernel:  ? __kasan_check_read+0x11/0x20
Aug 06 09:30:15 kernel:  ? do_raw_spin_unlock+0x5c/0x200
Aug 06 09:30:15 kernel:  ? _raw_spin_unlock+0x23/0x30
Aug 06 09:30:15 kernel:  do_sys_openat2+0x12e/0x3b0
Aug 06 09:30:15 kernel:  ? __call_rcu_nocb_wake.part.0+0x790/0x790
Aug 06 09:30:15 kernel:  ? build_open_flags+0x440/0x440
Aug 06 09:30:15 kernel:  __x64_sys_openat+0x128/0x210
Aug 06 09:30:15 kernel:  ? __x64_sys_open+0x1c0/0x1c0
Aug 06 09:30:16 kernel:  ? do_syscall_64+0x16/0x90
Aug 06 09:30:16 kernel:  ? syscall_enter_from_user_mode+0x25/0x80
Aug 06 09:30:16 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:16 kernel:  do_syscall_64+0x3b/0x90
Aug 06 09:30:16 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 06 09:30:16 kernel: RIP: 0033:0x7f673511b767
Aug 06 09:30:16 kernel: RSP: 002b:00007ffd83ac9560 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Aug 06 09:30:16 kernel: RAX: ffffffffffffffda RBX: 00007ffd83ac9a50 RCX: 00007f673511b767
Aug 06 09:30:16 kernel: RDX: 0000000000000242 RSI: 00007ffd83ac9a50 RDI: 00000000ffffff9c
Aug 06 09:30:16 kernel: RBP: 00007ffd83ac9a50 R08: 0000000000000000 R09: 0000000000000029
Aug 06 09:30:16 kernel: R10: 00000000000001a4 R11: 0000000000000246 R12: 0000000000000242
Aug 06 09:30:16 kernel: R13: 0000000000000022 R14: 0000000000000400 R15: 0000000000000400
Aug 06 09:30:16 kernel: INFO: task fstest:4413 blocked for more than 124 seconds.
Aug 06 09:30:16 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:16 kernel: task:fstest          state:D stack:    0 pid: 4413 ppid:  4409 flags:0x00000000
Aug 06 09:30:16 kernel: Call Trace:
Aug 06 09:30:16 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:16 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:16 kernel:  ? mark_held_locks+0xad/0xf0
Aug 06 09:30:16 kernel:  ? lock_contended+0xd40/0xd40
Aug 06 09:30:16 kernel:  ? percpu_rwsem_wait+0x1f6/0x4e0
Aug 06 09:30:16 kernel:  ? percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:16 kernel:  ? percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:16 kernel:  schedule+0xea/0x290
Aug 06 09:30:16 kernel:  percpu_rwsem_wait+0x228/0x4e0
Aug 06 09:30:16 kernel:  ? percpu_rwsem_wake_function+0x480/0x480
Aug 06 09:30:16 kernel:  ? percpu_free_rwsem+0xa0/0xa0
Aug 06 09:30:16 kernel:  __percpu_down_read+0xda/0x110
Aug 06 09:30:16 kernel:  mnt_want_write+0x261/0x300
Aug 06 09:30:16 kernel:  path_openat+0x2055/0x2610
Aug 06 09:30:16 kernel:  ? path_lookupat+0x6b0/0x6b0
Aug 06 09:30:16 kernel:  ? __fput+0x345/0x860
Aug 06 09:30:16 kernel:  ? ____fput+0xe/0x10
Aug 06 09:30:16 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:16 kernel:  ? sched_clock+0x9/0x10
Aug 06 09:30:16 kernel:  ? sched_clock_cpu+0x18/0x170
Aug 06 09:30:16 kernel:  ? find_held_lock+0x3c/0x130
Aug 06 09:30:16 kernel:  do_filp_open+0x1aa/0x3e0
Aug 06 09:30:16 kernel:  ? may_open_dev+0xd0/0xd0
Aug 06 09:30:16 kernel:  ? __kasan_check_read+0x11/0x20
Aug 06 09:30:16 kernel:  ? do_raw_spin_unlock+0x5c/0x200
Aug 06 09:30:16 kernel:  ? _raw_spin_unlock+0x23/0x30
Aug 06 09:30:16 kernel:  do_sys_openat2+0x12e/0x3b0
Aug 06 09:30:16 kernel:  ? __call_rcu_nocb_wake.part.0+0x790/0x790
Aug 06 09:30:16 kernel:  ? build_open_flags+0x440/0x440
Aug 06 09:30:16 kernel:  __x64_sys_openat+0x128/0x210
Aug 06 09:30:16 kernel:  ? __x64_sys_open+0x1c0/0x1c0
Aug 06 09:30:16 kernel:  ? do_syscall_64+0x16/0x90
Aug 06 09:30:16 kernel:  ? syscall_enter_from_user_mode+0x25/0x80
Aug 06 09:30:16 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:17 kernel:  do_syscall_64+0x3b/0x90
Aug 06 09:30:17 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 06 09:30:17 kernel: RIP: 0033:0x7f673511b767
Aug 06 09:30:17 kernel: RSP: 002b:00007ffd83ac9560 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Aug 06 09:30:17 kernel: RAX: ffffffffffffffda RBX: 00007ffd83ac9a50 RCX: 00007f673511b767
Aug 06 09:30:17 kernel: RDX: 0000000000000242 RSI: 00007ffd83ac9a50 RDI: 00000000ffffff9c
Aug 06 09:30:17 kernel: RBP: 00007ffd83ac9a50 R08: 0000000000000000 R09: 0000000000000029
Aug 06 09:30:17 kernel: R10: 00000000000001a4 R11: 0000000000000246 R12: 0000000000000242
Aug 06 09:30:17 kernel: R13: 0000000000000022 R14: 0000000000000400 R15: 0000000000000001
Aug 06 09:30:17 kernel: INFO: task kworker/u64:7:4442 blocked for more than 125 seconds.
Aug 06 09:30:17 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:17 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:17 kernel: task:kworker/u64:7   state:D stack:    0 pid: 4442 ppid:     2 flags:0x00004000
Aug 06 09:30:17 kernel: Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
Aug 06 09:30:17 kernel: Call Trace:
Aug 06 09:30:17 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:17 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:17 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:17 kernel:  ? _raw_spin_unlock_irqrestore+0x31/0x40
Aug 06 09:30:17 kernel:  schedule+0xea/0x290
Aug 06 09:30:17 kernel:  btrfs_start_ordered_extent+0x2d0/0x470 [btrfs]
Aug 06 09:30:17 kernel:  ? btrfs_wait_ordered_roots+0x6c0/0x6c0 [btrfs]
Aug 06 09:30:17 kernel:  ? finish_wait+0x280/0x280
Aug 06 09:30:17 kernel:  ? process_one_work+0x719/0x1430
Aug 06 09:30:17 kernel:  btrfs_run_ordered_extent_work+0x1e/0x30 [btrfs]
Aug 06 09:30:17 kernel:  btrfs_work_helper+0x1af/0xa50 [btrfs]
Aug 06 09:30:17 kernel:  ? process_one_work+0x6fa/0x1430
Aug 06 09:30:17 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:17 kernel:  ? _raw_spin_unlock_irq+0x28/0x40
Aug 06 09:30:17 kernel:  process_one_work+0x7cd/0x1430
Aug 06 09:30:17 kernel:  ? pwq_dec_nr_in_flight+0x290/0x290
Aug 06 09:30:17 kernel:  worker_thread+0x59b/0x1050
Aug 06 09:30:17 kernel:  ? _raw_spin_unlock_irqrestore+0x31/0x40
Aug 06 09:30:17 kernel:  ? process_one_work+0x1430/0x1430
Aug 06 09:30:17 kernel:  kthread+0x38c/0x460
Aug 06 09:30:17 kernel:  ? set_kthread_struct+0x110/0x110
Aug 06 09:30:17 kernel:  ret_from_fork+0x22/0x30
Aug 06 09:30:17 kernel: INFO: task fsstress:4491 blocked for more than 125 seconds.
Aug 06 09:30:17 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:17 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:17 kernel: task:fsstress        state:D stack:    0 pid: 4491 ppid:  4490 flags:0x00004000
Aug 06 09:30:17 kernel: Call Trace:
Aug 06 09:30:17 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:17 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:17 kernel:  ? __lock_acquire+0x1772/0x5a00
Aug 06 09:30:17 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:17 kernel:  ? sched_clock+0x9/0x10
Aug 06 09:30:17 kernel:  schedule+0xea/0x290
Aug 06 09:30:17 kernel:  schedule_timeout+0x19a/0x250
Aug 06 09:30:18 kernel:  ? usleep_range+0x180/0x180
Aug 06 09:30:18 kernel:  ? mark_held_locks+0xad/0xf0
Aug 06 09:30:18 kernel:  ? lock_contended+0xd40/0xd40
Aug 06 09:30:18 kernel:  ? wait_for_completion+0x192/0x2a0
Aug 06 09:30:18 kernel:  ? _raw_spin_unlock_irq+0x28/0x40
Aug 06 09:30:18 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:18 kernel:  wait_for_completion+0x19a/0x2a0
Aug 06 09:30:18 kernel:  ? bit_wait_timeout+0x170/0x170
Aug 06 09:30:18 kernel:  ? __kasan_check_read+0x11/0x20
Aug 06 09:30:18 kernel:  ? do_raw_spin_unlock+0x5c/0x200
Aug 06 09:30:18 kernel:  btrfs_wait_ordered_extents+0x7ff/0xf10 [btrfs]
Aug 06 09:30:18 kernel:  ? btrfs_remove_ordered_extent+0x950/0x950 [btrfs]
Aug 06 09:30:18 kernel:  ? lock_downgrade+0x7b0/0x7b0
Aug 06 09:30:18 kernel:  ? lock_contended+0xd40/0xd40
Aug 06 09:30:18 kernel:  ? __kasan_check_read+0x11/0x20
Aug 06 09:30:18 kernel:  ? do_raw_spin_unlock+0x5c/0x200
Aug 06 09:30:18 kernel:  btrfs_wait_ordered_roots+0x413/0x6c0 [btrfs]
Aug 06 09:30:18 kernel:  ? btrfs_wait_ordered_extents+0xf10/0xf10 [btrfs]
Aug 06 09:30:18 kernel:  ? iterate_supers+0xd2/0x220
Aug 06 09:30:18 kernel:  ? __kasan_check_read+0x11/0x20
Aug 06 09:30:18 kernel:  ? lock_acquired+0x378/0xc00
Aug 06 09:30:18 kernel:  ? lock_is_held_type+0xa4/0x110
Aug 06 09:30:18 kernel:  btrfs_sync_fs+0xbb/0x510 [btrfs]
Aug 06 09:30:18 kernel:  ? btrfs_freeze+0xa0/0xa0 [btrfs]
Aug 06 09:30:18 kernel:  sync_fs_one_sb+0xdf/0x140
Aug 06 09:30:18 kernel:  ? down_read+0x63/0x90
Aug 06 09:30:18 kernel:  iterate_supers+0x11a/0x220
Aug 06 09:30:18 kernel:  ? vfs_fsync_range+0x220/0x220
Aug 06 09:30:18 kernel:  ksys_sync+0xb0/0x150
Aug 06 09:30:18 kernel:  ? vfs_fsync+0x1e0/0x1e0
Aug 06 09:30:18 kernel:  ? do_syscall_64+0x16/0x90
Aug 06 09:30:18 kernel:  ? syscall_enter_from_user_mode+0x25/0x80
Aug 06 09:30:18 kernel:  ? trace_hardirqs_on+0x2b/0x120
Aug 06 09:30:18 kernel:  __do_sys_sync+0xe/0x20
Aug 06 09:30:18 kernel:  do_syscall_64+0x3b/0x90
Aug 06 09:30:18 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 06 09:30:18 kernel: RIP: 0033:0x7f93c3557b17
Aug 06 09:30:18 kernel: RSP: 002b:00007ffe11bbb9c8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
Aug 06 09:30:18 kernel: RAX: ffffffffffffffda RBX: 000055aa7b49f540 RCX: 00007f93c3557b17
Aug 06 09:30:18 kernel: RDX: 0000000000000000 RSI: 00000000612a84c7 RDI: 00000000000000b7
Aug 06 09:30:18 kernel: RBP: 0000000000000064 R08: 0000000000000041 R09: 00007ffe11bbb99c
Aug 06 09:30:18 kernel: R10: 00007ffe11bbb5c7 R11: 0000000000000206 R12: 00000000000000b7
Aug 06 09:30:18 kernel: R13: 00007ffe11bbba30 R14: 00007ffe11bbb9e6 R15: 000055aa7b48c370
Aug 06 09:30:18 kernel: INFO: task xfs_io:4519 blocked for more than 127 seconds.
Aug 06 09:30:18 kernel:       Not tainted 5.14.0-rc3-BTRFS-ZNS+ #27
Aug 06 09:30:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 06 09:30:18 kernel: task:xfs_io          state:D stack:    0 pid: 4519 ppid:  4515 flags:0x00000000
Aug 06 09:30:18 kernel: Call Trace:
Aug 06 09:30:18 kernel:  __schedule+0x9fb/0x2370
Aug 06 09:30:18 kernel:  ? io_schedule_timeout+0x160/0x160
Aug 06 09:30:19 kernel:  ? mark_held_locks+0xad/0xf0
Aug 06 09:30:19 kernel:  ? lock_contended+0xd40/0xd40
Aug 06 09:30:19 kernel:  ? rwsem_down_write_slowpath+0x887/0x1220
Aug 06 09:30:19 kernel:  schedule+0xea/0x290
Aug 06 09:30:19 kernel:  rwsem_down_write_slowpath+0x924/0x1220
Aug 06 09:30:19 kernel:  ? rwsem_mark_wake+0x930/0x930
Aug 06 09:30:19 kernel:  ? find_held_lock+0x3c/0x130
Aug 06 09:30:19 kernel:  ? lock_contended+0x578/0xd40
Aug 06 09:30:19 kernel:  ? debug_check_no_locks_held+0xa0/0xa0
Aug 06 09:30:19 kernel:  down_write+0x393/0x400
Aug 06 09:30:19 kernel:  ? down_write+0x393/0x400
Aug 06 09:30:19 kernel:  ? down_read_killable+0xb0/0xb0
Aug 06 09:30:19 kernel:  ? selinux_capable+0x49/0x70
Aug 06 09:30:19 kernel:  ? security_capable+0x5f/0xa0
Aug 06 09:30:19 kernel:  thaw_super+0x17/0x30
Aug 06 09:30:19 kernel:  do_vfs_ioctl+0xec6/0x1450
Aug 06 09:30:19 kernel:  ? vfs_fileattr_set+0xa30/0xa30
Aug 06 09:30:19 kernel:  ? selinux_file_ioctl+0x374/0x510
Aug 06 09:30:19 kernel:  ? selinux_inode_getsecctx+0x80/0x80
Aug 06 09:30:19 kernel:  ? rhashtable_jhash2+0x278/0x2c0
Aug 06 09:30:19 kernel:  ? __do_sys_sysinfo+0x7b/0xd0
Aug 06 09:30:19 kernel:  ? __ia32_compat_sys_sysinfo+0x40/0x40
Aug 06 09:30:19 kernel:  ? handle_mm_fault+0x465/0x690
Aug 06 09:30:19 kernel:  ? security_file_ioctl+0x55/0x90
Aug 06 09:30:19 kernel:  __x64_sys_ioctl+0xd3/0x1a0
Aug 06 09:30:19 kernel:  do_syscall_64+0x3b/0x90
Aug 06 09:30:19 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 06 09:30:19 kernel: RIP: 0033:0x7f8023b0dcc7
Aug 06 09:30:19 kernel: RSP: 002b:00007fffeeb21aa8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
Aug 06 09:30:19 kernel: RAX: ffffffffffffffda RBX: 000055c53afac3e0 RCX: 00007f8023b0dcc7
Aug 06 09:30:19 kernel: RDX: 00007fffeeb21abc RSI: ffffffffc0045878 RDI: 0000000000000003
Aug 06 09:30:19 kernel: RBP: 0000000000000008 R08: 000055c53afad8f0 R09: 00007f8023bd7be0
Aug 06 09:30:19 kernel: R10: fffffffffffff28a R11: 0000000000000202 R12: 0000000000000001
Aug 06 09:30:19 kernel: R13: 000055c53afad780 R14: 000055c53afad8d0 R15: 000055c53afad8f0
Aug 06 09:30:19 kernel: 
                        Showing all locks held in the system:
Aug 06 09:30:19 kernel: 3 locks held by kworker/10:1/181:
Aug 06 09:30:19 kernel: 1 lock held by khungtaskd/209:
Aug 06 09:30:19 kernel:  #0: ffffffff847e8640 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x5f/0x288
Aug 06 09:30:19 kernel: 4 locks held by kworker/u64:0/2205:
Aug 06 09:30:19 kernel:  #0: ffff888100122148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x6e8/0x1430
Aug 06 09:30:19 kernel:  #1: ffffc9000dcbfda8 ((work_completion)(&fs_info->reclaim_bgs_work)){+.+.}-{0:0}, at: process_one_work+0x719/0x1430
Aug 06 09:30:19 kernel:  #2: ffff8881aedba368 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_reclaim_bgs_work+0x12b/0x720 [btrfs]
Aug 06 09:30:19 kernel:  #3: ffff8881aedb88e0 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x33d/0x950 [btrfs]
Aug 06 09:30:19 kernel: 3 locks held by kworker/u64:6/3435:
Aug 06 09:30:20 kernel:  #0: ffff888151f76148 ((wq_completion)btrfs-endio-write){+.+.}-{0:0}, at: process_one_work+0x6e8/0x1430
Aug 06 09:30:20 kernel:  #1: ffffc9000ca7fda8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x719/0x1430
Aug 06 09:30:20 kernel:  #2: ffff8881dd2906b0 (sb_internal#3){++++}-{0:0}, at: btrfs_join_transaction+0x1d/0x20 [btrfs]
Aug 06 09:30:20 kernel: 2 locks held by docker/3793:
Aug 06 09:30:20 kernel:  #0: ffff888234dd20a0 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0x38/0x40
Aug 06 09:30:20 kernel:  #1: ffffc900006b82f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xa65/0x11b0
Aug 06 09:30:20 kernel: 1 lock held by fstest/4412:
Aug 06 09:30:20 kernel:  #0: ffff8881dd290490 (sb_writers#20){++++}-{0:0}, at: path_openat+0x2055/0x2610
Aug 06 09:30:20 kernel: 1 lock held by fstest/4413:
Aug 06 09:30:20 kernel:  #0: ffff8881dd290490 (sb_writers#20){++++}-{0:0}, at: path_openat+0x2055/0x2610
Aug 06 09:30:20 kernel: 2 locks held by kworker/u64:7/4442:
Aug 06 09:30:20 kernel:  #0: ffff888151f72948 ((wq_completion)btrfs-flush_delalloc){+.+.}-{0:0}, at: process_one_work+0x6e8/0x1430
Aug 06 09:30:20 kernel:  #1: ffffc9000f367da8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x719/0x1430
Aug 06 09:30:20 kernel: 3 locks held by fsstress/4491:
Aug 06 09:30:20 kernel:  #0: ffff8881dd2900e8 (&type->s_umount_key#73){++++}-{3:3}, at: iterate_supers+0xd2/0x220
Aug 06 09:30:20 kernel:  #1: ffff8881aedb8ac8 (&fs_info->ordered_operations_mutex){+.+.}-{3:3}, at: btrfs_wait_ordered_roots+0xca/0x6c0 [btrfs]
Aug 06 09:30:20 kernel:  #2: ffff88820413c8d8 (&root->ordered_extent_mutex){+.+.}-{3:3}, at: btrfs_wait_ordered_extents+0x12d/0xf10 [btrfs]
Aug 06 09:30:20 kernel: 1 lock held by xfs_io/4519:
Aug 06 09:30:20 kernel:  #0: ffff8881dd2900e8 (&type->s_umount_key#73){++++}-{3:3}, at: thaw_super+0x17/0x30
Aug 06 09:30:20 kernel: 
Aug 06 09:30:20 kernel: =============================================
Aug 06 09:30:20 kernel: Kernel panic - not syncing: hung_task: blocked tasks

naota · 2021-08-06T10:12:49Z

xfs_io:4519 is trying to thaw the FS and trying to take s_umount. But, it's already taken by fsstress/4491 and fsstress/4491 is waiting for ordered extents.

FS is already frozen, but how come these waited extents exist??

naota · 2021-08-07T13:36:14Z

I could fix it with this patch.

commit 1b614c417a2fba624c186c281f38e4c35a163936
Author: Naohiro Aota <naohiro.aota@wdc.com>
Date:   Mon Aug 9 09:10:14 2021 +0900

    btrfs: zoned: mark relocation as sb_*_write

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a3b830b8410a..9e833d74e8dc 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1560,7 +1560,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 				bg->start, div_u64(bg->used * 100, bg->length),
 				div64_u64(zone_unusable * 100, bg->length));
 		trace_btrfs_reclaim_block_group(bg);
+		sb_start_write(fs_info->sb);
 		ret = btrfs_relocate_chunk(fs_info, bg->start);
+		sb_end_write(fs_info->sb);
 		if (ret && ret != -EAGAIN)
 			btrfs_err(fs_info, "error relocating chunk %llu",
 				  bg->start);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 536e60c6ade3..102419997623 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -8167,7 +8167,9 @@ static int relocating_repair_kthread(void *data)
 	btrfs_info(fs_info,
 		   "zoned: relocating block group %llu to repair IO failure",
 		   target);
+	sb_start_write(fs_info->sb);
 	ret = btrfs_relocate_chunk(fs_info, target);
+	sb_end_write(fs_info->sb);
 
 out:
 	if (cache)

morbidrsa · 2021-08-09T12:06:46Z

Patch looks good to me, thanks.

I'm just wondering if we should move the sb_start_write() calls into btrfs_relocate_chunk() or at least document the dependency (ASSERT()??) in btrfs_relocate_chunk()?

EDIT:
Something like this:

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 536e60c6ade3..92b6b1178cb1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3146,6 +3146,11 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
         */
        lockdep_assert_held(&fs_info->reclaim_bgs_lock);
 
+       /*
+        * Prevent races with FS freezing.
+        */
+       ASSERT(fs_info->sb->s_writers.frozen >= SB_FREEZE_WRITE);
+
        /* step one, relocate all the extents inside this chunk */
        btrfs_scrub_pause(fs_info);
        ret = btrfs_relocate_block_group(fs_info, chunk_offset);

naota · 2021-08-10T02:42:35Z

ASSERT() would be nice. I'm considering some more points to be ASSERTed (e.g. defrag, scrub).

Moving sb_start_write() will cause double-lock as it's already taken by mnt_want_write_file()

naota · 2021-08-10T02:44:52Z

+       ASSERT(fs_info->sb->s_writers.frozen >= SB_FREEZE_WRITE);

This should be s_writers.frozen == SB_UNFROZEN

morbidrsa · 2021-08-10T15:13:34Z

```diff
+       ASSERT(fs_info->sb->s_writers.frozen >= SB_FREEZE_WRITE);
This should be s_writers.frozen == SB_UNFROZEN

Ah now I get it, we're ASSERT()ing two different things. Your ASSERT() is that no-one calls into btrfs_relocate_chunk() when the FS is frozen and the other one is ASSERT()ing that sb_start_write() has been called.

naota · 2021-12-17T07:45:37Z

Now, we have the following lockdep error.

We should take sb_start_write before fs_info->reclaim_bgs_lock. But, it will make the critical section of btrfs_reclaim_bgs_work really large...?

[ 1280.552697][T10865] ======================================================
[ 1280.554381][T10865] WARNING: possible circular locking dependency detected
[ 1280.556065][T10865] 5.16.0-rc5+ #58 Not tainted
[ 1280.557184][T10865] ------------------------------------------------------
[ 1280.558850][T10865] kworker/u8:3/10865 is trying to acquire lock:
[ 1280.560337][T10865] ffff88811e842490 (sb_writers#10){.+.+}-{0:0}, at: process_one_work+0x826/0x14e0
[ 1280.562576][T10865] 
[ 1280.562576][T10865] but task is already holding lock:
[ 1280.564356][T10865] ffff888108a32300 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_reclaim_bgs_work+0x9b/0x4b0 [btrfs]
[ 1280.567155][T10865] 
[ 1280.567155][T10865] which lock already depends on the new lock.
[ 1280.567155][T10865] 
[ 1280.569665][T10865] 
[ 1280.569665][T10865] the existing dependency chain (in reverse order) is:
[ 1280.571813][T10865] 
[ 1280.571813][T10865] -> #1 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}:
[ 1280.573879][T10865]        __mutex_lock+0x15d/0x1470
[ 1280.575113][T10865]        mutex_lock_nested+0x1b/0x20
[ 1280.576391][T10865]        btrfs_balance+0x111d/0x3190 [btrfs]
[ 1280.577914][T10865]        btrfs_ioctl_balance+0x59d/0x770 [btrfs]
[ 1280.579548][T10865]        btrfs_ioctl+0xbfc/0x89b0 [btrfs]
[ 1280.580985][T10865]        __x64_sys_ioctl+0x891/0x1790
[ 1280.582215][T10865]        do_syscall_64+0x3b/0x90
[ 1280.583367][T10865]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1280.584837][T10865] 
[ 1280.584837][T10865] -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 1280.586486][T10865]        __lock_acquire+0x2c1b/0x5a10
[ 1280.587667][T10865]        lock_acquire+0x1b1/0x4f0
[ 1280.588771][T10865]        btrfs_reclaim_bgs_work.cold+0x17e/0x3c6 [btrfs]
[ 1280.590343][T10865]        process_one_work+0x826/0x14e0
[ 1280.591490][T10865]        worker_thread+0x59b/0x1050
[ 1280.592599][T10865]        kthread+0x38f/0x460
[ 1280.593523][T10865]        ret_from_fork+0x22/0x30
[ 1280.594507][T10865] 
[ 1280.594507][T10865] other info that might help us debug this:
[ 1280.594507][T10865] 
[ 1280.596657][T10865]  Possible unsafe locking scenario:
[ 1280.596657][T10865] 
[ 1280.598206][T10865]        CPU0                    CPU1
[ 1280.599299][T10865]        ----                    ----
[ 1280.600364][T10865]   lock(&fs_info->reclaim_bgs_lock);
[ 1280.601375][T10865]                                lock(sb_writers#10);
[ 1280.602694][T10865]                                lock(&fs_info->reclaim_bgs_lock);
[ 1280.604280][T10865]   lock(sb_writers#10);
[ 1280.605126][T10865] 
[ 1280.605126][T10865]  *** DEADLOCK ***
[ 1280.605126][T10865] 
[ 1280.606761][T10865] 3 locks held by kworker/u8:3/10865:
[ 1280.607816][T10865]  #0: ffff8881000d9948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x740/0x14e0
[ 1280.609606][T10865]  #1: ffffc90009927da8 ((work_completion)(&fs_info->reclaim_bgs_work)){+.+.}-{0:0}, at: process_one_work+0x770/0x14e0
[ 1280.611637][T10865]  #2: ffff888108a32300 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_reclaim_bgs_work+0x9b/0x4b0 [btrfs]
[ 1280.613757][T10865]

There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extent never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit add sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Additionally, add an ASSERT in btrfs_relocate_chunk() to check it is properly called. Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") Cc: stable@vger.kernel.org # 5.13+ Link: naota#56 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extent never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit add sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Additionally, add an ASSERT in btrfs_relocate_chunk() to check it is properly called. Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>

There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Additionally, add an ASSERT in btrfs_relocate_chunk() to check it is properly called. Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

kdave · 2022-02-23T10:38:21Z

btrfs/125, does not happen on all testing VMs, so it's probably some race

[ 2927.013859] BTRFS warning (device vdc): devid 1 uuid 4335c7a6-652c-4389-8ea9-270c00fa9880 is missing
[ 2927.017693] BTRFS warning (device vdc): devid 1 uuid 4335c7a6-652c-4389-8ea9-270c00fa9880 is missing
[ 2927.022921] BTRFS info (device vdc): bdev /dev/vdd errs: wr 0, rd 0, flush 0, corrupt 6000, gen 0
[ 2927.031780] BTRFS info (device vdc): checking UUID tree
[ 2927.045348] BTRFS: error (device vdc: state X) in __btrfs_free_extent:3199: errno=-5 IO failure
[ 2927.049729] BTRFS info (device vdc: state EX): forced readonly
[ 2927.051787] BTRFS: error (device vdc: state EX) in btrfs_run_delayed_refs:2159: errno=-5 IO failure
[ 2927.058758] BTRFS info (device vdc: state EX): balance: resume -dusage=90 -musage=90 -susage=90
[ 2927.062457] assertion failed: sb_write_started(fs_info->sb), in fs/btrfs/volumes.c:3244
[ 2927.066121] ------------[ cut here ]------------
[ 2927.067682] kernel BUG at fs/btrfs/ctree.h:3552!
[ 2927.069214] invalid opcode: 0000 [#1] PREEMPT SMP
[ 2927.070926] CPU: 2 PID: 22817 Comm: btrfs-balance Not tainted 5.17.0-rc5-default+ #1632
[ 2927.075299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[ 2927.080897] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[ 2927.092652] RSP: 0018:ffffaed9c610fdc0 EFLAGS: 00010246
[ 2927.095227] RAX: 000000000000004b RBX: ffffa13a873db000 RCX: 0000000000000000
[ 2927.096898] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff
[ 2927.100514] RBP: ffffa13a55324000 R08: 0000000000000003 R09: 0000000000000001
[ 2927.102518] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa13a6922f098
[ 2927.104330] R13: 000000008cfa0000 R14: ffffa13a553262a0 R15: ffffa13a873db000
[ 2927.106025] FS:  0000000000000000(0000) GS:ffffa13abda00000(0000) knlGS:0000000000000000
[ 2927.108652] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2927.110568] CR2: 000055fdf2a94fd0 CR3: 000000005d012005 CR4: 0000000000170ea0
[ 2927.112167] Call Trace:
[ 2927.112801]  <TASK>
[ 2927.113212]  btrfs_relocate_chunk.cold+0x42/0x67 [btrfs]
[ 2927.114328]  __btrfs_balance+0x2ea/0x490 [btrfs]
[ 2927.114871] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 131072 csum 0x7e797e3e expected csum 0x8941f998 mirror 2
[ 2927.115469]  btrfs_balance+0x4ed/0x7e0 [btrfs]
[ 2927.118802] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 139264 csum 0x27df6522 expected csum 0x8941f998 mirror 2
[ 2927.119691]  ? btrfs_balance+0x7e0/0x7e0 [btrfs]
[ 2927.123158] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 143360 csum 0x9f144c35 expected csum 0x8941f998 mirror 2
[ 2927.123965]  balance_kthread+0x37/0x50 [btrfs]
[ 2927.127299] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 147456 csum 0x1027ab9a expected csum 0x8941f998 mirror 2
[ 2927.128016]  kthread+0xea/0x110
[ 2927.128023]  ? kthread_complete_and_exit+0x20/0x20
[ 2927.128027]  ret_from_fork+0x1f/0x30
[ 2927.128031]  </TASK>
[ 2927.128032] Modules linked in:
[ 2927.131390] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 155648 csum 0x428b86d5 expected csum 0x8941f998 mirror 2
[ 2927.131400] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 163840 csum 0x8fff7df2 expected csum 0x8941f998 mirror 2
[ 2927.131401] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 159744 csum 0x9893a835 expected csum 0x8941f998 mirror 2
[ 2927.131416] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 180224 csum 0x83d83877 expected csum 0x8941f998 mirror 2
[ 2927.131832] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 524288 csum 0x1a0c8fd4 expected csum 0x8941f998 mirror 2
[ 2927.132128] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 540672 csum 0xcaaf83cc expected csum 0x8941f998 mirror 2
[ 2927.133105]  dm_flakey dm_mod btrfs blake2b_generic libcrc32c crc32c_intel xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash loop
[ 2927.144290] ---[ end trace 0000000000000000 ]---
[ 2927.145080] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[ 2927.147738] RSP: 0018:ffffaed9c610fdc0 EFLAGS: 00010246
[ 2927.148220] RAX: 000000000000004b RBX: ffffa13a873db000 RCX: 0000000000000000
[ 2927.149126] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff
[ 2927.150057] RBP: ffffa13a55324000 R08: 0000000000000003 R09: 0000000000000001
[ 2927.150676] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa13a6922f098
[ 2927.151297] R13: 000000008cfa0000 R14: ffffa13a553262a0 R15: ffffa13a873db000
[ 2927.152529] FS:  0000000000000000(0000) GS:ffffa13abda00000(0000) knlGS:0000000000000000
[ 2927.153646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2927.154280] CR2: 000055fdf2a94fd0 CR3: 000000005d012005 CR4: 0000000000170ea0

There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Additionally, add an ASSERT in btrfs_relocate_chunk() to check it is properly called. Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota/linux#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota/linux#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota/linux#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: https://bugs.launchpad.net/bugs/1968984 commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota/linux#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/1969110 commit ca5e4ea upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: 18bb8bb ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: naota/linux#56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 68a8120e1647b6c9cf82687d5f7c6a96ee3b8e97) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>

naota self-assigned this Sep 1, 2021

naota added the bug Something isn't working label Sep 1, 2021

naota added this to To do in Maintenance via automation Sep 1, 2021

naota moved this from To do to In progress in Maintenance Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic/068 hung_task while thawing file system #56

generic/068 hung_task while thawing file system #56

naota commented Aug 6, 2021

naota commented Aug 6, 2021

naota commented Aug 6, 2021

naota commented Aug 7, 2021 •

edited

Loading

morbidrsa commented Aug 9, 2021 •

edited

Loading

naota commented Aug 10, 2021

naota commented Aug 10, 2021 •

edited by morbidrsa

Loading

morbidrsa commented Aug 10, 2021

naota commented Dec 17, 2021

kdave commented Feb 23, 2022 •

edited

Loading

generic/068 hung_task while thawing file system #56

generic/068 hung_task while thawing file system #56

Comments

naota commented Aug 6, 2021

naota commented Aug 6, 2021

naota commented Aug 6, 2021

naota commented Aug 7, 2021 • edited Loading

morbidrsa commented Aug 9, 2021 • edited Loading

naota commented Aug 10, 2021

naota commented Aug 10, 2021 • edited by morbidrsa Loading

morbidrsa commented Aug 10, 2021

naota commented Dec 17, 2021

kdave commented Feb 23, 2022 • edited Loading

naota commented Aug 7, 2021 •

edited

Loading

morbidrsa commented Aug 9, 2021 •

edited

Loading

naota commented Aug 10, 2021 •

edited by morbidrsa

Loading

kdave commented Feb 23, 2022 •

edited

Loading