Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circular locking dependency in Btrfs-transaction #42

Open
morbidrsa opened this issue May 7, 2021 · 0 comments
Open

Circular locking dependency in Btrfs-transaction #42

morbidrsa opened this issue May 7, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@morbidrsa
Copy link
Collaborator

[117889.200224] ======================================================
[117889.206570] WARNING: possible circular locking dependency detected
[117889.212934] 5.12.0-git+ torvalds#742 Tainted: G W
[117889.218405] ------------------------------------------------------
[117889.224762] btrfs-transacti/30008 is trying to acquire lock:
[117889.230594] ffff9edbc4f4a200 (&tree->lock#2){+.+.}-{2:2}, at: find_first_extent_bit+0x32/0x150 [btrfs]
[117889.240316]
[117889.240316] but task is already holding lock:
[117889.246416] ffff9edc4d47ac18 (&ctl->tree_lock){+.+.}-{2:2}, at: __btrfs_write_out_cache+0x13e/0x480 [btrfs]
[117889.256525]
[117889.256525] which lock already depends on the new lock.
[117889.256525]
[117889.265070]
[117889.265070] the existing dependency chain (in reverse order) is:
[117889.272838]
[117889.272838] -> #4 (&ctl->tree_lock){+.+.}-{2:2}:
[117889.279214] __lock_acquire+0x582/0xab0
[117889.283751] lock_acquire+0xc2/0x3a0
[117889.288005] _raw_spin_lock+0x31/0x80
[117889.292364] do_allocation.constprop.0+0x30f/0x390 [btrfs]
[117889.298688] find_free_extent+0x425/0xfc0 [btrfs]
[117889.304222] btrfs_reserve_extent+0xc0/0x1c0 [btrfs]
[117889.310004] btrfs_alloc_tree_block+0xc2/0x350 [btrfs]
[117889.315968] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
[117889.322356] __btrfs_cow_block+0x13c/0x5f0 [btrfs]
[117889.327972] btrfs_cow_block+0x10e/0x240 [btrfs]
[117889.333404] btrfs_search_slot+0x687/0xc80 [btrfs]
[117889.339011] btrfs_insert_empty_items+0x58/0xa0 [btrfs]
[117889.345071] btrfs_new_inode+0x240/0x780 [btrfs]
[117889.350528] btrfs_create+0xbb/0x1f0 [btrfs]
[117889.355642] lookup_open+0x368/0x600
[117889.359913] path_openat+0x274/0x900
[117889.364177] do_filp_open+0xa2/0x110
[117889.368452] do_sys_openat2+0x242/0x310
[117889.372973] do_sys_open+0x44/0x80
[117889.377064] do_syscall_64+0x3f/0xb0
[117889.381337] entry_SYSCALL_64_after_hwframe+0x44/0xae
[117889.387068]
[117889.387068] -> #3 (&fs_info->treelog_bg_lock){+.+.}-{2:2}:
[117889.394335] __lock_acquire+0x582/0xab0
[117889.398853] lock_acquire+0xc2/0x3a0
[117889.403114] _raw_spin_lock+0x31/0x80
[117889.407486] do_allocation.constprop.0+0xd3/0x390 [btrfs]
[117889.413720] find_free_extent+0x425/0xfc0 [btrfs]
[117889.419256] btrfs_reserve_extent+0xc0/0x1c0 [btrfs]
[117889.425038] btrfs_alloc_tree_block+0xc2/0x350 [btrfs]
[117889.430999] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
[117889.437396] __btrfs_cow_block+0x13c/0x5f0 [btrfs]
[117889.443002] btrfs_cow_block+0x10e/0x240 [btrfs]
[117889.448437] btrfs_search_slot+0x687/0xc80 [btrfs]
[117889.454055] btrfs_insert_empty_items+0x58/0xa0 [btrfs]
[117889.460112] btrfs_new_inode+0x240/0x780 [btrfs]
[117889.465571] btrfs_create+0xbb/0x1f0 [btrfs]
[117889.470683] lookup_open+0x368/0x600
[117889.474944] path_openat+0x274/0x900
[117889.479210] do_filp_open+0xa2/0x110
[117889.483482] do_sys_openat2+0x242/0x310
[117889.488008] do_sys_open+0x44/0x80
[117889.492114] do_syscall_64+0x3f/0xb0
[117889.496369] entry_SYSCALL_64_after_hwframe+0x44/0xae
[117889.502119]
[117889.502119] -> #2 (&cache->lock){+.+.}-{2:2}:
[117889.508239] __lock_acquire+0x582/0xab0
[117889.512760] lock_acquire+0xc2/0x3a0
[117889.517021] _raw_spin_lock+0x31/0x80
[117889.521374] do_allocation.constprop.0+0xcb/0x390 [btrfs]
[117889.527600] find_free_extent+0x425/0xfc0 [btrfs]
[117889.533129] btrfs_reserve_extent+0xc0/0x1c0 [btrfs]
[117889.538916] btrfs_alloc_tree_block+0xc2/0x350 [btrfs]
[117889.544872] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
[117889.551267] __btrfs_cow_block+0x13c/0x5f0 [btrfs]
[117889.556883] btrfs_cow_block+0x10e/0x240 [btrfs]
[117889.562325] btrfs_search_slot+0x687/0xc80 [btrfs]
[117889.567942] btrfs_insert_empty_items+0x58/0xa0 [btrfs]
[117889.574002] btrfs_new_inode+0x240/0x780 [btrfs]
[117889.579469] btrfs_create+0xbb/0x1f0 [btrfs]
[117889.584571] lookup_open+0x368/0x600
[117889.588824] path_openat+0x274/0x900
[117889.593088] do_filp_open+0xa2/0x110
[117889.597354] do_sys_openat2+0x242/0x310
[117889.601870] do_sys_open+0x44/0x80
[117889.605968] do_syscall_64+0x3f/0xb0
[117889.610231] entry_SYSCALL_64_after_hwframe+0x44/0xae
[117889.615991]
[117889.615991] -> #1 (&space_info->lock){+.+.}-{2:2}:
[117889.622536] __lock_acquire+0x582/0xab0
[117889.627054] lock_acquire+0xc2/0x3a0
[117889.631327] _raw_spin_lock+0x31/0x80
[117889.635678] btrfs_block_rsv_release+0x1a1/0x410 [btrfs]
[117889.641842] btrfs_inode_rsv_release+0x48/0x190 [btrfs]
[117889.647927] btrfs_clear_delalloc_extent+0x1f0/0x500 [btrfs]
[117889.654420] clear_state_bit+0x84/0x1b0 [btrfs]
[117889.659800] __clear_extent_bit+0x266/0x610 [btrfs]
[117889.665519] extent_clear_unlock_delalloc+0x41/0x70 [btrfs]
[117889.671943] cow_file_range+0x3d7/0x440 [btrfs]
[117889.677325] run_delalloc_zoned+0x25/0x80 [btrfs]
[117889.682870] btrfs_run_delalloc_range+0x129/0x680 [btrfs]
[117889.689113] writepage_delalloc+0xae/0x160 [btrfs]
[117889.694751] __extent_writepage+0x10a/0x3f0 [btrfs]
[117889.700483] extent_write_cache_pages+0x268/0x480 [btrfs]
[117889.706742] extent_writepages+0x43/0x90 [btrfs]
[117889.712198] do_writepages+0x40/0xe0
[117889.716468] __writeback_single_inode+0x61/0x610
[117889.721792] writeback_sb_inodes+0x20f/0x510
[117889.726748] __writeback_inodes_wb+0x4c/0xe0
[117889.731715] wb_writeback+0x30c/0x4e0
[117889.736073] wb_do_writeback+0x2f4/0x340
[117889.740692] wb_workfn+0x81/0x370
[117889.744706] process_one_work+0x265/0x5e0
[117889.749401] worker_thread+0x50/0x3b0
[117889.753751] kthread+0x124/0x160
[117889.757678] ret_from_fork+0x1f/0x30
[117889.761934]
[117889.761934] -> #0 (&tree->lock#2){+.+.}-{2:2}:
[117889.768132] check_prev_add+0x91/0xc60
[117889.772573] validate_chain+0xa10/0x1e30
[117889.777193] __lock_acquire+0x582/0xab0
[117889.777201] lock_acquire+0xc2/0x3a0
[117889.777206] _raw_spin_lock+0x31/0x80
[117889.790471] find_first_extent_bit+0x32/0x150 [btrfs]
[117889.796362] write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
[117889.803399] __btrfs_write_out_cache+0x262/0x480 [btrfs]
[117889.809552] btrfs_write_out_cache+0x7a/0x100 [btrfs]
[117889.815469] btrfs_write_dirty_block_groups+0x286/0x3c0 [btrfs]
[117889.822258] commit_cowonly_roots+0x1ec/0x2a0 [btrfs]
[117889.828124] btrfs_commit_transaction+0x57d/0xcb0 [btrfs]
[117889.834348] transaction_kthread+0x130/0x1a0 [btrfs]
[117889.840144] kthread+0x124/0x160
[117889.845888] ret_from_fork+0x1f/0x30
[117889.850143]
[117889.850143] other info that might help us debug this:
[117889.850143]
[117889.858521] Chain exists of:
[117889.858521] &tree->lock#2 --> &fs_info->treelog_bg_lock --> &ctl->tree_lock
[117889.858521]
[117889.870394] Possible unsafe locking scenario:
[117889.870394]
[117889.876571] CPU0 CPU1
[117889.881283] ---- ----
[117889.885971] lock(&ctl->tree_lock);
[117889.889713] lock(&fs_info->treelog_bg_lock);
[117889.896845] lock(&ctl->tree_lock);
[117889.903108] lock(&tree->lock#2);
[117889.906690]
[117889.906690] *** DEADLOCK ***
[117889.906690]
[117889.912987] 5 locks held by btrfs-transacti/30008:
[117889.917945] #0: ffff9edbc6d68840 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x56/0x1a0 [btrfs]
[117889.929770] #1: ffff9edbc6d68da8 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_commit_transaction+0x4d4/0xcb0 [btrfs]
[117889.940897] #2: ffff9edbc6d687a0 (&fs_info->tree_log_mutex){+.+.}-{3:3}, at: btrfs_commit_transaction+0x53c/0xcb0 [btrfs]
[117889.952277] #3: ffff9edc4d47ad08 (&ctl->cache_writeout_mutex){+.+.}-{3:3}, at: __btrfs_write_out_cache+0x136/0x480 [btrfs]
[117889.963760] #4: ffff9edc4d47ac18 (&ctl->tree_lock){+.+.}-{2:2}, at: __btrfs_write_out_cache+0x13e/0x480 [btrfs]
[117889.974298]
[117889.974298] stack backtrace:
[117889.978933] CPU: 0 PID: 30008 Comm: btrfs-transacti Tainted: G W 5.12.0-git+ torvalds#742
[117889.987910] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[117889.994607] Call Trace:
[117889.997228] dump_stack+0x6d/0x89
[117890.000720] check_noncircular+0xff/0x110
[117890.004898] ? check_path.constprop.0+0x24/0x50
[117890.009611] ? check_noncircular+0x80/0x110
[117890.013975] check_prev_add+0x91/0xc60
[117890.017897] ? check_prev_add+0xa3/0xc60
[117890.021996] validate_chain+0xa10/0x1e30
[117890.026102] __lock_acquire+0x582/0xab0
[117890.030123] lock_acquire+0xc2/0x3a0
[117890.033867] ? find_first_extent_bit+0x32/0x150 [btrfs]
[117890.039417] ? lock_acquire+0xc2/0x3a0
[117890.043341] ? lock_is_held_type+0x9a/0x110
[117890.047692] ? write_cache_extent_entries+0x130/0x200 [btrfs]
[117890.053794] _raw_spin_lock+0x31/0x80
[117890.057631] ? find_first_extent_bit+0x32/0x150 [btrfs]
[117890.063182] find_first_extent_bit+0x32/0x150 [btrfs]
[117890.068569] write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
[117890.075098] __btrfs_write_out_cache+0x262/0x480 [btrfs]
[117890.080756] btrfs_write_out_cache+0x7a/0x100 [btrfs]
[117890.086163] btrfs_write_dirty_block_groups+0x286/0x3c0 [btrfs]
[117890.092437] ? _raw_spin_unlock+0x1f/0x40
[117890.096606] ? btrfs_run_delayed_refs+0x18a/0x200 [btrfs]
[117890.102306] commit_cowonly_roots+0x1ec/0x2a0 [btrfs]
[117890.107684] btrfs_commit_transaction+0x57d/0xcb0 [btrfs]
[117890.113399] ? start_transaction+0xda/0x760 [btrfs]
[117890.118594] ? lock_release+0x1b0/0x3e0
[117890.122609] transaction_kthread+0x130/0x1a0 [btrfs]
[117890.127877] ? btrfs_cleanup_transaction+0x650/0x650 [btrfs]
[117890.133844] kthread+0x124/0x160
[117890.137252] ? kthread_park+0x90/0x90
[117890.141075] ret_from_fork+0x1f/0x30
[121311.640943] BTRFS info (device nullb0): at unmount dio bytes count 409911296
[122311.431879] BTRFS info (device nullb0): has skinny extents
[122311.447587] BTRFS info (device nullb0): host-managed zoned block device /dev/nullb0, 8 zones of 268435456 bytes
[122311.458261] BTRFS info (device nullb0): zoned mode enabled with zone size 268435456
[122311.468915] BTRFS info (device nullb0): enabling ssd optimizations
[122311.476352] BTRFS info (device nullb0): checking UUID tree
[122335.216131] BTRFS info (device nullb0): has skinny extents
[122335.231773] BTRFS info (device nullb0): host-managed zoned block device /dev/nullb0, 8 zones of 268435456 bytes
[122335.242347] BTRFS info (device nullb0): zoned mode enabled with zone size 268435456
[122335.253489] BTRFS info (device nullb0): enabling ssd optimizations
[122376.470642] print_req_error: 8 callbacks suppressed
[122376.470656] blk_update_request: I/O error, dev nullb0, sector 1048896 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[122376.487046] buffer_io_error: 503 callbacks suppressed
[122376.487049] Buffer I/O error on dev nullb0, logical block 131112, lost async page write
[122376.487096] blk_update_request: I/O error, dev nullb0, sector 1048904 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[122376.511812] Buffer I/O error on dev nullb0, logical block 131113, lost async page write
[122376.511850] blk_update_request: I/O error, dev nullb0, sector 1048912 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[122376.531181] Buffer I/O error on dev nullb0, logical block 131114, lost async page write
[122376.531217] blk_update_request: I/O error, dev nullb0, sector 1048920 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[122376.550549] Buffer I/O error on dev nullb0, logical block 131115, lost async page write
[122376.550586] blk_update_request: I/O error, dev nullb0, sector 1573184 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[122376.569939] Buffer I/O error on dev nullb0, logical block 196648, lost async page write
[122376.569954] Buffer I/O error on dev nullb0, logical block 196649, lost async page write
[122376.569965] Buffer I/O error on dev nullb0, logical block 196650, lost async page write
[122376.569972] Buffer I/O error on dev nullb0, logical block 196651, lost async page write
[122376.569985] Buffer I/O error on dev nullb0, logical block 196652, lost async page write
[122376.569995] Buffer I/O error on dev nullb0, logical block 196653, lost async page write
[122376.570029] blk_update_request: I/O error, dev nullb0, sector 1573472 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[122376.630746] blk_update_request: I/O error, dev nullb0, sector 1573720 op 0x1:(WRITE) flags 0x800 phys_seg 13 prio class 0
[122376.642027] blk_update_request: I/O error, dev nullb0, sector 1573856 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[122376.642293] blk_update_request: I/O error, dev nullb0, sector 1574104 op 0x1:(WRITE) flags 0x800 phys_seg 25 prio class 0
[122376.664789] blk_update_request: I/O error, dev nullb0, sector 1574336 op 0x1:(WRITE) flags 0x800 phys_seg 12 prio class 0
[122434.573282] BTRFS: device fsid fccda1af-97ca-44be-abd4-c9ed4c63be24 devid 1 transid 13 /dev/nullb0 scanned by mount (2443)
[122434.631114] BTRFS error (device nullb0): unrecognized or unsupported super flag: 34359738368
[122434.640113] BTRFS error (device nullb0): dev_item UUID does not match metadata fsid: fccda1af-97ca-44be-abd4-c9ed4c63be24 != 5c337095-a42c-4408-82ea-6435ac2e02cc
[122434.654816] BTRFS error (device nullb0): superblock contains fatal errors
[122434.662017] BTRFS error (device nullb0): open_ctree failed
[122510.271979] loop0: detected capacity change from 0 to 4194304
[122510.333887] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[122795.575302] sysrq: Changing Loglevel
[122795.579189] sysrq: Loglevel set to 9
[123394.006421] sysrq: Changing Loglevel
[123394.010406] sysrq: Loglevel set to 9
[165612.307454] BTRFS: device label ZONED devid 1 transid 6 /dev/nullb0 scanned by mkfs.btrfs (10407)
[165612.316828] BTRFS: device label ZONED devid 2 transid 6 /dev/nullb1 scanned by mkfs.btrfs (10407)
[165644.309598] BTRFS: device label ZONED devid 1 transid 6 /dev/nullb0 scanned by mkfs.btrfs (10427)
[165644.319118] BTRFS: device label ZONED devid 2 transid 6 /dev/nullb1 scanned by mkfs.btrfs (10427)
[165663.165330] BTRFS: device label ZONED devid 1 transid 6 /dev/nullb0 scanned by mkfs.btrfs (10525)
[165663.174636] BTRFS: device label ZONED devid 2 transid 6 /dev/nullb1 scanned by mkfs.btrfs (10525)
[165756.670426] BTRFS info (device nullb0): setting incompat feature flag for COMPRESS_ZSTD (0x10)
[165756.679376] BTRFS info (device nullb0): use zstd compression, level 15
[165756.686142] BTRFS info (device nullb0): using free space tree
[165756.692087] BTRFS info (device nullb0): has skinny extents
[165756.697761] BTRFS info (device nullb0): flagging fs with big metadata feature
[165756.715880] BTRFS info (device nullb0): host-managed zoned block device /dev/nullb0, 8 zones of 268435456 bytes
[165756.726418] BTRFS info (device nullb0): host-managed zoned block device /dev/nullb1, 8 zones of 268435456 bytes
[165756.736828] BTRFS info (device nullb0): zoned mode enabled with zone size 268435456
[165756.746732] BTRFS info (device nullb0): enabling ssd optimizations
[165756.753912] BTRFS info (device nullb0): checking UUID tree

https://susepaste.org/view/raw/45084267

@morbidrsa morbidrsa added the bug Something isn't working label May 7, 2021
naota pushed a commit that referenced this issue Nov 3, 2021
Commit 4dd0d5c ("ice: add lock around Tx timestamp tracker flush")
added a lock around the Tx timestamp tracker flow which is used to
cleanup any left over SKBs and prepare for device removal.

This lock is problematic because it is being held around a call to
ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
write command to firmware. This could lead to a deadlock if the mutex
actually sleeps, and causes the following warning on a kernel with
preemption debugging enabled:

[  715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
[  715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
[  715.435652] INFO: lockdep is turned off.
[  715.439591] Preemption disabled at:
[  715.439594] [<0000000000000000>] 0x0
[  715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G        W  OE     5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
[  715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  715.468483] Call Trace:
[  715.470940]  dump_stack_lvl+0x6a/0x9a
[  715.474613]  ___might_sleep.cold+0x224/0x26a
[  715.478895]  __mutex_lock+0xb3/0x1440
[  715.482569]  ? stack_depot_save+0x378/0x500
[  715.486763]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.494979]  ? kfree+0xc1/0x520
[  715.498128]  ? mutex_lock_io_nested+0x12a0/0x12a0
[  715.502837]  ? kasan_set_free_info+0x20/0x30
[  715.507110]  ? __kasan_slab_free+0x10b/0x140
[  715.511385]  ? slab_free_freelist_hook+0xc7/0x220
[  715.516092]  ? kfree+0xc1/0x520
[  715.519235]  ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.527359]  ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.535133]  ? pci_device_remove+0xab/0x1d0
[  715.539318]  ? __device_release_driver+0x35b/0x690
[  715.544110]  ? driver_detach+0x214/0x2f0
[  715.548035]  ? bus_remove_driver+0x11d/0x2f0
[  715.552309]  ? pci_unregister_driver+0x26/0x250
[  715.556840]  ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.564799]  ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[  715.570554]  ? do_syscall_64+0x3b/0x90
[  715.574303]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  715.579529]  ? start_flush_work+0x542/0x8f0
[  715.583719]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.591923]  ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.599960]  ? wait_for_completion_io+0x250/0x250
[  715.604662]  ? lock_acquire+0x196/0x200
[  715.608504]  ? do_raw_spin_trylock+0xa5/0x160
[  715.612864]  ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.620813]  ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.628497]  ? __debug_check_no_obj_freed+0x1e8/0x3c0
[  715.633550]  ? trace_hardirqs_on+0x1c/0x130
[  715.637748]  ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.646220]  ? do_raw_spin_trylock+0xa5/0x160
[  715.650581]  ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.658797]  ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.667013]  ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.675403]  ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.683440]  ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.691037]  ? _raw_spin_unlock_irqrestore+0x46/0x73
[  715.696005]  pci_device_remove+0xab/0x1d0
[  715.700018]  __device_release_driver+0x35b/0x690
[  715.704637]  driver_detach+0x214/0x2f0
[  715.708389]  bus_remove_driver+0x11d/0x2f0
[  715.712489]  pci_unregister_driver+0x26/0x250
[  715.716857]  ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.724637]  __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[  715.730210]  ? free_module+0x6d0/0x6d0
[  715.733963]  ? task_work_run+0xe1/0x170
[  715.737803]  ? exit_to_user_mode_loop+0x17f/0x1d0
[  715.742509]  ? rcu_read_lock_sched_held+0x12/0x80
[  715.747215]  ? trace_hardirqs_on+0x1c/0x130
[  715.751401]  do_syscall_64+0x3b/0x90
[  715.754981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  715.760033] RIP: 0033:0x7f4dfe59000b
[  715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
[  715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
[  715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
[  715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
[  715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0

Notice that this is the only case where we use the lock in this way. In
the cleanup kthread and work kthread the lock is only taken around the
bit accesses. This was done intentionally to avoid this kind of issue.
The way the lock is used, we only protect ordering of bit sets vs bit
clears. The Tx writers in the hot path don't need to be protected
against the entire kthread loop. The Tx queues threads only need to
ensure that they do not re-use an index that is currently in use. The
cleanup loop does not need to block all new set bits, since it will
re-queue itself if new timestamps are present.

Fix the tracker flow so that it uses the same flow as the standard
cleanup thread. In addition, ensure the in_use bitmap actually gets
cleared properly.

This fixes the warning and also avoids the potential deadlock that might
have occurred otherwise.

Fixes: 4dd0d5c ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant