Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASUS Tinker Board Kernel Modules #20

Closed
sghazagh opened this issue Jul 10, 2017 · 11 comments
Closed

ASUS Tinker Board Kernel Modules #20

sghazagh opened this issue Jul 10, 2017 · 11 comments

Comments

@sghazagh
Copy link

sghazagh commented Jul 10, 2017

Hi,
I have tried to test the kernel on this repository for ASUS Tinker Board.(using rk3288_linux_defconfig)
I have compiled the kernel and modules and boot the device with created kernel.

The kernel could load the device successfully however, modules does not load at all as this repository do not create all the required modules.

Device Tree seems different compare to official ASUS DTB file for miniarm.
Can you please advise if modules like WIFI and Bluetooth are included in this repository or not?

Thanks

@sghazagh sghazagh changed the title Tinker Board KErnel Modules ASUS Tinker Board Kernel Modules Jul 10, 2017
@wzyy2
Copy link
Contributor

wzyy2 commented Jul 10, 2017

WIFI and Bluetooth driver for tinker board are not included in main 4.4 kernel branch.

https://github.com/rockchip-linux/kernel/tree/release-20160818-miniarm
You can try this branch.

@Tonymac32
Copy link
Contributor

If that branch does not work for your needs, the Tinkerboard 4.4.76 kernel at Armbian.com has Bluetooth and rtl8723bs driver 4.4.0, as well as the Rockchip reboot fix. The build environment allows for creating user patches for any other adjustments you may wish to make. I follow the patches here and on the Tinker Board repo for any fixes.

@wzyy2
Copy link
Contributor

wzyy2 commented Jul 22, 2017

If that branch does not work for your needs, the Tinkerboard 4.4.76 kernel at Armbian.com has Bluetooth and rtl8723bs driver 4.4.0, as well as the Rockchip reboot fix. The build environment allows for creating user patches for any other adjustments you may wish to make. I follow the patches here and on the Tinker Board repo for any fixes.

I will rebase release-20160818-miniarm branch to the latest release-4.4 and re-organiza the code to make it easy to merge release-4.4 branch.

Maybe armbian could switch to that branch.

@Tonymac32
Copy link
Contributor

Thanks for the update, Armbian tries to use a single kernel source with patches to support all devices using a particular SoC. Switching to the Rockchip miniarm branch would most likely break the MiQi, or we'd have to have an entirely separate config and patchset for it.

@Tonymac32
Copy link
Contributor

Also, I've had very good success (as has @Miouyouyou) running Mainline kernel on RK3288 devices, the patches for Mainline are primarily dts and Mali.

@Miouyouyou
Copy link

Yeah, we're only missing bluetooth and VPU (well, I didn't the test the VPU packages ATM) with the Mainline kernel.

@wzyy2
Copy link
Contributor

wzyy2 commented Jul 24, 2017

Yeah, we're only missing bluetooth and VPU (well, I didn't the test the VPU packages ATM) with the Mainline kernel.

You can look at this repo for VPU driver.
https://github.com/rockchip-linux/rockchip_forwardports

I remember 4K HDMI is also missing, maybe you folks can have a look. ; ).

@Tonymac32
Copy link
Contributor

Haha yes, 4k is missing as well, I was focusing on the reboot thing first. Thanks for the link

@Miouyouyou
Copy link

Miouyouyou commented Jul 24, 2017

I'll take a look at the VPU, yeah !

I also take generous 4K screens donations !
\(・ヮ・)/

Meanwhile, I don't have 4K screens so testing this will be fairly difficult (´・ω・`).

The 4.13.0-rc2 update should be available very soon. This update includes various DTS/DTSI fixes and the reboot patch. Turns out I missed regulator-always-on and regulator-boot-on properties on an SD vcc node.

@Tonymac32
Copy link
Contributor

I have a 4k screen, I was looking at some of the code, let me get the Armbian Dev up to 4.13 first. ;-)

rkchrome pushed a commit that referenced this issue Dec 4, 2017
commit 6f48655 upstream.

This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
regression, that was introduced by

  commit 01d4d67
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Wed Dec 7 12:55:54 2016 -0800

which originally had the proper list_del_init() usage, but was
dropped during list review as it was thought unnecessary by HCH.

However, list_del_init() usage is required during the special
generate_node_acls = 1 + cache_dynamic_acls = 0 case when
transport_free_session() does a list_del(&se_nacl->acl_list),
followed by target_complete_nacl() doing the same thing.

This was manifesting as a general protection fault as reported
by Justin:

kernel: general protection fault: 0000 [#1] SMP
kernel: Modules linked in:
kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20
kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014
kernel: task: ffff88026939e800 task.stack: ffffc90007884000
kernel: RIP: 0010:target_put_nacl+0x49/0xb0
kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246
kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000
kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028
kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020
kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540
kernel: FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  transport_free_session+0x67/0x140
kernel:  transport_deregister_session+0x7a/0xc0
kernel:  iscsit_close_session+0x92/0x210
kernel:  iscsit_close_connection+0x5f9/0x840
kernel:  iscsit_take_action_for_connection_exit+0xfe/0x110
kernel:  iscsi_target_tx_thread+0x140/0x1e0
kernel:  ? wait_woken+0x90/0x90
kernel:  kthread+0x124/0x160
kernel:  ? iscsit_thread_get_cpumask+0x90/0x90
kernel:  ? kthread_create_on_node+0x40/0x40
kernel:  ret_from_fork+0x22/0x30
kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70
kernel: ---[ end trace f12821adbfd46fed ]---

To address this, go ahead and use proper list_del_list() for all
cases of se_nacl->acl_list deletion.

Reported-by: Justin Maggard <jmaggard01@gmail.com>
Tested-by: Justin Maggard <jmaggard01@gmail.com>
Cc: Justin Maggard <jmaggard01@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
liuqsqq pushed a commit to liuqsqq/kernel-rokchip that referenced this issue Jan 20, 2018
Kms hotplug event may race into fbdev helper initial, that would
cause the bug:

[    0.735411] [00000200] *pgd=00000000f6ffe003, *pud=00000000f6ffe003, *pmd=0000000000000000
[    0.736156] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    0.736648] Modules linked in:
[    0.736930] CPU: 2 PID: 20 Comm: kworker/2:0 Not tainted 4.4.41 rockchip-linux#20
[    0.737480] Hardware name: Rockchip RK3399 Board rev2 (BOX) (DT)
[    0.738020] Workqueue: events cdn_dp_pd_event_work
[    0.738447] task: ffffffc0f21f3100 ti: ffffffc0f2218000 task.ti: ffffffc0f2218000
[    0.739109] PC is at mutex_lock+0x14/0x44
[    0.739469] LR is at drm_fb_helper_hotplug_event+0x30/0x114
[    0.756253] [<ffffff8008a344f4>] mutex_lock+0x14/0x44
[    0.756260] [<ffffff8008445708>] drm_fb_helper_hotplug_event+0x30/0x114
[    0.756271] [<ffffff8008473c84>] rockchip_drm_output_poll_changed+0x18/0x20
[    0.756280] [<ffffff8008439fcc>] drm_kms_helper_hotplug_event+0x28/0x34
[    0.756286] [<ffffff800846c444>] cdn_dp_pd_event_work+0x394/0x3c4
[    0.756295] [<ffffff80080b2b38>] process_one_work+0x218/0x3e0
[    0.756302] [<ffffff80080b3538>] worker_thread+0x2e8/0x404
[    0.756308] [<ffffff80080b7e70>] kthread+0xe8/0xf0
[    0.756316] [<ffffff8008082690>] ret_from_fork+0x10/0x40

Change-Id: I8d594183cf8187131418a0096dde840cbf01ed6b
Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
liuqsqq pushed a commit to liuqsqq/kernel-rokchip that referenced this issue Jan 20, 2018
This patch aims to fix possible unsafe locking scenario when
dwc3 probe failed. The possible deadlock warning info is:

[4.254320] ======================================================
[4.254327] [ INFO: possible circular locking dependency detected ]
[4.254334] 4.4.55 rockchip-linux#20 Not tainted
[4.254339] -------------------------------------------------------
[4.254346] kworker/4:1/47 is trying to acquire lock:
[4.254352]  (&rockchip->lock){+.+.+.}, at: [<ffffff8008730d58>] dwc3_rockchip_otg_extcon_evt_work+0x30/0x42c
[4.254382]
[4.254382] but task is already holding lock:
[4.254388]  ((&rockchip->otg_work)){+.+...}, at: [<ffffff80080b7a78>] process_one_work+0x1c4/0x6ac
[4.254411]
[4.254411] which lock already depends on the new lock.
[4.254411]
[4.254418]
[4.254418] the existing dependency chain (in reverse order) is:
[4.254424]
-> #1 ((&rockchip->otg_work)){+.+...}:
[4.254440]        [<ffffff80080f6a58>] __lock_acquire+0x15b4/0x1950
[4.254452]        [<ffffff80080f75fc>] lock_acquire+0x190/0x250
[4.254461]        [<ffffff80080b88c8>] flush_work+0x4c/0x274
[4.254471]        [<ffffff80080b8cd8>] __cancel_work_timer+0x130/0x1c0
[4.254480]        [<ffffff80080b8d78>] cancel_work_sync+0x10/0x18
[4.254489]        [<ffffff8008731648>] dwc3_rockchip_probe+0x4f4/0x59c
[4.254498]        [<ffffff8008525a9c>] platform_drv_probe+0x58/0xa4
[4.254509]        [<ffffff800852397c>] driver_probe_device+0x118/0x2b0
[4.254521]        [<ffffff8008523b80>] __driver_attach+0x6c/0x98
[4.254531]        [<ffffff80085229d8>] bus_for_each_dev+0x80/0xb0
[4.254543]        [<ffffff80085234b0>] driver_attach+0x20/0x28
[4.254554]        [<ffffff8008523040>] bus_add_driver+0xe8/0x1ec
[4.254565]        [<ffffff8008524a9c>] driver_register+0x94/0xe0
[4.254576]        [<ffffff80085259f4>] __platform_driver_register+0x48/0x50
[4.254585]        [<ffffff8009130bf4>] dwc3_rockchip_driver_init+0x18/0x20
[4.254598]        [<ffffff8008082ba8>] do_one_initcall+0x180/0x19c
[4.254609]        [<ffffff8009100e58>] kernel_init_freeable+0x208/0x2c0
[4.254621]        [<ffffff8008c00748>] kernel_init+0x10/0xf8
[4.254632]        [<ffffff80080826d0>] ret_from_fork+0x10/0x40
[4.254641]
-> #0 (&rockchip->lock){+.+.+.}:
[4.254656]        [<ffffff80080f3af8>] print_circular_bug+0x64/0x2c4
[4.254666]        [<ffffff80080f6728>] __lock_acquire+0x1284/0x1950
[4.254675]        [<ffffff80080f75fc>] lock_acquire+0x190/0x250
[4.254685]        [<ffffff8008c03aac>] mutex_lock_nested+0x80/0x3d0
[4.254695]        [<ffffff8008730d58>] dwc3_rockchip_otg_extcon_evt_work+0x30/0x42c
[4.254704]        [<ffffff80080b7bec>] process_one_work+0x338/0x6ac
[4.254714]        [<ffffff80080b90e8>] worker_thread+0x300/0x428
[4.254723]        [<ffffff80080bead8>] kthread+0xf4/0xfc
[4.254732]        [<ffffff80080826d0>] ret_from_fork+0x10/0x40
[4.254741]
[4.254741] other info that might help us debug this:
[4.254741]
[4.254749]  Possible unsafe locking scenario:
[4.254749]
[4.254755]        CPU0                    CPU1
[4.254760]        ----                    ----
[4.254765]   lock((&rockchip->otg_work));
[4.254775]                                lock(&rockchip->lock);
[4.254786]                                lock((&rockchip->otg_work));
[4.254796]   lock(&rockchip->lock);
[4.254805]
[4.254805]  *** DEADLOCK ***
[4.254805]
[4.254813] 2 locks held by kworker/4:1/47:
[4.254818]  #0:  ("events"){.+.+.+}, at: [<ffffff80080b7a78>] process_one_work+0x1c4/0x6ac
[4.254839]  #1:  ((&rockchip->otg_work)){+.+...}, at: [<ffffff80080b7a78>] process_one_work+0x1c4/0x6ac
[4.254860]
[4.254860] stack backtrace:
[4.254869] CPU: 4 PID: 47 Comm: kworker/4:1 Not tainted 4.4.55 rockchip-linux#20
[4.254875] Hardware name: Rockchip RK3399 Evaluation Board v3 (Android) (DT)
[4.254885] Workqueue: events dwc3_rockchip_otg_extcon_evt_work
[4.254893] Call trace:
[4.254902] [<ffffff8008088a1c>] dump_backtrace+0x0/0x1c8
[4.254909] [<ffffff8008088bf8>] show_stack+0x14/0x1c
[4.254917] [<ffffff80083a5fa0>] dump_stack+0xb0/0xec
[4.254925] [<ffffff80080f3d3c>] print_circular_bug+0x2a8/0x2c4
[4.254931] [<ffffff80080f6728>] __lock_acquire+0x1284/0x1950
[4.254938] [<ffffff80080f75fc>] lock_acquire+0x190/0x250
[4.254946] [<ffffff8008c03aac>] mutex_lock_nested+0x80/0x3d0
[4.254952] [<ffffff8008730d58>] dwc3_rockchip_otg_extcon_evt_work+0x30/0x42c
[4.254960] [<ffffff80080b7bec>] process_one_work+0x338/0x6ac
[4.254967] [<ffffff80080b90e8>] worker_thread+0x300/0x428
[4.254973] [<ffffff80080bead8>] kthread+0xf4/0xfc
[4.254979] [<ffffff80080826d0>] ret_from_fork+0x10/0x40
[4.275124] rockchip-dwc3 usb@fe800000: USB peripheral connected

Change-Id: I9d34724e1c2af9b8472f79bfe4b088cbdde6394d
Signed-off-by: William Wu <wulf@rock-chips.com>
rkchrome pushed a commit that referenced this issue Jun 11, 2018
[ Upstream commit 560d388 ]

cifs_relock_file() can perform a down_write() on the inode's lock_sem even
though it was already performed in cifs_strict_readv().  Lockdep complains
about this.  AFAICS, there is no problem here, and lockdep just needs to be
told that this nesting is OK.

 =============================================
 [ INFO: possible recursive locking detected ]
 4.11.0+ #20 Not tainted
 ---------------------------------------------
 cat/701 is trying to acquire lock:
  (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00

 but task is already holding lock:
  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&cifsi->lock_sem);
   lock(&cifsi->lock_sem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by cat/701:
  #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

 stack backtrace:
 CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
 Call Trace:
  dump_stack+0x85/0xc2
  __lock_acquire+0x17dd/0x2260
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  ? preempt_schedule_irq+0x6b/0x80
  lock_acquire+0xcc/0x260
  ? lock_acquire+0xcc/0x260
  ? cifs_reopen_file+0x7a7/0xc00
  down_read+0x2d/0x70
  ? cifs_reopen_file+0x7a7/0xc00
  cifs_reopen_file+0x7a7/0xc00
  ? printk+0x43/0x4b
  cifs_readpage_worker+0x327/0x8a0
  cifs_readpage+0x8c/0x2a0
  generic_file_read_iter+0x692/0xd00
  cifs_strict_readv+0x29f/0x310
  generic_file_splice_read+0x11c/0x1c0
  do_splice_to+0xa5/0xc0
  splice_direct_to_actor+0xfa/0x350
  ? generic_pipe_buf_nosteal+0x10/0x10
  do_splice_direct+0xb5/0xe0
  do_sendfile+0x278/0x3a0
  SyS_sendfile64+0xc4/0xe0
  entry_SYSCALL_64_fastpath+0x1f/0xbe

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rkchrome pushed a commit that referenced this issue Jul 19, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rkchrome pushed a commit that referenced this issue Sep 26, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@Caesar-github
Copy link
Member

Obsolete issue

0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Apr 27, 2019
Function ib_create_qp() was failing to return an error when
rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp()
when trying to dereferece the qp pointer which was actually a negative
errno.

The crash:

crash> log|grep BUG
[  136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
crash> bt
PID: 3736   TASK: ffff8808543215c0  CPU: 2   COMMAND: "kworker/u64:2"
 #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0
 FireflyTeam#1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758
 FireflyTeam#2 [ffff88084d323480] crash_kexec at ffffffff8111682d
 FireflyTeam#3 [ffff88084d3234b0] oops_end at ffffffff81032bd6
 FireflyTeam#4 [ffff88084d3234e0] no_context at ffffffff8106e431
 FireflyTeam#5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610
 FireflyTeam#6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4
 FireflyTeam#7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc
 FireflyTeam#8 [ffff88084d323620] do_page_fault at ffffffff8106f057
 FireflyTeam#9 [ffff88084d323660] page_fault at ffffffff816e3148
    [exception RIP: ib_create_qp+427]
    RIP: ffffffffa02554fb  RSP: ffff88084d323718  RFLAGS: 00010246
    RAX: 0000000000000004  RBX: fffffffffffffff4  RCX: 000000018020001f
    RDX: ffff880830997fc0  RSI: 0000000000000001  RDI: ffff88085f407200
    RBP: ffff88084d323778   R8: 0000000000000001   R9: ffffea0020bae210
    R10: ffffea0020bae218  R11: 0000000000000001  R12: ffff88084d3237c8
    R13: 00000000fffffff4  R14: ffff880859fa5000  R15: ffff88082eb89800
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
FireflyTeam#10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm]
FireflyTeam#11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma]
FireflyTeam#12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma]
FireflyTeam#13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma]
FireflyTeam#14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma]
FireflyTeam#15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm]
FireflyTeam#16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm]
rockchip-linux#17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm]
rockchip-linux#18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm]
rockchip-linux#19 [ffff88084d323cb0] process_one_work at ffffffff810a1483
rockchip-linux#20 [ffff88084d323d90] worker_thread at ffffffff810a211d
rockchip-linux#21 [ffff88084d323ec0] kthread at ffffffff810a6c5c
rockchip-linux#22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf

Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jun 2, 2019
There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 FireflyTeam#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
FireflyTeam#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
FireflyTeam#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
FireflyTeam#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
FireflyTeam#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
FireflyTeam#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
FireflyTeam#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
FireflyTeam#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
rockchip-linux#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
rockchip-linux#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
rockchip-linux#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
rockchip-linux#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
rockchip-linux#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
rockchip-linux#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 17, 2019
Remove spinlock and use the "rtc->ops_lock" from RTC subsystem instead.
spin_lock_irqsave() is not needed here because we do not have hard IRQs.

This patch fixes the following issue:

root@GE004097290448 b850v3:~# hwclock --systohc
root@GE004097290448 b850v3:~# hwclock --systohc
root@GE004097290448 b850v3:~# hwclock --systohc
root@GE004097290448 b850v3:~# hwclock --systohc
root@GE004097290448 b850v3:~# hwclock --systohc
[   82.108175] BUG: spinlock wrong CPU on CPU#0, hwclock/855
[   82.113660]  lock: 0xedb4899c, .magic: dead4ead, .owner: hwclock/855, .owner_cpu: 1
[   82.121329] CPU: 0 PID: 855 Comm: hwclock Not tainted 4.8.0-00042-g09d5410-dirty rockchip-linux#20
[   82.129078] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   82.135609] Backtrace:
[   82.138090] [<8010d378>] (dump_backtrace) from [<8010d5c0>] (show_stack+0x20/0x24)
[   82.145664]  r7:ec936000 r6:600a0013 r5:00000000 r4:81031680
[   82.151402] [<8010d5a0>] (show_stack) from [<80401518>] (dump_stack+0xb4/0xe8)
[   82.158636] [<80401464>] (dump_stack) from [<8017b8b0>] (spin_dump+0x84/0xcc)
[   82.165775]  r10:00000000 r9:ec936000 r8:81056090 r7:600a0013 r6:edb4899c r5:edb4899c
[   82.173691]  r4:e5033e00 r3:00000000
[   82.177308] [<8017b82c>] (spin_dump) from [<8017bcb0>] (do_raw_spin_unlock+0x108/0x130)
[   82.185314]  r5:edb4899c r4:edb4899c
[   82.188938] [<8017bba8>] (do_raw_spin_unlock) from [<8094b93c>] (_raw_spin_unlock_irqrestore+0x34/0x54)
[   82.198333]  r5:edb4899c r4:600a0013
[   82.201953] [<8094b908>] (_raw_spin_unlock_irqrestore) from [<8065b090>] (rx8010_set_time+0x14c/0x188)
[   82.211261]  r5:00000020 r4:edb48990
[   82.214882] [<8065af44>] (rx8010_set_time) from [<80653fe4>] (rtc_set_time+0x70/0x104)
[   82.222801]  r7:00000051 r6:edb39da0 r5:edb39c00 r4:ec937e8c
[   82.228535] [<80653f74>] (rtc_set_time) from [<80655774>] (rtc_dev_ioctl+0x3c4/0x674)
[   82.236368]  r7:00000051 r6:7ecf1b74 r5:00000000 r4:edb39c00
[   82.242106] [<806553b0>] (rtc_dev_ioctl) from [<80284034>] (do_vfs_ioctl+0xa4/0xa6c)
[   82.249851]  r8:00000003 r7:80284a40 r6:ed1e9c80 r5:edb44e60 r4:7ecf1b74
[   82.256642] [<80283f90>] (do_vfs_ioctl) from [<80284a40>] (SyS_ioctl+0x44/0x6c)
[   82.263953]  r10:00000000 r9:ec936000 r8:7ecf1b74 r7:4024700a r6:ed1e9c80 r5:00000003
[   82.271869]  r4:ed1e9c80
[   82.274432] [<802849fc>] (SyS_ioctl) from [<80108520>] (ret_fast_syscall+0x0/0x1c)
[   82.282005]  r9:ec936000 r8:801086c4 r7:00000036 r6:00000000 r5:00000003 r4:0008e1bc
root@GE004097290448 b850v3:~#
Message from syslogd@GE004097290448 at Dec  3 11:17:08 ...
 kernel:[   82.108175] BUG: spinlock wrong CPU on CPU#0, hwclock/855

Message from syslogd@GE004097290448 at Dec  3 11:17:08 ...
 kernel:[   82.113660]  lock: 0xedb4899c, .magic: dead4ead, .owner: hwclock/855, .owner_cpu: 1
hwclock --systohc
root@GE004097290448 b850v3:~#

Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.co.uk>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 18, 2019
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 FireflyTeam#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 FireflyTeam#9 [] tcp_rcv_established at ffffffff81580b64
FireflyTeam#10 [] tcp_v4_do_rcv at ffffffff8158b54a
FireflyTeam#11 [] tcp_v4_rcv at ffffffff8158cd02
FireflyTeam#12 [] ip_local_deliver_finish at ffffffff815668f4
FireflyTeam#13 [] ip_local_deliver at ffffffff81566bd9
FireflyTeam#14 [] ip_rcv_finish at ffffffff8156656d
FireflyTeam#15 [] ip_rcv at ffffffff81566f06
FireflyTeam#16 [] __netif_receive_skb_core at ffffffff8152b3a2
rockchip-linux#17 [] __netif_receive_skb at ffffffff8152b608
rockchip-linux#18 [] netif_receive_skb at ffffffff8152b690
rockchip-linux#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
rockchip-linux#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
rockchip-linux#21 [] net_rx_action at ffffffff8152bac2
rockchip-linux#22 [] __do_softirq at ffffffff81084b4f
rockchip-linux#23 [] call_softirq at ffffffff8164845c
rockchip-linux#24 [] do_softirq at ffffffff81016fc5
rockchip-linux#25 [] irq_exit at ffffffff81084ee5
rockchip-linux#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 19, 2019
Commit 57e5568 ("sata_via: Implement hotplug for VT6421") adds
hotplug IRQ handler for VT6421 but enables hotplug on all chips. This
is a bug because it causes "irq xx: nobody cared" error on VT6420 when
hot-(un)plugging a drive:

[  381.839948] irq 20: nobody cared (try booting with the "irqpoll" option)
[  381.840014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc5+ rockchip-linux#148
[  381.840066] Hardware name:          P4VM800/P4VM800, BIOS P1.60 05/29/2006
[  381.840117] Call Trace:
[  381.840167]  <IRQ>
[  381.840225]  ? dump_stack+0x44/0x58
[  381.840278]  ? __report_bad_irq+0x14/0x97
[  381.840327]  ? handle_edge_irq+0xa5/0xa5
[  381.840376]  ? note_interrupt+0x155/0x1cf
[  381.840426]  ? handle_edge_irq+0xa5/0xa5
[  381.840474]  ? handle_irq_event_percpu+0x32/0x38
[  381.840524]  ? handle_irq_event+0x1f/0x38
[  381.840573]  ? handle_fasteoi_irq+0x69/0xb8
[  381.840625]  ? handle_irq+0x4f/0x5d
[  381.840672]  </IRQ>
[  381.840726]  ? do_IRQ+0x2e/0x8b
[  381.840782]  ? common_interrupt+0x2c/0x34
[  381.840836]  ? mwait_idle+0x60/0x82
[  381.840892]  ? arch_cpu_idle+0x6/0x7
[  381.840949]  ? do_idle+0x96/0x18e
[  381.841002]  ? cpu_startup_entry+0x16/0x1a
[  381.841057]  ? start_kernel+0x319/0x31c
[  381.841111]  ? startup_32_smp+0x166/0x168
[  381.841165] handlers:
[  381.841219] [<c12a7263>] ata_bmdma_interrupt
[  381.841274] Disabling IRQ rockchip-linux#20

Seems that VT6420 can do hotplug too (there's no documentation) but the
comments say that SCR register access (required for detecting hotplug
events) can cause problems on these chips.

For now, just keep hotplug disabled on anything other than VT6421.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 28, 2019
cifs_relock_file() can perform a down_write() on the inode's lock_sem even
though it was already performed in cifs_strict_readv().  Lockdep complains
about this.  AFAICS, there is no problem here, and lockdep just needs to be
told that this nesting is OK.

 =============================================
 [ INFO: possible recursive locking detected ]
 4.11.0+ rockchip-linux#20 Not tainted
 ---------------------------------------------
 cat/701 is trying to acquire lock:
  (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00

 but task is already holding lock:
  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&cifsi->lock_sem);
   lock(&cifsi->lock_sem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by cat/701:
  #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

 stack backtrace:
 CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ rockchip-linux#20
 Call Trace:
  dump_stack+0x85/0xc2
  __lock_acquire+0x17dd/0x2260
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  ? preempt_schedule_irq+0x6b/0x80
  lock_acquire+0xcc/0x260
  ? lock_acquire+0xcc/0x260
  ? cifs_reopen_file+0x7a7/0xc00
  down_read+0x2d/0x70
  ? cifs_reopen_file+0x7a7/0xc00
  cifs_reopen_file+0x7a7/0xc00
  ? printk+0x43/0x4b
  cifs_readpage_worker+0x327/0x8a0
  cifs_readpage+0x8c/0x2a0
  generic_file_read_iter+0x692/0xd00
  cifs_strict_readv+0x29f/0x310
  generic_file_splice_read+0x11c/0x1c0
  do_splice_to+0xa5/0xc0
  splice_direct_to_actor+0xfa/0x350
  ? generic_pipe_buf_nosteal+0x10/0x10
  do_splice_direct+0xb5/0xe0
  do_sendfile+0x278/0x3a0
  SyS_sendfile64+0xc4/0xe0
  entry_SYSCALL_64_fastpath+0x1f/0xbe

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 29, 2019
This prevents a deadlock that somehow results from the suspend() ->
forbid() -> resume() callchain.

[  125.266960] [drm] Initialized nouveau 1.3.1 20120801 for 0000:02:00.0 on minor 1
[  370.120872] INFO: task kworker/4:1:77 blocked for more than 120 seconds.
[  370.120920]       Tainted: G           O    4.12.0-rc3 rockchip-linux#20
[  370.120947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.120982] kworker/4:1     D13808    77      2 0x00000000
[  370.120998] Workqueue: pm pm_runtime_work
[  370.121004] Call Trace:
[  370.121018]  __schedule+0x2bf/0xb40
[  370.121025]  ? mark_held_locks+0x5f/0x90
[  370.121038]  schedule+0x3d/0x90
[  370.121044]  rpm_resume+0x107/0x870
[  370.121052]  ? finish_wait+0x90/0x90
[  370.121065]  ? pci_pm_runtime_resume+0xa0/0xa0
[  370.121070]  pm_runtime_forbid+0x4c/0x60
[  370.121129]  nouveau_pmops_runtime_suspend+0xaf/0xc0 [nouveau]
[  370.121139]  pci_pm_runtime_suspend+0x5f/0x170
[  370.121147]  ? pci_pm_runtime_resume+0xa0/0xa0
[  370.121152]  __rpm_callback+0xb9/0x1e0
[  370.121159]  ? pci_pm_runtime_resume+0xa0/0xa0
[  370.121166]  rpm_callback+0x24/0x80
[  370.121171]  ? pci_pm_runtime_resume+0xa0/0xa0
[  370.121176]  rpm_suspend+0x138/0x6e0
[  370.121192]  pm_runtime_work+0x7b/0xc0
[  370.121199]  process_one_work+0x253/0x6a0
[  370.121216]  worker_thread+0x4d/0x3b0
[  370.121229]  kthread+0x133/0x150
[  370.121234]  ? process_one_work+0x6a0/0x6a0
[  370.121238]  ? kthread_create_on_node+0x70/0x70
[  370.121246]  ret_from_fork+0x2a/0x40
[  370.121283]
               Showing all locks held in the system:
[  370.121291] 2 locks held by kworker/4:1/77:
[  370.121298]  #0:  ("pm"){.+.+.+}, at: [<ffffffffac0d3530>] process_one_work+0x1d0/0x6a0
[  370.121315]  FireflyTeam#1:  ((&dev->power.work)){+.+.+.}, at: [<ffffffffac0d3530>] process_one_work+0x1d0/0x6a0
[  370.121330] 1 lock held by khungtaskd/81:
[  370.121333]  #0:  (tasklist_lock){.+.+..}, at: [<ffffffffac10fc8d>] debug_show_all_locks+0x3d/0x1a0
[  370.121355] 1 lock held by dmesg/1639:
[  370.121358]  #0:  (&user->lock){+.+.+.}, at: [<ffffffffac124b6d>] devkmsg_read+0x4d/0x360

[  370.121377] =============================================

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 4, 2019
This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
regression, that was introduced by

  commit 01d4d67
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Wed Dec 7 12:55:54 2016 -0800

which originally had the proper list_del_init() usage, but was
dropped during list review as it was thought unnecessary by HCH.

However, list_del_init() usage is required during the special
generate_node_acls = 1 + cache_dynamic_acls = 0 case when
transport_free_session() does a list_del(&se_nacl->acl_list),
followed by target_complete_nacl() doing the same thing.

This was manifesting as a general protection fault as reported
by Justin:

kernel: general protection fault: 0000 [FireflyTeam#1] SMP
kernel: Modules linked in:
kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ rockchip-linux#20
kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014
kernel: task: ffff88026939e800 task.stack: ffffc90007884000
kernel: RIP: 0010:target_put_nacl+0x49/0xb0
kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246
kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000
kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028
kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020
kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540
kernel: FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  transport_free_session+0x67/0x140
kernel:  transport_deregister_session+0x7a/0xc0
kernel:  iscsit_close_session+0x92/0x210
kernel:  iscsit_close_connection+0x5f9/0x840
kernel:  iscsit_take_action_for_connection_exit+0xfe/0x110
kernel:  iscsi_target_tx_thread+0x140/0x1e0
kernel:  ? wait_woken+0x90/0x90
kernel:  kthread+0x124/0x160
kernel:  ? iscsit_thread_get_cpumask+0x90/0x90
kernel:  ? kthread_create_on_node+0x40/0x40
kernel:  ret_from_fork+0x22/0x30
kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70
kernel: ---[ end trace f12821adbfd46fed ]---

To address this, go ahead and use proper list_del_list() for all
cases of se_nacl->acl_list deletion.

Reported-by: Justin Maggard <jmaggard01@gmail.com>
Tested-by: Justin Maggard <jmaggard01@gmail.com>
Cc: Justin Maggard <jmaggard01@gmail.com>
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 4, 2019
syzkaller reported a refcount_t warning [1]

Issue here is that noop_qdisc refcnt was never really considered as
a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :

if (qdisc->flags & TCQ_F_BUILTIN ||
    !refcount_dec_and_test(&qdisc->refcnt)))
	return;

Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
really needed, but harmless until refcount_t came.

To fix this problem, we simply need to not increment noop_qdisc.refcnt,
since we never decrement it.

[1]
refcount_t: increment on 0; use-after-free.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ rockchip-linux#20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x417 kernel/panic.c:180
 __warn+0x1c4/0x1d9 kernel/panic.c:541
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
 attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
 dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
 __dev_open+0x227/0x330 net/core/dev.c:1380
 __dev_change_flags+0x695/0x990 net/core/dev.c:6726
 dev_change_flags+0x88/0x140 net/core/dev.c:6792
 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
 sock_do_ioctl+0x94/0xb0 net/socket.c:968
 sock_ioctl+0x2c2/0x440 net/socket.c:1058
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691

Fixes: 7b93640 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Reshetova, Elena <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 11, 2019
According to the kerneldoc[0], should do fbdev setup before calling
drm_kms_helper_poll_init(), otherwise, Kms hotplug event may race
into fbdev helper initial, and fb_helper->dev may be NULL pointer,
that would cause the bug:
[    0.735411] [00000200] *pgd=00000000f6ffe003, *pud=00000000f6ffe003, *pmd=0000000000000000
[    0.736156] Internal error: Oops: 96000005 [FireflyTeam#1] PREEMPT SMP
[    0.736648] Modules linked in:
[    0.736930] CPU: 2 PID: 20 Comm: kworker/2:0 Not tainted 4.4.41 rockchip-linux#20
[    0.737480] Hardware name: Rockchip RK3399 Board rev2 (BOX) (DT)
[    0.738020] Workqueue: events cdn_dp_pd_event_work
[    0.738447] task: ffffffc0f21f3100 ti: ffffffc0f2218000 task.ti: ffffffc0f2218000
[    0.739109] PC is at mutex_lock+0x14/0x44
[    0.739469] LR is at drm_fb_helper_hotplug_event+0x30/0x114
[    0.756253] [<ffffff8008a344f4>] mutex_lock+0x14/0x44
[    0.756260] [<ffffff8008445708>] drm_fb_helper_hotplug_event+0x30/0x114
[    0.756271] [<ffffff8008473c84>] rockchip_drm_output_poll_changed+0x18/0x20
[    0.756280] [<ffffff8008439fcc>] drm_kms_helper_hotplug_event+0x28/0x34
[    0.756286] [<ffffff800846c444>] cdn_dp_pd_event_work+0x394/0x3c4
[    0.756295] [<ffffff80080b2b38>] process_one_work+0x218/0x3e0
[    0.756302] [<ffffff80080b3538>] worker_thread+0x2e8/0x404
[    0.756308] [<ffffff80080b7e70>] kthread+0xe8/0xf0
[    0.756316] [<ffffff8008082690>] ret_from_fork+0x10/0x40

[0]: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-kms-helpers.html

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Reviewed-by: Sandy huang <sandy.huang@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1501575103-20136-1-git-send-email-mark.yao@rock-chips.com
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 19, 2019
…-text symbols"

This reverts commit 83e840c ("powerpc64/elfv1: Only dereference
function descriptor for non-text symbols").

Chandan reported that on newer kernels, trying to enable function_graph
tracer on ppc64 (BE) locks up the system with the following trace:

  Unable to handle kernel paging request for data at address 0x600000002fa30010
  Faulting instruction address: 0xc0000000001f1300
  Thread overran stack, or stack corrupted
  Oops: Kernel access of bad area, sig: 11 [FireflyTeam#1]
  BE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
  Modules linked in:
  CPU: 1 PID: 6586 Comm: bash Not tainted 4.14.0-rc3-00162-g6e51f1f-dirty rockchip-linux#20
  task: c000000625c07200 task.stack: c000000625c07310
  NIP:  c0000000001f1300 LR: c000000000121cac CTR: c000000000061af8
  REGS: c000000625c088c0 TRAP: 0380   Not tainted  (4.14.0-rc3-00162-g6e51f1f-dirty)
  MSR:  8000000000001032 <SF,ME,IR,DR,RI>  CR: 28002848  XER: 00000000
  CFAR: c0000000001f1320 SOFTE: 0
  ...
  NIP [c0000000001f1300] .__is_insn_slot_addr+0x30/0x90
  LR [c000000000121cac] .kernel_text_address+0x18c/0x1c0
  Call Trace:
  [c000000625c08b40] [c0000000001bd040] .is_module_text_address+0x20/0x40 (unreliable)
  [c000000625c08bc0] [c000000000121cac] .kernel_text_address+0x18c/0x1c0
  [c000000625c08c50] [c000000000061960] .prepare_ftrace_return+0x50/0x130
  [c000000625c08cf0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34
  [c000000625c08d60] [c000000000121b40] .kernel_text_address+0x20/0x1c0
  [c000000625c08df0] [c000000000061960] .prepare_ftrace_return+0x50/0x130
  ...
  [c000000625c0ab30] [c000000000061960] .prepare_ftrace_return+0x50/0x130
  [c000000625c0abd0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34
  [c000000625c0ac40] [c000000000121b40] .kernel_text_address+0x20/0x1c0
  [c000000625c0acd0] [c000000000061960] .prepare_ftrace_return+0x50/0x130
  [c000000625c0ad70] [c000000000061b10] .ftrace_graph_caller+0x14/0x34
  [c000000625c0ade0] [c000000000121b40] .kernel_text_address+0x20/0x1c0

This is because ftrace is using ppc_function_entry() for obtaining the
address of return_to_handler() in prepare_ftrace_return(). The call to
kernel_text_address() itself gets traced and we end up in a recursive
loop.

Fixes: 83e840c ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols")
Cc: stable@vger.kernel.org # v4.13+
Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 19, 2019
When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1
and N2, we want to transfer socket ownership to the new fresh skbs.

In order to avoid expensive atomic operations on a cache line subject to
cache bouncing, we replace the sequence :

refcount_add(N1, &sk->sk_wmem_alloc);
refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments

refcount_sub(O, &sk->sk_wmem_alloc);

by a single

refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc);

Problem is :

In some pathological cases, sum(N) - O might be a negative number, and
syzkaller bot was apparently able to trigger this trace [1]

atomic_t was ok with this construct, but we need to take care of the
negative delta with refcount_t

[1]
refcount_t: saturated; leaking memory.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ rockchip-linux#20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1c4/0x1e0 kernel/panic.c:546
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
 do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77
RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282
RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000
RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68
RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c
R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f
 refcount_add+0x1b/0x60 lib/refcount.c:101
 tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155
 tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51
 inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271
 skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749
 __skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821
 skb_gso_segment include/linux/netdevice.h:3971 [inline]
 validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074
 __dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3538
 neigh_hh_output include/net/neighbour.h:471 [inline]
 neigh_output include/net/neighbour.h:479 [inline]
 ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229
 ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:238 [inline]
 ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:459 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137
 tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341
 __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513
 tcp_push_pending_frames include/net/tcp.h:1722 [inline]
 tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline]
 tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497
 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460
 sk_backlog_rcv include/net/sock.h:909 [inline]
 __release_sock+0x124/0x360 net/core/sock.c:2264
 release_sock+0xa4/0x2a0 net/core/sock.c:2776
 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
 sock_sendmsg_nosec net/socket.c:632 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:642
 ___sys_sendmsg+0x31c/0x890 net/socket.c:2048
 __sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138

Fixes: 14afee4 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rkchrome pushed a commit that referenced this issue Aug 23, 2019
commit 46cc0b4 upstream.

Current snapshot implementation swaps two ring_buffers even though their
sizes are different from each other, that can cause an inconsistency
between the contents of buffer_size_kb file and the current buffer size.

For example:

  # cat buffer_size_kb
  7 (expanded: 1408)
  # echo 1 > events/enable
  # grep bytes per_cpu/cpu0/stats
  bytes: 1441020
  # echo 1 > snapshot             // current:1408, spare:1408
  # echo 123 > buffer_size_kb     // current:123,  spare:1408
  # echo 1 > snapshot             // current:1408, spare:123
  # grep bytes per_cpu/cpu0/stats
  bytes: 1443700
  # cat buffer_size_kb
  123                             // != current:1408

And also, a similar per-cpu case hits the following WARNING:

Reproducer:

  # echo 1 > per_cpu/cpu0/snapshot
  # echo 123 > buffer_size_kb
  # echo 1 > per_cpu/cpu0/snapshot

WARNING:

  WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380
  Modules linked in:
  CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380
  Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48
  RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093
  RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8
  RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005
  RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b
  R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000
  R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060
  FS:  00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0
  Call Trace:
   ? trace_array_printk_buf+0x140/0x140
   ? __mutex_lock_slowpath+0x10/0x10
   tracing_snapshot_write+0x4c8/0x7f0
   ? trace_printk_init_buffers+0x60/0x60
   ? selinux_file_permission+0x3b/0x540
   ? tracer_preempt_off+0x38/0x506
   ? trace_printk_init_buffers+0x60/0x60
   __vfs_write+0x81/0x100
   vfs_write+0x1e1/0x560
   ksys_write+0x126/0x250
   ? __ia32_sys_read+0xb0/0xb0
   ? do_syscall_64+0x1f/0x390
   do_syscall_64+0xc1/0x390
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

This patch adds resize_buffer_duplicate_size() to check if there is a
difference between current/spare buffer sizes and resize a spare buffer
if necessary.

Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com

Cc: stable@vger.kernel.org
Fixes: ad909e2 ("tracing: Add internal tracing_snapshot() functions")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Sep 20, 2019
When RCU stall warning triggers, it can print out a lot of messages
while holding spinlocks.  If the console device is slow (e.g. an
actual or IPMI serial console), it may end up triggering NMI hard
lockup watchdog like the following.

*** CPU printking while holding RCU spinlock

  PID: 4149739  TASK: ffff881a46baa880  CPU: 13  COMMAND: "CPUThreadPool8"
   #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0
   FireflyTeam#1 [ffff881fff945e58] nmi_handle at ffffffff81020653
   FireflyTeam#2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36
   FireflyTeam#3 [ffff881fff945ed0] do_nmi at ffffffff81020d32
   FireflyTeam#4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: io_serial_in+21]
      RIP: ffffffff81630e55  RSP: ffff881fff943b88  RFLAGS: 00000002
      RAX: 000000000000ca00  RBX: ffffffff8230e188  RCX: 0000000000000000
      RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffff8230e188
      RBP: ffff881fff943bb0   R8: 0000000000000000   R9: ffffffff820cb3c4
      R10: 0000000000000019  R11: 0000000000002000  R12: 00000000000026e1
      R13: 0000000000000020  R14: ffffffff820cd398  R15: 0000000000000035
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  --- <NMI exception stack> ---
   FireflyTeam#5 [ffff881fff943b88] io_serial_in at ffffffff81630e55
   FireflyTeam#6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c
   FireflyTeam#7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc
   FireflyTeam#8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00
   FireflyTeam#9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691
  FireflyTeam#10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2
  FireflyTeam#11 [ffff881fff943c90] console_unlock at ffffffff810dfc55
  FireflyTeam#12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5
  FireflyTeam#13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf
  FireflyTeam#14 [ffff881fff943d60] vprintk_func at ffffffff810e1127
  FireflyTeam#15 [ffff881fff943d70] printk at ffffffff8119a8a4
  FireflyTeam#16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c
  rockchip-linux#17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133
  rockchip-linux#18 [ffff881fff943ee8] update_process_times at ffffffff810f3497
  rockchip-linux#19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037
  rockchip-linux#20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38
  rockchip-linux#21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b

*** CPU triggering the hardlockup watchdog

  PID: 4149709  TASK: ffff88010f88c380  CPU: 26  COMMAND: "CPUThreadPool35"
   #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874
   FireflyTeam#1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc
   FireflyTeam#2 [ffff883fff105af0] __crash_kexec at ffffffff81111795
   FireflyTeam#3 [ffff883fff105b08] panic at ffffffff8119a6ae
   FireflyTeam#4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd
   FireflyTeam#5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866
   FireflyTeam#6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4
   FireflyTeam#7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265
   FireflyTeam#8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f
   FireflyTeam#9 [ffff883fff105e58] nmi_handle at ffffffff81020653
  FireflyTeam#10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94
  FireflyTeam#11 [ffff883fff105ed0] do_nmi at ffffffff81020d32
  FireflyTeam#12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: queued_spin_lock_slowpath+248]
      RIP: ffffffff810da958  RSP: ffff883fff103e68  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: 0000000000000046  RCX: 00000000006d0000
      RDX: ffff883fff49a950  RSI: 0000000000d10101  RDI: ffffffff81e54300
      RBP: ffff883fff103e80   R8: ffff883fff11a950   R9: 0000000000000000
      R10: 000000000e5873ba  R11: 000000000000010f  R12: ffffffff81e54300
      R13: 0000000000000000  R14: ffff88010f88c380  R15: ffffffff81e54300
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
  FireflyTeam#13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958
  FireflyTeam#14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b
  FireflyTeam#15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18
  FireflyTeam#16 [ffff883fff103ee8] update_process_times at ffffffff810f3497
  rockchip-linux#17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037
  rockchip-linux#18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38
  rockchip-linux#19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b
  --- <IRQ stack> ---

Avoid spuriously triggering NMI hardlockup watchdog by touching it
from the print functions.  show_state_filter() shares the same problem
and solution.

v2: Relocate the comment to where it belongs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Sep 21, 2019
when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 FireflyTeam#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 FireflyTeam#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 FireflyTeam#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 FireflyTeam#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 FireflyTeam#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 FireflyTeam#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 FireflyTeam#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 FireflyTeam#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 FireflyTeam#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
FireflyTeam#10 [ffff880078447e28] mount_fs at ffffffff81202d09
FireflyTeam#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
FireflyTeam#12 [ffff880078447ea8] do_mount at ffffffff81220fee
FireflyTeam#13 [ffff880078447f28] sys_mount at ffffffff812218d6
FireflyTeam#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 FireflyTeam#1 [ffff880078417900] schedule at ffffffff8168dc59
 FireflyTeam#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 FireflyTeam#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 FireflyTeam#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 FireflyTeam#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 FireflyTeam#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 FireflyTeam#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 FireflyTeam#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 FireflyTeam#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
FireflyTeam#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
FireflyTeam#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
FireflyTeam#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
FireflyTeam#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
FireflyTeam#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
FireflyTeam#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
FireflyTeam#16 [ffff880078417d00] do_last at ffffffff8120d53d
rockchip-linux#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
rockchip-linux#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
rockchip-linux#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
rockchip-linux#20 [ffff880078417f70] sys_open at ffffffff811fde4e
rockchip-linux#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Sep 29, 2019
commit 46cc0b4 upstream.

Current snapshot implementation swaps two ring_buffers even though their
sizes are different from each other, that can cause an inconsistency
between the contents of buffer_size_kb file and the current buffer size.

For example:

  # cat buffer_size_kb
  7 (expanded: 1408)
  # echo 1 > events/enable
  # grep bytes per_cpu/cpu0/stats
  bytes: 1441020
  # echo 1 > snapshot             // current:1408, spare:1408
  # echo 123 > buffer_size_kb     // current:123,  spare:1408
  # echo 1 > snapshot             // current:1408, spare:123
  # grep bytes per_cpu/cpu0/stats
  bytes: 1443700
  # cat buffer_size_kb
  123                             // != current:1408

And also, a similar per-cpu case hits the following WARNING:

Reproducer:

  # echo 1 > per_cpu/cpu0/snapshot
  # echo 123 > buffer_size_kb
  # echo 1 > per_cpu/cpu0/snapshot

WARNING:

  WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380
  Modules linked in:
  CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 rockchip-linux#20
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380
  Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48
  RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093
  RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8
  RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005
  RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b
  R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000
  R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060
  FS:  00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0
  Call Trace:
   ? trace_array_printk_buf+0x140/0x140
   ? __mutex_lock_slowpath+0x10/0x10
   tracing_snapshot_write+0x4c8/0x7f0
   ? trace_printk_init_buffers+0x60/0x60
   ? selinux_file_permission+0x3b/0x540
   ? tracer_preempt_off+0x38/0x506
   ? trace_printk_init_buffers+0x60/0x60
   __vfs_write+0x81/0x100
   vfs_write+0x1e1/0x560
   ksys_write+0x126/0x250
   ? __ia32_sys_read+0xb0/0xb0
   ? do_syscall_64+0x1f/0x390
   do_syscall_64+0xc1/0x390
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

This patch adds resize_buffer_duplicate_size() to check if there is a
difference between current/spare buffer sizes and resize a spare buffer
if necessary.

Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com

Cc: stable@vger.kernel.org
Fixes: ad909e2 ("tracing: Add internal tracing_snapshot() functions")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Oct 5, 2019
Crash dump shows following instructions

crash> bt
PID: 0      TASK: ffffffffbe412480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1
 FireflyTeam#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2
 FireflyTeam#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c
 FireflyTeam#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a
 FireflyTeam#4 [ffff891ee00039e0] no_context at ffffffffbd074643
 FireflyTeam#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e
 FireflyTeam#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64
 FireflyTeam#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a
 FireflyTeam#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8
 FireflyTeam#9 [ffff891ee0003b50] page_fault at ffffffffbda01925
    [exception RIP: qlt_schedule_sess_for_deletion+15]
    RIP: ffffffffc02e526f  RSP: ffff891ee0003c08  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffffffffc0307847
    RDX: 00000000000020e6  RSI: ffff891edbc377c8  RDI: 0000000000000000
    RBP: ffff891ee0003c18   R8: ffffffffc02f0b20   R9: 0000000000000250
    R10: 0000000000000258  R11: 000000000000b780  R12: ffff891ed9b43000
    R13: 00000000000000f0  R14: 0000000000000006  R15: ffff891edbc377c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 FireflyTeam#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx]
 FireflyTeam#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx]
 FireflyTeam#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx]
 FireflyTeam#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx]
 FireflyTeam#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59
 FireflyTeam#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02
 FireflyTeam#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90
 rockchip-linux#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984
 rockchip-linux#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5
 rockchip-linux#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18
 --- <IRQ stack> ---
 rockchip-linux#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e
    [exception RIP: unknown or invalid address]
    RIP: 000000000000001f  RSP: 0000000000000000  RFLAGS: fff3b8c2091ebb3f
    RAX: ffffbba5a0000200  RBX: 0000be8cdfa8f9fa  RCX: 0000000000000018
    RDX: 0000000000000101  RSI: 000000000000015d  RDI: 0000000000000193
    RBP: 0000000000000083   R8: ffffffffbe403e38   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffffffbe56b820  R12: ffff891ee001cf00
    R13: ffffffffbd11c0a4  R14: ffffffffbe403d60  R15: 0000000000000001
    ORIG_RAX: ffff891ee0022ac0  CS: 0000  SS: ffffffffffffffb9
 bt: WARNING: possibly bogus exception frame
 rockchip-linux#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd
 rockchip-linux#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907
 rockchip-linux#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3
 rockchip-linux#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42
 rockchip-linux#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3
 rockchip-linux#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa
 rockchip-linux#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca
 rockchip-linux#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675
 rockchip-linux#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb
 rockchip-linux#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5

Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login")
Cc: <stable@vger.kernel.org> # v4.17+
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fanck0605 pushed a commit to fanck0605/friendlywrt-kernel that referenced this issue Apr 27, 2020
[ Upstream commit 1bc7896 ]

When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   friendlyarm#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   friendlyarm#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   friendlyarm#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   friendlyarm#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   rockchip-linux#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  rockchip-linux#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  rockchip-linux#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  rockchip-linux#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  rockchip-linux#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  rockchip-linux#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  rockchip-linux#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  rockchip-linux#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  rockchip-linux#17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  rockchip-linux#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  rockchip-linux#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  rockchip-linux#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  rockchip-linux#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  rockchip-linux#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  rockchip-linux#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/mman.h>
  #include <pthread.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
#     --- a/tools/trace.py
#     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -> friendlyarm#1 (&p->pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&rq->lock);
  [   32.856671]                                lock(&p->pi_lock);
  [   32.857332]                                lock(&rq->lock);
  [   32.857999]   lock(&(&sighand->siglock)->rlock);

  Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
  but it has been held by CPU1 and CPU1 tries to grab &rq->lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fanck0605 pushed a commit to fanck0605/friendlywrt-kernel that referenced this issue Apr 27, 2020
[ no upstream commit ]

Switch the comparison, so that is_branch_taken() will recognize that below
branch is never taken:

  [...]
  17: [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
  17: (67) r8 <<= 32
  18: [...] R8_w=inv(id=0,smax_value=-4294967296,umin_value=9223372036854775808,umax_value=18446744069414584320,var_off=(0x8000000000000000; 0x7fffffff00000000)) [...]
  18: (c7) r8 s>>= 32
  19: [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
  19: (6d) if r1 s> r8 goto pc+16
  [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
  [...]

Currently we check for is_branch_taken() only if either K is source, or source
is a scalar value that is const. For upstream it would be good to extend this
properly to check whether dst is const and src not.

For the sake of the test_verifier, it is probably not needed here:

  # ./test_verifier 101
  rockchip-linux#101/p bpf_get_stack return R0 within range OK
  Summary: 1 PASSED, 0 SKIPPED, 0 FAILED

I haven't seen this issue in test_progs* though, they are passing fine:

  # ./test_progs-no_alu32 -t get_stack
  Switching to flavor 'no_alu32' subdirectory...
  rockchip-linux#20 get_stack_raw_tp:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

  # ./test_progs -t get_stack
  rockchip-linux#20 get_stack_raw_tp:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Aug 31, 2020
[ Upstream commit e24c644 ]

I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        rockchip-linux#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        rockchip-linux#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        rockchip-linux#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        rockchip-linux#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        rockchip-linux#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        rockchip-linux#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        rockchip-linux#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        rockchip-linux#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        rockchip-linux#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        rockchip-linux#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        rockchip-linux#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        rockchip-linux#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        rockchip-linux#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        rockchip-linux#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        rockchip-linux#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        rockchip-linux#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        rockchip-linux#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        rockchip-linux#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Aug 31, 2020
[ Upstream commit ab0db04 ]

When running with -o enospc_debug you can get the following splat if one
of the dump_space_info's trip

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc5+ rockchip-linux#20 Tainted: G           OE
  ------------------------------------------------------
  dd/563090 is trying to acquire lock:
  ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs]

  but task is already holding lock:
  ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (&cache->lock){+.+.}-{2:2}:
	 _raw_spin_lock+0x25/0x30
	 btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs]
	 find_free_extent+0x7ef/0x13b0 [btrfs]
	 btrfs_reserve_extent+0x9b/0x180 [btrfs]
	 btrfs_alloc_tree_block+0xc1/0x340 [btrfs]
	 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
	 __btrfs_cow_block+0x122/0x530 [btrfs]
	 btrfs_cow_block+0x106/0x210 [btrfs]
	 commit_cowonly_roots+0x55/0x300 [btrfs]
	 btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
	 sync_filesystem+0x74/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20 [btrfs]
	 deactivate_locked_super+0x36/0x70
	 cleanup_mnt+0x104/0x160
	 task_work_run+0x5f/0x90
	 __prepare_exit_to_usermode+0x1bd/0x1c0
	 do_syscall_64+0x5e/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #2 (&space_info->lock){+.+.}-{2:2}:
	 _raw_spin_lock+0x25/0x30
	 btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs]
	 btrfs_inode_rsv_release+0x4f/0x170 [btrfs]
	 btrfs_clear_delalloc_extent+0x155/0x480 [btrfs]
	 clear_state_bit+0x81/0x1a0 [btrfs]
	 __clear_extent_bit+0x25c/0x5d0 [btrfs]
	 clear_extent_bit+0x15/0x20 [btrfs]
	 btrfs_invalidatepage+0x2b7/0x3c0 [btrfs]
	 truncate_cleanup_page+0x47/0xe0
	 truncate_inode_pages_range+0x238/0x840
	 truncate_pagecache+0x44/0x60
	 btrfs_setattr+0x202/0x5e0 [btrfs]
	 notify_change+0x33b/0x490
	 do_truncate+0x76/0xd0
	 path_openat+0x687/0xa10
	 do_filp_open+0x91/0x100
	 do_sys_openat2+0x215/0x2d0
	 do_sys_open+0x44/0x80
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&tree->lock#2){+.+.}-{2:2}:
	 _raw_spin_lock+0x25/0x30
	 find_first_extent_bit+0x32/0x150 [btrfs]
	 write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
	 __btrfs_write_out_cache+0x172/0x480 [btrfs]
	 btrfs_write_out_cache+0x7a/0xf0 [btrfs]
	 btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs]
	 commit_cowonly_roots+0x245/0x300 [btrfs]
	 btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
	 close_ctree+0xf9/0x2f5 [btrfs]
	 generic_shutdown_super+0x6c/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20 [btrfs]
	 deactivate_locked_super+0x36/0x70
	 cleanup_mnt+0x104/0x160
	 task_work_run+0x5f/0x90
	 __prepare_exit_to_usermode+0x1bd/0x1c0
	 do_syscall_64+0x5e/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&ctl->tree_lock){+.+.}-{2:2}:
	 __lock_acquire+0x1240/0x2460
	 lock_acquire+0xab/0x360
	 _raw_spin_lock+0x25/0x30
	 btrfs_dump_free_space+0x2b/0xa0 [btrfs]
	 btrfs_dump_space_info+0xf4/0x120 [btrfs]
	 btrfs_reserve_extent+0x176/0x180 [btrfs]
	 __btrfs_prealloc_file_range+0x145/0x550 [btrfs]
	 cache_save_setup+0x28d/0x3b0 [btrfs]
	 btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
	 btrfs_commit_transaction+0xcc/0xac0 [btrfs]
	 btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
	 btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
	 btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
	 btrfs_file_write_iter+0x3cf/0x610 [btrfs]
	 new_sync_write+0x11e/0x1b0
	 vfs_write+0x1c9/0x200
	 ksys_write+0x68/0xe0
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

  Chain exists of:
    &ctl->tree_lock --> &space_info->lock --> &cache->lock

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&cache->lock);
				 lock(&space_info->lock);
				 lock(&cache->lock);
    lock(&ctl->tree_lock);

   *** DEADLOCK ***

  6 locks held by dd/563090:
   #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200
   #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs]
   #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs]
   #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs]
   #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs]
   #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]

  stack backtrace:
  CPU: 3 PID: 563090 Comm: dd Tainted: G           OE     5.8.0-rc5+ rockchip-linux#20
  Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
  Call Trace:
   dump_stack+0x96/0xd0
   check_noncircular+0x162/0x180
   __lock_acquire+0x1240/0x2460
   ? wake_up_klogd.part.0+0x30/0x40
   lock_acquire+0xab/0x360
   ? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
   _raw_spin_lock+0x25/0x30
   ? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
   btrfs_dump_free_space+0x2b/0xa0 [btrfs]
   btrfs_dump_space_info+0xf4/0x120 [btrfs]
   btrfs_reserve_extent+0x176/0x180 [btrfs]
   __btrfs_prealloc_file_range+0x145/0x550 [btrfs]
   ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs]
   cache_save_setup+0x28d/0x3b0 [btrfs]
   btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
   btrfs_commit_transaction+0xcc/0xac0 [btrfs]
   ? start_transaction+0xe0/0x5b0 [btrfs]
   btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
   btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
   btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
   ? ktime_get_coarse_real_ts64+0xa8/0xd0
   ? trace_hardirqs_on+0x1c/0xe0
   btrfs_file_write_iter+0x3cf/0x610 [btrfs]
   new_sync_write+0x11e/0x1b0
   vfs_write+0x1c9/0x200
   ksys_write+0x68/0xe0
   do_syscall_64+0x52/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is because we're holding the block_group->lock while trying to dump
the free space cache.  However we don't need this lock, we just need it
to read the values for the printk, so move the free space cache dumping
outside of the block group lock.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Aug 31, 2020
commit 01d01ca upstream.

We are currently getting this lockdep splat in btrfs/161:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc5+ rockchip-linux#20 Tainted: G            E
  ------------------------------------------------------
  mount/678048 is trying to acquire lock:
  ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs]

  but task is already holding lock:
  ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x8b/0x8f0
	 btrfs_init_new_device+0x2d2/0x1240 [btrfs]
	 btrfs_ioctl+0x1de/0x2d20 [btrfs]
	 ksys_ioctl+0x87/0xc0
	 __x64_sys_ioctl+0x16/0x20
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
	 __lock_acquire+0x1240/0x2460
	 lock_acquire+0xab/0x360
	 __mutex_lock+0x8b/0x8f0
	 clone_fs_devices+0x4d/0x170 [btrfs]
	 btrfs_read_chunk_tree+0x330/0x800 [btrfs]
	 open_ctree+0xb7c/0x18ce [btrfs]
	 btrfs_mount_root.cold+0x13/0xfa [btrfs]
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 fc_mount+0xe/0x40
	 vfs_kern_mount.part.0+0x71/0x90
	 btrfs_mount+0x13b/0x3e0 [btrfs]
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 do_mount+0x7de/0xb30
	 __x64_sys_mount+0x8e/0xd0
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&fs_info->chunk_mutex);
				 lock(&fs_devs->device_list_mutex);
				 lock(&fs_info->chunk_mutex);
    lock(&fs_devs->device_list_mutex);

   *** DEADLOCK ***

  3 locks held by mount/678048:
   #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
   #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs]
   #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]

  stack backtrace:
  CPU: 2 PID: 678048 Comm: mount Tainted: G            E     5.8.0-rc5+ rockchip-linux#20
  Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
  Call Trace:
   dump_stack+0x96/0xd0
   check_noncircular+0x162/0x180
   __lock_acquire+0x1240/0x2460
   ? asm_sysvec_apic_timer_interrupt+0x12/0x20
   lock_acquire+0xab/0x360
   ? clone_fs_devices+0x4d/0x170 [btrfs]
   __mutex_lock+0x8b/0x8f0
   ? clone_fs_devices+0x4d/0x170 [btrfs]
   ? rcu_read_lock_sched_held+0x52/0x60
   ? cpumask_next+0x16/0x20
   ? module_assert_mutex_or_preempt+0x14/0x40
   ? __module_address+0x28/0xf0
   ? clone_fs_devices+0x4d/0x170 [btrfs]
   ? static_obj+0x4f/0x60
   ? lockdep_init_map_waits+0x43/0x200
   ? clone_fs_devices+0x4d/0x170 [btrfs]
   clone_fs_devices+0x4d/0x170 [btrfs]
   btrfs_read_chunk_tree+0x330/0x800 [btrfs]
   open_ctree+0xb7c/0x18ce [btrfs]
   ? super_setup_bdi_name+0x79/0xd0
   btrfs_mount_root.cold+0x13/0xfa [btrfs]
   ? vfs_parse_fs_string+0x84/0xb0
   ? rcu_read_lock_sched_held+0x52/0x60
   ? kfree+0x2b5/0x310
   legacy_get_tree+0x30/0x50
   vfs_get_tree+0x28/0xc0
   fc_mount+0xe/0x40
   vfs_kern_mount.part.0+0x71/0x90
   btrfs_mount+0x13b/0x3e0 [btrfs]
   ? cred_has_capability+0x7c/0x120
   ? rcu_read_lock_sched_held+0x52/0x60
   ? legacy_get_tree+0x30/0x50
   legacy_get_tree+0x30/0x50
   vfs_get_tree+0x28/0xc0
   do_mount+0x7de/0xb30
   ? memdup_user+0x4e/0x90
   __x64_sys_mount+0x8e/0xd0
   do_syscall_64+0x52/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and
then read the device, which takes the device_list_mutex.  The
device_list_mutex needs to be taken before the chunk_mutex, so this is a
problem.  We only really need the chunk mutex around adding the chunk,
so move the mutex around read_one_chunk.

An argument could be made that we don't even need the chunk_mutex here
as it's during mount, and we are protected by various other locks.
However we already have special rules for ->device_list_mutex, and I'd
rather not have another special case for ->chunk_mutex.

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Apr 2, 2021
commit 4d14c5c upstream

Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  rockchip-linux#9  btrfs_update_inode at ffffffffc0385ba8
 rockchip-linux#10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 rockchip-linux#11  touch_atime at ffffffffa4cf0000
 rockchip-linux#12  generic_file_read_iter at ffffffffa4c1f123
 rockchip-linux#13  new_sync_read at ffffffffa4ccdc8a
 rockchip-linux#14  vfs_read at ffffffffa4cd0849
 rockchip-linux#15  ksys_read at ffffffffa4cd0bd1
 rockchip-linux#16  do_syscall_64 at ffffffffa4a052eb
 rockchip-linux#17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  rockchip-linux#9  writepage_delalloc at ffffffffc03a3c8f
 rockchip-linux#10  __extent_writepage at ffffffffc03a4c01
 rockchip-linux#11  extent_write_cache_pages at ffffffffc03a500b
 rockchip-linux#12  extent_writepages at ffffffffc03a6de2
 rockchip-linux#13  do_writepages at ffffffffa4c277eb
 rockchip-linux#14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 rockchip-linux#15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 rockchip-linux#16  normal_work_helper at ffffffffc03b706c
 rockchip-linux#17  process_one_work at ffffffffa4aba4e4
 rockchip-linux#18  worker_thread at ffffffffa4aba6fd
 rockchip-linux#19  kthread at ffffffffa4ac0a3d
 rockchip-linux#20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Mar 1, 2022
commit 8b59b0a upstream.

arm32 uses software to simulate the instruction replaced
by kprobe. some instructions may be simulated by constructing
assembly functions. therefore, before executing instruction
simulation, it is necessary to construct assembly function
execution environment in C language through binding registers.
after kasan is enabled, the register binding relationship will
be destroyed, resulting in instruction simulation errors and
causing kernel panic.

the kprobe emulate instruction function is distributed in three
files: actions-common.c actions-arm.c actions-thumb.c, so disable
KASAN when compiling these files.

for example, use kprobe insert on cap_capable+20 after kasan
enabled, the cap_capable assembly code is as follows:
<cap_capable>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e1a05000	mov	r5, r0
e280006c	add	r0, r0, rockchip-linux#108    ; 0x6c
e1a04001	mov	r4, r1
e1a06002	mov	r6, r2
e59fa090	ldr	sl, [pc, rockchip-linux#144]  ;
ebfc7bf8	bl	c03aa4b4 <__asan_load4>
e595706c	ldr	r7, [r5, rockchip-linux#108]  ; 0x6c
e2859014	add	r9, r5, rockchip-linux#20
......
The emulate_ldr assembly code after enabling kasan is as follows:
c06f1384 <emulate_ldr>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e282803c	add	r8, r2, rockchip-linux#60     ; 0x3c
e1a05000	mov	r5, r0
e7e37855	ubfx	r7, r5, rockchip-linux#16, #4
e1a00008	mov	r0, r8
e1a09001	mov	r9, r1
e1a04002	mov	r4, r2
ebf35462	bl	c03c6530 <__asan_load4>
e357000f	cmp	r7, rockchip-linux#15
e7e36655	ubfx	r6, r5, rockchip-linux#12, #4
e205a00f	and	sl, r5, rockchip-linux#15
0a000001	beq	c06f13bc <emulate_ldr+0x38>
e0840107	add	r0, r4, r7, lsl #2
ebf3545c	bl	c03c6530 <__asan_load4>
e084010a	add	r0, r4, sl, lsl #2
ebf3545a	bl	c03c6530 <__asan_load4>
e2890010	add	r0, r9, rockchip-linux#16
ebf35458	bl	c03c6530 <__asan_load4>
e5990010	ldr	r0, [r9, rockchip-linux#16]
e12fff30	blx	r0
e356000f	cm	r6, rockchip-linux#15
1a000014	bne	c06f1430 <emulate_ldr+0xac>
e1a06000	mov	r6, r0
e2840040	add	r0, r4, rockchip-linux#64     ; 0x40
......

when running in emulate_ldr to simulate the ldr instruction, panic
occurred, and the log is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
00000090
pgd = ecb46400
[00000090] *pgd=2e0fa003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
PC is at cap_capable+0x14/0xb0
LR is at emulate_ldr+0x50/0xc0
psr: 600d0293 sp : ecd63af8  ip : 00000004  fp : c0a7c30c
r10: 00000000  r9 : c30897f4  r8 : ecd63cd4
r7 : 0000000f  r6 : 0000000a  r5 : e59fa090  r4 : ecd63c98
r3 : c06ae294  r2 : 00000000  r1 : b7611300  r0 : bf4ec008
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 32c5387d  Table: 2d546400  DAC: 55555555
Process bash (pid: 1643, stack limit = 0xecd60190)
(cap_capable) from (kprobe_handler+0x218/0x340)
(kprobe_handler) from (kprobe_trap_handler+0x24/0x48)
(kprobe_trap_handler) from (do_undefinstr+0x13c/0x364)
(do_undefinstr) from (__und_svc_finish+0x0/0x30)
(__und_svc_finish) from (cap_capable+0x18/0xb0)
(cap_capable) from (cap_vm_enough_memory+0x38/0x48)
(cap_vm_enough_memory) from
(security_vm_enough_memory_mm+0x48/0x6c)
(security_vm_enough_memory_mm) from
(copy_process.constprop.5+0x16b4/0x25c8)
(copy_process.constprop.5) from (_do_fork+0xe8/0x55c)
(_do_fork) from (SyS_clone+0x1c/0x24)
(SyS_clone) from (__sys_trace_return+0x0/0x10)
Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)

Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support")
Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: huangshaobo <huangshaobo6@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SquallATF pushed a commit to SquallATF/kernel that referenced this issue May 31, 2022
commit f9c6456 upstream.

Masoud Sharbiani noticed that commit 29ef680 ("memcg, oom: move
out_of_memory back to the charge path") broke memcg OOM called from
__xfs_filemap_fault() path.  It turned out that try_charge() is retrying
forever without making forward progress because mem_cgroup_oom(GFP_NOFS)
cannot invoke the OOM killer due to commit 3da88fb ("mm, oom:
move GFP_NOFS check to out_of_memory").

Allowing forced charge due to being unable to invoke memcg OOM killer will
lead to global OOM situation.  Also, just returning -ENOMEM will be risky
because OOM path is lost and some paths (e.g.  get_user_pages()) will leak
-ENOMEM.  Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be
the only choice we can choose for now.

Until 29ef680, we were able to invoke memcg OOM killer when
GFP_KERNEL reclaim failed [1].  But since 29ef680, we need to
invoke memcg OOM killer when GFP_NOFS reclaim failed [2].  Although in the
past we did invoke memcg OOM killer for GFP_NOFS [3], we might get
pre-mature memcg OOM reports due to this patch.

[1]

 leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
 CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ rockchip-linux#19
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
 Call Trace:
  dump_stack+0x63/0x88
  dump_header+0x67/0x27a
  ? mem_cgroup_scan_tasks+0x91/0xf0
  oom_kill_process+0x210/0x410
  out_of_memory+0x10a/0x2c0
  mem_cgroup_out_of_memory+0x46/0x80
  mem_cgroup_oom_synchronize+0x2e4/0x310
  ? high_work_func+0x20/0x20
  pagefault_out_of_memory+0x31/0x76
  mm_fault_error+0x55/0x115
  ? handle_mm_fault+0xfd/0x220
  __do_page_fault+0x433/0x4e0
  do_page_fault+0x22/0x30
  ? page_fault+0x8/0x30
  page_fault+0x1e/0x30
 RIP: 0033:0x4009f0
 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
 RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206
 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000
 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000
 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d
 R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0
 R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000
 Task in /leaker killed as a result of limit of /leaker
 memory: usage 524288kB, limit 524288kB, failcnt 158965
 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
 kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0
 Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB
 Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child
 Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB
 oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

[2]

 leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0
 CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ rockchip-linux#20
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
 Call Trace:
  dump_stack+0x63/0x88
  dump_header+0x67/0x27a
  ? mem_cgroup_scan_tasks+0x91/0xf0
  oom_kill_process+0x210/0x410
  out_of_memory+0x109/0x2d0
  mem_cgroup_out_of_memory+0x46/0x80
  try_charge+0x58d/0x650
  ? __radix_tree_replace+0x81/0x100
  mem_cgroup_try_charge+0x7a/0x100
  __add_to_page_cache_locked+0x92/0x180
  add_to_page_cache_lru+0x4d/0xf0
  iomap_readpages_actor+0xde/0x1b0
  ? iomap_zero_range_actor+0x1d0/0x1d0
  iomap_apply+0xaf/0x130
  iomap_readpages+0x9f/0x150
  ? iomap_zero_range_actor+0x1d0/0x1d0
  xfs_vm_readpages+0x18/0x20 [xfs]
  read_pages+0x60/0x140
  __do_page_cache_readahead+0x193/0x1b0
  ondemand_readahead+0x16d/0x2c0
  page_cache_async_readahead+0x9a/0xd0
  filemap_fault+0x403/0x620
  ? alloc_set_pte+0x12c/0x540
  ? _cond_resched+0x14/0x30
  __xfs_filemap_fault+0x66/0x180 [xfs]
  xfs_filemap_fault+0x27/0x30 [xfs]
  __do_fault+0x19/0x40
  __handle_mm_fault+0x8e8/0xb60
  handle_mm_fault+0xfd/0x220
  __do_page_fault+0x238/0x4e0
  do_page_fault+0x22/0x30
  ? page_fault+0x8/0x30
  page_fault+0x1e/0x30
 RIP: 0033:0x4009f0
 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
 RSP: 002b:00007ffda45c9290 EFLAGS: 00010206
 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000
 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000
 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d
 R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0
 R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000
 Task in /leaker killed as a result of limit of /leaker
 memory: usage 524288kB, limit 524288kB, failcnt 7221
 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
 kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0
 Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB
 Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child
 Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB
 oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

[3]

 leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0
 leaker cpuset=/ mems_allowed=0
 CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 FireflyTeam#1
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
 Call Trace:
  [<ffffffffaf364147>] dump_stack+0x19/0x1b
  [<ffffffffaf35eb6a>] dump_header+0x90/0x229
  [<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0
  [<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60
  [<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0
  [<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570
  [<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0
  [<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90
  [<ffffffffaf35d072>] mm_fault_error+0x6a/0x157
  [<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0
  [<ffffffffaf371925>] do_page_fault+0x35/0x90
  [<ffffffffaf36d768>] page_fault+0x28/0x30
 Task in /leaker killed as a result of limit of /leaker
 memory: usage 524288kB, limit 524288kB, failcnt 20628
 memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0
 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
 Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB
 Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child
 Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB

Bisected by Masoud Sharbiani.

Link: http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
Fixes: 3da88fb ("mm, oom: move GFP_NOFS check to out_of_memory") [necessary after 29ef680]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Masoud Sharbiani <msharbiani@apple.com>
Tested-by: Masoud Sharbiani <msharbiani@apple.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Dec 5, 2023
[ Upstream commit a154f5f ]

The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  rockchip-linux#9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 rockchip-linux#10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 rockchip-linux#11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 rockchip-linux#12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 rockchip-linux#13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 rockchip-linux#14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 rockchip-linux#15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 rockchip-linux#16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 rockchip-linux#17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 rockchip-linux#18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 rockchip-linux#19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 rockchip-linux#20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants