Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinctrl: RK3288 different GPIO pin numbers #15

Closed
mcerveny opened this issue May 20, 2017 · 2 comments
Closed

pinctrl: RK3288 different GPIO pin numbers #15

mcerveny opened this issue May 20, 2017 · 2 comments

Comments

@mcerveny
Copy link

mcerveny commented May 20, 2017

Hello.

This kernel is using different pin numbers (better) for GPIO (due to commit 1d9964a) than other kernels (TinkerOS, upstream...). Problem is here "PIN_BANK_IOMUX_FLAGS(0, 32, "gpio0",..." but in other kernels here or here "PIN_BANK_IOMUX_FLAGS(0, 24, "gpio0"...).
This kernel: GPIO5_B2 = 5*32 + ("B") 1*8 + 2 = gpio170
Other kernels: GPIO5_B2 = 5*32 + ("B") 1*8 + 2 - 8 = gpio162 (subtract 8 for >gpio24)

This leads to incompatibility of "/sys/class/gpio/*" and libraries for TinkerBoard GPIO_API_for_C and GPIO_API_for_Python.

How to resolve this ?
Will this commit be pushed to upstream + TinkerOS ?

Thanks, Martin

ref: http://tinkerboarding.co.uk/forum/thread-306-post-916.html#pid916

@wzyy2
Copy link
Contributor

wzyy2 commented May 22, 2017

It seems GPIO0_D0 ~ GPIO0_D7 are in TRM, not sure why they don't add it.

@mmind
304f077#diff-3635f39427067fc9fdb5c98648e2e1a5R1813

wzyy2 pushed a commit that referenced this issue May 24, 2017
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
wzyy2 pushed a commit that referenced this issue May 24, 2017
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
--
 fs/xfs/xfs_bmap_util.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
@wzyy2
Copy link
Contributor

wzyy2 commented Jun 15, 2017

It's a mistake and it will be deleted.

@wzyy2 wzyy2 closed this as completed Jun 15, 2017
wzyy2 pushed a commit that referenced this issue Sep 22, 2017
commit 2474623 upstream.

When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
instead of the expected SIGSEGV signal.

This example shows how the kernel faults:
do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000

The vma->vm_end value is the first address which does not belong to the vma, so
adjust the check to include vma->vm_end to the range for which to send the
SIGSEGV signal.

This patch unbreaks building the debian libsigsegv package.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
wzyy2 pushed a commit that referenced this issue Jan 30, 2018
[ Upstream commit 293d264 ]

drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
On PowerNV platform cpu_present could be less than cpu_possible in cases
where firmware detects the cpu, but it is not available to the OS.  When
CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
we skip creating cpu_device.

This breaks cpuidle on powernv where register_cpu() is not called for
cpus in cpu_possible_mask that cannot be hot-added at runtime.

Trying cpuidle_register_device() on cpu without cpu_device will cause
crash like this:

cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
   msr: 9000000000009033
   dar: 6000000060000000
  current = 0xc000000ff1480000
  paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/8
Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4
(Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
enter ? for help
[link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
[c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
[c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
[c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
[c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
[c000000ff15038c0] c000000000619694 printk+0x38/0x4c
[c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
[c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
[c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
[c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
[c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
[c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
[c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
[c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
[c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
[c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
[c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

This patch fixes the bug by passing correct cpumask from
powernv-cpuidle driver.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
[ rjw: Comment massage ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rkchrome pushed a commit that referenced this issue Jul 19, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rkchrome pushed a commit that referenced this issue Sep 26, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Feb 2, 2019
Don't free the object until the file handle has been closed.  Fixes
use-after-free bug which occurs when I disconnect my DVB-S received
while VDR is running.

This is a crash dump of such a use-after-free:

    general protection fault: 0000 [FireflyTeam#1] SMP
    CPU: 0 PID: 2541 Comm: CI adapter on d Not tainted 4.7.0-rc1-hosting+ rockchip-linux#49
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff880027d7ce00 ti: ffff88003d8f8000 task.ti: ffff88003d8f8000
    RIP: 0010:[<ffffffff812f3d1f>]  [<ffffffff812f3d1f>] dvb_ca_en50221_io_read_condition.isra.7+0x6f/0x150
    RSP: 0018:ffff88003d8fba98  EFLAGS: 00010206
    RAX: 0000000059534255 RBX: 000000753d470f90 RCX: ffff88003c74d181
    RDX: 00000001bea04ba9 RSI: ffff88003d8fbaf4 RDI: 3a3030a56d763fc0
    RBP: ffff88003d8fbae0 R08: ffff88003c74d180 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003c480e00
    R13: 00000000ffffffff R14: 0000000059534255 R15: 0000000000000000
    FS:  00007fb4209b4700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f06445f4078 CR3: 000000003c55b000 CR4: 00000000000006b0
    Stack:
     ffff88003d8fbaf4 000000003c2170c0 0000000000004000 0000000000000000
     ffff88003c480e00 ffff88003d8fbc80 ffff88003c74d180 ffff88003d8fbb8c
     0000000000000000 ffff88003d8fbb10 ffffffff812f3e37 ffff88003d8fbb00
    Call Trace:
     [<ffffffff812f3e37>] dvb_ca_en50221_io_poll+0x37/0xa0
     [<ffffffff8113109b>] do_sys_poll+0x2db/0x520

This is a backtrace of the kernel attempting to lock a freed mutex:

    #0  0xffffffff81083d40 in rep_nop () at ./arch/x86/include/asm/processor.h:569
    FireflyTeam#1  cpu_relax () at ./arch/x86/include/asm/processor.h:574
    FireflyTeam#2  virt_spin_lock (lock=<optimized out>) at ./arch/x86/include/asm/qspinlock.h:57
    FireflyTeam#3  native_queued_spin_lock_slowpath (lock=0xffff88003c480e90, val=761492029) at kernel/locking/qspinlock.c:304
    FireflyTeam#4  0xffffffff810d1a06 in pv_queued_spin_lock_slowpath (val=<optimized out>, lock=<optimized out>) at ./arch/x86/include/asm/paravirt.h:669
    FireflyTeam#5  queued_spin_lock_slowpath (val=<optimized out>, lock=<optimized out>) at ./arch/x86/include/asm/qspinlock.h:28
    FireflyTeam#6  queued_spin_lock (lock=<optimized out>) at include/asm-generic/qspinlock.h:107
    FireflyTeam#7  __mutex_lock_common (use_ww_ctx=<optimized out>, ww_ctx=<optimized out>, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>,
        state=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:526
    FireflyTeam#8  mutex_lock_interruptible_nested (lock=0xffff88003c480e88, subclass=<optimized out>) at kernel/locking/mutex.c:647
    FireflyTeam#9  0xffffffff812f49fe in dvb_ca_en50221_io_do_ioctl (file=<optimized out>, cmd=761492029, parg=0x1 <irq_stack_union+1>)
        at drivers/media/dvb-core/dvb_ca_en50221.c:1210
    FireflyTeam#10 0xffffffff812ee660 in dvb_usercopy (file=<optimized out>, cmd=761492029, arg=<optimized out>, func=<optimized out>) at drivers/media/dvb-core/dvbdev.c:883
    FireflyTeam#11 0xffffffff812f3410 in dvb_ca_en50221_io_ioctl (file=<optimized out>, cmd=<optimized out>, arg=<optimized out>) at drivers/media/dvb-core/dvb_ca_en50221.c:1284
    FireflyTeam#12 0xffffffff8112eddd in vfs_ioctl (arg=<optimized out>, cmd=<optimized out>, filp=<optimized out>) at fs/ioctl.c:43
    FireflyTeam#13 do_vfs_ioctl (filp=0xffff88003c480e90, fd=<optimized out>, cmd=<optimized out>, arg=<optimized out>) at fs/ioctl.c:674
    FireflyTeam#14 0xffffffff8112f30c in SYSC_ioctl (arg=<optimized out>, cmd=<optimized out>, fd=<optimized out>) at fs/ioctl.c:689
    FireflyTeam#15 SyS_ioctl (fd=6, cmd=2148298626, arg=140734533693696) at fs/ioctl.c:680
    FireflyTeam#16 0xffffffff8103feb2 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:207

Signed-off-by: Max Kellermann <max@duempel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Feb 24, 2019
Done, because line6_stream_stop() locks and calls line6_unlink_audio_urbs(),
which in turn invokes audio_out_callback(), which tries to lock 2nd time.

Fixes:

=============================================
[ INFO: possible recursive locking detected ]
4.4.15+ FireflyTeam#15 Not tainted
---------------------------------------------
mplayer/3591 is trying to acquire lock:
 (&(&line6pcm->out.lock)->rlock){-.-...}, at: [<bfa27655>] audio_out_callback+0x70/0x110 [snd_usb_line6]

but task is already holding lock:
 (&(&line6pcm->out.lock)->rlock){-.-...}, at: [<bfa26aad>] line6_stream_stop+0x24/0x5c [snd_usb_line6]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&line6pcm->out.lock)->rlock);
  lock(&(&line6pcm->out.lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by mplayer/3591:
 #0:  (snd_pcm_link_rwlock){.-.-..}, at: [<bf8d49a7>] snd_pcm_stream_lock+0x1e/0x40 [snd_pcm]
 FireflyTeam#1:  (&(&substream->self_group.lock)->rlock){-.-...}, at: [<bf8d49af>] snd_pcm_stream_lock+0x26/0x40 [snd_pcm]
 FireflyTeam#2:  (&(&line6pcm->out.lock)->rlock){-.-...}, at: [<bfa26aad>] line6_stream_stop+0x24/0x5c [snd_usb_line6]

stack backtrace:
CPU: 0 PID: 3591 Comm: mplayer Not tainted 4.4.15+ FireflyTeam#15
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c0015d85>] (unwind_backtrace) from [<c001253d>] (show_stack+0x11/0x14)
[<c001253d>] (show_stack) from [<c02f1bdf>] (dump_stack+0x8b/0xac)
[<c02f1bdf>] (dump_stack) from [<c0076f43>] (__lock_acquire+0xc8b/0x1780)
[<c0076f43>] (__lock_acquire) from [<c007810d>] (lock_acquire+0x99/0x1c0)
[<c007810d>] (lock_acquire) from [<c06171e7>] (_raw_spin_lock_irqsave+0x3f/0x4c)
[<c06171e7>] (_raw_spin_lock_irqsave) from [<bfa27655>] (audio_out_callback+0x70/0x110 [snd_usb_line6])
[<bfa27655>] (audio_out_callback [snd_usb_line6]) from [<c04294db>] (__usb_hcd_giveback_urb+0x53/0xd0)
[<c04294db>] (__usb_hcd_giveback_urb) from [<c046388d>] (musb_giveback+0x3d/0x98)
[<c046388d>] (musb_giveback) from [<c04647f5>] (musb_urb_dequeue+0x6d/0x114)
[<c04647f5>] (musb_urb_dequeue) from [<c042ac11>] (usb_hcd_unlink_urb+0x39/0x98)
[<c042ac11>] (usb_hcd_unlink_urb) from [<bfa26a87>] (line6_unlink_audio_urbs+0x6a/0x6c [snd_usb_line6])
[<bfa26a87>] (line6_unlink_audio_urbs [snd_usb_line6]) from [<bfa26acb>] (line6_stream_stop+0x42/0x5c [snd_usb_line6])
[<bfa26acb>] (line6_stream_stop [snd_usb_line6]) from [<bfa26fe7>] (snd_line6_trigger+0xb6/0xf4 [snd_usb_line6])
[<bfa26fe7>] (snd_line6_trigger [snd_usb_line6]) from [<bf8d47b7>] (snd_pcm_do_stop+0x36/0x38 [snd_pcm])
[<bf8d47b7>] (snd_pcm_do_stop [snd_pcm]) from [<bf8d462f>] (snd_pcm_action_single+0x22/0x40 [snd_pcm])
[<bf8d462f>] (snd_pcm_action_single [snd_pcm]) from [<bf8d46f9>] (snd_pcm_action+0xac/0xb0 [snd_pcm])
[<bf8d46f9>] (snd_pcm_action [snd_pcm]) from [<bf8d4b61>] (snd_pcm_drop+0x38/0x64 [snd_pcm])
[<bf8d4b61>] (snd_pcm_drop [snd_pcm]) from [<bf8d6233>] (snd_pcm_common_ioctl1+0x7fe/0xbe8 [snd_pcm])
[<bf8d6233>] (snd_pcm_common_ioctl1 [snd_pcm]) from [<bf8d6779>] (snd_pcm_playback_ioctl1+0x15c/0x51c [snd_pcm])
[<bf8d6779>] (snd_pcm_playback_ioctl1 [snd_pcm]) from [<bf8d6b59>] (snd_pcm_playback_ioctl+0x20/0x28 [snd_pcm])
[<bf8d6b59>] (snd_pcm_playback_ioctl [snd_pcm]) from [<c016714b>] (do_vfs_ioctl+0x3af/0x5c8)

Fixes: 63e20df ('ALSA: line6: Reorganize PCM stream handling')
Cc: <stable@vger.kernel.org> # v4.0+
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Andrej Krutak <dev@andree.sk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Feb 28, 2019
rsc_lookup steals the passed-in memory to avoid doing an allocation of
its own, so we can't just pass in a pointer to memory that someone else
is using.

If we really want to avoid allocation there then maybe we should
preallocate somwhere, or reference count these handles.

For now we should revert.

On occasion I see this on my server:

kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
kernel: invalid opcode: 0000 [FireflyTeam#1] SMP
kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b FireflyTeam#15
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: events do_cache_clean [sunrpc]
kernel: task: ffff8808541d8000 task.stack: ffff880854344000
kernel: RIP: 0010:[<ffffffff811e7075>]  [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP: 0018:ffff880854347d70  EFLAGS: 00010246
kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
kernel: FS:  0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
kernel: Stack:
kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
kernel: Call Trace:
kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
kernel: RIP  [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP <ffff880854347d70>
kernel: ---[ end trace 3fdec044969def26 ]---

It seems to be most common after a server reboot where a client has been
using a Kerberos mount, and reconnects to continue its workload.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Apr 27, 2019
Function ib_create_qp() was failing to return an error when
rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp()
when trying to dereferece the qp pointer which was actually a negative
errno.

The crash:

crash> log|grep BUG
[  136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
crash> bt
PID: 3736   TASK: ffff8808543215c0  CPU: 2   COMMAND: "kworker/u64:2"
 #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0
 FireflyTeam#1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758
 FireflyTeam#2 [ffff88084d323480] crash_kexec at ffffffff8111682d
 FireflyTeam#3 [ffff88084d3234b0] oops_end at ffffffff81032bd6
 FireflyTeam#4 [ffff88084d3234e0] no_context at ffffffff8106e431
 FireflyTeam#5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610
 FireflyTeam#6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4
 FireflyTeam#7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc
 FireflyTeam#8 [ffff88084d323620] do_page_fault at ffffffff8106f057
 FireflyTeam#9 [ffff88084d323660] page_fault at ffffffff816e3148
    [exception RIP: ib_create_qp+427]
    RIP: ffffffffa02554fb  RSP: ffff88084d323718  RFLAGS: 00010246
    RAX: 0000000000000004  RBX: fffffffffffffff4  RCX: 000000018020001f
    RDX: ffff880830997fc0  RSI: 0000000000000001  RDI: ffff88085f407200
    RBP: ffff88084d323778   R8: 0000000000000001   R9: ffffea0020bae210
    R10: ffffea0020bae218  R11: 0000000000000001  R12: ffff88084d3237c8
    R13: 00000000fffffff4  R14: ffff880859fa5000  R15: ffff88082eb89800
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
FireflyTeam#10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm]
FireflyTeam#11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma]
FireflyTeam#12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma]
FireflyTeam#13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma]
FireflyTeam#14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma]
FireflyTeam#15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm]
FireflyTeam#16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm]
rockchip-linux#17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm]
rockchip-linux#18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm]
rockchip-linux#19 [ffff88084d323cb0] process_one_work at ffffffff810a1483
rockchip-linux#20 [ffff88084d323d90] worker_thread at ffffffff810a211d
rockchip-linux#21 [ffff88084d323ec0] kthread at ffffffff810a6c5c
rockchip-linux#22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf

Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official May 1, 2019
Since commit c32b5bc ("ARM: dts: at91: Fix USB endpoint nodes"),
atmel_usba_udc fails with:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at include/linux/usb/gadget.h:405
ecm_do_notify+0x188/0x1a0
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0+ FireflyTeam#15
Hardware name: Atmel SAMA5
[<c010ccfc>] (unwind_backtrace) from [<c010a7ec>] (show_stack+0x10/0x14)
[<c010a7ec>] (show_stack) from [<c0115c10>] (__warn+0xe4/0xfc)
[<c0115c10>] (__warn) from [<c0115cd8>] (warn_slowpath_null+0x20/0x28)
[<c0115cd8>] (warn_slowpath_null) from [<c04377ac>] (ecm_do_notify+0x188/0x1a0)
[<c04377ac>] (ecm_do_notify) from [<c04379a4>] (ecm_set_alt+0x74/0x1ac)
[<c04379a4>] (ecm_set_alt) from [<c042f74c>] (composite_setup+0xfc0/0x19f8)
[<c042f74c>] (composite_setup) from [<c04356e8>] (usba_udc_irq+0x8f4/0xd9c)
[<c04356e8>] (usba_udc_irq) from [<c013ec9c>] (handle_irq_event_percpu+0x9c/0x158)
[<c013ec9c>] (handle_irq_event_percpu) from [<c013ed80>] (handle_irq_event+0x28/0x3c)
[<c013ed80>] (handle_irq_event) from [<c01416d4>] (handle_fasteoi_irq+0xa0/0x168)
[<c01416d4>] (handle_fasteoi_irq) from [<c013e3f8>] (generic_handle_irq+0x24/0x34)
[<c013e3f8>] (generic_handle_irq) from [<c013e640>] (__handle_domain_irq+0x54/0xa8)
[<c013e640>] (__handle_domain_irq) from [<c010b214>] (__irq_svc+0x54/0x70)
[<c010b214>] (__irq_svc) from [<c0107eb0>] (arch_cpu_idle+0x38/0x3c)
[<c0107eb0>] (arch_cpu_idle) from [<c0137300>] (cpu_startup_entry+0x9c/0xdc)
[<c0137300>] (cpu_startup_entry) from [<c0900c40>] (start_kernel+0x354/0x360)
[<c0900c40>] (start_kernel) from [<20008078>] (0x20008078)
---[ end trace e7cf9dcebf4815a6 ]---

Fixes: c32b5bc ("ARM: dts: at91: Fix USB endpoint nodes")
Cc: <stable@vger.kernel.org>
Reported-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official May 3, 2019
Prior to commit c0371da ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for an ICMP header,
and the read loop would walk across neighboring stack contents. Since the
iov_iter conversion, bad arguments are noticed, but the returned error is
EFAULT. Returning EINVAL is a clearer error and also solves the problem
prior to v3.19.

This was found using trinity with KASAN on v3.18:

BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffffffbe034b9a08 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G    BU         3.18.0-dirty FireflyTeam#15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[<     inline     >] __dump_stack lib/dump_stack.c:15
[<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[<     inline     >] print_address_description mm/kasan/report.c:147
[<     inline     >] kasan_report_error mm/kasan/report.c:236
[<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[<     inline     >] check_memory_region mm/kasan/kasan.c:264
[<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[<     inline     >] memcpy_from_msg include/linux/skbuff.h:2667
[<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[<     inline     >] __sock_sendmsg_nosec net/socket.c:624
[<     inline     >] __sock_sendmsg net/socket.c:632
[<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
[<     inline     >] SYSC_sendto net/socket.c:1797
[<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761

CVE-2016-8399

Reported-by: Qidan He <i@flanker017.me>
Fixes: c319b4d ("net: ipv4: add IPPROTO_ICMP socket kind")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jun 2, 2019
There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 FireflyTeam#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
FireflyTeam#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
FireflyTeam#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
FireflyTeam#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
FireflyTeam#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
FireflyTeam#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
FireflyTeam#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
FireflyTeam#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
rockchip-linux#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
rockchip-linux#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
rockchip-linux#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
rockchip-linux#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
rockchip-linux#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
rockchip-linux#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
rkchrome pushed a commit that referenced this issue Jun 27, 2019
[ Upstream commit 0c81585 ]

After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining.  At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().

    BUG: unable to handle kernel paging request at ffff888453d00000
    PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
    Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
    RIP: 0010:scan_block+0xb5/0x290
    Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
    RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
    RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
    RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
    R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
    R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
    FS:  00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
    Call Trace:
     scan_gray_list+0x269/0x430
     kmemleak_scan+0x5a8/0x10f0
     kmemleak_write+0x541/0x6ca
     full_proxy_write+0xf8/0x190
     __vfs_write+0xeb/0x980
     vfs_write+0x15a/0x4f0
     ksys_write+0xd2/0x1b0
     __x64_sys_write+0x73/0xb0
     do_syscall_64+0xeb/0xaaa
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f6c0dad73b8
    Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
    RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
    RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
    RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
    R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
    R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
    Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
    CR2: ffff888453d00000
    ---[ end trace ccf646c7456717c5 ]---
    Kernel panic - not syncing: Fatal exception
    Shutting down cpus with NMI
    Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
    0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
rkchrome pushed a commit that referenced this issue Jun 27, 2019
[ Upstream commit 42dfa45 ]

Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x5625e5330a5e in zalloc util/util.h:23
      #2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      #7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x5625e532560d in zalloc util/util.h:23
      #2 0x5625e532566b in xyarray__new util/xyarray.c:10
      #3 0x5625e5330aba in perf_counts__new util/counts.c:15
      #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      #8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jul 18, 2019
Patch series "userfaultfd non-cooperative further update for 4.11 merge
window".

Unfortunately I noticed one relevant bug in userfaultfd_exit while doing
more testing.  I've been doing testing before and this was also tested
by kbuild bot and exercised by the selftest, but this bug never
reproduced before.

I dropped userfaultfd_exit as result.  I dropped it because of
implementation difficulty in receiving signals in __mmput and because I
think -ENOSPC as result from the background UFFDIO_COPY should be enough
already.

Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit
wasn't exercised by the selftest and when I tried to exercise it, after
moving it to a more correct place in __mmput where it would make more
sense and where the vma list is stable, it resulted in the
event_wait_completion in D state.  So then I added the second patch to
be sure even if we call userfaultfd_event_wait_completion too late
during task exit(), we won't risk to generate tasks in D state.  The
same check exists in handle_userfault() for the same reason, except it
makes a difference there, while here is just a robustness check and it's
run under WARN_ON_ONCE.

While looking at the userfaultfd_event_wait_completion() function I
looked back at its callers too while at it and I think it's not ok to
stop executing dup_fctx on the fcs list because we relay on
userfaultfd_event_wait_completion to execute
userfaultfd_ctx_put(fctx->orig) which is paired against
userfaultfd_ctx_get(fctx->orig) in dup_userfault just before
list_add(fcs).  This change only takes care of fctx->orig but this area
also needs further review looking for similar problems in fctx->new.

The only patch that is urgent is the first because it's an use after
free during a SMP race condition that affects all processes if
CONFIG_USERFAULTFD=y.  Very hard to reproduce though and probably
impossible without SLUB poisoning enabled.

This patch (of 3):

I once reproduced this oops with the userfaultfd selftest, it's not
easily reproducible and it requires SLUB poisoning to reproduce.

    general protection fault: 0000 [FireflyTeam#1] SMP
    Modules linked in:
    CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G               ------------ T 3.10.0+ FireflyTeam#15
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000
    RIP: 0010:[<ffffffff81451299>]  [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
    RSP: 0018:ffff8801f833fe80  EFLAGS: 00010202
    RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600
    RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000
    R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700
    FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
      do_exit+0x297/0xd10
      SyS_exit+0x17/0x20
      tracesys+0xdd/0xe2
    Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80
    RIP  [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
     RSP <ffff8801f833fe80>
    ---[ end trace 9fecd6dcb442846a ]---

In the debugger I located the "mm" pointer in the stack and walking
mm->mmap->vm_next through the end shows the vma->vm_next list is fully
consistent and it is null terminated list as expected.  So this has to
be an SMP race condition where userfaultfd_exit was running while the
vma list was being modified by another CPU.

When userfaultfd_exit() run one of the ->vm_next pointers pointed to
SLAB_POISON (RBX is the vma pointer and is 0x6b6b..).

The reason is that it's not running in __mmput but while there are still
other threads running and it's not holding the mmap_sem (it can't as it
has to wait the even to be received by the manager).  So this is an use
after free that was happening for all processes.

One more implementation problem aside from the race condition:
userfaultfd_exit has really to check a flag in mm->flags before walking
the vma or it's going to slowdown the exit() path for regular tasks.

One more implementation problem: at that point signals can't be
delivered so it would also create a task in D state if the manager
doesn't read the event.

The major design issue: it overall looks superfluous as the manager can
check for -ENOSPC in the background transfer:

	if (mmget_not_zero(ctx->mm)) {
[..]
	} else {
		return -ENOSPC;
	}

It's safer to roll it back and re-introduce it later if at all.

[rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT]
  Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jul 18, 2019
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 FireflyTeam#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 FireflyTeam#9 [] tcp_rcv_established at ffffffff81580b64
FireflyTeam#10 [] tcp_v4_do_rcv at ffffffff8158b54a
FireflyTeam#11 [] tcp_v4_rcv at ffffffff8158cd02
FireflyTeam#12 [] ip_local_deliver_finish at ffffffff815668f4
FireflyTeam#13 [] ip_local_deliver at ffffffff81566bd9
FireflyTeam#14 [] ip_rcv_finish at ffffffff8156656d
FireflyTeam#15 [] ip_rcv at ffffffff81566f06
FireflyTeam#16 [] __netif_receive_skb_core at ffffffff8152b3a2
rockchip-linux#17 [] __netif_receive_skb at ffffffff8152b608
rockchip-linux#18 [] netif_receive_skb at ffffffff8152b690
rockchip-linux#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
rockchip-linux#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
rockchip-linux#21 [] net_rx_action at ffffffff8152bac2
rockchip-linux#22 [] __do_softirq at ffffffff81084b4f
rockchip-linux#23 [] call_softirq at ffffffff8164845c
rockchip-linux#24 [] do_softirq at ffffffff81016fc5
rockchip-linux#25 [] irq_exit at ffffffff81084ee5
rockchip-linux#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jul 19, 2019
drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
On PowerNV platform cpu_present could be less than cpu_possible in cases
where firmware detects the cpu, but it is not available to the OS.  When
CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
we skip creating cpu_device.

This breaks cpuidle on powernv where register_cpu() is not called for
cpus in cpu_possible_mask that cannot be hot-added at runtime.

Trying cpuidle_register_device() on cpu without cpu_device will cause
crash like this:

cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
   msr: 9000000000009033
   dar: 6000000060000000
  current = 0xc000000ff1480000
  paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/8
Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4
(Buildroot 2017.02-00004-gc28573e) ) FireflyTeam#15 SMP Fri Mar 17 19:32:02 IST 2017
enter ? for help
[link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
[c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
[c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
[c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
[c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
[c000000ff15038c0] c000000000619694 printk+0x38/0x4c
[c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
[c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
[c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
[c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
[c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
[c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
[c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
[c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
[c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
[c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
[c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

This patch fixes the bug by passing correct cpumask from
powernv-cpuidle driver.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
[ rjw: Comment massage ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jul 22, 2019
Saeed Mahameed says:

====================
Mellanox, mlx5 RDMA net device support

This series provides the lower level mlx5 support of RDMA netdevice
creation API [1] suggested and introduced by Intel's HFI OPA VNIC
netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation.

mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current
IPoIB ULP generic netdevice, providing:
	- mlx5 RSS support.
	- mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..).
	- Full mlx5 HW features transparent to the ULP itself.

The idea here is to reuse and benefit from the already implemented mlx5e netdevice
management and channels API for both etherent and RDMA netdevices, since both IPoIB
and Ethernet netdevices share same common mlx5 HW resources (with some small
exceptions) and share most of the control/data path logic, it is more natural to
have them share the same code.

The differences between IPoIB and Ethernet netdevices can be summarized to:

Steering:
In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet
the traffic is handled by vports and vport steering is managed by e-switch or FW.

For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS
HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to
them (underlay QP will be 0 in case of Ethernet).

RX,TX:
Since IPoIB traffic is different, slightly modified RX and TX handlers are required,
still we do some code reuse in data path via common helper functions.

All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet
and IPoIB netdevices, e.g.
	- Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..)
	- Offloads, checksum, GRO, LRO, TSO, and more.
        - netdevice logic and non Ethernet specific ndos (open/close, etc..)

In order to achieve what we want:

In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored
the mlx5 steering code to accept the underlay QP as a parameter for creating steering
objects and enabled flow steering for IB link.

Then we are going to use the mlx5e netdevice profile, which is already used to separate between
NIC and VF representors netdevices, to create new type of IPoIB netdevice profile.

For that, one small refactoring is required to make mlx5e netdevice profile management
more genetic and agnostic to link type which is done in patch FireflyTeam#4.

In patch FireflyTeam#5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a
skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches,
using mlx5e already existing APIs.

Patch FireflyTeam#6 and FireflyTeam#7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS
resources, same as mlx5e but without vlan and L2 steering tables.

Patch FireflyTeam#8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources
same as mlx5e but with one TC (tc = 0) support.

Patch FireflyTeam#9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels.

Patch FireflyTeam#10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts.

Patch FireflyTeam#11 and FireflyTeam#12, Break down the mlx5e xmit flow into smaller helper function and implement the
mlx5i IPoIB xmit routine.

Patch FireflyTeam#13 and FireflyTeam#14, Have an RX handler per netdevice profile. We already do this before this series
in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make
the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers.

Patch FireflyTeam#15, Small cleanup to avoid e-switch with IPoIB netdev.

In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3]
- which was alread submitted to the rdma mailing list - and this series is required
plus an extra small patch [4] which will connect between both sides and actually enables the offload.

Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable
the feature.

Thanks,
Saeed.

[1] https://patchwork.kernel.org/patch/9676637/

[2] https://lwn.net/Articles/715453/
    https://patchwork.kernel.org/patch/9587815/

[3] https://patchwork.kernel.org/patch/9672069/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jul 26, 2019
The below backtrace can be observed on -rt kernel with
CONFIG_DEBUG_MODULE_RONX (4.9 kernel CONFIG_DEBUG_RODATA) option enabled:

 BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:993
 in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0
 1 lock held by migration/0/14:
  #0:  (tasklist_lock){+.+...}, at: [<c01183e8>] update_sections_early+0x24/0xdc
 irq event stamp: 38
 hardirqs last  enabled at (37): [<c08f6f7c>] _raw_spin_unlock_irq+0x24/0x68
 hardirqs last disabled at (38): [<c01fdfe8>] multi_cpu_stop+0xd8/0x138
 softirqs last  enabled at (0): [<c01303ec>] copy_process.part.5+0x238/0x1b64
 softirqs last disabled at (0): [<  (null)>]   (null)
 Preemption disabled at: [<c01fe244>] cpu_stopper_thread+0x80/0x10c
 CPU: 0 PID: 14 Comm: migration/0 Not tainted 4.9.21-rt16-02220-g49e319c FireflyTeam#15
 Hardware name: Generic DRA74X (Flattened Device Tree)
 [<c0112014>] (unwind_backtrace) from [<c010d370>] (show_stack+0x10/0x14)
 [<c010d370>] (show_stack) from [<c049beb8>] (dump_stack+0xa8/0xd4)
 [<c049beb8>] (dump_stack) from [<c01631a0>] (___might_sleep+0x1bc/0x2ac)
 [<c01631a0>] (___might_sleep) from [<c08f7244>] (__rt_spin_lock+0x1c/0x30)
 [<c08f7244>] (__rt_spin_lock) from [<c08f77a4>] (rt_read_lock+0x54/0x68)
 [<c08f77a4>] (rt_read_lock) from [<c01183e8>] (update_sections_early+0x24/0xdc)
 [<c01183e8>] (update_sections_early) from [<c01184b0>] (__fix_kernmem_perms+0x10/0x1c)
 [<c01184b0>] (__fix_kernmem_perms) from [<c01fe010>] (multi_cpu_stop+0x100/0x138)
 [<c01fe010>] (multi_cpu_stop) from [<c01fe24c>] (cpu_stopper_thread+0x88/0x10c)
 [<c01fe24c>] (cpu_stopper_thread) from [<c015edc4>] (smpboot_thread_fn+0x174/0x31c)
 [<c015edc4>] (smpboot_thread_fn) from [<c015a988>] (kthread+0xf0/0x108)
 [<c015a988>] (kthread) from [<c0108818>] (ret_from_fork+0x14/0x3c)
 Freeing unused kernel memory: 1024K (c0d00000 - c0e00000)

The stop_machine() is called with cpus = NULL from fix_kernmem_perms() and
mark_rodata_ro() which means only one CPU will execute
update_sections_early() while all other CPUs will spin and wait. Hence,
it's safe to remove tasklist locking from update_sections_early(). As part
of this change also mark functions which are local to this module as
static.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Jul 30, 2019
When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
instead of the expected SIGSEGV signal.

This example shows how the kernel faults:
do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
trap FireflyTeam#15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000

The vma->vm_end value is the first address which does not belong to the vma, so
adjust the check to include vma->vm_end to the range for which to send the
SIGSEGV signal.

This patch unbreaks building the debian libsigsegv package.

Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Aug 4, 2019
Currently we are allocating drm_device in rockchip_drm_bind, so if the
suspend/resume code access it when drm is not bound, we would hit this
crash:

[  253.402836] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[  253.402837] pgd = ffffffc06c9b0000
[  253.402841] [00000028] *pgd=0000000000000000, *pud=0000000000000000
[  253.402844] Internal error: Oops: 96000005 [FireflyTeam#1] PREEMPT SMP
[  253.402859] Modules linked in: btusb btrtl btbcm btintel bluetooth ath10k_pci ath10k_core ar10k_ath ar10k_mac80211 cfg80211 ip6table_filter asix usbnet mii
[  253.402864] CPU: 4 PID: 1331 Comm: cat Not tainted 4.4.70 FireflyTeam#15
[  253.402865] Hardware name: Google Scarlet (DT)
[  253.402867] task: ffffffc076c0ce00 ti: ffffffc06c2c8000 task.ti: ffffffc06c2c8000
[  253.402871] PC is at rockchip_drm_sys_suspend+0x20/0x5c

Add sanity checks to prevent that.

Reported-by: Brian Norris <briannorris@chromium.com>
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.kernel.org/patch/9890297/
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Aug 31, 2019
James Morris reported kernel stack corruption bug [1] while
running the SELinux testsuite, and bisected to a recent
commit bffa72c ("net: sk_buff rbnode reorg")

We believe this commit is fine, but exposes an older bug.

SELinux code runs from tcp_filter() and might send an ICMP,
expecting IP options to be found in skb->cb[] using regular IPCB placement.

We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.

This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
similar way we added them for IPv6.

[1]
[  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
[  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
[  339.822505]
[  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test FireflyTeam#15
[  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
[  339.885060] Call Trace:
[  339.896875]  <IRQ>
[  339.908103]  dump_stack+0x63/0x87
[  339.920645]  panic+0xe8/0x248
[  339.932668]  ? ip_push_pending_frames+0x33/0x40
[  339.946328]  ? icmp_send+0x525/0x530
[  339.958861]  ? kfree_skbmem+0x60/0x70
[  339.971431]  __stack_chk_fail+0x1b/0x20
[  339.984049]  icmp_send+0x525/0x530
[  339.996205]  ? netlbl_skbuff_err+0x36/0x40
[  340.008997]  ? selinux_netlbl_err+0x11/0x20
[  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
[  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
[  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
[  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
[  340.074562]  ? tcp_filter+0x2c/0x40
[  340.086400]  ? tcp_v4_rcv+0x820/0xa20
[  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
[  340.111279]  ? ip_local_deliver+0x6f/0xe0
[  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
[  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
[  340.147442]  ? ip_rcv+0x27c/0x3c0
[  340.158668]  ? inet_del_offload+0x40/0x40
[  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
[  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
[  340.195282]  ? __netif_receive_skb+0x18/0x60
[  340.207288]  ? process_backlog+0x95/0x140
[  340.218948]  ? net_rx_action+0x26c/0x3b0
[  340.230416]  ? __do_softirq+0xc9/0x26a
[  340.241625]  ? do_softirq_own_stack+0x2a/0x40
[  340.253368]  </IRQ>
[  340.262673]  ? do_softirq+0x50/0x60
[  340.273450]  ? __local_bh_enable_ip+0x57/0x60
[  340.285045]  ? ip_finish_output2+0x175/0x350
[  340.296403]  ? ip_finish_output+0x127/0x1d0
[  340.307665]  ? nf_hook_slow+0x3c/0xb0
[  340.318230]  ? ip_output+0x72/0xe0
[  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
[  340.340070]  ? ip_local_out+0x35/0x40
[  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
[  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
[  340.372484]  ? __skb_clone+0x2e/0x130
[  340.382633]  ? tcp_transmit_skb+0x558/0xa10
[  340.393262]  ? tcp_connect+0x938/0xad0
[  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
[  340.414206]  ? tcp_v4_connect+0x457/0x4e0
[  340.424471]  ? __inet_stream_connect+0xb3/0x300
[  340.435195]  ? inet_stream_connect+0x3b/0x60
[  340.445607]  ? SYSC_connect+0xd9/0x110
[  340.455455]  ? __audit_syscall_entry+0xaf/0x100
[  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
[  340.476636]  ? __audit_syscall_exit+0x209/0x290
[  340.487151]  ? SyS_connect+0xe/0x10
[  340.496453]  ? do_syscall_64+0x67/0x1b0
[  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 971f10e ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: James Morris <james.l.morris@oracle.com>
Tested-by: James Morris <james.l.morris@oracle.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Sep 20, 2019
When RCU stall warning triggers, it can print out a lot of messages
while holding spinlocks.  If the console device is slow (e.g. an
actual or IPMI serial console), it may end up triggering NMI hard
lockup watchdog like the following.

*** CPU printking while holding RCU spinlock

  PID: 4149739  TASK: ffff881a46baa880  CPU: 13  COMMAND: "CPUThreadPool8"
   #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0
   FireflyTeam#1 [ffff881fff945e58] nmi_handle at ffffffff81020653
   FireflyTeam#2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36
   FireflyTeam#3 [ffff881fff945ed0] do_nmi at ffffffff81020d32
   FireflyTeam#4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: io_serial_in+21]
      RIP: ffffffff81630e55  RSP: ffff881fff943b88  RFLAGS: 00000002
      RAX: 000000000000ca00  RBX: ffffffff8230e188  RCX: 0000000000000000
      RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffff8230e188
      RBP: ffff881fff943bb0   R8: 0000000000000000   R9: ffffffff820cb3c4
      R10: 0000000000000019  R11: 0000000000002000  R12: 00000000000026e1
      R13: 0000000000000020  R14: ffffffff820cd398  R15: 0000000000000035
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  --- <NMI exception stack> ---
   FireflyTeam#5 [ffff881fff943b88] io_serial_in at ffffffff81630e55
   FireflyTeam#6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c
   FireflyTeam#7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc
   FireflyTeam#8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00
   FireflyTeam#9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691
  FireflyTeam#10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2
  FireflyTeam#11 [ffff881fff943c90] console_unlock at ffffffff810dfc55
  FireflyTeam#12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5
  FireflyTeam#13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf
  FireflyTeam#14 [ffff881fff943d60] vprintk_func at ffffffff810e1127
  FireflyTeam#15 [ffff881fff943d70] printk at ffffffff8119a8a4
  FireflyTeam#16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c
  rockchip-linux#17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133
  rockchip-linux#18 [ffff881fff943ee8] update_process_times at ffffffff810f3497
  rockchip-linux#19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037
  rockchip-linux#20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38
  rockchip-linux#21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b

*** CPU triggering the hardlockup watchdog

  PID: 4149709  TASK: ffff88010f88c380  CPU: 26  COMMAND: "CPUThreadPool35"
   #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874
   FireflyTeam#1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc
   FireflyTeam#2 [ffff883fff105af0] __crash_kexec at ffffffff81111795
   FireflyTeam#3 [ffff883fff105b08] panic at ffffffff8119a6ae
   FireflyTeam#4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd
   FireflyTeam#5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866
   FireflyTeam#6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4
   FireflyTeam#7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265
   FireflyTeam#8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f
   FireflyTeam#9 [ffff883fff105e58] nmi_handle at ffffffff81020653
  FireflyTeam#10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94
  FireflyTeam#11 [ffff883fff105ed0] do_nmi at ffffffff81020d32
  FireflyTeam#12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: queued_spin_lock_slowpath+248]
      RIP: ffffffff810da958  RSP: ffff883fff103e68  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: 0000000000000046  RCX: 00000000006d0000
      RDX: ffff883fff49a950  RSI: 0000000000d10101  RDI: ffffffff81e54300
      RBP: ffff883fff103e80   R8: ffff883fff11a950   R9: 0000000000000000
      R10: 000000000e5873ba  R11: 000000000000010f  R12: ffffffff81e54300
      R13: 0000000000000000  R14: ffff88010f88c380  R15: ffffffff81e54300
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
  FireflyTeam#13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958
  FireflyTeam#14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b
  FireflyTeam#15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18
  FireflyTeam#16 [ffff883fff103ee8] update_process_times at ffffffff810f3497
  rockchip-linux#17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037
  rockchip-linux#18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38
  rockchip-linux#19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b
  --- <IRQ stack> ---

Avoid spuriously triggering NMI hardlockup watchdog by touching it
from the print functions.  show_state_filter() shares the same problem
and solution.

v2: Relocate the comment to where it belongs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Sep 21, 2019
when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 FireflyTeam#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 FireflyTeam#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 FireflyTeam#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 FireflyTeam#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 FireflyTeam#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 FireflyTeam#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 FireflyTeam#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 FireflyTeam#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 FireflyTeam#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
FireflyTeam#10 [ffff880078447e28] mount_fs at ffffffff81202d09
FireflyTeam#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
FireflyTeam#12 [ffff880078447ea8] do_mount at ffffffff81220fee
FireflyTeam#13 [ffff880078447f28] sys_mount at ffffffff812218d6
FireflyTeam#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 FireflyTeam#1 [ffff880078417900] schedule at ffffffff8168dc59
 FireflyTeam#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 FireflyTeam#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 FireflyTeam#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 FireflyTeam#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 FireflyTeam#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 FireflyTeam#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 FireflyTeam#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 FireflyTeam#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
FireflyTeam#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
FireflyTeam#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
FireflyTeam#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
FireflyTeam#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
FireflyTeam#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
FireflyTeam#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
FireflyTeam#16 [ffff880078417d00] do_last at ffffffff8120d53d
rockchip-linux#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
rockchip-linux#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
rockchip-linux#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
rockchip-linux#20 [ffff880078417f70] sys_open at ffffffff811fde4e
rockchip-linux#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Sep 21, 2019
Both cm-x300 and pxa3xx-ulpi use the plain clk_{en,dis}able() API.
With the new clocking framework this results in warnings of type:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:714 clk_core_enable+0x90/0x9c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc5-cm-x300+ FireflyTeam#15
Hardware name: CM-X300 module
[<c001007c>] (unwind_backtrace) from [<c000df94>] (show_stack+0x10/0x14)
[<c000df94>] (show_stack) from [<c00199a8>] (__warn+0xd8/0x100)
[<c00199a8>] (__warn) from [<c0019a0c>] (warn_slowpath_null+0x3c/0x48)
[<c0019a0c>] (warn_slowpath_null) from [<c024e8c0>] (clk_core_enable+0x90/0x9c)
[<c024e8c0>] (clk_core_enable) from [<c024ea54>] (clk_core_enable_lock+0x18/0x2c)
[<c024ea54>] (clk_core_enable_lock) from [<c0016994>] (cm_x300_u2d_init+0x4c/0xe8)
[<c0016994>] (cm_x300_u2d_init) from [<c00163e0>] (pxa3xx_u2d_probe+0xe0/0x244)
[<c00163e0>] (pxa3xx_u2d_probe) from [<c0283de0>] (platform_drv_probe+0x38/0x88)
...
------------[ cut here ]------------
and alike...

And finally, it results in:
------------[ cut here ]------------
pxa310_ulpi_poll: ULPI access timed out!
OTG transceiver init failed
------------[ cut here ]------------

It might be that disabling the warning in kernel config would also do
the job, but IMO a better solution would be to switch to
clk_prepare_enable() and clk_disable_unprepare() APIs.

Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Sep 29, 2019
[ Upstream commit 36a2ba0 ]

In a system where, through IORT firmware mappings, the SMMU device is
mapped to a NUMA node that is not online, the kernel bootstrap results
in the following crash:

  Unable to handle kernel paging request at virtual address 0000000000001388
  Mem abort info:
    ESR = 0x96000004
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000004
    CM = 0, WnR = 0
  [0000000000001388] user address but active_mm is swapper
  Internal error: Oops: 96000004 [FireflyTeam#1] SMP
  Modules linked in:
  CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 FireflyTeam#15
  pstate: 80c00009 (Nzcv daif +PAN +UAO)
  pc : __alloc_pages_nodemask+0x13c/0x1068
  lr : __alloc_pages_nodemask+0xdc/0x1068
  ...
  Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
  Call trace:
   __alloc_pages_nodemask+0x13c/0x1068
   new_slab+0xec/0x570
   ___slab_alloc+0x3e0/0x4f8
   __slab_alloc+0x60/0x80
   __kmalloc_node_track_caller+0x10c/0x478
   devm_kmalloc+0x44/0xb0
   pinctrl_bind_pins+0x4c/0x188
   really_probe+0x78/0x2b8
   driver_probe_device+0x64/0x110
   device_driver_attach+0x74/0x98
   __driver_attach+0x9c/0xe8
   bus_for_each_dev+0x84/0xd8
   driver_attach+0x30/0x40
   bus_add_driver+0x170/0x218
   driver_register+0x64/0x118
   __platform_driver_register+0x54/0x60
   arm_smmu_driver_init+0x24/0x2c
   do_one_initcall+0xbc/0x328
   kernel_init_freeable+0x304/0x3ac
   kernel_init+0x18/0x110
   ret_from_fork+0x10/0x1c
  Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
  ---[ end trace dfeaed4c373a32da ]--

Change the dev_set_proximity() hook prototype so that it returns a
value and make it return failure if the PXM->NUMA-node mapping
corresponds to an offline node, fixing the crash.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/linux-arm-kernel/20190315021940.86905-1-wangkefeng.wang@huawei.com/
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Sep 29, 2019
commit 30d4057 upstream.

[BUG]
When a fs has orphan reloc tree along with unfinished balance:
  ...
        item 16 key (TREE_RELOC ROOT_ITEM FS_TREE) itemoff 12090 itemsize 439
                generation 12 root_dirid 256 bytenr 300400640 level 1 refs 0 <<<
                lastsnap 8 byte_limit 0 bytes_used 1359872 flags 0x0(none)
                uuid 7c48d938-33a3-4aae-ab19-6e5c9d406e46
        item 17 key (BALANCE TEMPORARY_ITEM 0) itemoff 11642 itemsize 448
                temporary item objectid BALANCE offset 0
                balance status flags 14

Then at mount time, we can hit the following kernel BUG_ON():
  BTRFS info (device dm-3): relocating block group 298844160 flags metadata|dup
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/relocation.c:1413!
  invalid opcode: 0000 [FireflyTeam#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 897 Comm: btrfs-balance Tainted: G           O      5.2.0-rc1-custom FireflyTeam#15
  RIP: 0010:create_reloc_root+0x1eb/0x200 [btrfs]
  Call Trace:
   btrfs_init_reloc_root+0x96/0xb0 [btrfs]
   record_root_in_trans+0xb2/0xe0 [btrfs]
   btrfs_record_root_in_trans+0x55/0x70 [btrfs]
   select_reloc_root+0x7e/0x230 [btrfs]
   do_relocation+0xc4/0x620 [btrfs]
   relocate_tree_blocks+0x592/0x6a0 [btrfs]
   relocate_block_group+0x47b/0x5d0 [btrfs]
   btrfs_relocate_block_group+0x183/0x2f0 [btrfs]
   btrfs_relocate_chunk+0x4e/0xe0 [btrfs]
   btrfs_balance+0x864/0xfa0 [btrfs]
   balance_kthread+0x3b/0x50 [btrfs]
   kthread+0x123/0x140
   ret_from_fork+0x27/0x50

[CAUSE]
In btrfs, reloc trees are used to record swapped tree blocks during
balance.
Reloc tree either get merged (replace old tree blocks of its parent
subvolume) in next transaction if its ref is 1 (fresh).
Or is already merged and will be cleaned up if its ref is 0 (orphan).

After commit d2311e6 ("btrfs: relocation: Delay reloc tree deletion
after merge_reloc_roots"), reloc tree cleanup is delayed until one block
group is balanced.

Since fresh reloc roots are recorded during merge, as long as there
is no power loss, those orphan reloc roots converted from fresh ones are
handled without problem.

However when power loss happens, orphan reloc roots can be recorded
on-disk, thus at next mount time, we will have orphan reloc roots from
on-disk data directly, and ignored by clean_dirty_subvols() routine.

Then when background balance starts to balance another block group, and
needs to create new reloc root for the same root, btrfs_insert_item()
returns -EEXIST, and trigger that BUG_ON().

[FIX]
For orphan reloc roots, also queue them to rc->dirty_subvol_roots, so
all reloc roots no matter orphan or not, can be cleaned up properly and
avoid above BUG_ON().

And to cooperate with above change, clean_dirty_subvols() will check if
the queued root is a reloc root or a subvol root.
For a subvol root, do the old work, and for a orphan reloc root, clean it
up.

Fixes: d2311e6 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.1
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Sep 29, 2019
commit f01098c upstream.

Just like the case of commit 8b05a3a ("tracing/kprobes: Fix NULL
pointer dereference in trace_kprobe_create()"), writing an incorrectly
formatted string to uprobe_events can trigger NULL pointer dereference.

Reporeducer:

  # echo r > /sys/kernel/debug/tracing/uprobe_events

dmesg:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0
  Oops: 0000 [FireflyTeam#1] PREEMPT SMP PTI
  CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ FireflyTeam#15
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  RIP: 0010:strchr+0x0/0x30
  Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c
  RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293

  RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625
  RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000
  RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000
  FS:  00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0
  Call Trace:
   trace_uprobe_create+0xe6/0xb10
   ? __kmalloc_track_caller+0xe6/0x1c0
   ? __kmalloc+0xf0/0x1d0
   ? trace_uprobe_create+0xb10/0xb10
   create_or_delete_trace_uprobe+0x35/0x90
   ? trace_uprobe_create+0xb10/0xb10
   trace_run_command+0x9c/0xb0
   trace_parse_run_command+0xf9/0x1eb
   ? probes_open+0x80/0x80
   __vfs_write+0x43/0x90
   vfs_write+0x14a/0x2a0
   ksys_write+0xa2/0x170
   do_syscall_64+0x7f/0x200
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Link: http://lkml.kernel.org/r/20190614074026.8045-1-devel@etsukata.com

Cc: stable@vger.kernel.org
Fixes: 0597c49 ("tracing/uprobes: Use dyn_event framework for uprobe events")
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin referenced this issue in free-z4u/roc-rk3328-cc-official Oct 5, 2019
Crash dump shows following instructions

crash> bt
PID: 0      TASK: ffffffffbe412480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1
 FireflyTeam#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2
 FireflyTeam#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c
 FireflyTeam#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a
 FireflyTeam#4 [ffff891ee00039e0] no_context at ffffffffbd074643
 FireflyTeam#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e
 FireflyTeam#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64
 FireflyTeam#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a
 FireflyTeam#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8
 FireflyTeam#9 [ffff891ee0003b50] page_fault at ffffffffbda01925
    [exception RIP: qlt_schedule_sess_for_deletion+15]
    RIP: ffffffffc02e526f  RSP: ffff891ee0003c08  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffffffffc0307847
    RDX: 00000000000020e6  RSI: ffff891edbc377c8  RDI: 0000000000000000
    RBP: ffff891ee0003c18   R8: ffffffffc02f0b20   R9: 0000000000000250
    R10: 0000000000000258  R11: 000000000000b780  R12: ffff891ed9b43000
    R13: 00000000000000f0  R14: 0000000000000006  R15: ffff891edbc377c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 FireflyTeam#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx]
 FireflyTeam#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx]
 FireflyTeam#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx]
 FireflyTeam#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx]
 FireflyTeam#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59
 FireflyTeam#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02
 FireflyTeam#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90
 rockchip-linux#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984
 rockchip-linux#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5
 rockchip-linux#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18
 --- <IRQ stack> ---
 rockchip-linux#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e
    [exception RIP: unknown or invalid address]
    RIP: 000000000000001f  RSP: 0000000000000000  RFLAGS: fff3b8c2091ebb3f
    RAX: ffffbba5a0000200  RBX: 0000be8cdfa8f9fa  RCX: 0000000000000018
    RDX: 0000000000000101  RSI: 000000000000015d  RDI: 0000000000000193
    RBP: 0000000000000083   R8: ffffffffbe403e38   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffffffbe56b820  R12: ffff891ee001cf00
    R13: ffffffffbd11c0a4  R14: ffffffffbe403d60  R15: 0000000000000001
    ORIG_RAX: ffff891ee0022ac0  CS: 0000  SS: ffffffffffffffb9
 bt: WARNING: possibly bogus exception frame
 rockchip-linux#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd
 rockchip-linux#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907
 rockchip-linux#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3
 rockchip-linux#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42
 rockchip-linux#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3
 rockchip-linux#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa
 rockchip-linux#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca
 rockchip-linux#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675
 rockchip-linux#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb
 rockchip-linux#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5

Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login")
Cc: <stable@vger.kernel.org> # v4.17+
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fanck0605 pushed a commit to fanck0605/friendlywrt-kernel that referenced this issue Apr 27, 2020
[ Upstream commit 1bc7896 ]

When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   friendlyarm#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   friendlyarm#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   rockchip-linux#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   rockchip-linux#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   rockchip-linux#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  rockchip-linux#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  rockchip-linux#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  rockchip-linux#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  rockchip-linux#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  rockchip-linux#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  rockchip-linux#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  rockchip-linux#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  rockchip-linux#17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  rockchip-linux#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  rockchip-linux#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  rockchip-linux#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  rockchip-linux#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  rockchip-linux#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  rockchip-linux#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/mman.h>
  #include <pthread.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
#     --- a/tools/trace.py
#     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -> friendlyarm#1 (&p->pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&rq->lock);
  [   32.856671]                                lock(&p->pi_lock);
  [   32.857332]                                lock(&rq->lock);
  [   32.857999]   lock(&(&sighand->siglock)->rlock);

  Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
  but it has been held by CPU1 and CPU1 tries to grab &rq->lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fanck0605 pushed a commit to fanck0605/friendlywrt-kernel that referenced this issue May 31, 2020
commit f8f482d upstream.

When the kernel is built with lockdep support and the owl-dma driver is
used, the following message is shown:

[    2.496939] INFO: trying to register non-static key.
[    2.501889] the code is fine but needs lockdep annotation.
[    2.507357] turning off the locking correctness validator.
[    2.512834] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.6.3+ rockchip-linux#15
[    2.519084] Hardware name: Generic DT based system
[    2.523878] Workqueue: events_freezable mmc_rescan
[    2.528681] [<801127f0>] (unwind_backtrace) from [<8010da58>] (show_stack+0x10/0x14)
[    2.536420] [<8010da58>] (show_stack) from [<8080fbe8>] (dump_stack+0xb4/0xe0)
[    2.543645] [<8080fbe8>] (dump_stack) from [<8017efa4>] (register_lock_class+0x6f0/0x718)
[    2.551816] [<8017efa4>] (register_lock_class) from [<8017b7d0>] (__lock_acquire+0x78/0x25f0)
[    2.560330] [<8017b7d0>] (__lock_acquire) from [<8017e5e4>] (lock_acquire+0xd8/0x1f4)
[    2.568159] [<8017e5e4>] (lock_acquire) from [<80831fb0>] (_raw_spin_lock_irqsave+0x3c/0x50)
[    2.576589] [<80831fb0>] (_raw_spin_lock_irqsave) from [<8051b5fc>] (owl_dma_issue_pending+0xbc/0x120)
[    2.585884] [<8051b5fc>] (owl_dma_issue_pending) from [<80668cbc>] (owl_mmc_request+0x1b0/0x390)
[    2.594655] [<80668cbc>] (owl_mmc_request) from [<80650ce0>] (mmc_start_request+0x94/0xbc)
[    2.602906] [<80650ce0>] (mmc_start_request) from [<80650ec0>] (mmc_wait_for_req+0x64/0xd0)
[    2.611245] [<80650ec0>] (mmc_wait_for_req) from [<8065aa10>] (mmc_app_send_scr+0x10c/0x144)
[    2.619669] [<8065aa10>] (mmc_app_send_scr) from [<80659b3c>] (mmc_sd_setup_card+0x4c/0x318)
[    2.628092] [<80659b3c>] (mmc_sd_setup_card) from [<80659f0c>] (mmc_sd_init_card+0x104/0x430)
[    2.636601] [<80659f0c>] (mmc_sd_init_card) from [<8065a3e0>] (mmc_attach_sd+0xcc/0x16c)
[    2.644678] [<8065a3e0>] (mmc_attach_sd) from [<8065301c>] (mmc_rescan+0x3ac/0x40c)
[    2.652332] [<8065301c>] (mmc_rescan) from [<80143244>] (process_one_work+0x2d8/0x780)
[    2.660239] [<80143244>] (process_one_work) from [<80143730>] (worker_thread+0x44/0x598)
[    2.668323] [<80143730>] (worker_thread) from [<8014b5f8>] (kthread+0x148/0x150)
[    2.675708] [<8014b5f8>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
[    2.682912] Exception stack(0xee8fdfb0 to 0xee8fdff8)
[    2.687954] dfa0:                                     00000000 00000000 00000000 00000000
[    2.696118] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.704277] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000

The obvious fix would be to use 'spin_lock_init()' on 'pchan->lock'
before attempting to call 'spin_lock_irqsave()' in 'owl_dma_get_pchan()'.

However, according to Manivannan Sadhasivam, 'pchan->lock' was supposed
to only protect 'pchan->vchan' while 'od->lock' does a similar job in
'owl_dma_terminate_pchan()'.

Therefore, this patch substitutes 'pchan->lock' with 'od->lock' and
removes the 'lock' attribute in 'owl_dma_pchan' struct.

Fixes: 47e2057 ("dmaengine: Add Actions Semi Owl family S900 DMA driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Andreas Färber <afaerber@suse.de>
Link: https://lore.kernel.org/r/c6e6cdaca252b5364bd294093673951036488cf0.1588439073.git.cristian.ciocaltea@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Aug 31, 2020
[ Upstream commit e24c644 ]

I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        rockchip-linux#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        rockchip-linux#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        rockchip-linux#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        rockchip-linux#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        rockchip-linux#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        rockchip-linux#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        rockchip-linux#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        rockchip-linux#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        rockchip-linux#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        rockchip-linux#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        rockchip-linux#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        rockchip-linux#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        rockchip-linux#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        rockchip-linux#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        rockchip-linux#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        rockchip-linux#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        rockchip-linux#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        rockchip-linux#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        rockchip-linux#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        rockchip-linux#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Apr 2, 2021
commit 4d14c5c upstream

Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  rockchip-linux#7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  rockchip-linux#8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  rockchip-linux#9  btrfs_update_inode at ffffffffc0385ba8
 rockchip-linux#10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 rockchip-linux#11  touch_atime at ffffffffa4cf0000
 rockchip-linux#12  generic_file_read_iter at ffffffffa4c1f123
 rockchip-linux#13  new_sync_read at ffffffffa4ccdc8a
 rockchip-linux#14  vfs_read at ffffffffa4cd0849
 rockchip-linux#15  ksys_read at ffffffffa4cd0bd1
 rockchip-linux#16  do_syscall_64 at ffffffffa4a052eb
 rockchip-linux#17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  rockchip-linux#7  cow_file_range at ffffffffc03879c1
  rockchip-linux#8  btrfs_run_delalloc_range at ffffffffc038894c
  rockchip-linux#9  writepage_delalloc at ffffffffc03a3c8f
 rockchip-linux#10  __extent_writepage at ffffffffc03a4c01
 rockchip-linux#11  extent_write_cache_pages at ffffffffc03a500b
 rockchip-linux#12  extent_writepages at ffffffffc03a6de2
 rockchip-linux#13  do_writepages at ffffffffa4c277eb
 rockchip-linux#14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 rockchip-linux#15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 rockchip-linux#16  normal_work_helper at ffffffffc03b706c
 rockchip-linux#17  process_one_work at ffffffffa4aba4e4
 rockchip-linux#18  worker_thread at ffffffffa4aba6fd
 rockchip-linux#19  kthread at ffffffffa4ac0a3d
 rockchip-linux#20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Mar 1, 2022
commit 8b59b0a upstream.

arm32 uses software to simulate the instruction replaced
by kprobe. some instructions may be simulated by constructing
assembly functions. therefore, before executing instruction
simulation, it is necessary to construct assembly function
execution environment in C language through binding registers.
after kasan is enabled, the register binding relationship will
be destroyed, resulting in instruction simulation errors and
causing kernel panic.

the kprobe emulate instruction function is distributed in three
files: actions-common.c actions-arm.c actions-thumb.c, so disable
KASAN when compiling these files.

for example, use kprobe insert on cap_capable+20 after kasan
enabled, the cap_capable assembly code is as follows:
<cap_capable>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e1a05000	mov	r5, r0
e280006c	add	r0, r0, rockchip-linux#108    ; 0x6c
e1a04001	mov	r4, r1
e1a06002	mov	r6, r2
e59fa090	ldr	sl, [pc, rockchip-linux#144]  ;
ebfc7bf8	bl	c03aa4b4 <__asan_load4>
e595706c	ldr	r7, [r5, rockchip-linux#108]  ; 0x6c
e2859014	add	r9, r5, rockchip-linux#20
......
The emulate_ldr assembly code after enabling kasan is as follows:
c06f1384 <emulate_ldr>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e282803c	add	r8, r2, rockchip-linux#60     ; 0x3c
e1a05000	mov	r5, r0
e7e37855	ubfx	r7, r5, rockchip-linux#16, #4
e1a00008	mov	r0, r8
e1a09001	mov	r9, r1
e1a04002	mov	r4, r2
ebf35462	bl	c03c6530 <__asan_load4>
e357000f	cmp	r7, rockchip-linux#15
e7e36655	ubfx	r6, r5, rockchip-linux#12, #4
e205a00f	and	sl, r5, rockchip-linux#15
0a000001	beq	c06f13bc <emulate_ldr+0x38>
e0840107	add	r0, r4, r7, lsl #2
ebf3545c	bl	c03c6530 <__asan_load4>
e084010a	add	r0, r4, sl, lsl #2
ebf3545a	bl	c03c6530 <__asan_load4>
e2890010	add	r0, r9, rockchip-linux#16
ebf35458	bl	c03c6530 <__asan_load4>
e5990010	ldr	r0, [r9, rockchip-linux#16]
e12fff30	blx	r0
e356000f	cm	r6, rockchip-linux#15
1a000014	bne	c06f1430 <emulate_ldr+0xac>
e1a06000	mov	r6, r0
e2840040	add	r0, r4, rockchip-linux#64     ; 0x40
......

when running in emulate_ldr to simulate the ldr instruction, panic
occurred, and the log is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
00000090
pgd = ecb46400
[00000090] *pgd=2e0fa003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
PC is at cap_capable+0x14/0xb0
LR is at emulate_ldr+0x50/0xc0
psr: 600d0293 sp : ecd63af8  ip : 00000004  fp : c0a7c30c
r10: 00000000  r9 : c30897f4  r8 : ecd63cd4
r7 : 0000000f  r6 : 0000000a  r5 : e59fa090  r4 : ecd63c98
r3 : c06ae294  r2 : 00000000  r1 : b7611300  r0 : bf4ec008
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 32c5387d  Table: 2d546400  DAC: 55555555
Process bash (pid: 1643, stack limit = 0xecd60190)
(cap_capable) from (kprobe_handler+0x218/0x340)
(kprobe_handler) from (kprobe_trap_handler+0x24/0x48)
(kprobe_trap_handler) from (do_undefinstr+0x13c/0x364)
(do_undefinstr) from (__und_svc_finish+0x0/0x30)
(__und_svc_finish) from (cap_capable+0x18/0xb0)
(cap_capable) from (cap_vm_enough_memory+0x38/0x48)
(cap_vm_enough_memory) from
(security_vm_enough_memory_mm+0x48/0x6c)
(security_vm_enough_memory_mm) from
(copy_process.constprop.5+0x16b4/0x25c8)
(copy_process.constprop.5) from (_do_fork+0xe8/0x55c)
(_do_fork) from (SyS_clone+0x1c/0x24)
(SyS_clone) from (__sys_trace_return+0x0/0x10)
Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)

Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support")
Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: huangshaobo <huangshaobo6@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Mar 1, 2022
[ Upstream commit f53a2ce ]

As explained in commits:
74b6d7d ("net: dsa: realtek: register the MDIO bus under devres")
5135e96 ("net: dsa: don't allocate the slave_mii_bus using devres")

mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.

The mv88e6xxx is an MDIO device, so the initial set of constraints that
I thought would cause this (I2C or SPI buses which call ->remove on
->shutdown) do not apply. But there is one more which applies here.

If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the Marvell switch driver on shutdown.

systemd-shutdown[1]: Powering off.
mv88e6085 0x0000000008b96000:00 sw_gl0: Link is Down
fsl-mc dpbp.9: Removing from iommu group 7
fsl-mc dpbp.8: Removing from iommu group 7
------------[ cut here ]------------
kernel BUG at drivers/net/phy/mdio_bus.c:677!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00040-gdc05f73788e5 rockchip-linux#15
pc : mdiobus_free+0x44/0x50
lr : devm_mdiobus_free+0x10/0x20
Call trace:
 mdiobus_free+0x44/0x50
 devm_mdiobus_free+0x10/0x20
 devres_release_all+0xa0/0x100
 __device_release_driver+0x190/0x220
 device_release_driver_internal+0xac/0xb0
 device_links_unbind_consumers+0xd4/0x100
 __device_release_driver+0x4c/0x220
 device_release_driver_internal+0xac/0xb0
 device_links_unbind_consumers+0xd4/0x100
 __device_release_driver+0x94/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_device_remove+0x24/0x40
 __fsl_mc_device_remove+0xc/0x20
 device_for_each_child+0x58/0xa0
 dprc_remove+0x90/0xb0
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_bus_remove+0x80/0x100
 fsl_mc_bus_shutdown+0xc/0x1c
 platform_shutdown+0x20/0x30
 device_shutdown+0x154/0x330
 kernel_power_off+0x34/0x6c
 __do_sys_reboot+0x15c/0x250
 __arm64_sys_reboot+0x20/0x30
 invoke_syscall.constprop.0+0x4c/0xe0
 do_el0_svc+0x4c/0x150
 el0_svc+0x24/0xb0
 el0t_64_sync_handler+0xa8/0xb0
 el0t_64_sync+0x178/0x17c

So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.

The Marvell driver already has a good structure for mdiobus removal, so
just plug in mdiobus_free and get rid of devres.

Fixes: ac3a68d ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Rafael Richter <Rafael.Richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Daniel Klauer <daniel.klauer@gin.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Jul 22, 2022
commit 050133e upstream.

commit 0622cab ("bonding: fix 802.3ad aggregator reselection"),
resolve case, when there is several aggregation groups in the same bond.
bond_3ad_unbind_slave will invalidate (clear) aggregator when
__agg_active_ports return zero. So, ad_clear_agg can be executed even, when
num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for,
previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave
will not update slave ports list, because lag_ports==NULL. So, here we
got slave ports, pointing to freed aggregator memory.

Fix with checking actual number of ports in group (as was before
commit 0622cab ("bonding: fix 802.3ad aggregator reselection") ),
before ad_clear_agg().

The KASAN logs are as follows:

[  767.617392] ==================================================================
[  767.630776] BUG: KASAN: use-after-free in bond_3ad_state_machine_handler+0x13dc/0x1470
[  767.638764] Read of size 2 at addr ffff00011ba9d430 by task kworker/u8:7/767
[  767.647361] CPU: 3 PID: 767 Comm: kworker/u8:7 Tainted: G           O 5.15.11 rockchip-linux#15
[  767.655329] Hardware name: DNI AmazonGo1 A7040 board (DT)
[  767.660760] Workqueue: lacp_1 bond_3ad_state_machine_handler
[  767.666468] Call trace:
[  767.668930]  dump_backtrace+0x0/0x2d0
[  767.672625]  show_stack+0x24/0x30
[  767.675965]  dump_stack_lvl+0x68/0x84
[  767.679659]  print_address_description.constprop.0+0x74/0x2b8
[  767.685451]  kasan_report+0x1f0/0x260
[  767.689148]  __asan_load2+0x94/0xd0
[  767.692667]  bond_3ad_state_machine_handler+0x13dc/0x1470

Fixes: 0622cab ("bonding: fix 802.3ad aggregator reselection")
Co-developed-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20220629012914.361-1-yevhen.orlov@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Nov 12, 2022
commit 4bb26f2 upstream.

When inode is created and written to using direct IO, there is nothing
to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
truncated later to say 1 byte and written using normal write, we will
try to store the data as inline data. This confuses the code later
because the inode now has both normal block and inline data allocated
and the confusion manifests for example as:

kernel BUG at fs/ext4/inode.c:2721!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty rockchip-linux#15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:ext4_writepages+0x363d/0x3660
RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293
RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180
RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000
RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b
R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128
R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001
FS:  00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0
Call Trace:
 <TASK>
 do_writepages+0x397/0x640
 filemap_fdatawrite_wbc+0x151/0x1b0
 file_write_and_wait_range+0x1c9/0x2b0
 ext4_sync_file+0x19e/0xa00
 vfs_fsync_range+0x17b/0x190
 ext4_buffered_write_iter+0x488/0x530
 ext4_file_write_iter+0x449/0x1b90
 vfs_write+0xbcd/0xf40
 ksys_write+0x198/0x2c0
 __x64_sys_write+0x7b/0x90
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
 </TASK>

Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing
direct IO write to a file.

Cc: stable@kernel.org
Reported-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Reported-by: syzbot+bd13648a53ed6933ca49@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Tested-by: Tadeusz Struk<tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220727155753.13969-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
FanX-Tek pushed a commit to FanX-Tek/kernel that referenced this issue Nov 21, 2022
Co-authored-by: Catalin Toda <catalinii@yahoo.com>
Cassolette pushed a commit to Cassolette/rk-kernel that referenced this issue Dec 17, 2022
Signed-off-by: Catalin Toda <catalinii@yahoo.com>
Signed-off-by: Stephen Chen <stephen@radxa.com>
FanX-Tek pushed a commit to FanX-Tek/kernel that referenced this issue Jan 3, 2023
RK3128 vop-iommu skip request the irq due to register read issue.
So don't free irq when shut down or belowing error will output

[  102.107589] WARNING: CPU: 3 PID: 1013 at kernel/irq/devres.c:143 devm_free_irq+0x68/0x9c
[  102.115720] Modules linked in:
[  102.118862] CPU: 3 PID: 1013 Comm: init Not tainted 5.10.110 rockchip-linux#15
[  102.124907] Hardware name: Generic DT based system
[  102.129732] Backtrace:
[  102.132230] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.139843]  r7:600f0013 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.145553] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.153171] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.160780]  r7:00000000 r6:00000000 r5:00000009 r4:c017ac34
[  102.166488] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.173497] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.181026]  r9:00000000 r8:00000009 r7:c017ac34 r6:0000008f r5:c0df4ea4 r4:c353c000
[  102.188820] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017ac34>] (devm_free_irq+0x68/0x9c)
[  102.196962]  r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e r4:c353c000
[  102.204759] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.212895]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.217558] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.226385]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.232094] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.240938] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.249684]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.257548]  r4:00000000
[  102.260124] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.268785] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.277004]  r5:c11168ec r4:00000000
[  102.280617] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.288326]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.296121] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.303821] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.308912] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.317141] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.325364] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.330545] ---[ end trace adc766c58fa6634f ]---
[  102.335275] ------------[ cut here ]------------
[  102.339948] WARNING: CPU: 3 PID: 1013 at kernel/irq/manage.c:1756 free_irq+0x26c/0x29c
[  102.347907] Trying to free already-free IRQ 46
[  102.352381] Modules linked in:
[  102.355479] CPU: 3 PID: 1013 Comm: init Tainted: G        W         5.10.110 rockchip-linux#15
[  102.362907] Hardware name: Generic DT based system
[  102.367721] Backtrace:
[  102.370212] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.377824]  r7:600f0093 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.383532] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.391150] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.398759]  r7:c353dd34 r6:00000000 r5:00000009 r4:c017823c
[  102.404466] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.411473] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.419002]  r9:c0df48e7 r8:00000009 r7:c017823c r6:000006dc r5:c0df4854 r4:c353c000
[  102.426799] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017823c>] (free_irq+0x26c/0x29c)
[  102.434676]  r9:600f0013 r8:0000002e r7:c1a17f40 r6:c1978a6c r5:00000000 r4:c1978a00
[  102.442471] [<c0177fd0>] (free_irq) from [<c017ac40>] (devm_free_irq+0x74/0x9c)
[  102.449832]  r10:c197ec54 r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e
[  102.457700]  r4:c353c000
[  102.460278] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.468414]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.473075] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.481901]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.487611] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.496453] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.505200]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.513066]  r4:00000000
[  102.515642] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.524302] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.532521]  r5:c11168ec r4:00000000
[  102.536134] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.543833]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.551626] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.559324] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.564414] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.572641] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.580864] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.585947] ---[ end trace adc766c58fa66350 ]---

Change-Id: Ic0603d4d00528dc6b5ef6d480b15d3c14585dec3
Signed-off-by: Simon Xue <xxm@rock-chips.com>
hejiawencc pushed a commit to LubanCat/kernel that referenced this issue Sep 5, 2023
RK3128 vop-iommu skip request the irq due to register read issue.
So don't free irq when shut down or belowing error will output

[  102.107589] WARNING: CPU: 3 PID: 1013 at kernel/irq/devres.c:143 devm_free_irq+0x68/0x9c
[  102.115720] Modules linked in:
[  102.118862] CPU: 3 PID: 1013 Comm: init Not tainted 5.10.110 rockchip-linux#15
[  102.124907] Hardware name: Generic DT based system
[  102.129732] Backtrace:
[  102.132230] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.139843]  r7:600f0013 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.145553] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.153171] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.160780]  r7:00000000 r6:00000000 r5:00000009 r4:c017ac34
[  102.166488] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.173497] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.181026]  r9:00000000 r8:00000009 r7:c017ac34 r6:0000008f r5:c0df4ea4 r4:c353c000
[  102.188820] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017ac34>] (devm_free_irq+0x68/0x9c)
[  102.196962]  r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e r4:c353c000
[  102.204759] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.212895]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.217558] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.226385]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.232094] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.240938] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.249684]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.257548]  r4:00000000
[  102.260124] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.268785] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.277004]  r5:c11168ec r4:00000000
[  102.280617] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.288326]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.296121] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.303821] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.308912] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.317141] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.325364] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.330545] ---[ end trace adc766c58fa6634f ]---
[  102.335275] ------------[ cut here ]------------
[  102.339948] WARNING: CPU: 3 PID: 1013 at kernel/irq/manage.c:1756 free_irq+0x26c/0x29c
[  102.347907] Trying to free already-free IRQ 46
[  102.352381] Modules linked in:
[  102.355479] CPU: 3 PID: 1013 Comm: init Tainted: G        W         5.10.110 rockchip-linux#15
[  102.362907] Hardware name: Generic DT based system
[  102.367721] Backtrace:
[  102.370212] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.377824]  r7:600f0093 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.383532] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.391150] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.398759]  r7:c353dd34 r6:00000000 r5:00000009 r4:c017823c
[  102.404466] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.411473] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.419002]  r9:c0df48e7 r8:00000009 r7:c017823c r6:000006dc r5:c0df4854 r4:c353c000
[  102.426799] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017823c>] (free_irq+0x26c/0x29c)
[  102.434676]  r9:600f0013 r8:0000002e r7:c1a17f40 r6:c1978a6c r5:00000000 r4:c1978a00
[  102.442471] [<c0177fd0>] (free_irq) from [<c017ac40>] (devm_free_irq+0x74/0x9c)
[  102.449832]  r10:c197ec54 r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e
[  102.457700]  r4:c353c000
[  102.460278] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.468414]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.473075] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.481901]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.487611] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.496453] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.505200]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.513066]  r4:00000000
[  102.515642] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.524302] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.532521]  r5:c11168ec r4:00000000
[  102.536134] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.543833]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.551626] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.559324] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.564414] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.572641] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.580864] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.585947] ---[ end trace adc766c58fa66350 ]---

Change-Id: Ic0603d4d00528dc6b5ef6d480b15d3c14585dec3
Signed-off-by: Simon Xue <xxm@rock-chips.com>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Dec 5, 2023
[ Upstream commit a154f5f ]

The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  rockchip-linux#7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  rockchip-linux#8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  rockchip-linux#9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 rockchip-linux#10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 rockchip-linux#11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 rockchip-linux#12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 rockchip-linux#13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 rockchip-linux#14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 rockchip-linux#15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 rockchip-linux#16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 rockchip-linux#17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 rockchip-linux#18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 rockchip-linux#19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 rockchip-linux#20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants