Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RK3288 usb_host1 (dwc2) incomplete MJPEG frames #35

Closed
teseo-sw opened this issue Sep 12, 2017 · 3 comments
Closed

RK3288 usb_host1 (dwc2) incomplete MJPEG frames #35

teseo-sw opened this issue Sep 12, 2017 · 3 comments

Comments

@teseo-sw
Copy link

Hi,
has anyone ever get into an incomplete MJPG frame error streaming a UVC camera connected on RK3288 host#1 (dw)? connecting the same camera on host#0 (ehci) it work without any error. This happens only with format equal or grater than 1920x1080 so I suppose could be a bug in dwc2 kernel driver. Is it possible?

I check it using a command like

v4l2-ctl --verbose -d /dev/video2 --stream-mmap=2 --stream-to=./stream.jpg --stream-count=10 --stream-poll

I do not see any error on dmseg log but I obtain partial MJPG frames.
thank you
Andrea

@wzyy2
Copy link
Contributor

wzyy2 commented Sep 13, 2017

Could you try this patch?

 drivers/phy/phy-rockchip-usb.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/phy/phy-rockchip-usb.c b/drivers/phy/phy-rockchip-usb.c
index 39f419d..c8b6ab3 100644
--- a/drivers/phy/phy-rockchip-usb.c
+++ b/drivers/phy/phy-rockchip-usb.c
@@ -1,5 +1,5 @@
 /*
- * Rockchip usb PHY driver
+ * Rockchip usb PHY driver
  *
  * Copyright (C) 2014 Yunzhi Li <lyz@rock-chips.com>
  * Copyright (C) 2014 ROCKCHIP, Inc.
@@ -52,11 +52,14 @@ struct rockchip_usb_phys {
 	const char *pll_name;
 };
 
+struct rockchip_usb_phy_base;
+
 struct rockchip_usb_phy_pdata {
 	struct rockchip_usb_phys *phys;
 	unsigned int phy_pw_on;
 	unsigned int phy_pw_off;
 	bool siddq_ctl;
+	int (*phy_tuning)(struct rockchip_usb_phy_base *);
 };
 
 struct rockchip_usb_phy_base {
@@ -281,6 +284,16 @@ err_clk:
 	return err;
 }
 
+static int rk3288_usb_phy_tuning(struct rockchip_usb_phy_base *phy_base)
+{
+	unsigned int uoc1_word_if_control = 0x08000000;
+	int ret = 0;
+
+	ret |= regmap_write(phy_base->reg_base, 0x0334, uoc1_word_if_control);
+
+	return ret;
+}
+
 static const struct rockchip_usb_phy_pdata rk3066a_pdata = {
 	.phys = (struct rockchip_usb_phys[]){
 		{ .reg = 0x17c, .pll_name = "sclk_otgphy0_480m" },
@@ -313,6 +326,7 @@ static const struct rockchip_usb_phy_pdata rk3288_pdata = {
 	.phy_pw_on  = SIDDQ_WRITE_ENA | SIDDQ_OFF,
 	.phy_pw_off = SIDDQ_WRITE_ENA | SIDDQ_ON,
 	.siddq_ctl  = true,
+	.phy_tuning = rk3288_usb_phy_tuning,
 };
 
 static const struct rockchip_usb_phy_pdata rk336x_pdata = {
@@ -366,6 +380,12 @@ static int rockchip_usb_phy_probe(struct platform_device *pdev)
 		return PTR_ERR(phy_base->reg_base);
 	}
 
+	if (phy_base->pdata->phy_tuning) {
+		err = phy_base->pdata->phy_tuning(phy_base);
+		if (err)
+			dev_err(dev, "phy tuning fail\n");
+	}
+
 	/* Request the vbus_drv GPIO asserted */
 	phy_base->vbus_drv_gpio =
 		devm_gpiod_get_optional(dev, "vbus_drv", GPIOD_OUT_HIGH);
-- 
2.7.4

@teseo-sw
Copy link
Author

I have tried it, but nothing seems to change about my issue

@teseo-sw
Copy link
Author

Could someone tell me what could cause that USB problem? Is it something wrong with usb_host1 or with dwc2 usb driver? I need to delve into it cause I need to use both RK3288 usb hosts.

Could it be something related to USB bandwidth? I suppose that cause if I reduce image size this problem disappears or, at least, it becomes less frequent. Could someone give me some hints?

best regards

Kwiboo referenced this issue in Kwiboo/linux-rockchip Dec 15, 2018
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 rockchip-linux#38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 rockchip-linux#39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 rockchip-linux#40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 rockchip-linux#41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 rockchip-linux#42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 rockchip-linux#43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 rockchip-linux#44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 rockchip-linux#45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 rockchip-linux#46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 rockchip-linux#47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 rockchip-linux#48 [9a06f388]  536 bytes  wb_workfn at a28228
 rockchip-linux#49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 rockchip-linux#50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 rockchip-linux#51 [9a06fe40]  104 bytes  kthread at 26545a
 rockchip-linux#52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue May 1, 2019
PCI devices that are 64-bit DMA capable should set the coherent
DMA mask as well as the streaming DMA mask. On some architectures,
these are managed separately, and so the coherent DMA mask will be
left at its default value of 32 if it is not set explicitly. This
results in errors such as

     r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
     hwdev DMA mask = 0x00000000ffffffff, dev_addr = 0x00000080fbfff000
     swiotlb: coherent allocation failed for device 0000:02:00.0 size=4096
     CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ rockchip-linux#35
     Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016

on systems without memory that is 32-bit addressable by PCI devices.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue May 9, 2019
relay_open() may return NULL, check the return value to avoid the crash.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffffa01a95c5>] ath_cmn_process_fft+0xd5/0x700 [ath9k_common]
PGD 41cf28067 PUD 41be92067 PMD 0
Oops: 0000 [FireflyTeam#1] SMP
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6+ rockchip-linux#35
Hardware name: Hewlett-Packard h8-1080t/2A86, BIOS 6.15    07/04/2011
task: ffffffff81e0c4c0 task.stack: ffffffff81e00000
RIP: 0010:[<ffffffffa01a95c5>] [<ffffffffa01a95c5>] ath_cmn_process_fft+0xd5/0x700 [ath9k_common]
RSP: 0018:ffff88041f203ca0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 000000000000059f RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffffffff81f0ca98
RBP: ffff88041f203dc8 R08: ffffffffffffffff R09: 00000000000000ff
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81f0ca98 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 000000041b6ec000 CR4: 00000000000006f0
Stack:
0000000000000363 00000000000003f3 00000000000003f3 00000000000001f9
000000000000049a 0000000001252c04 ffff88041f203e44 ffff880417b4bfd0
0000000000000008 ffff88041785b9c0 0000000000000002 ffff88041613dc60

Call Trace:
<IRQ>
[<ffffffffa01b6441>] ath9k_tasklet+0x1b1/0x220 [ath9k]
[<ffffffff8105d8dd>] tasklet_action+0x4d/0xf0
[<ffffffff8105dde2>] __do_softirq+0x92/0x2a0

Reported-by: Devin Tuchsen <devin.tuchsen@gmail.com>
Tested-by: Devin Tuchsen <devin.tuchsen@gmail.com>
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 19, 2019
Sean Wang says:

====================
mediatek: Fix crash caused by reporting inconsistent skb->len to BQL

Changes since v1:
- fix inconsistent enumeration which easily causes the potential bug

The series fixes kernel BUG caused by inconsistent SKB length reported
into BQL. The reason for inconsistent length comes from hardware BUG which
results in different port number carried on the TXD within the lifecycle of
SKB. So patch 2) is proposed for use a software way to track which port
the SKB involving instead of hardware way. And patch 1) is given for another
issue I found which causes TXD and SKB inconsistency that is not expected
in the initial logic, so it is also being corrected it in the series.

The log for the kernel BUG caused by the issue is posted as below.

[  120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26!
[  120.837684] Internal error: Oops - BUG: 0 [FireflyTeam#1] SMP ARM
[  120.842778] Modules linked in:
[  120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-191576-gdbcef47 rockchip-linux#35
[  120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree)
[  120.859012] task: c1007480 task.stack: c1000000
[  120.863510] PC is at dql_completed+0x108/0x17c
[  120.867915] LR is at 0x46
[  120.870512] pc : [<c03c19c8>]    lr : [<00000046>]    psr: 80000113
[  120.870512] sp : c1001d58  ip : c1001d80  fp : c1001d7c
[  120.881895] r10: 0000003e  r9 : df6b3400  r8 : 0ed86506
[  120.887075] r7 : 00000001  r6 : 00000001  r5 : 0ed8654c  r4 : df0135d8
[  120.893546] r3 : 00000001  r2 : df016800  r1 : 0000fece  r0 : df6b3480
[  120.900018] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  120.907093] Control: 10c5387d  Table: 9e27806a  DAC: 00000051
[  120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218)
[  120.918744] Stack: (0xc1001d58 to 0xc1002000)

....

121.085331] 1fc0: 00000000 c0a52a28 00000000 c10855d4 c1003c58 c0a52a24 c100885c 8000406a
[  121.093444] 1fe0: 410fc073 00000000 00000000 c1001ff8 8000807c c0a009cc 00000000 00000000
[  121.101575] [<c03c19c8>] (dql_completed) from [<c04cb010>] (mtk_napi_tx+0x1d0/0x37c)
[  121.109263] [<c04cb010>] (mtk_napi_tx) from [<c05e28cc>] (net_rx_action+0x24c/0x3b8)
[  121.116951] [<c05e28cc>] (net_rx_action) from [<c010152c>] (__do_softirq+0xe4/0x35c)
[  121.124638] [<c010152c>] (__do_softirq) from [<c012a624>] (irq_exit+0xe8/0x150)
[  121.131895] [<c012a624>] (irq_exit) from [<c017750c>] (__handle_domain_irq+0x70/0xc4)
[  121.139666] [<c017750c>] (__handle_domain_irq) from [<c0101404>] (gic_handle_irq+0x58/0x9c)
[  121.147953] [<c0101404>] (gic_handle_irq) from [<c010e18c>] (__irq_svc+0x6c/0x90)
[  121.155373] Exception stack(0xc1001ef8 to 0xc1001f40)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 21, 2019
When device open races with device shutdown, we can get the following
oops in scsi_disk_get():

[11863.044351] general protection fault: 0000 [FireflyTeam#1] SMP
[11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop]
[11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W      4.10.0-rc2-xen+ rockchip-linux#35
[11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000
[11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70
[11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202
[11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000
[11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880
[11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001
[11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa
[11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d
[11863.059217] FS:  00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000
[11863.059217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0
[11863.059217] Call Trace:
[11863.059217]  ? disk_get_part+0x22/0x1f0
[11863.059217]  sd_open+0x39/0x130
[11863.059217]  __blkdev_get+0x69/0x430
[11863.059217]  ? bd_acquire+0x7f/0xc0
[11863.059217]  ? bd_acquire+0x96/0xc0
[11863.059217]  ? blkdev_get+0x350/0x350
[11863.059217]  blkdev_get+0x126/0x350
[11863.059217]  ? _raw_spin_unlock+0x2b/0x40
[11863.059217]  ? bd_acquire+0x7f/0xc0
[11863.059217]  ? blkdev_get+0x350/0x350
[11863.059217]  blkdev_open+0x65/0x80
...

As you can see RAX value is already poisoned showing that gendisk we got
is already freed. The problem is that get_gendisk() looks up device
number in ext_devt_idr and then does get_disk() which does kobject_get()
on the disks kobject. However the disk gets removed from ext_devt_idr
only in disk_release() (through blk_free_devt()) at which moment it has
already 0 refcount and is already on its way to be freed. Indeed we've
got a warning from kobject_get() about 0 refcount shortly before the
oops.

We fix the problem by using kobject_get_unless_zero() in get_disk() so
that get_disk() cannot get reference on a disk that is already being
freed.

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Jul 29, 2019
Using the syzkaller kernel fuzzer, Andrey Konovalov generated the
following error in gadgetfs:

> BUG: KASAN: use-after-free in __lock_acquire+0x3069/0x3690
> kernel/locking/lockdep.c:3246
> Read of size 8 at addr ffff88003a2bdaf8 by task kworker/3:1/903
>
> CPU: 3 PID: 903 Comm: kworker/3:1 Not tainted 4.12.0-rc4+ rockchip-linux#35
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: usb_hub_wq hub_event
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x292/0x395 lib/dump_stack.c:52
>  print_address_description+0x78/0x280 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x230/0x340 mm/kasan/report.c:408
>  __asan_report_load8_noabort+0x19/0x20 mm/kasan/report.c:429
>  __lock_acquire+0x3069/0x3690 kernel/locking/lockdep.c:3246
>  lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855
>  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>  _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
>  spin_lock include/linux/spinlock.h:299 [inline]
>  gadgetfs_suspend+0x89/0x130 drivers/usb/gadget/legacy/inode.c:1682
>  set_link_state+0x88e/0xae0 drivers/usb/gadget/udc/dummy_hcd.c:455
>  dummy_hub_control+0xd7e/0x1fb0 drivers/usb/gadget/udc/dummy_hcd.c:2074
>  rh_call_control drivers/usb/core/hcd.c:689 [inline]
>  rh_urb_enqueue drivers/usb/core/hcd.c:846 [inline]
>  usb_hcd_submit_urb+0x92f/0x20b0 drivers/usb/core/hcd.c:1650
>  usb_submit_urb+0x8b2/0x12c0 drivers/usb/core/urb.c:542
>  usb_start_wait_urb+0x148/0x5b0 drivers/usb/core/message.c:56
>  usb_internal_control_msg drivers/usb/core/message.c:100 [inline]
>  usb_control_msg+0x341/0x4d0 drivers/usb/core/message.c:151
>  usb_clear_port_feature+0x74/0xa0 drivers/usb/core/hub.c:412
>  hub_port_disable+0x123/0x510 drivers/usb/core/hub.c:4177
>  hub_port_init+0x1ed/0x2940 drivers/usb/core/hub.c:4648
>  hub_port_connect drivers/usb/core/hub.c:4826 [inline]
>  hub_port_connect_change drivers/usb/core/hub.c:4999 [inline]
>  port_event drivers/usb/core/hub.c:5105 [inline]
>  hub_event+0x1ae1/0x3d40 drivers/usb/core/hub.c:5185
>  process_one_work+0xc08/0x1bd0 kernel/workqueue.c:2097
>  process_scheduled_works kernel/workqueue.c:2157 [inline]
>  worker_thread+0xb2b/0x1860 kernel/workqueue.c:2233
>  kthread+0x363/0x440 kernel/kthread.c:231
>  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:424
>
> Allocated by task 9958:
>  save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:617
>  kmem_cache_alloc_trace+0x87/0x280 mm/slub.c:2745
>  kmalloc include/linux/slab.h:492 [inline]
>  kzalloc include/linux/slab.h:665 [inline]
>  dev_new drivers/usb/gadget/legacy/inode.c:170 [inline]
>  gadgetfs_fill_super+0x24f/0x540 drivers/usb/gadget/legacy/inode.c:1993
>  mount_single+0xf6/0x160 fs/super.c:1192
>  gadgetfs_mount+0x31/0x40 drivers/usb/gadget/legacy/inode.c:2019
>  mount_fs+0x9c/0x2d0 fs/super.c:1223
>  vfs_kern_mount.part.25+0xcb/0x490 fs/namespace.c:976
>  vfs_kern_mount fs/namespace.c:2509 [inline]
>  do_new_mount fs/namespace.c:2512 [inline]
>  do_mount+0x41b/0x2d90 fs/namespace.c:2834
>  SYSC_mount fs/namespace.c:3050 [inline]
>  SyS_mount+0xb0/0x120 fs/namespace.c:3027
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> Freed by task 9960:
>  save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:590
>  slab_free_hook mm/slub.c:1357 [inline]
>  slab_free_freelist_hook mm/slub.c:1379 [inline]
>  slab_free mm/slub.c:2961 [inline]
>  kfree+0xed/0x2b0 mm/slub.c:3882
>  put_dev+0x124/0x160 drivers/usb/gadget/legacy/inode.c:163
>  gadgetfs_kill_sb+0x33/0x60 drivers/usb/gadget/legacy/inode.c:2027
>  deactivate_locked_super+0x8d/0xd0 fs/super.c:309
>  deactivate_super+0x21e/0x310 fs/super.c:340
>  cleanup_mnt+0xb7/0x150 fs/namespace.c:1112
>  __cleanup_mnt+0x1b/0x20 fs/namespace.c:1119
>  task_work_run+0x1a0/0x280 kernel/task_work.c:116
>  exit_task_work include/linux/task_work.h:21 [inline]
>  do_exit+0x18a8/0x2820 kernel/exit.c:878
>  do_group_exit+0x14e/0x420 kernel/exit.c:982
>  get_signal+0x784/0x1780 kernel/signal.c:2318
>  do_signal+0xd7/0x2130 arch/x86/kernel/signal.c:808
>  exit_to_usermode_loop+0x1ac/0x240 arch/x86/entry/common.c:157
>  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>  syscall_return_slowpath+0x3ba/0x410 arch/x86/entry/common.c:263
>  entry_SYSCALL_64_fastpath+0xbc/0xbe
>
> The buggy address belongs to the object at ffff88003a2bdae0
>  which belongs to the cache kmalloc-1024 of size 1024
> The buggy address is located 24 bytes inside of
>  1024-byte region [ffff88003a2bdae0, ffff88003a2bdee0)
> The buggy address belongs to the page:
> page:ffffea0000e8ae00 count:1 mapcount:0 mapping:          (null)
> index:0x0 compound_mapcount: 0
> flags: 0x100000000008100(slab|head)
> raw: 0100000000008100 0000000000000000 0000000000000000 0000000100170017
> raw: ffffea0000ed3020 ffffea0000f5f820 ffff88003e80efc0 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff88003a2bd980: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff88003a2bda00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff88003a2bda80: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb
>                                                                 ^
>  ffff88003a2bdb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88003a2bdb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================

What this means is that the gadgetfs_suspend() routine was trying to
access dev->lock after it had been deallocated.  The root cause is a
race in the dummy_hcd driver; the dummy_udc_stop() routine can race
with the rest of the driver because it contains no locking.  And even
when proper locking is added, it can still race with the
set_link_state() function because that function incorrectly drops the
private spinlock before invoking any gadget driver callbacks.

The result of this race, as seen above, is that set_link_state() can
invoke a callback in gadgetfs even after gadgetfs has been unbound
from dummy_hcd's UDC and its private data structures have been
deallocated.

include/linux/usb/gadget.h documents that the ->reset, ->disconnect,
->suspend, and ->resume callbacks may be invoked in interrupt context.
In general this is necessary, to prevent races with gadget driver
removal.  This patch fixes dummy_hcd to retain the spinlock across
these calls, and it adds a spinlock acquisition to dummy_udc_stop() to
prevent the race.

The net2280 driver makes the same mistake of dropping the private
spinlock for its ->disconnect and ->reset callback invocations.  The
patch fixes it too.

Lastly, since gadgetfs_suspend() may be invoked in interrupt context,
it cannot assume that interrupts are enabled when it runs.  It must
use spin_lock_irqsave() instead of spin_lock_irq().  The patch fixes
that bug as well.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com>
CC: <stable@vger.kernel.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 4, 2019
If madvise(..., MADV_FREE) split a transparent hugepage, it called
put_page() before unlock_page().

This was wrong because put_page() can free the page, e.g. if a
concurrent madvise(..., MADV_DONTNEED) has removed it from the memory
mapping. put_page() then rightfully complained about freeing a locked
page.

Fix this by moving the unlock_page() before put_page().

This bug was found by syzkaller, which encountered the following splat:

    BUG: Bad page state in process syzkaller412798  pfn:1bd800
    page:ffffea0006f60000 count:0 mapcount:0 mapping:          (null) index:0x20a00
    flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
    raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
    raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
    page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
    bad because of flags: 0x1(locked)
    Modules linked in:
    CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ rockchip-linux#35
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:16 [inline]
     dump_stack+0x194/0x257 lib/dump_stack.c:52
     bad_page+0x230/0x2b0 mm/page_alloc.c:565
     free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
     free_pages_check mm/page_alloc.c:952 [inline]
     free_pages_prepare mm/page_alloc.c:1043 [inline]
     free_pcp_prepare mm/page_alloc.c:1068 [inline]
     free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
     __put_single_page mm/swap.c:79 [inline]
     __put_page+0xfb/0x160 mm/swap.c:113
     put_page include/linux/mm.h:814 [inline]
     madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
     walk_pmd_range mm/pagewalk.c:50 [inline]
     walk_pud_range mm/pagewalk.c:108 [inline]
     walk_p4d_range mm/pagewalk.c:134 [inline]
     walk_pgd_range mm/pagewalk.c:160 [inline]
     __walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
     walk_page_range+0x200/0x470 mm/pagewalk.c:326
     madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
     madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
     madvise_dontneed_free mm/madvise.c:555 [inline]
     madvise_vma mm/madvise.c:664 [inline]
     SYSC_madvise mm/madvise.c:832 [inline]
     SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
     entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

    #define _GNU_SOURCE
    #include <pthread.h>
    #include <sys/mman.h>
    #include <unistd.h>

    #define MADV_FREE	8
    #define PAGE_SIZE	4096

    static void *mapping;
    static const size_t mapping_size = 0x1000000;

    static void *madvise_thrproc(void *arg)
    {
        madvise(mapping, mapping_size, (long)arg);
    }

    int main(void)
    {
        pthread_t t[2];

        for (;;) {
            mapping = mmap(NULL, mapping_size, PROT_WRITE,
                           MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

            munmap(mapping + mapping_size / 2, PAGE_SIZE);

            pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
            pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
            pthread_join(t[0], NULL);
            pthread_join(t[1], NULL);
            munmap(mapping, mapping_size);
        }
    }

Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_DEBUG_VM=y are needed.

Google Bug Id: 64696096

Link: http://lkml.kernel.org/r/20170823205235.132061-1-ebiggers3@gmail.com
Fixes: 854e9ed ("mm: support madvise(MADV_FREE)")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[v4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 17, 2019
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty rockchip-linux#35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [FireflyTeam#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty rockchip-linux#36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Aug 17, 2019
syzkaller reported crashes in bpf map creation or map update [1]

Problem is that nr_node_ids is a signed integer,
NUMA_NO_NODE is also an integer, so it is very tempting
to declare numa_node as a signed integer.

This means the typical test to validate a user provided value :

        if (numa_node != NUMA_NO_NODE &&
            (numa_node >= nr_node_ids ||
             !node_online(numa_node)))

must be written :

        if (numa_node != NUMA_NO_NODE &&
            ((unsigned int)numa_node >= nr_node_ids ||
             !node_online(numa_node)))

[1]
kernel BUG at mm/slab.c:3256!
invalid opcode: 0000 [FireflyTeam#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ rockchip-linux#35
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000
RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292
RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096
RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000
RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040
RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040
R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b
FS:  0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0
Call Trace:
 __do_kmalloc_node mm/slab.c:3688 [inline]
 __kmalloc_node+0x33/0x70 mm/slab.c:3696
 kmalloc_node include/linux/slab.h:535 [inline]
 alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740
 htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820
 map_update_elem kernel/bpf/syscall.c:587 [inline]
 SYSC_bpf kernel/bpf/syscall.c:1468 [inline]
 SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x440409
RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409
RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70
R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000
Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638
---[ end trace d745f355da2e33ce ]---
Kernel panic - not syncing: Fatal exception

Fixes: 96eabe7 ("bpf: Allow selecting numa node during map creation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin pushed a commit to free-z4u/roc-rk3328-cc-official that referenced this issue Sep 29, 2019
[ Upstream commit 7602843 ]

Dulicate call of null_del_dev() will trigger null pointer error like below.
The reason is a race condition between nullb_device_power_store() and
nullb_group_drop_item().

  CPU#0                         CPU#1
  ----------------              -----------------
  do_rmdir()
   >configfs_rmdir()
    >client_drop_item()
     >nullb_group_drop_item()
                                nullb_device_power_store()
				>null_del_dev()

      >test_and_clear_bit(NULLB_DEV_FL_UP
       >null_del_dev()
       ^^^^^
       Duplicated null_dev_dev() triger null pointer error

				>clear_bit(NULLB_DEV_FL_UP

The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().

[  698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  698.613608] #PF error: [normal kernel read fault]
[  698.613611] PGD 0 P4D 0
[  698.613619] Oops: 0000 [FireflyTeam#1] SMP PTI
[  698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ rockchip-linux#35
[  698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
[  698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
[  698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
[  698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
[  698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
[  698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
[  698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
[  698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
[  698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
[  698.613680] FS:  00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
[  698.613685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
[  698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  698.613700] Call Trace:
[  698.613712]  nullb_group_drop_item+0x50/0x70 [null_blk]
[  698.613722]  client_drop_item+0x29/0x40
[  698.613728]  configfs_rmdir+0x1ed/0x300
[  698.613738]  vfs_rmdir+0xb2/0x130
[  698.613743]  do_rmdir+0x1c7/0x1e0
[  698.613750]  __x64_sys_rmdir+0x17/0x20
[  698.613759]  do_syscall_64+0x5a/0x110
[  698.613768]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Feb 26, 2020
[ Upstream commit a9d6227 ]

Because 'q->flags' starts as zero, and zero is a valid value, we
aren't able to detect the transition from zero to something else
during "runtime".

The solution is to initialize 'q->flags' with an invalid value, so we
can detect if 'q->flags' was set by the user or not.

To better solidify the behavior, 'flags' handling is moved to a
separate function. The behavior is:
 - 'flags' if unspecified by the user, is assumed to be zero;
 - 'flags' cannot change during "runtime" (i.e. a change() request
 cannot modify it);

With this new function we can remove taprio_flags, which should reduce
the risk of future accidents.

Allowing flags to be changed was causing the following RCU stall:

[ 1730.558249] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1730.558258] rcu: 	  6-...0: (190 ticks this GP) idle=922/0/0x1 softirq=25580/25582 fqs=16250
[ 1730.558264] 		  (detected by 2, t=65002 jiffies, g=33017, q=81)
[ 1730.558269] Sending NMI from CPU 2 to CPUs 6:
[ 1730.559277] NMI backtrace for cpu 6
[ 1730.559277] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G            E     5.5.0-rc6+ rockchip-linux#35
[ 1730.559278] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019
[ 1730.559278] RIP: 0010:__hrtimer_run_queues+0xe2/0x440
[ 1730.559278] Code: 48 8b 43 28 4c 89 ff 48 8b 75 c0 48 89 45 c8 e8 f4 bb 7c 00 0f 1f 44 00 00 65 8b 05 40 31 f0 68 89 c0 48 0f a3 05 3e 5c 25 01 <0f> 82 fc 01 00 00 48 8b 45 c8 48 89 df ff d0 89 45 c8 0f 1f 44 00
[ 1730.559279] RSP: 0018:ffff9970802d8f10 EFLAGS: 00000083
[ 1730.559279] RAX: 0000000000000006 RBX: ffff8b31645bff38 RCX: 0000000000000000
[ 1730.559280] RDX: 0000000000000000 RSI: ffffffff9710f2ec RDI: ffffffff978daf0e
[ 1730.559280] RBP: ffff9970802d8f68 R08: 0000000000000000 R09: 0000000000000000
[ 1730.559280] R10: 0000018336d7944e R11: 0000000000000001 R12: ffff8b316e39f9c0
[ 1730.559281] R13: ffff8b316e39f940 R14: ffff8b316e39f998 R15: ffff8b316e39f7c0
[ 1730.559281] FS:  0000000000000000(0000) GS:ffff8b316e380000(0000) knlGS:0000000000000000
[ 1730.559281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1730.559281] CR2: 00007f1105303760 CR3: 0000000227210005 CR4: 00000000003606e0
[ 1730.559282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1730.559282] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1730.559282] Call Trace:
[ 1730.559282]  <IRQ>
[ 1730.559283]  ? taprio_dequeue_soft+0x2d0/0x2d0 [sch_taprio]
[ 1730.559283]  hrtimer_interrupt+0x104/0x220
[ 1730.559283]  ? irqtime_account_irq+0x34/0xa0
[ 1730.559283]  smp_apic_timer_interrupt+0x6d/0x230
[ 1730.559284]  apic_timer_interrupt+0xf/0x20
[ 1730.559284]  </IRQ>
[ 1730.559284] RIP: 0010:cpu_idle_poll+0x35/0x1a0
[ 1730.559285] Code: 88 82 ff 65 44 8b 25 12 7d 73 68 0f 1f 44 00 00 e8 90 c3 89 ff fb 65 48 8b 1c 25 c0 7e 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 be a8 a8 00 85 c0 75 ed e8 75 48 84 ff
[ 1730.559285] RSP: 0018:ffff997080137ea8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 1730.559285] RAX: 0000000000000001 RBX: ffff8b316bc3c580 RCX: 0000000000000000
[ 1730.559286] RDX: 0000000000000001 RSI: 000000002819aad9 RDI: ffffffff978da730
[ 1730.559286] RBP: ffff997080137ec0 R08: 0000018324a6d387 R09: 0000000000000000
[ 1730.559286] R10: 0000000000000400 R11: 0000000000000001 R12: 0000000000000006
[ 1730.559286] R13: ffff8b316bc3c580 R14: 0000000000000000 R15: 0000000000000000
[ 1730.559287]  ? cpu_idle_poll+0x20/0x1a0
[ 1730.559287]  ? cpu_idle_poll+0x20/0x1a0
[ 1730.559287]  do_idle+0x4d/0x1f0
[ 1730.559287]  ? complete+0x44/0x50
[ 1730.559288]  cpu_startup_entry+0x1b/0x20
[ 1730.559288]  start_secondary+0x142/0x180
[ 1730.559288]  secondary_startup_64+0xb6/0xc0
[ 1776.686313] nvme nvme0: I/O 96 QID 1 timeout, completion polled

Fixes: 4cfd577 ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joern-P pushed a commit to Joern-P/kernel that referenced this issue Sep 6, 2020
Change-Id: I9f2c1b0d8d56a8d3da1774f51a4701212ac6ffff

added IPGRE module to enable: (rockchip-linux#18)

ip tunnel add grx mode gre

Change-Id: I26bda9ece68f9f6be1277da0779076694d5d845c

ayufan: rockchip_linux_defconfig compile ZRAM as module

Change-Id: I2ee48973cb370130cc5d75008a07d5777c69208a

ayufan: rockchip_defconfig: disable CONFIG_SCHED_WALT as it makes system unstable

Change-Id: Id2a150311fcaad9f11127320558dc3caafb250e4

config: add more USB wifi chipsets (rockchip-linux#19)

* Add some more USB wifi chipsets to default config.

* Fix typo in default config.

Added openvswitch kernel module compilation (rockchip-linux#22)

ayufan: defconfig: add RK805 pinctrl

Change-Id: I9144206544db4e86e73a4d553fb8db3aa194c619

ayufan: enable additional realtek devices

Change-Id: Ie3598c47154b648540f161ad4aedd597bc33f83f

config: Add support for modules/features requested by issues rockchip-linux#24 (AUTOFS4), rockchip-linux#54 (DRBD), rockchip-linux#64 (Multiple Routing Tables), rockchip-linux#87 (RTL8188EU), rockchip-linux#107 (iSCSI), rockchip-linux#148 (1-W GPIO) (rockchip-linux#24)

ayufan: rtl8812au: add Edimax 600 USB adapter

Add requested kernel modules/features for issue rockchip-linux#153 and LIRC, PPS GPIO support, remove wifi staging drivers. (rockchip-linux#25)

Fix compile error (rockchip-linux#26)

Multiple definition error due to midgard and bifrost both selected

ayufan: defconfig: add a bunch of kernel modules

Change-Id: I77f5c4809c35b6c74a3bb668487eba93a0a15169

ayufan: rockchip_wlan: revert changes

Change-Id: I6aa0bbc24e2fdfb6da350ed167d59c4eea836c2a

ayufan: defconfig: enable CONFIG_MEMTEST

Change-Id: I82c3d1b2d35fb10172f891571ef600a692ac4e61

ayufan: defconfig: remove unusued kernel configs

Change-Id: I2a4a2e5785dd5796bb7404718898127718b86425

ayufan: defconfig: enable initrd in bzip2/lzma/lzo/lz4

Change-Id: I88ea1f8127d8fdadf90dd4428a51a1582adc2687

ayufan: defconfig: enable PWM FAN

Change-Id: I87a364e95d937c354e06eb41fead43d8d5cb0b4a

external: defconfig: Add support for f2fs and crc32 (rockchip-linux#32)

Add support for f2fs and it's dependency crc32 to rock64.

This patch may or may not need to be included as well due to this bug https://bugs.debian.org/819725    A official patch was added by Debian https://salsa.debian.org/kernel-team/linux/blob/master/debian/patches/bugfix/all/fs-add-module_softdep-declarations-for-hard-coded-cr.patch

external: defconfig: enable kernel modules for RBD and IPVS (rockchip-linux#33)

external: defconfig: Make crc, crc32, f2fs to be compiled into kernel. (rockchip-linux#35)

This changes crc, crc32, and f2fs to be compiled into the kernel instead of as module. This also fixes the issue of crc32-arm64 not being compiled due to a incorrect driver name.  CONFIG_CRYPTO_CRC32_ARM64 became CONFIG_CRYPTO_CRC32_ARM64_CE in newer kernel sources. That's why I had to change that here.

ayufan: defconfig: use CONFIG_HZ=250

Change-Id: I8009db93ee7c5af5e7804a004ab03d0ba97d20e2

CONFIG_SQUASHFS_XZ=y (rockchip-linux#41)

cyberp: defconfig: squashfs xz for snap support

defconfig: enable binfmt-misc (rockchip-linux#36)

ayufan: defconfig: organize with savedefconfig

ayufan: defconfig: support tehuti 10gbps adapter

ayufan: defconfig: enable usb gadget via configfs

ayufan: dts: compile-in audio codecs

Change-Id: I7c70870395a10a0eafff006aabc17faf1d33355e
rkchrome pushed a commit that referenced this issue Oct 10, 2020
[ Upstream commit 96298f6 ]

According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5,
the incoming L2CAP_ConfigReq should be handled during
OPEN state.

The section below shows the btmon trace when running
L2CAP/COS/CFD/BV-12-C before and after this change.

=== Before ===
...
> ACL Data RX: Handle 256 flags 0x02 dlen 12                #22
      L2CAP: Connection Request (0x02) ident 2 len 4
        PSM: 1 (0x0001)
        Source CID: 65
< ACL Data TX: Handle 256 flags 0x00 dlen 16                #23
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 12                #24
      L2CAP: Configure Request (0x04) ident 2 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5      #25
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5      #26
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 16                #27
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00                                            ..
< ACL Data TX: Handle 256 flags 0x00 dlen 18                #28
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
> HCI Event: Number of Completed Packets (0x13) plen 5      #29
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 14                #30
      L2CAP: Configure Response (0x05) ident 2 len 6
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 20                #31
      L2CAP: Configure Request (0x04) ident 3 len 12
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00 91 02 11 11                                ......
< ACL Data TX: Handle 256 flags 0x00 dlen 14                #32
      L2CAP: Command Reject (0x01) ident 3 len 6
        Reason: Invalid CID in request (0x0002)
        Destination CID: 64
        Source CID: 65
> HCI Event: Number of Completed Packets (0x13) plen 5      #33
        Num handles: 1
        Handle: 256
        Count: 1
...
=== After ===
...
> ACL Data RX: Handle 256 flags 0x02 dlen 12               #22
      L2CAP: Connection Request (0x02) ident 2 len 4
        PSM: 1 (0x0001)
        Source CID: 65
< ACL Data TX: Handle 256 flags 0x00 dlen 16               #23
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 12               #24
      L2CAP: Configure Request (0x04) ident 2 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5     #25
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5     #26
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 16               #27
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00                                            ..
< ACL Data TX: Handle 256 flags 0x00 dlen 18               #28
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
> HCI Event: Number of Completed Packets (0x13) plen 5     #29
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 14               #30
      L2CAP: Configure Response (0x05) ident 2 len 6
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 20               #31
      L2CAP: Configure Request (0x04) ident 3 len 12
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00 91 02 11 11                                .....
< ACL Data TX: Handle 256 flags 0x00 dlen 18               #32
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
< ACL Data TX: Handle 256 flags 0x00 dlen 12               #33
      L2CAP: Configure Request (0x04) ident 3 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5     #34
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5     #35
        Num handles: 1
        Handle: 256
        Count: 1
...

Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
rkchrome pushed a commit that referenced this issue Nov 3, 2020
[ Upstream commit 52a3974 ]

Get and put the reference to the ctrl in the nvme_dev_open() and
nvme_dev_release() before and after module get/put for ctrl in char
device file operations.

Introduce char_dev relase function, get/put the controller and module
which allows us to fix the potential Oops which can be easily reproduced
with a passthru ctrl (although the problem also exists with pure user
access):

Entering kdb (current=0xffff8887f8290000, pid 3128) on processor 30 Oops: (null)
due to oops @ 0xffffffffa01019ad
CPU: 30 PID: 3128 Comm: bash Tainted: G        W  OE     5.8.0-rc4nvme-5.9+ #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4
RIP: 0010:nvme_free_ctrl+0x234/0x285 [nvme_core]
Code: 57 10 a0 e8 73 bf 02 e1 ba 3d 11 00 00 48 c7 c6 98 33 10 a0 48 c7 c7 1d 57 10 a0 e8 5b bf 02 e1 8
RSP: 0018:ffffc90001d63de0 EFLAGS: 00010246
RAX: ffffffffa05c0440 RBX: ffff8888119e45a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8888177e9550 RDI: ffff8888119e43b0
RBP: ffff8887d4768000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffc90001d63c90 R12: ffff8888119e43b0
R13: ffff8888119e5108 R14: dead000000000100 R15: ffff8888119e5108
FS:  00007f1ef27b0740(0000) GS:ffff888817600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa05c0470 CR3: 00000007f6bee000 CR4: 00000000003406e0
Call Trace:
 device_release+0x27/0x80
 kobject_put+0x98/0x170
 nvmet_passthru_ctrl_disable+0x4a/0x70 [nvmet]
 nvmet_passthru_enable_store+0x4c/0x90 [nvmet]
 configfs_write_file+0xe6/0x150
 vfs_write+0xba/0x1e0
 ksys_write+0x5f/0xe0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f1ef1eb2840
Code: Bad RIP value.
RSP: 002b:00007fffdbff0eb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1ef1eb2840
RDX: 0000000000000002 RSI: 00007f1ef27d2000 RDI: 0000000000000001
RBP: 00007f1ef27d2000 R08: 000000000000000a R09: 00007f1ef27b0740
R10: 0000000000000001 R11: 0000000000000246 R12: 00007f1ef2186400
R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000

With this patch fix we take the module ref count in nvme_dev_open() and
release that ref count in newly introduced nvme_dev_release().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
friendlyarm pushed a commit to friendlyarm/kernel-rockchip that referenced this issue Nov 17, 2020
[ Upstream commit 52a3974 ]

Get and put the reference to the ctrl in the nvme_dev_open() and
nvme_dev_release() before and after module get/put for ctrl in char
device file operations.

Introduce char_dev relase function, get/put the controller and module
which allows us to fix the potential Oops which can be easily reproduced
with a passthru ctrl (although the problem also exists with pure user
access):

Entering kdb (current=0xffff8887f8290000, pid 3128) on processor 30 Oops: (null)
due to oops @ 0xffffffffa01019ad
CPU: 30 PID: 3128 Comm: bash Tainted: G        W  OE     5.8.0-rc4nvme-5.9+ rockchip-linux#35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4
RIP: 0010:nvme_free_ctrl+0x234/0x285 [nvme_core]
Code: 57 10 a0 e8 73 bf 02 e1 ba 3d 11 00 00 48 c7 c6 98 33 10 a0 48 c7 c7 1d 57 10 a0 e8 5b bf 02 e1 8
RSP: 0018:ffffc90001d63de0 EFLAGS: 00010246
RAX: ffffffffa05c0440 RBX: ffff8888119e45a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8888177e9550 RDI: ffff8888119e43b0
RBP: ffff8887d4768000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffc90001d63c90 R12: ffff8888119e43b0
R13: ffff8888119e5108 R14: dead000000000100 R15: ffff8888119e5108
FS:  00007f1ef27b0740(0000) GS:ffff888817600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa05c0470 CR3: 00000007f6bee000 CR4: 00000000003406e0
Call Trace:
 device_release+0x27/0x80
 kobject_put+0x98/0x170
 nvmet_passthru_ctrl_disable+0x4a/0x70 [nvmet]
 nvmet_passthru_enable_store+0x4c/0x90 [nvmet]
 configfs_write_file+0xe6/0x150
 vfs_write+0xba/0x1e0
 ksys_write+0x5f/0xe0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f1ef1eb2840
Code: Bad RIP value.
RSP: 002b:00007fffdbff0eb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1ef1eb2840
RDX: 0000000000000002 RSI: 00007f1ef27d2000 RDI: 0000000000000001
RBP: 00007f1ef27d2000 R08: 000000000000000a R09: 00007f1ef27b0740
R10: 0000000000000001 R11: 0000000000000246 R12: 00007f1ef2186400
R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000

With this patch fix we take the module ref count in nvme_dev_open() and
release that ref count in newly introduced nvme_dev_release().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants