Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spinlock recursion in sdhci-bcm2708.c #76

Closed
ddv2005 opened this issue Aug 3, 2012 · 4 comments
Closed

Spinlock recursion in sdhci-bcm2708.c #76

ddv2005 opened this issue Aug 3, 2012 · 4 comments

Comments

@ddv2005
Copy link

ddv2005 commented Aug 3, 2012

Hi,

sdhci-bcm2708.c has spinlock recursion in sdhci_bcm2708_dma_complete_irq and sdhci_bcm2708_platdma_reset. Both functions does not need spin_lock_irqsave/spin_unlock_irqrestore because spinlock already locked by main irq handler.

@popcornmix
Copy link
Collaborator

Do you have a suggested patch/pull request?

@ddv2005
Copy link
Author

ddv2005 commented Aug 5, 2012

@alphaonex86
Copy link

Can be linked with my issue:
#52

bmwiedemann pushed a commit to bmwiedemann/raspberrypi-linux that referenced this issue Aug 11, 2012
Plug merge calls two elevator callbacks outside queue lock -
elevator_allow_merge_fn() and elevator_bio_merged_fn().  Although
attempt_plug_merge() suggests that elevator is guaranteed to be there
through the existing request on the plug list, nothing prevents plug
merge from calling into dying or initializing elevator.

For regular merges, bypass ensures elvpriv count to reach zero, which
in turn prevents merges as all !ELVPRIV requests get REQ_SOFTBARRIER
from forced back insertion.  Plug merge doesn't check ELVPRIV, and, as
the requests haven't gone through elevator insertion yet, it doesn't
have SOFTBARRIER set allowing merges on a bypassed queue.

This, for example, leads to the following crash during elevator
switch.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
 PGD 112cbc067 PUD 115d5c067 PMD 0
 Oops: 0000 [raspberrypi#1] PREEMPT SMP
 CPU 1
 Modules linked in: deadline_iosched

 Pid: 819, comm: dd Not tainted 3.3.0-rc2-work+ raspberrypi#76 Bochs Bochs
 RIP: 0010:[<ffffffff813b34e9>]  [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
 RSP: 0018:ffff8801143a38f8  EFLAGS: 00010297
 RAX: 0000000000000000 RBX: ffff88011817ce28 RCX: ffff880116eb6cc0
 RDX: 0000000000000000 RSI: ffff880118056e20 RDI: ffff8801199512f8
 RBP: ffff8801143a3908 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff880118195708
 R13: ffff880118052aa0 R14: ffff8801143a3d50 R15: ffff880118195708
 FS:  00007f19f82cb700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000008 CR3: 0000000112c6a000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process dd (pid: 819, threadinfo ffff8801143a2000, task ffff880116eb6cc0)
 Stack:
  ffff88011817ce28 ffff880118195708 ffff8801143a3928 ffffffff81391bba
  ffff88011817ce28 ffff880118195708 ffff8801143a3948 ffffffff81391bf1
  ffff88011817ce28 0000000000000000 ffff8801143a39a8 ffffffff81398e3e
 Call Trace:
  [<ffffffff81391bba>] elv_rq_merge_ok+0x4a/0x60
  [<ffffffff81391bf1>] elv_try_merge+0x21/0x40
  [<ffffffff81398e3e>] blk_queue_bio+0x8e/0x390
  [<ffffffff81396a5a>] generic_make_request+0xca/0x100
  [<ffffffff81396b04>] submit_bio+0x74/0x100
  [<ffffffff811d45c2>] __blockdev_direct_IO+0x1ce2/0x3450
  [<ffffffff811d0dc7>] blkdev_direct_IO+0x57/0x60
  [<ffffffff811460b5>] generic_file_aio_read+0x6d5/0x760
  [<ffffffff811986b2>] do_sync_read+0xe2/0x120
  [<ffffffff81199345>] vfs_read+0xc5/0x180
  [<ffffffff81199501>] sys_read+0x51/0x90
  [<ffffffff81aeac12>] system_call_fastpath+0x16/0x1b

There are multiple ways to fix this including making plug merge check
ELVPRIV; however,

* Calling into elevator outside queue lock is confusing and
  error-prone.

* Requests on plug list aren't known to the elevator.  They aren't on
  the elevator yet, so there's no elevator specific state to update.

* Given the nature of plug merges - collecting bio's for the same
  purpose from the same issuer - elevator specific restrictions aren't
  applicable.

So, simply don't call into elevator methods from plug merge by moving
elv_bio_merged() from bio_attempt_*_merge() to blk_queue_bio(), and
using blk_try_merge() in attempt_plug_merge().

This is based on Jens' patch to skip elevator_allow_merge_fn() from
plug merge.

Note that this makes per-cgroup merged stats skip plug merging.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4F16F3CA.90904@kernel.dk>
Original-patch-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
@popcornmix
Copy link
Collaborator

Patch has been applied.

popcornmix pushed a commit that referenced this issue Oct 13, 2012
When a virtio_9p pci device is being removed, we should close down any
active channels and free up resources, we're not supposed to BUG() if there's
still an open channel since it's a valid case when removing the PCI device.

Otherwise, removing the PCI device with an open channel would cause the
following BUG():

[ 1184.671416] ------------[ cut here ]------------
[ 1184.672057] kernel BUG at net/9p/trans_virtio.c:618!
[ 1184.672057] invalid opcode: 0000 [#1] PREEMPT SMP
[ 1184.672057] CPU 3
[ 1184.672057] Pid: 5, comm: kworker/u:0 Tainted: G        W    3.4.0-rc2-next-20120413-sasha-dirty #76
[ 1184.672057] RIP: 0010:[<ffffffff825c9116>]  [<ffffffff825c9116>] p9_virtio_remove+0x16/0x90
[ 1184.672057] RSP: 0018:ffff88000d653ac0  EFLAGS: 00010202
[ 1184.672057] RAX: ffffffff836bfb40 RBX: ffff88000c9b2148 RCX: ffff88000d658978
[ 1184.672057] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff880028868000
[ 1184.672057] RBP: ffff88000d653ad0 R08: 0000000000000000 R09: 0000000000000000
[ 1184.672057] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880028868000
[ 1184.672057] R13: ffffffff835aa7c0 R14: ffff880041630000 R15: ffff88000d653da0
[ 1184.672057] FS:  0000000000000000(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000
[ 1184.672057] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1184.672057] CR2: 0000000001181000 CR3: 000000000eba1000 CR4: 00000000000406e0
[ 1184.672057] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
x000000000117a190 *[ 1184.672057] DR3: 00000000000000**
00 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1184.672057] Process kworker/u:0 (pid: 5, threadinfo ffff88000d652000, task ffff88000d658000)
[ 1184.672057] Stack:
[ 1184.672057]  ffff880028868000 ffffffff836bfb40 ffff88000d653af0 ffffffff8193661b
[ 1184.672057]  ffff880028868008 ffffffff836bfb40 ffff88000d653b10 ffffffff81af1c81
[ 1184.672057]  ffff880028868068 ffff880028868008 ffff88000d653b30 ffffffff81af257a
[ 1184.795301] Call Trace:
[ 1184.795301]  [<ffffffff8193661b>] virtio_dev_remove+0x1b/0x60
[ 1184.795301]  [<ffffffff81af1c81>] __device_release_driver+0x81/0xd0
[ 1184.795301]  [<ffffffff81af257a>] device_release_driver+0x2a/0x40
[ 1184.795301]  [<ffffffff81af0d48>] bus_remove_device+0x138/0x150
[ 1184.795301]  [<ffffffff81aef08d>] device_del+0x14d/0x1b0
[ 1184.795301]  [<ffffffff81aef138>] device_unregister+0x48/0x60
[ 1184.795301]  [<ffffffff8193694d>] unregister_virtio_device+0xd/0x10
[ 1184.795301]  [<ffffffff8265fc74>] virtio_pci_remove+0x2a/0x6c
[ 1184.795301]  [<ffffffff818a95ad>] pci_device_remove+0x4d/0x110
[ 1184.795301]  [<ffffffff81af1c81>] __device_release_driver+0x81/0xd0
[ 1184.795301]  [<ffffffff81af257a>] device_release_driver+0x2a/0x40
[ 1184.795301]  [<ffffffff81af0d48>] bus_remove_device+0x138/0x150
[ 1184.795301]  [<ffffffff81aef08d>] device_del+0x14d/0x1b0
[ 1184.795301]  [<ffffffff81aef138>] device_unregister+0x48/0x60
[ 1184.795301]  [<ffffffff818a36fa>] pci_stop_bus_device+0x6a/0x90
[ 1184.795301]  [<ffffffff818a3791>] pci_stop_and_remove_bus_device+0x11/0x20
[ 1184.795301]  [<ffffffff818c21d9>] remove_callback+0x9/0x10
[ 1184.795301]  [<ffffffff81252d91>] sysfs_schedule_callback_work+0x21/0x60
[ 1184.795301]  [<ffffffff810cb1a1>] process_one_work+0x281/0x430
[ 1184.795301]  [<ffffffff810cb140>] ? process_one_work+0x220/0x430
[ 1184.795301]  [<ffffffff81252d70>] ? sysfs_read_file+0x1c0/0x1c0
[ 1184.795301]  [<ffffffff810cc613>] worker_thread+0x1f3/0x320
[ 1184.795301]  [<ffffffff810cc420>] ? manage_workers.clone.13+0x130/0x130
[ 1184.795301]  [<ffffffff810d30b2>] kthread+0xb2/0xc0
[ 1184.795301]  [<ffffffff826783f4>] kernel_thread_helper+0x4/0x10
[ 1184.795301]  [<ffffffff810deb18>] ? finish_task_switch+0x78/0xf0
[ 1184.795301]  [<ffffffff82676574>] ? retint_restore_args+0x13/0x13
[ 1184.795301]  [<ffffffff810d3000>] ? kthread_flush_work_fn+0x10/0x10
[ 1184.795301]  [<ffffffff826783f0>] ? gs_change+0x13/0x13
[ 1184.795301] Code: c1 9e 0a 00 48 83 c4 08 5b c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49 89 fc 53 48 8b 9f a8 04 00 00 80 3b 00 74 0a <0f> 0b 0f 1f 84 00 00 00 00 00 48 8b 87 88 04 00 00 ff 50 30 31
[ 1184.795301] RIP  [<ffffffff825c9116>] p9_virtio_remove+0x16/0x90
[ 1184.795301]  RSP <ffff88000d653ac0>
[ 1184.952618] ---[ end trace a307b3ed40206b4c ]---

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
anholt referenced this issue in anholt/linux Oct 17, 2016
This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.

In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:

  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: virtio_balloon ip_tables x_tables autofs4
  virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio
  CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76
  task: c000000005394880 task.stack: c000000005570000
  NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000
  REGS: c000000005573820 TRAP: 0300   Not tainted  (4.8.0)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 84822884  XER: 20000000
  CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1
  GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964
  GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104
  GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5
  GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000
  GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000
  GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000
  GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964
  GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0
  NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470
  LR [c00000000027bf64] zap_huge_pmd+0x44/0x470
  Call Trace:
  [c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable)
  [c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0
  [c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120
  [c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0
  [c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0
  [c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0
  [c000000005573e30] [c000000000009560] system_call+0x38/0x108
  Instruction dump:
  fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1
  7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378

Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.

This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.

This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).

Fixes: 1a52728 ("powerpc: Optimize hugepage invalidate")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
davet321 pushed a commit to davet321/rpi-linux that referenced this issue Oct 28, 2016
commit 05af40e upstream.

This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.

In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:

  Oops: Kernel access of bad area, sig: 11 [raspberrypi#1]
  SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: virtio_balloon ip_tables x_tables autofs4
  virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio
  CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 raspberrypi#76
  task: c000000005394880 task.stack: c000000005570000
  NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000
  REGS: c000000005573820 TRAP: 0300   Not tainted  (4.8.0)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 84822884  XER: 20000000
  CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1
  GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964
  GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104
  GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5
  GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000
  GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000
  GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000
  GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964
  GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0
  NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470
  LR [c00000000027bf64] zap_huge_pmd+0x44/0x470
  Call Trace:
  [c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable)
  [c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0
  [c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120
  [c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0
  [c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0
  [c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0
  [c000000005573e30] [c000000000009560] system_call+0x38/0x108
  Instruction dump:
  fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1
  7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378

Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.

This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.

This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).

Fixes: 1a52728 ("powerpc: Optimize hugepage invalidate")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Oct 28, 2016
commit 05af40e upstream.

This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.

In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:

  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: virtio_balloon ip_tables x_tables autofs4
  virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio
  CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76
  task: c000000005394880 task.stack: c000000005570000
  NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000
  REGS: c000000005573820 TRAP: 0300   Not tainted  (4.8.0)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 84822884  XER: 20000000
  CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1
  GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964
  GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104
  GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5
  GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000
  GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000
  GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000
  GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964
  GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0
  NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470
  LR [c00000000027bf64] zap_huge_pmd+0x44/0x470
  Call Trace:
  [c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable)
  [c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0
  [c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120
  [c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0
  [c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0
  [c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0
  [c000000005573e30] [c000000000009560] system_call+0x38/0x108
  Instruction dump:
  fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1
  7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378

Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.

This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.

This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).

Fixes: 1a52728 ("powerpc: Optimize hugepage invalidate")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Jan 18, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Jan 21, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Feb 1, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 11, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 11, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 12, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 15, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 15, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 18, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Apr 24, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue May 1, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue May 29, 2017
… HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Mar 22, 2018
…disconnecting HIDs

While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [raspberrypi#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty raspberrypi#76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
popcornmix pushed a commit that referenced this issue Dec 10, 2018
This reverts commit 3aa2177.

That commit triggered a new WARN when unloading the module (see at the
end of the commit message). When a class_dev is embedded in a structure
then that class_dev is the thing that controls the lifetime of that
structure, for that reason device managed allocations can't be used here.
See Documentation/kobject.txt.

Revert the above patch, so the struct is allocated using kzalloc and we
have a release function for it that frees the allocated memory, otherwise
it is broken.

 ------------[ cut here ]------------
 Device 'cros_ec' does not have a release() function, it is broken and must be fixed.
 WARNING: CPU: 3 PID: 3675 at drivers/base/core.c:895 device_release+0x80/0x90
 Modules linked in: btusb btrtl btintel btbcm bluetooth ...
 CPU: 3 PID: 3675 Comm: rmmod Not tainted 4.20.0-rc4 #76
 Hardware name: Google Kevin (DT)
 pstate: 40000005 (nZcv daif -PAN -UAO)
 pc : device_release+0x80/0x90
 lr : device_release+0x80/0x90
 sp : ffff00000c47bc70
 x29: ffff00000c47bc70 x28: ffff8000e86b0d40
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000056000000 x24: 0000000000000015
 x23: ffff8000f0bbf860 x22: ffff000000d320a0
 x21: ffff8000ee93e100 x20: ffff8000ed931428
 x19: ffff8000ed931418 x18: 0000000000000020
 x17: 0000000000000000 x16: 0000000000000000
 x15: 0000000000000400 x14: 0000000000000143
 x13: 0000000000000000 x12: 0000000000000400
 x11: 0000000000000157 x10: 0000000000000960
 x9 : ffff00000c47b9b0 x8 : ffff8000e86b1700
 x7 : 0000000000000000 x6 : ffff8000f7d520b8
 x5 : ffff8000f7d520b8 x4 : 0000000000000000
 x3 : ffff8000f7d58e68 x2 : ffff8000e86b0d40
 x1 : 37d859939c964800 x0 : 0000000000000000
 Call trace:
  device_release+0x80/0x90
  kobject_put+0x74/0xe8
  device_unregister+0x20/0x30
  ec_device_remove+0x34/0x48 [cros_ec_dev]
  platform_drv_remove+0x28/0x48
  device_release_driver_internal+0x1a8/0x240
  driver_detach+0x40/0x80
  bus_remove_driver+0x54/0xa8
  driver_unregister+0x2c/0x58
  platform_driver_unregister+0x10/0x18
  cros_ec_dev_exit+0x1c/0x2d8 [cros_ec_dev]
  __arm64_sys_delete_module+0x16c/0x1f8
  el0_svc_common+0x84/0xd8
  el0_svc_handler+0x2c/0x80
  el0_svc+0x8/0xc
 ---[ end trace a57c4625f3c60ae8 ]---

Cc: stable@vger.kernel.org
Fixes: 3aa2177 ("mfd: cros_ec: Use devm_kzalloc for private data")
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
popcornmix pushed a commit that referenced this issue Dec 14, 2018
commit 48a2ca0 upstream.

This reverts commit 3aa2177.

That commit triggered a new WARN when unloading the module (see at the
end of the commit message). When a class_dev is embedded in a structure
then that class_dev is the thing that controls the lifetime of that
structure, for that reason device managed allocations can't be used here.
See Documentation/kobject.txt.

Revert the above patch, so the struct is allocated using kzalloc and we
have a release function for it that frees the allocated memory, otherwise
it is broken.

 ------------[ cut here ]------------
 Device 'cros_ec' does not have a release() function, it is broken and must be fixed.
 WARNING: CPU: 3 PID: 3675 at drivers/base/core.c:895 device_release+0x80/0x90
 Modules linked in: btusb btrtl btintel btbcm bluetooth ...
 CPU: 3 PID: 3675 Comm: rmmod Not tainted 4.20.0-rc4 #76
 Hardware name: Google Kevin (DT)
 pstate: 40000005 (nZcv daif -PAN -UAO)
 pc : device_release+0x80/0x90
 lr : device_release+0x80/0x90
 sp : ffff00000c47bc70
 x29: ffff00000c47bc70 x28: ffff8000e86b0d40
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000056000000 x24: 0000000000000015
 x23: ffff8000f0bbf860 x22: ffff000000d320a0
 x21: ffff8000ee93e100 x20: ffff8000ed931428
 x19: ffff8000ed931418 x18: 0000000000000020
 x17: 0000000000000000 x16: 0000000000000000
 x15: 0000000000000400 x14: 0000000000000143
 x13: 0000000000000000 x12: 0000000000000400
 x11: 0000000000000157 x10: 0000000000000960
 x9 : ffff00000c47b9b0 x8 : ffff8000e86b1700
 x7 : 0000000000000000 x6 : ffff8000f7d520b8
 x5 : ffff8000f7d520b8 x4 : 0000000000000000
 x3 : ffff8000f7d58e68 x2 : ffff8000e86b0d40
 x1 : 37d859939c964800 x0 : 0000000000000000
 Call trace:
  device_release+0x80/0x90
  kobject_put+0x74/0xe8
  device_unregister+0x20/0x30
  ec_device_remove+0x34/0x48 [cros_ec_dev]
  platform_drv_remove+0x28/0x48
  device_release_driver_internal+0x1a8/0x240
  driver_detach+0x40/0x80
  bus_remove_driver+0x54/0xa8
  driver_unregister+0x2c/0x58
  platform_driver_unregister+0x10/0x18
  cros_ec_dev_exit+0x1c/0x2d8 [cros_ec_dev]
  __arm64_sys_delete_module+0x16c/0x1f8
  el0_svc_common+0x84/0xd8
  el0_svc_handler+0x2c/0x80
  el0_svc+0x8/0xc
 ---[ end trace a57c4625f3c60ae8 ]---

Cc: stable@vger.kernel.org
Fixes: 3aa2177 ("mfd: cros_ec: Use devm_kzalloc for private data")
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 23, 2019
commit 94a2c3a upstream.

We recently got a stack by syzkaller like this:

BUG: sleeping function called from invalid context at mm/slab.h:361
in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
INFO: lockdep is turned off.
CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
 0000000000000000 0000000000000001 0000000000000004 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline]
 <IRQ>  [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
 [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
 [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
 [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline]
 [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline]
 [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline]
 [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
 [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline]
 [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline]
 [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
 [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
 [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline]
 [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675
 [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline]
 [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline]
 [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692
 [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237
 [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
 [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
 [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
 [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
 [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
 [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273
 [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline]
 [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
 [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
 [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
 [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
 <EOI>  [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180
 [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626
 [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043
 [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline]
 [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050
 [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a

In softirq context, we call rcu callback function delete_partition_rcu_cb(),
which may allocate memory by kzalloc with GFP_KERNEL flag. If the
allocation cannot be satisfied, it may sleep. However, That is not allowed
in softirq contex.

Although we found this problem on linux 4.4, the latest kernel version
seems to have this problem as well. And it is very similar to the
previous one:
	https://lkml.org/lkml/2018/7/9/391

Fix it by using RCU workqueue, which allows sleep.

Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 24, 2019
commit 94a2c3a upstream.

We recently got a stack by syzkaller like this:

BUG: sleeping function called from invalid context at mm/slab.h:361
in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
INFO: lockdep is turned off.
CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
 0000000000000000 0000000000000001 0000000000000004 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline]
 <IRQ>  [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
 [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
 [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
 [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline]
 [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline]
 [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline]
 [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
 [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline]
 [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline]
 [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
 [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
 [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline]
 [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675
 [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline]
 [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline]
 [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692
 [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237
 [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
 [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
 [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
 [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
 [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
 [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273
 [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline]
 [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
 [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
 [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
 [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
 <EOI>  [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180
 [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626
 [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043
 [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline]
 [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050
 [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a

In softirq context, we call rcu callback function delete_partition_rcu_cb(),
which may allocate memory by kzalloc with GFP_KERNEL flag. If the
allocation cannot be satisfied, it may sleep. However, That is not allowed
in softirq contex.

Although we found this problem on linux 4.4, the latest kernel version
seems to have this problem as well. And it is very similar to the
previous one:
	https://lkml.org/lkml/2018/7/9/391

Fix it by using RCU workqueue, which allows sleep.

Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Apr 8, 2019
[ Upstream commit 070e64d ]

Devices that make up DPU, i.e. graphics card, request their interrupts
from this "virtual" interrupt chip. The interrupt chip builds upon a GIC
SPI interrupt that raises high when any of the interrupts in the DPU's
irq status register are triggered. From the kernel's perspective this is
a chained irq chip, so requesting a flow handler for the GIC SPI and
then calling generic IRQ handling code from that irq handler is not
completely proper. It's better to convert this to a chained irq so that
the GIC SPI irq doesn't appear in /proc/interrupts, can't have CPU
affinity changed, and won't be accounted for with irq stats. Doing this
also silences a recursive lockdep warning because we can specify a
different lock class for the chained interrupts, silencing a warning
that is easy to see with 'threadirqs' on the kernel commandline.

 WARNING: inconsistent lock state
 4.19.10 #76 Tainted: G        W
 --------------------------------
 inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
 irq/40-dpu_mdss/203 [HC0[0]:SC0[2]:HE1:SE0] takes:
 0000000053ea9021 (&irq_desc_lock_class){?.-.}, at: handle_level_irq+0x34/0x26c
 {IN-HARDIRQ-W} state was registered at:
   lock_acquire+0x244/0x360
   _raw_spin_lock+0x64/0xa0
   handle_fasteoi_irq+0x54/0x2ec
   generic_handle_irq+0x44/0x5c
   __handle_domain_irq+0x9c/0x11c
   gic_handle_irq+0x208/0x260
   el1_irq+0xb4/0x130
   arch_cpu_idle+0x178/0x3cc
   default_idle_call+0x3c/0x54
   do_idle+0x1a8/0x3dc
   cpu_startup_entry+0x24/0x28
   rest_init+0x240/0x270
   start_kernel+0x5a8/0x6bc
 irq event stamp: 18
 hardirqs last  enabled at (17): [<ffffff9042385e80>] _raw_spin_unlock_irq+0x40/0xc0
 hardirqs last disabled at (16): [<ffffff904237a1f4>] __schedule+0x20c/0x1bbc
 softirqs last  enabled at (0): [<ffffff9040f318d0>] copy_process+0xb50/0x3964
 softirqs last disabled at (18): [<ffffff9041036364>] local_bh_disable+0x8/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   <Interrupt>
     lock(&irq_desc_lock_class);

  *** DEADLOCK ***

 no locks held by irq/40-dpu_mdss/203.

 stack backtrace:
 CPU: 0 PID: 203 Comm: irq/40-dpu_mdss Tainted: G        W         4.19.10 #76
 Call trace:
  dump_backtrace+0x0/0x2f8
  show_stack+0x20/0x2c
  __dump_stack+0x20/0x28
  dump_stack+0xcc/0x10c
  mark_lock+0xbe0/0xe24
  __lock_acquire+0x4cc/0x2708
  lock_acquire+0x244/0x360
  _raw_spin_lock+0x64/0xa0
  handle_level_irq+0x34/0x26c
  generic_handle_irq+0x44/0x5c
  dpu_mdss_irq+0x64/0xec
  irq_forced_thread_fn+0x58/0x9c
  irq_thread+0x120/0x1dc
  kthread+0x248/0x260
  ret_from_fork+0x10/0x18
 ------------[ cut here ]------------
 irq 169 handler irq_default_primary_handler+0x0/0x18 enabled interrupts

Cc: Sean Paul <seanpaul@chromium.org>
Cc: Jordan Crouse <jcrouse@codeaurora.org>
Cc: Jayant Shekhar <jshekhar@codeaurora.org>
Cc: Rajesh Yadav <ryadav@codeaurora.org>
Cc: Jeykumar Sankaran <jsanka@codeaurora.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Apr 23, 2019
[ Upstream commit bd18bff ]

A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS.  Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified.  But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state.  Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified.  So, unwind our software model to
the pre-VMEntry host state.

Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01.  Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.

An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.

Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.

This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12.  Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.

A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1.  L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.

L1:
  BUG: unable to handle kernel paging request at ffffc90000663b9e
  PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
  Oops: 0009 [#1] SMP
  CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G        W         4.18.0-rc2+ #2
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:native_load_gdt+0x0/0x10

  ...

  Call Trace:
   load_fixmap_gdt+0x22/0x30
   __vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
   vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
   nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
   vmx_handle_exit+0x246/0x15a0 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
   kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
   do_vfs_ioctl+0x9f/0x600
   ksys_ioctl+0x66/0x70
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x4f/0x100
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

L0:
  WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
  ...
  CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
  Hardware name: Intel Corporation Kabylake Client platform/KBL S
  RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]

  ...

  Call Trace:
   kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
   kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
   do_vfs_ioctl+0x9f/0x5e0
   ksys_ioctl+0x66/0x70
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x49/0xf0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 5af4157 (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Jan 14, 2020
Following scenario easily break driver logic and crash the kernel:
1. Add rule with mirred actions to same device.
2. Delete this rule.
In described scenario rule is not added to database and on deletion
driver access invalid entry.
Example:

 $ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \
       flower skip_sw \
       action mirred egress mirror dev ens1f0_1 pipe \
       action mirred egress redirect dev ens1f0_1
 $ tc filter del dev ens1f0_0 ingress protocol ip prio 1

Dmesg output:

[  376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f)
[  376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728
[  376.673433] kasan: CONFIG_KASAN_INLINE enabled
[  376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[  376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76
[  376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016
[  376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core]
[  376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d
[  376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202
[  376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60
[  376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282
[  376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6
[  376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0
[  376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0
[  376.823445] FS:  00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000
[  376.834971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0
[  376.854843] Call Trace:
[  376.868542]  __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core]
[  376.877735]  mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core]
[  376.921549]  mlx5e_flow_put+0x2b/0x50 [mlx5_core]
[  376.929813]  mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core]
[  376.973030]  tc_setup_cb_reoffload+0x29/0xc0
[  376.980619]  fl_reoffload+0x50a/0x770 [cls_flower]
[  377.015087]  tcf_block_playback_offloads+0xbd/0x250
[  377.033400]  tcf_block_setup+0x1b2/0xc60
[  377.057247]  tcf_block_offload_cmd+0x195/0x240
[  377.098826]  tcf_block_offload_unbind+0xe7/0x180
[  377.107056]  __tcf_block_put+0xe5/0x400
[  377.114528]  ingress_destroy+0x3d/0x60 [sch_ingress]
[  377.122894]  qdisc_destroy+0xf1/0x5a0
[  377.129993]  qdisc_graft+0xa3d/0xe50
[  377.151227]  tc_get_qdisc+0x48e/0xa20
[  377.165167]  rtnetlink_rcv_msg+0x35d/0x8d0
[  377.199528]  netlink_rcv_skb+0x11e/0x340
[  377.219638]  netlink_unicast+0x408/0x5b0
[  377.239913]  netlink_sendmsg+0x71b/0xb30
[  377.267505]  sock_sendmsg+0xb1/0xf0
[  377.273801]  ___sys_sendmsg+0x635/0x900
[  377.312784]  __sys_sendmsg+0xd3/0x170
[  377.338693]  do_syscall_64+0x95/0x460
[  377.344833]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  377.352321] RIP: 0033:0x7f675e58e090

To avoid this, for every mirred action check if output device was
already processed. If so - drop rule with EOPNOTSUPP error.

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
popcornmix pushed a commit that referenced this issue Nov 22, 2021
commit ab00f3e upstream.

In the case of MDIO bus registration failure due to no external PHY
devices is connected to the MAC, clk_disable_unprepare() is called in
stmmac_bus_clk_config() and intel_eth_pci_probe() respectively.

The second call in intel_eth_pci_probe() will caused the following:-

[   16.578605] intel-eth-pci 0000:00:1e.5: No PHY found
[   16.583778] intel-eth-pci 0000:00:1e.5: stmmac_dvr_probe: MDIO bus (id: 2) registration failed
[   16.680181] ------------[ cut here ]------------
[   16.684861] stmmac-0000:00:1e.5 already disabled
[   16.689547] WARNING: CPU: 13 PID: 2053 at drivers/clk/clk.c:952 clk_core_disable+0x96/0x1b0
[   16.697963] Modules linked in: dwc3 iTCO_wdt mei_hdcp iTCO_vendor_support udc_core x86_pkg_temp_thermal kvm_intel marvell10g kvm sch_fq_codel nfsd irqbypass dwmac_intel(+) stmmac uio ax88179_178a pcs_xpcs phylink uhid spi_pxa2xx_platform usbnet mei_me pcspkr tpm_crb mii i2c_i801 dw_dmac dwc3_pci thermal dw_dmac_core intel_rapl_msr libphy i2c_smbus mei tpm_tis intel_th_gth tpm_tis_core tpm intel_th_acpi intel_pmc_core intel_th i915 fuse configfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore
[   16.746785] CPU: 13 PID: 2053 Comm: systemd-udevd Tainted: G     U            5.13.0-rc3-intel-lts #76
[   16.756134] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020
[   16.769465] RIP: 0010:clk_core_disable+0x96/0x1b0
[   16.774222] Code: 00 8b 05 45 96 17 01 85 c0 7f 24 48 8b 5b 30 48 85 db 74 a5 8b 43 7c 85 c0 75 93 48 8b 33 48 c7 c7 6e 32 cc b7 e8 b2 5d 52 00 <0f> 0b 5b 5d c3 65 8b 05 76 31 18 49 89 c0 48 0f a3 05 bc 92 1a 01
[   16.793016] RSP: 0018:ffffa44580523aa0 EFLAGS: 00010086
[   16.798287] RAX: 0000000000000000 RBX: ffff8d7d0eb70a00 RCX: 0000000000000000
[   16.805435] RDX: 0000000000000002 RSI: ffffffffb7c62d5f RDI: 00000000ffffffff
[   16.812610] RBP: 0000000000000287 R08: 0000000000000000 R09: ffffa445805238d0
[   16.819759] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d7d0eb70a00
[   16.826904] R13: ffff8d7d027370c8 R14: 0000000000000006 R15: ffffa44580523ad0
[   16.834047] FS:  00007f9882fa2600(0000) GS:ffff8d80a0940000(0000) knlGS:0000000000000000
[   16.842177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.847966] CR2: 00007f9882bea3d8 CR3: 000000010b126001 CR4: 0000000000370ee0
[   16.855144] Call Trace:
[   16.857614]  clk_core_disable_lock+0x1b/0x30
[   16.861941]  intel_eth_pci_probe.cold+0x11d/0x136 [dwmac_intel]
[   16.867913]  pci_device_probe+0xcf/0x150
[   16.871890]  really_probe+0xf5/0x3e0
[   16.875526]  driver_probe_device+0x64/0x150
[   16.879763]  device_driver_attach+0x53/0x60
[   16.883998]  __driver_attach+0x9f/0x150
[   16.887883]  ? device_driver_attach+0x60/0x60
[   16.892288]  ? device_driver_attach+0x60/0x60
[   16.896698]  bus_for_each_dev+0x77/0xc0
[   16.900583]  bus_add_driver+0x184/0x1f0
[   16.904469]  driver_register+0x6c/0xc0
[   16.908268]  ? 0xffffffffc07ae000
[   16.911598]  do_one_initcall+0x4a/0x210
[   16.915489]  ? kmem_cache_alloc_trace+0x305/0x4e0
[   16.920247]  do_init_module+0x5c/0x230
[   16.924057]  load_module+0x2894/0x2b70
[   16.927857]  ? __do_sys_finit_module+0xb5/0x120
[   16.932441]  __do_sys_finit_module+0xb5/0x120
[   16.936845]  do_syscall_64+0x42/0x80
[   16.940476]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   16.945586] RIP: 0033:0x7f98830e5ccd
[   16.949177] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
[   16.967970] RSP: 002b:00007ffc66b60168 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   16.975583] RAX: ffffffffffffffda RBX: 000055885de35ef0 RCX: 00007f98830e5ccd
[   16.982725] RDX: 0000000000000000 RSI: 00007f98832541e3 RDI: 0000000000000012
[   16.989868] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   16.997042] R10: 0000000000000012 R11: 0000000000000246 R12: 00007f98832541e3
[   17.004222] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffc66b60328
[   17.011369] ---[ end trace df06a3dab26b988c ]---
[   17.016062] ------------[ cut here ]------------
[   17.020701] stmmac-0000:00:1e.5 already unprepared

Removing the stmmac_bus_clks_config() call in stmmac_dvr_probe and let
dwmac-intel to handle the unprepare and disable of the clk device.

Fixes: 5ec5582 ("net: stmmac: add clocks management for gmac driver")
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Meng Li <Meng.Li@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lxb pushed a commit to 0lxb/rpi_linux that referenced this issue Jan 30, 2024
popcornmix pushed a commit that referenced this issue Apr 23, 2024
On arm64 machines, swsusp_save() faults if it attempts to access
MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI
when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n:

  Unable to handle kernel paging request at virtual address ffffff8000000000
  Mem abort info:
    ESR = 0x0000000096000007
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x07: level 3 translation fault
  Data abort info:
    ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000
  [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000
  Internal error: Oops: 0000000096000007 [#1] SMP
  Internal error: Oops: 0000000096000007 [#1] SMP
  Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm
  CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ #76
  Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0
  Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021
  pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : swsusp_save+0x280/0x538
  lr : swsusp_save+0x280/0x538
  sp : ffffffa034a3fa40
  x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000
  x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000
  x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2
  x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000
  x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666
  x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea
  x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0
  x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001
  x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027
  x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e
  Call trace:
   swsusp_save+0x280/0x538
   swsusp_arch_suspend+0x148/0x190
   hibernation_snapshot+0x240/0x39c
   hibernate+0xc4/0x378
   state_store+0xf0/0x10c
   kobj_attr_store+0x14/0x24

The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable()
-> kernel_page_present() assuming that a page is always present when
can_set_direct_map() is false (all of rodata_full,
debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false),
irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions
should not be saved during hibernation.

This problem was introduced by changes to the pfn_valid() logic in
commit a7d9f30 ("arm64: drop pfn_valid_within() and simplify
pfn_valid()").

Similar to other architectures, drop the !can_set_direct_map() check in
kernel_page_present() so that page_is_savable() skips such pages.

Fixes: a7d9f30 ("arm64: drop pfn_valid_within() and simplify pfn_valid()")
Cc: <stable@vger.kernel.org> # 5.14.x
Suggested-by: Mike Rapoport <rppt@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240417025248.386622-1-tianyaxiong@kylinos.cn
[catalin.marinas@arm.com: rework commit message]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
popcornmix pushed a commit that referenced this issue Apr 29, 2024
commit 50449ca upstream.

On arm64 machines, swsusp_save() faults if it attempts to access
MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI
when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n:

  Unable to handle kernel paging request at virtual address ffffff8000000000
  Mem abort info:
    ESR = 0x0000000096000007
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x07: level 3 translation fault
  Data abort info:
    ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000
  [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000
  Internal error: Oops: 0000000096000007 [#1] SMP
  Internal error: Oops: 0000000096000007 [#1] SMP
  Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm
  CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ #76
  Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0
  Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021
  pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : swsusp_save+0x280/0x538
  lr : swsusp_save+0x280/0x538
  sp : ffffffa034a3fa40
  x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000
  x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000
  x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2
  x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000
  x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666
  x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea
  x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0
  x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001
  x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027
  x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e
  Call trace:
   swsusp_save+0x280/0x538
   swsusp_arch_suspend+0x148/0x190
   hibernation_snapshot+0x240/0x39c
   hibernate+0xc4/0x378
   state_store+0xf0/0x10c
   kobj_attr_store+0x14/0x24

The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable()
-> kernel_page_present() assuming that a page is always present when
can_set_direct_map() is false (all of rodata_full,
debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false),
irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions
should not be saved during hibernation.

This problem was introduced by changes to the pfn_valid() logic in
commit a7d9f30 ("arm64: drop pfn_valid_within() and simplify
pfn_valid()").

Similar to other architectures, drop the !can_set_direct_map() check in
kernel_page_present() so that page_is_savable() skips such pages.

Fixes: a7d9f30 ("arm64: drop pfn_valid_within() and simplify pfn_valid()")
Cc: <stable@vger.kernel.org> # 5.14.x
Suggested-by: Mike Rapoport <rppt@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240417025248.386622-1-tianyaxiong@kylinos.cn
[catalin.marinas@arm.com: rework commit message]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Apr 29, 2024
commit 50449ca upstream.

On arm64 machines, swsusp_save() faults if it attempts to access
MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI
when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n:

  Unable to handle kernel paging request at virtual address ffffff8000000000
  Mem abort info:
    ESR = 0x0000000096000007
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x07: level 3 translation fault
  Data abort info:
    ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000
  [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000
  Internal error: Oops: 0000000096000007 [#1] SMP
  Internal error: Oops: 0000000096000007 [#1] SMP
  Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm
  CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ #76
  Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0
  Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021
  pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : swsusp_save+0x280/0x538
  lr : swsusp_save+0x280/0x538
  sp : ffffffa034a3fa40
  x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000
  x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000
  x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2
  x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000
  x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666
  x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea
  x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0
  x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001
  x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027
  x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e
  Call trace:
   swsusp_save+0x280/0x538
   swsusp_arch_suspend+0x148/0x190
   hibernation_snapshot+0x240/0x39c
   hibernate+0xc4/0x378
   state_store+0xf0/0x10c
   kobj_attr_store+0x14/0x24

The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable()
-> kernel_page_present() assuming that a page is always present when
can_set_direct_map() is false (all of rodata_full,
debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false),
irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions
should not be saved during hibernation.

This problem was introduced by changes to the pfn_valid() logic in
commit a7d9f30 ("arm64: drop pfn_valid_within() and simplify
pfn_valid()").

Similar to other architectures, drop the !can_set_direct_map() check in
kernel_page_present() so that page_is_savable() skips such pages.

Fixes: a7d9f30 ("arm64: drop pfn_valid_within() and simplify pfn_valid()")
Cc: <stable@vger.kernel.org> # 5.14.x
Suggested-by: Mike Rapoport <rppt@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240417025248.386622-1-tianyaxiong@kylinos.cn
[catalin.marinas@arm.com: rework commit message]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Sep 2, 2024
…hunk()

[BUG]
There is an internal report that KASAN is reporting use-after-free, with
the following backtrace:

  BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs]
  Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45
  CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ #76
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
  Call Trace:
   dump_stack_lvl+0x61/0x80
   print_address_description.constprop.0+0x5e/0x2f0
   print_report+0x118/0x216
   kasan_report+0x11d/0x1f0
   btrfs_check_read_bio+0xa68/0xb70 [btrfs]
   process_one_work+0xce0/0x12a0
   worker_thread+0x717/0x1250
   kthread+0x2e3/0x3c0
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x11/0x20

  Allocated by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   __kasan_slab_alloc+0x7d/0x80
   kmem_cache_alloc_noprof+0x16e/0x3e0
   mempool_alloc_noprof+0x12e/0x310
   bio_alloc_bioset+0x3f0/0x7a0
   btrfs_bio_alloc+0x2e/0x50 [btrfs]
   submit_extent_page+0x4d1/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Freed by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x50
   __kasan_slab_free+0x4b/0x60
   kmem_cache_free+0x214/0x5d0
   bio_free+0xed/0x180
   end_bbio_data_read+0x1cc/0x580 [btrfs]
   btrfs_submit_chunk+0x98d/0x1880 [btrfs]
   btrfs_submit_bio+0x33/0x70 [btrfs]
   submit_one_bio+0xd4/0x130 [btrfs]
   submit_extent_page+0x3ea/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

[CAUSE]
Although I cannot reproduce the error, the report itself is good enough
to pin down the cause.

The call trace is the regular endio workqueue context, but the
free-by-task trace is showing that during btrfs_submit_chunk() we
already hit a critical error, and is calling btrfs_bio_end_io() to error
out.  And the original endio function called bio_put() to free the whole
bio.

This means a double freeing thus causing use-after-free, e.g.:

1. Enter btrfs_submit_bio() with a read bio
   The read bio length is 128K, crossing two 64K stripes.

2. The first run of btrfs_submit_chunk()

2.1 Call btrfs_map_block(), which returns 64K
2.2 Call btrfs_split_bio()
    Now there are two bios, one referring to the first 64K, the other
    referring to the second 64K.
2.3 The first half is submitted.

3. The second run of btrfs_submit_chunk()

3.1 Call btrfs_map_block(), which by somehow failed
    Now we call btrfs_bio_end_io() to handle the error

3.2 btrfs_bio_end_io() calls the original endio function
    Which is end_bbio_data_read(), and it calls bio_put() for the
    original bio.

    Now the original bio is freed.

4. The submitted first 64K bio finished
   Now we call into btrfs_check_read_bio() and tries to advance the bio
   iter.
   But since the original bio (thus its iter) is already freed, we
   trigger the above use-after free.

   And even if the memory is not poisoned/corrupted, we will later call
   the original endio function, causing a double freeing.

[FIX]
Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(),
which has the extra check on split bios and do the proper refcounting
for cloned bios.

Furthermore there is already one extra btrfs_cleanup_bio() call, but
that is duplicated to btrfs_orig_bbio_end_io() call, so remove that
label completely.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 852eee6 ("btrfs: allow btrfs_submit_bio to split bios")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
ncopa pushed a commit to ncopa/linux that referenced this issue Sep 4, 2024
…hunk()

commit 10d9d8c upstream.

[BUG]
There is an internal report that KASAN is reporting use-after-free, with
the following backtrace:

  BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs]
  Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45
  CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ raspberrypi#76
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
  Call Trace:
   dump_stack_lvl+0x61/0x80
   print_address_description.constprop.0+0x5e/0x2f0
   print_report+0x118/0x216
   kasan_report+0x11d/0x1f0
   btrfs_check_read_bio+0xa68/0xb70 [btrfs]
   process_one_work+0xce0/0x12a0
   worker_thread+0x717/0x1250
   kthread+0x2e3/0x3c0
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x11/0x20

  Allocated by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   __kasan_slab_alloc+0x7d/0x80
   kmem_cache_alloc_noprof+0x16e/0x3e0
   mempool_alloc_noprof+0x12e/0x310
   bio_alloc_bioset+0x3f0/0x7a0
   btrfs_bio_alloc+0x2e/0x50 [btrfs]
   submit_extent_page+0x4d1/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Freed by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x50
   __kasan_slab_free+0x4b/0x60
   kmem_cache_free+0x214/0x5d0
   bio_free+0xed/0x180
   end_bbio_data_read+0x1cc/0x580 [btrfs]
   btrfs_submit_chunk+0x98d/0x1880 [btrfs]
   btrfs_submit_bio+0x33/0x70 [btrfs]
   submit_one_bio+0xd4/0x130 [btrfs]
   submit_extent_page+0x3ea/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

[CAUSE]
Although I cannot reproduce the error, the report itself is good enough
to pin down the cause.

The call trace is the regular endio workqueue context, but the
free-by-task trace is showing that during btrfs_submit_chunk() we
already hit a critical error, and is calling btrfs_bio_end_io() to error
out.  And the original endio function called bio_put() to free the whole
bio.

This means a double freeing thus causing use-after-free, e.g.:

1. Enter btrfs_submit_bio() with a read bio
   The read bio length is 128K, crossing two 64K stripes.

2. The first run of btrfs_submit_chunk()

2.1 Call btrfs_map_block(), which returns 64K
2.2 Call btrfs_split_bio()
    Now there are two bios, one referring to the first 64K, the other
    referring to the second 64K.
2.3 The first half is submitted.

3. The second run of btrfs_submit_chunk()

3.1 Call btrfs_map_block(), which by somehow failed
    Now we call btrfs_bio_end_io() to handle the error

3.2 btrfs_bio_end_io() calls the original endio function
    Which is end_bbio_data_read(), and it calls bio_put() for the
    original bio.

    Now the original bio is freed.

4. The submitted first 64K bio finished
   Now we call into btrfs_check_read_bio() and tries to advance the bio
   iter.
   But since the original bio (thus its iter) is already freed, we
   trigger the above use-after free.

   And even if the memory is not poisoned/corrupted, we will later call
   the original endio function, causing a double freeing.

[FIX]
Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(),
which has the extra check on split bios and do the proper refcounting
for cloned bios.

Furthermore there is already one extra btrfs_cleanup_bio() call, but
that is duplicated to btrfs_orig_bbio_end_io() call, so remove that
label completely.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 852eee6 ("btrfs: allow btrfs_submit_bio to split bios")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Sep 6, 2024
…hunk()

commit 10d9d8c upstream.

[BUG]
There is an internal report that KASAN is reporting use-after-free, with
the following backtrace:

  BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs]
  Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45
  CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ #76
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
  Call Trace:
   dump_stack_lvl+0x61/0x80
   print_address_description.constprop.0+0x5e/0x2f0
   print_report+0x118/0x216
   kasan_report+0x11d/0x1f0
   btrfs_check_read_bio+0xa68/0xb70 [btrfs]
   process_one_work+0xce0/0x12a0
   worker_thread+0x717/0x1250
   kthread+0x2e3/0x3c0
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x11/0x20

  Allocated by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   __kasan_slab_alloc+0x7d/0x80
   kmem_cache_alloc_noprof+0x16e/0x3e0
   mempool_alloc_noprof+0x12e/0x310
   bio_alloc_bioset+0x3f0/0x7a0
   btrfs_bio_alloc+0x2e/0x50 [btrfs]
   submit_extent_page+0x4d1/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Freed by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x50
   __kasan_slab_free+0x4b/0x60
   kmem_cache_free+0x214/0x5d0
   bio_free+0xed/0x180
   end_bbio_data_read+0x1cc/0x580 [btrfs]
   btrfs_submit_chunk+0x98d/0x1880 [btrfs]
   btrfs_submit_bio+0x33/0x70 [btrfs]
   submit_one_bio+0xd4/0x130 [btrfs]
   submit_extent_page+0x3ea/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

[CAUSE]
Although I cannot reproduce the error, the report itself is good enough
to pin down the cause.

The call trace is the regular endio workqueue context, but the
free-by-task trace is showing that during btrfs_submit_chunk() we
already hit a critical error, and is calling btrfs_bio_end_io() to error
out.  And the original endio function called bio_put() to free the whole
bio.

This means a double freeing thus causing use-after-free, e.g.:

1. Enter btrfs_submit_bio() with a read bio
   The read bio length is 128K, crossing two 64K stripes.

2. The first run of btrfs_submit_chunk()

2.1 Call btrfs_map_block(), which returns 64K
2.2 Call btrfs_split_bio()
    Now there are two bios, one referring to the first 64K, the other
    referring to the second 64K.
2.3 The first half is submitted.

3. The second run of btrfs_submit_chunk()

3.1 Call btrfs_map_block(), which by somehow failed
    Now we call btrfs_bio_end_io() to handle the error

3.2 btrfs_bio_end_io() calls the original endio function
    Which is end_bbio_data_read(), and it calls bio_put() for the
    original bio.

    Now the original bio is freed.

4. The submitted first 64K bio finished
   Now we call into btrfs_check_read_bio() and tries to advance the bio
   iter.
   But since the original bio (thus its iter) is already freed, we
   trigger the above use-after free.

   And even if the memory is not poisoned/corrupted, we will later call
   the original endio function, causing a double freeing.

[FIX]
Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(),
which has the extra check on split bios and do the proper refcounting
for cloned bios.

Furthermore there is already one extra btrfs_cleanup_bio() call, but
that is duplicated to btrfs_orig_bbio_end_io() call, so remove that
label completely.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 852eee6 ("btrfs: allow btrfs_submit_bio to split bios")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants