Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel OOPS at boot and shutdown #53

Closed
shenki opened this issue Feb 24, 2016 · 8 comments
Closed

Kernel OOPS at boot and shutdown #53

shenki opened this issue Feb 24, 2016 · 8 comments
Labels

Comments

@shenki
Copy link
Member

shenki commented Feb 24, 2016

Reported by Doug:

There are some problems getting to boot successfully, and I copied some console
data to the attached file. 2 out of 6 boot attempts made it to the login prompt, and
power was cycled between each attempt.

[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Started udev Kernel Device Manager.
         Starting Network Time Synchronization...
         Starting Update UTMP about System Boot/Shutdown...
Unable to handle kernel paging request at virtual address ffffffff
pgd = cfb20000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 512 Comm: kworker/0:2 Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
Workqueue: events cache_reap
task: cfa3ac00 ti: ce444000 task.ti: ce444000
PC is at drain_array+0x18/0xec
LR is at cache_reap+0x54/0x114
pc : [<c0087d8c>]    lr : [<c0088060>]    psr: a0000013
sp : ce445ec8  ip : 00000000  fp : 00000008
r10: c05084d8  r9 : c050b7a0  r8 : c0509f00
r7 : ce445ed0  r6 : c050b51c  r5 : cf9c9400  r4 : ffffffff
r3 : 00000000  r2 : ffffffff  r1 : cf9c9400  r0 : cf856b20
[  OK  ] Started Create Static Device Nodes in /dev.
systemd-journald[522]: Received request to flush runtime journal from PID 1
[  OK  ] Reached target Local File Systems (Pre).
         Mounting /var/volatile...
         Starting udev Kernel Device Manager...
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Mounted /var/volatile.
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Reached target Local File Systems.
Unable to handle kernel paging request at virtual address ffffffff
pgd = cdcd0000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf810aa0 ti: cf826000 task.ti: cf826000
PC is at kfree+0x44/0xa8
LR is at remove_proc_subtree+0xc8/0xd4
pc : [<c0088164>]    lr : [<c00d4900>]    psr: 40000093
sp : cf827e90  ip : cf827e98  fp : be969770
r10: cdca12c8  r9 : cdc8e630  r8 : 00000000
r7 : cf800100  r6 : a0000013  r5 : ce45f580  r4 : ffffffff
r3 : cffc1be0  r2 : 00000080  r1 : cfdf9000  r0 : ce45f580
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4dcd0000  DAC: 00000051
Process systemd (pid: 1, stack limit = 0xcf826190)
From Shutdown:

+ umount /oldroot/dev/pts
+ umount /oldroot/dev
+ umount /oldroot
------------[ cut here ]------------
Kernel BUG at c005ec10 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 1095 Comm: umount Tainted: G    B           4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf994860 ti: cde32000 task.ti: cde32000
PC is at __delete_from_page_cache+0x204/0x2a8
LR is at __delete_from_page_cache+0x11c/0x2a8
pc : [<c005ec10>]    lr : [<c005eb28>]    psr: 60000093
sp : cde33df0  ip : c0525400  fp : 00000000
r10: 00000001  r9 : cf4852d8  r8 : 00000003
r7 : 00000000  r6 : 00000000  r5 : cf4852d4  r4 : cffadb20
r3 : 00000000  r2 : 000002bc  r1 : c053a334  r0 : 00003602
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4de14000  DAC: 00000051
Process umount (pid: 1095, stack limit = 0xcde32190)
Stack: (0xcde33df0 to 0xcde34000)
3de0:                                     00000000 0000000a cf48cd40 cf48cd5c
3e00: cde33ea8 cffadb20 00000000 00000013 cf4852d4 00000001 00000000 00000001
3e20: cffadb20 c005ecf0 cffadb20 cf4852d4 00000001 c0069428 00001000 00000000
3e40: 00000000 cffadba0 00000000 00000000 ffffffff 00000000 cf4852d4 c00695cc
3e60: cde33e68 00000000 00000000 00000001 00000002 00000003 00000004 00000005
3e80: 00000006 00000007 00000008 00000009 0000000a 0000000b 0000000c 0000000d
3ea0: 0000000a 00000000 cffadba0 cffadb20 cffadbe0 cffadc00 cffadc20 cffadc40
3ec0: cffadc60 cffadc80 cffadca0 cffadcc0 cffad8c0 cffad8a0 cffad880 cffad860
3ee0: c000a424 ffffffff ffffffff cde32000 cdcf6800 c000a424 cde32000 00000000
3f00: bedaaecc c0069990 ffffffff ffffffff 00000000 cf485218 c037d180 c00a1038
3f20: cde32000 cde33f38 cf7b5a38 c00a1100 cdcf69fc c00a1d3c cf48513c cf7b5abc
3f40: 00000000 cdcf6800 c037d180 c052e62c 00000034 c008e5b0 00000000 cf403b80
3f60: 00000081 c008e810 cdcf6800 c050c548 c052e62c c008e910 ffff0001 cdcb0c20
3f80: cfae387c c00a42ac cf994860 c002a97c c000a424 cde32000 cde33fb0 c000cba8
3fa0: 000bc1a0 000bc290 000bc1c0 c000a2d0 00000000 00000000 00000000 000bc1a0
3fc0: 000bc1a0 000bc290 000bc1c0 00000034 00000000 000bc2c0 00000000 bedaaecc
3fe0: 000aa3a8 bedaacd4 0004ebc4 4f77407c 60000010 000bc1c0 00000000 00000000
[<c005ec10>] (__delete_from_page_cache) from [<c005ecf0>] (delete_from_page_cache+0x3c/0x5c)
[<c005ecf0>] (delete_from_page_cache) from [<c0069428>] (truncate_inode_page+0x98/0xa4)
[<c0069428>] (truncate_inode_page) from [<c00695cc>] (truncate_inode_pages_range+0x168/0x518)
[<c00695cc>] (truncate_inode_pages_range) from [<c0069990>] (truncate_inode_pages+0x14/0x1c)
[<c0069990>] (truncate_inode_pages) from [<c00a1038>] (evict+0x90/0x128)
[<c00a1038>] (evict) from [<c00a1100>] (dispose_list+0x30/0x3c)
[<c00a1100>] (dispose_list) from [<c00a1d3c>] (evict_inodes+0xb4/0xbc)
[<c00a1d3c>] (evict_inodes) from [<c008e5b0>] (generic_shutdown_super+0x40/0xc0)
[<c008e5b0>] (generic_shutdown_super) from [<c008e810>] (kill_block_super+0x18/0x68)
[<c008e810>] (kill_block_super) from [<c008e910>] (deactivate_locked_super+0x44/0x74)
[<c008e910>] (deactivate_locked_super) from [<c00a42ac>] (cleanup_mnt+0x4c/0x70)
[<c00a42ac>] (cleanup_mnt) from [<c002a97c>] (task_work_run+0x6c/0x80)
[<c002a97c>] (task_work_run) from [<c000cba8>] (do_work_pending+0xa4/0xc0)
[<c000cba8>] (do_work_pending) from [<c000a2d0>] (slow_work_pending+0xc/0x20)
Code: e121f002 e594300c e3530000 ba000000 (e7f001f2) 
---[ end trace 85608810d6a88310 ]---
Segmentation fault
[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
         Starting Network Time Synchronization...
Unable to handle kernel paging request at virtual address ffffffff
pgd = ce4b0000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 515 Comm: kworker/0:2 Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
Workqueue: events cache_reap
task: cf99f1e0 ti: ce462000 task.ti: ce462000
PC is at drain_array+0x18/0xec
LR is at cache_reap+0x54/0x114
pc : [<c0087d8c>]    lr : [<c0088060>]    psr: a0000013
sp : ce463ec8  ip : cffb2934  fp : 00000008
r10: c05084d8  r9 : c050b7a0  r8 : c0509f00
r7 : ce463ed0  r6 : c050b51c  r5 : cf801280  r4 : ffffffff
r3 : 00000000  r2 : ffffffff  r1 : cf801280  r0 : cf800340
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4e4b0000  DAC: 00000053
Process kworker/0:2 (pid: 515, stack limit = 0xce462190)
Stack: (0xce463ec8 to 0xce464000)
3ec0:                   cf800340 cf801280 ce463ed0 ce463ed0 cf800340 cf801280
3ee0: c050b51c 00000000 c0509f00 c0088060 00000000 00000000 c05084a4 cfa036a0
3f00: c050b7a0 c0508494 00000000 cfdebb00 00000000 c0027be8 cfa036a0 c050b7a0
3f20: cf9f1500 cfa036a0 c0508494 cfa036b8 ce462000 c05084a4 c0508494 c05084d8
3f40: 00000008 c0028280 00000000 ce43dfe0 00000000 cfa036a0 c0027fd0 00000000
3f60: 00000000 00000000 00000000 c002bf74 00000000 00000000 000001f6 cfa036a0
3f80: 00000000 ce463f84 ce463f84 00000000 ce463f90 ce463f90 ce463fac ce43dfe0
3fa0: c002beb0 00000000 00000000 c000a330 00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 6af75684 6868d53b
[<c0087d8c>] (drain_array) from [<c0088060>] (cache_reap+0x54/0x114)
[<c0088060>] (cache_reap) from [<c0027be8>] (process_one_work+0x1bc/0x2f8)
[<c0027be8>] (process_one_work) from [<c0028280>] (worker_thread+0x2b0/0x3ec)
[<c0028280>] (worker_thread) from [<c002bf74>] (kthread+0xc4/0xd8)
[<c002bf74>] (kthread) from [<c000a330>] (ret_from_fork+0x14/0x24)
Code: e28d7008 e58d7008 e58d700c 0a000032 (e5942000) 
---[ end trace 0a57a7db8754ec3b ]---
Unable to handle kernel paging request at virtual address fffffff0
pgd = c0004000
[fffffff0] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 37 [#2] ARM
Modules linked in:
CPU: 0 PID: 515 Comm: kworker/0:2 Tainted: G      D         4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf99f1e0 ti: ce462000 task.ti: ce462000
PC is at kthread_data+0x4/0xc
LR is at wq_worker_sleeping+0xc/0xb8
pc : [<c002c2f0>]    lr : [<c002854c>]    psr: 20000093
sp : ce463c48  ip : c0374b14  fp : ce463c74
r10: 00000001  r9 : cf99f380  r8 : cf99f404
r7 : 00000000  r6 : c0508da0  r5 : ce463adc  r4 : 00000000
r3 : 00000000  r2 : ffffffff  r1 : 00000000  r0 : cf99f1e0
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4e4b0000  DAC: 00000051
Process kworker/0:2 (pid: 515, stack limit = 0xce462190)
Freeing unused kernel memory: 176K (c04d6000 - c0502000)
rofs = mtd5 squashfs rwfs = mtd6 ext4
/dev/mtdblock6 was not cleanly unmounted, check forced.
/dev/mtdblock6: 124/1024 files (0.0% non-contiguous), 215/1024 blocks
EXT4-fs (mtdblock6): mounted filesystem without journal. Opts: (null)
systemd[1]: Failed to insert module 'ipv6': Function not implemented
systemd[1]: Failed to insert module 'kdbus': Function not implemented
random: systemd urandom read with 20 bits of entropy available
systemd[1]: systemd 225 running in system mode. (-PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP -GCRYPT -GNUTLS +ACL -XZ -LZ4 -SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
systemd[1]: Detected architecture arm.

Welcome to Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) 0.1.0 (master)!

...............

[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
Unable to handle kernel paging request at virtual address ffffffff
pgd = cdcd4000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf810aa0 ti: cf826000 task.ti: cf826000
PC is at kmem_cache_free+0x54/0xb8
LR is at kernfs_put+0xf4/0x188
pc : [<c0087c5c>]    lr : [<c00dacc4>]    psr: a0000093
sp : cf827e90  ip : 00000000  fp : cf827f64
r10: 7f754f00  r9 : c0525b60  r8 : c0525b60
r7 : a0000013  r6 : ce47b088  r5 : ffffffff  r4 : cf800a00
r3 : a0000093  r2 : 00000080  r1 : cfdf9000  r0 : cf800a00
Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4dcd4000  DAC: 00000051
Process systemd (pid: 1, stack limit = 0xcf826190)
Stack: (0xcf827e90 to 0xcf828000)
7e80:                                     c053c9cc ce47b088 ce47b178 cdcb7ec8
7ea0: cdcb7ec0 c00dacc4 ce47b088 ce47b1a4 a0000013 ce47b088 ce47b178 00000000
7ec0: c0525b60 cf827ee4 ffffff9c 7f754f00 cf827f64 c00db478 c009e33c cf827f2c
7ee0: 0000004c 00000000 00000000 c009da50 c050bed8 ce47b178 ce4f4524 00000000
7f00: cf827f70 c00dbd30 00000000 ce4f44a0 00000000 c005a0dc c050ae4c ce47b178
7f20: cf4e07f0 c005a114 c005a0fc c050ae4c ce47b178 c00db844 c00db7f8 cf4e56e8
7f40: cf4e56e8 c009843c 00000000 cf4e56e8 cfb0b000 c0098d2c cf827f70 cf827f64
7f60: 40000010 00000000 cfaf26f0 cf4e3b28 b8d446ca 0000001e cfb0b034 ce456000
7f80: c000a424 00000000 00000000 b6f47f10 00000028 c000a424 cf826000 00000000
7fa0: 7f6bacbc c000a280 00000000 00000000 7f754f00 00000000 b6e13ba0 00000000
7fc0: 00000000 00000000 b6f47f10 00000028 b6f4053c 7f750f18 7f722ee0 7f6bacbc
7fe0: 7f700d4c bed08734 7f6181d0 b6d9999c 20000010 7f754f00 f4ad8c1b 8b9d0ac4
[<c0087c5c>] (kmem_cache_free) from [<c00dacc4>] (kernfs_put+0xf4/0x188)
[<c00dacc4>] (kernfs_put) from [<c00db478>] (__kernfs_remove+0x200/0x22c)
[<c00db478>] (__kernfs_remove) from [<c00dbd30>] (kernfs_remove+0x1c/0x2c)
[<c00dbd30>] (kernfs_remove) from [<c005a0dc>] (cgroup_destroy_locked+0x54/0x74)
[<c005a0dc>] (cgroup_destroy_locked) from [<c005a114>] (cgroup_rmdir+0x18/0x34)
[<c005a114>] (cgroup_rmdir) from [<c00db844>] (kernfs_iop_rmdir+0x4c/0x70)
[<c00db844>] (kernfs_iop_rmdir) from [<c009843c>] (vfs_rmdir+0x70/0xfc)
[<c009843c>] (vfs_rmdir) from [<c0098d2c>] (do_rmdir+0xd4/0x124)
[<c0098d2c>] (do_rmdir) from [<c000a280>] (ret_fast_syscall+0x0/0x38)
Code: e10f7000 e3873080 e121f003 e5945000 (e895000c) 
---[ end trace 9e1fd1527d8f4e91 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
@shenki
Copy link
Member Author

shenki commented Feb 24, 2016

14:08 < NormJ> that issue only occurs on a EVT barreleye.  doesn't happen ever 
               on DVT (newer) systems
14:08 < miltonm> NormJ one EVT or all ?
14:09 < NormJ> all the ones doug tried, but not all.  i have a an EVT barreleye 
               that doesn't have issue.

@shenki
Copy link
Member Author

shenki commented Feb 24, 2016

Unable to handle kernel paging request at virtual address 01b50325
pgd = ce4a4000
[01b50325] *pgd=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 1103 Comm: umount Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cfa3a180 ti: cfb3e000 task.ti: cfb3e000
PC is at kfree+0x3c/0xa8
LR is at squashfs_cache_delete+0x60/0x9c
pc : [<c008815c>]    lr : [<c01236e4>]    psr: 00000093
sp : cfb3ff18  ip : 00000010  fp : beeb0ebc
r10: 00000000  r9 : cfb3e000  r8 : 00000038
r7 : 00000004  r6 : a0000013  r5 : cf8db000  r4 : cfadad40
r3 : 01b50309  r2 : ffffffff  r1 : cfdf9000  r0 : cf8db000
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4e4a4000  DAC: 00000051
Process umount (pid: 1103, stack limit = 0xcfb3e190)
Stack: (0xcfb3ff18 to 0xcfb40000)
ff00:                                                       cdcafa00 cfadad40
ff20: 00000000 00000000 00000004 c01236e4 cdcaf900 cdcf4800 c052e62c 00000034
ff40: c000a424 c0125dc8 c0125d9c cdcf4800 c037d180 c008e5dc 00000000 cf403b80
ff60: 00000081 c008e810 cdcf4800 c050c548 c052e62c c008e910 ffff0001 cdcb0c20
ff80: cfae387c c00a42ac cfa3a180 c002a97c c000a424 cfb3e000 cfb3ffb0 c000cba8
ffa0: 000bc1a0 000bc290 000bc1c0 c000a2d0 00000000 00000000 00000000 000bc1a0
ffc0: 000bc1a0 000bc290 000bc1c0 00000034 00000000 000bc2c0 00000000 beeb0ebc
ffe0: 000aa3a8 beeb0cc4 0004ebc4 4f77407c 60000010 000bc1c0 cfbe4a40 00000000
[<c008815c>] (kfree) from [<c01236e4>] (squashfs_cache_delete+0x60/0x9c)
[<c01236e4>] (squashfs_cache_delete) from [<c0125dc8>] (squashfs_put_super+0x2c/0x70)
[<c0125dc8>] (squashfs_put_super) from [<c008e5dc>] (generic_shutdown_super+0x6c/0xc0)
[<c008e5dc>] (generic_shutdown_super) from [<c008e810>] (kill_block_super+0x18/0x68)
[<c008e810>] (kill_block_super) from [<c008e910>] (deactivate_locked_super+0x44/0x74)
[<c008e910>] (deactivate_locked_super) from [<c00a42ac>] (cleanup_mnt+0x4c/0x70)
[<c00a42ac>] (cleanup_mnt) from [<c002a97c>] (task_work_run+0x6c/0x80)
[<c002a97c>] (task_work_run) from [<c000cba8>] (do_work_pending+0xa4/0xc0)
[<c000cba8>] (do_work_pending) from [<c000a2d0>] (slow_work_pending+0xc/0x20)
Code: e0813282 e7912282 e3120902 1593301c (e593701c) 
---[ end trace 293ad3070fe69835 ]---

@nkskjames
Copy link

This happens on occasionally on my EVT system with latest kernel including jffs2. Looks similar to fail #4 above:

root@barreleye:~# reboot
[  OK  ] Stopped target System Time Synchronized.
[  OK  ] Stopped Dispatch Password Requests to Console Directory Watch.
[  OK  ] Stopped target Swap.
         Stopping SSH Per-Connection Server (9.3.62.94:40890)...
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped target Network.
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Load/Save Random Seed...
         Stopping Network Time Synchronization...
         Stopping SSH Per-Connection Server (9.123.229.28:49692)...
[  OK  ] Stopped target Local File Systems.
         Unmounting /var/volatile...
BUG: Bad page map in process systemd-timesyn  pte:4da32100 pmd:4fb2b831
addr:b6e00000 vm_flags:00000075 anon_vma:  (null) mapping:cf48b874 index:0
file:libpthread-2.22.so fault:filemap_fault mmap:generic_file_readonly_mmap readpage:squashfs_readpage
CPU: 0 PID: 850 Comm: systemd-timesyn Not tainted 4.3.6-openbmc-20160222-1 #1
Hardware name: ASpeed SoC
[<c000f28c>] (unwind_backtrace) from [<c000cf0c>] (show_stack+0x10/0x14)
[<c000cf0c>] (show_stack) from [<c00787c8>] (print_bad_pte+0x158/0x190)
[<c00787c8>] (print_bad_pte) from [<c0079cec>] (unmap_single_vma+0x4ec/0x500)
[<c0079cec>] (unmap_single_vma) from [<c007a438>] (unmap_vmas+0x44/0x54)
[<c007a438>] (unmap_vmas) from [<c007e97c>] (exit_mmap+0xc8/0x1dc)
[<c007e97c>] (exit_mmap) from [<c0014770>] (mmput+0x38/0xb8)
[<c0014770>] (mmput) from [<c0017380>] (do_exit+0x2f0/0x7b0)
[<c0017380>] (do_exit) from [<c0018654>] (do_group_exit+0x4c/0xb8)
[<c0018654>] (do_group_exit) from [<c00186d0>] (__wake_up_parent+0x0/0x18)
Disabling lock debugging due to kernel taint
         Unmounting /run/initramfs/rw...
         Stopping SSH Per-Connection Server (9.109.165.198:54558)...
[  OK  ] Removed slice system-getty.slice.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Multi-User System.
         Stopping Phosphor OpenBMC BT to DBUS...
         Stopping D-Bus System Message Bus...
         Stopping Temp placeholder for skeleton function...
         Stopping DBUS introspecting REST server....
         Stopping Phosphor OpenBMC event management daemon...
         Stopping System Logging Service...
[  OK  ] Stopped target Login Prompts.
         Stopping Serial Getty on ttyS4...
         Stopping Phosphor OpenBMC user management daemon...
         Stopping Login Service...
         Stopping Phosphor OpenBMC DBus REST daemon...
[  OK  ] Stopped Forward Password Requests to Wall Directory Watch.
         Stopping SSH Per-Connection Server (9.109.165.198:54737)...
         Unmounting /run/initramfs/ro...
         Stopping Kernel Logging Service...
[  OK  ] Stopped Login Service.
[  OK  ] Stopped DBUS introspecting REST server..
[  OK  ] Stopped Phosphor OpenBMC BT to DBUS.
[  OK  ] Stopped Phosphor OpenBMC event management daemon.
[  OK  ] Stopped Kernel Logging Service.
BUG: Bad rss-counter state mm:ce491640 idx:0 val:1
BUG: Bad rss-counter state mm:ce491640 idx:2 val:-1
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Stopped System Logging Service.
[  OK  ] Stopped Network Time Synchronization.
[  OK  ] Stopped Phosphor OpenBMC user management daemon.
[  OK  ] Stopped Serial Getty on ttyS4.
[  OK  ] Stopped SSH Per-Connection Server (9.3.62.94:40890).
[  OK  ] Stopped SSH Per-Connection Server (9.123.229.28:49692).
[  OK  ] Stopped Temp placeholder for skeleton function.
[  OK  ] Stopped SSH Per-Connection Server (9.109.165.198:54558).
[  OK  ] Stopped SSH Per-Connection Server (9.109.165.198:54737).
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Unmounted /var/volatile.
[  OK  ] Unmounted /run/initramfs/rw.
[  OK  ] Unmounted /run/initramfs/ro.
[  OK  ] Stopped Phosphor OpenBMC DBus REST daemon.
[  OK  ] Stopped Phosphor OpenBMC DBus service management daemon.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Removed slice system-dropbear.slice.
[  OK  ] Closed dropbear.socket.
[  OK  ] Removed slice system-serial\x2dgetty.slice.
[  OK  ] Closed Syslog Socket.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Reached target Shutdown.
systemd-shutdown[1]: Sending SIGTERM to remaining processes...
systemd-journald[520]: Received SIGTERM from PID 1 (systemd-shutdow).
systemd-shutdown[1]: Sending SIGKILL to remaining processes...
systemd-shutdown[1]: Sending SIGKILL to PID 1321 (sh).
systemd-shutdown[1]: Hardware watchdog 'aspeed_wdt', version 0
systemd-shutdown[1]: Unmounting file systems.
systemd-shutdown[1]: Unmounting /tmp.
systemd-shutdown[1]: All filesystems unmounted.
systemd-shutdown[1]: Deactivating swaps.
systemd-shutdown[1]: All swaps deactivated.
systemd-shutdown[1]: Detaching loop devices.
systemd-shutdown[1]: All loop devices detached.
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: All DM devices detached.
systemd-shutdown[1]: Successfully changed into root pivot.
systemd-shutdown[1]: Returning to initrd...
shutdown: reboot --log-level 6 --log-target kmsg --log-color
+ awk /oldroot|mnt/ { print $2 }
+ sort -r
+ umount /oldroot/sys/kernel/debug
+ umount /oldroot/sys/kernel/config
+ umount /oldroot/sys/fs/cgroup/systemd
+ umount /oldroot/sys/fs/cgroup
+ umount /oldroot/sys
+ umount /oldroot/proc
+ umount /oldroot/dev/shm
+ umount /oldroot/dev/pts
+ umount /oldroot/dev
+ umount /oldroot
------------[ cut here ]------------
Kernel BUG at c005ec1c [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 2097 Comm: umount Tainted: G    B           4.3.6-openbmc-20160222-1 #1
Hardware name: ASpeed SoC
task: ce88a3c0 ti: cde50000 task.ti: cde50000
PC is at __delete_from_page_cache+0x204/0x2a8
LR is at __delete_from_page_cache+0x11c/0x2a8
pc : [<c005ec1c>]    lr : [<c005eb34>]    psr: 60000093
sp : cde51df0  ip : c0527400  fp : 00000000
r10: 00000000  r9 : cf48b878  r8 : 00000003
r7 : 00000000  r6 : 00000000  r5 : cf48b874  r4 : cffad640
r3 : 00000000  r2 : 000002bc  r1 : c053c334  r0 : 0000592f
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4de9c000  DAC: 00000051
Process umount (pid: 2097, stack limit = 0xcde50190)
Stack: (0xcde51df0 to 0xcde52000)
1de0:                                     00000000 00000000 cf48a9b0 cf48a9c8
1e00: cde51ea8 cffad640 00000000 00000013 cf48b874 00000000 00000000 00000000
1e20: cffad640 c005ecfc cffad640 cf48b874 00000000 c0069430 00001000 00000000
1e40: 00000000 00000fff 00000000 00000000 ffffffff 00000000 cf48b874 c00695d4
1e60: cde51e68 00000000 00000000 00000001 00000002 00000003 00000004 00000005
1e80: 00000006 00000007 00000008 00000009 0000000a 0000000b 0000000c 0000000d
1ea0: 0000000e 00000000 cffad640 cffad6a0 cffad6c0 cffad6e0 cffad700 cffad720
1ec0: cffad740 cffad760 cffad780 cffad7a0 cffad7c0 cffad7e0 cffad800 cffad820
1ee0: c000a424 ffffffff ffffffff cde50000 cdcb8800 c000a424 cde50000 00000000
1f00: bef19ebc c0069998 ffffffff ffffffff 00000000 cf48b7b8 c037d180 c00a1060
1f20: cde50000 cde51f38 cec760b8 c00a1128 cdcb89fc c00a1d64 cf48b6dc cec7613c
1f40: 00000000 cdcb8800 c037d180 c053062c 00000034 c008e5d0 00000000 cf4039e0
1f60: 00000081 c008e830 cdcb8800 c050e548 c053062c c008e930 ffff0001 cdc8e9e0
1f80: cdc8463c c00a42d4 ce88a3c0 c002a97c c000a424 cde50000 cde51fb0 c000cba8
1fa0: 000bc1a0 000bc290 000bc1c0 c000a2d0 00000000 00000000 00000000 000bc1a0
1fc0: 000bc1a0 000bc290 000bc1c0 00000034 00000000 000bc2c0 00000000 bef19ebc
1fe0: 000aa3a8 bef19cc4 0004ebc4 4e0b407c 60000010 000bc1c0 00000000 00000000
[<c005ec1c>] (__delete_from_page_cache) from [<c005ecfc>] (delete_from_page_cache+0x3c/0x5c)
[<c005ecfc>] (delete_from_page_cache) from [<c0069430>] (truncate_inode_page+0x98/0xa4)
[<c0069430>] (truncate_inode_page) from [<c00695d4>] (truncate_inode_pages_range+0x168/0x518)
[<c00695d4>] (truncate_inode_pages_range) from [<c0069998>] (truncate_inode_pages+0x14/0x1c)
[<c0069998>] (truncate_inode_pages) from [<c00a1060>] (evict+0x90/0x128)
[<c00a1060>] (evict) from [<c00a1128>] (dispose_list+0x30/0x3c)
[<c00a1128>] (dispose_list) from [<c00a1d64>] (evict_inodes+0xb4/0xbc)
[<c00a1d64>] (evict_inodes) from [<c008e5d0>] (generic_shutdown_super+0x40/0xc0)
[<c008e5d0>] (generic_shutdown_super) from [<c008e830>] (kill_block_super+0x18/0x68)
[<c008e830>] (kill_block_super) from [<c008e930>] (deactivate_locked_super+0x44/0x74)
[<c008e930>] (deactivate_locked_super) from [<c00a42d4>] (cleanup_mnt+0x4c/0x70)
[<c00a42d4>] (cleanup_mnt) from [<c002a97c>] (task_work_run+0x6c/0x80)
[<c002a97c>] (task_work_run) from [<c000cba8>] (do_work_pending+0xa4/0xc0)
[<c000cba8>] (do_work_pending) from [<c000a2d0>] (slow_work_pending+0xc/0x20)
Code: e121f002 e594300c e3530000 ba000000 (e7f001f2) 
---[ end trace c053299951a20b73 ]---
Segmentation fault
+ umount /mnt
+ set +x
update: reboot --log-level 6 --log-target kmsg --log-color
Updating bmc...
Erasing block: 8192/8192 (100%) 
Writing kb: 32768/32768 (100%) 
Verifying kb: 32768/32768 (100%) 
Remaining mounts:
tmpfs / tmpfs rw,nosuid,nodev,mode=755 0 0
dev /dev devtmpfs rw,relatime,size=126380k,nr_inodes=31595,mode=755 0 0
proc /proc proc rw,relatime 0 0
sys /sys sysfs rw,relatime 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0

@nkskjames
Copy link

I think we found some important observations:

  1. The EVT system was plugged directly into lab switch. The DVT system was plugged into a local switch.
  2. If the BMC network is plugged in and we do a cold reboot, then seg fault is likely to occur.
  3. The seg faults does not occur on warm reboots (with jffs2 filesystem)
  4. If we wait at uboot prompt for around 10secs before letting kernel start, then seg fault does not occur
  5. If we unplug BMC network, the seg fault does not occur
  6. I went back to original uboot FB patches and seg fault still occurs

@nkskjames nkskjames added the bug label Mar 4, 2016
@gwshan
Copy link

gwshan commented Mar 7, 2016

It's mos likely caused by race condition between uboot and kernel like below:

  1. The uboot has enabled NCSI and there has BDs pointing to valid memory buffer.
  2. Kernel boots and lots of memory slabs (struct kmem_cache) are created. One of the slab cache ("kernfs_node_cache" as being seen in the following example) is using the memory block that uboot reserved to store ingress frames.
  3. Ingress ARP request received and stored to system memory by uboot. It's corrupting the slab cache "kernfs_node_cache".
  4. Kernel tries to allocate the slab object from percpu's hot list and run into crash because of the corrupted slab cache "kernfs_node_cache".

The serial port can't be accessed this moment. I'll figure out a command to stop the activity on MAC before running "bootcmd" command from uboot side after serial port becomes available again.

I added some code to dump the corrupted slab cache and got below dump for "kernfs_node_cache". Obviously, the data here is exactly a ARP request whose source IP address is
9.3.40.155

Corrupted CPU list detected on [kernfs_node_cache]
ff ff ff ff ff ff 5c f3 fc 5f 3f 60 08 06 00 01 
08 00 06 04 00 01 5c f3 fc 5f 3f 60 09 03 28 9b 
00 00 00 00 00 00 09 03 28 96 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 c1 98 e5 ab 
c8 05 44 c0 24 71 80 cf 84 7f 80 cf 01 00 00 00 
50 00 00 00 08 00 00 00 52 12 00 00 b9 12 00 00 
52 12 00 00 7c 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
ea 12 00 00 78 01 00 00 12 02 00 00 06 00 00 00 
08 00 00 00 40 1e 80 cf 
gwshan@gwshan:~$ ping 9.3.40.155
PING 9.3.40.155 (9.3.40.155) 56(84) bytes of data.
64 bytes from 9.3.40.155: icmp_seq=1 ttl=50 time=192 ms
64 bytes from 9.3.40.155: icmp_seq=2 ttl=50 time=193 ms

@gwshan
Copy link

gwshan commented Mar 7, 2016

Writing 0x0 to MACCR (offset: 0x50) should stop the MAC from receiving any frames. It's just a workaround. The final solution is to change uboot to do that before loading and booting kernel image. I'm not sure who is responsible for that? If nobody is in charge of that, I can do that by providing more info:

(A) The source code of uboot
(B) Command to flush uboot image

@nkskjames
Copy link

(A) We have our own fork of uboot: https://github.com/openbmc/u-boot/tree/v2013.07-aspeed-openbmc

So can we use the "workaround" until the final fix? If so, can you do a pull request for that?

@shenki
Copy link
Member Author

shenki commented Mar 29, 2016

These have been committed to our branch at https://github.com/openbmc/u-boot/tree/v2013.07-aspeed-openbmc as of openbmc/u-boot@fecb84a

@shenki shenki closed this as completed Mar 29, 2016
amboar pushed a commit to amboar/linux that referenced this issue Jul 12, 2016
If mvneta_mdio_probe() fails, a kernel warning is triggered due to
missing cleanup in the error path.  Add the necessary cleanup.

------------[ cut here ]------------
WARNING: CPU: 1 PID: 281 at kernel/irq/manage.c:1814 __free_percpu_irq+0xfc/0x130
percpu IRQ 38 still enabled on CPU0!
Modules linked in: bnep bluetooth xhci_plat_hcd xhci_hcd marvell_cesa armada_thermal des_generic ehci_orion mcp3021 spi_orion sfp mdio_i2c evbug fuse
CPU: 1 PID: 281 Comm: connmand Not tainted 4.7.0-rc2+ openbmc#53
Hardware name: Marvell Armada 380/385 (Device Tree)
Backtrace:
[<c0013488>] (dump_backtrace) from [<c00137d0>] (show_stack+0x18/0x1c)
 r6:60010093 r5:ffffffff r4:00000000 r3:dc8ba500
[<c00137b8>] (show_stack) from [<c02c6fe0>] (dump_stack+0xa4/0xdc)
[<c02c6f3c>] (dump_stack) from [<c002d4ec>] (__warn+0xd8/0x104)
 r6:c081e6a0 r5:00000000 r4:edfe5d50 r3:dc8ba500
[<c002d414>] (__warn) from [<c002d5d0>] (warn_slowpath_fmt+0x40/0x48)
 r10:a0010013 r8:c09356f8 r7:00000026 r6:ef11a260 r5:edd7b980 r4:ef11a200
[<c002d594>] (warn_slowpath_fmt) from [<c008c8e0>] (__free_percpu_irq+0xfc/0x130)
 r3:00000026 r2:c081e7ac
[<c008c7e4>] (__free_percpu_irq) from [<c008c95c>] (free_percpu_irq+0x48/0x74)
 r10:00008914 r8:00000000 r7:ffffffed r6:c09356f8 r5:00000026 r4:ef11a200
[<c008c914>] (free_percpu_irq) from [<c043dd70>] (mvneta_open+0x118/0x134)
 r6:ffffffed r5:ef01e640 r4:ef01e000 r3:ef01e000
[<c043dc58>] (mvneta_open) from [<c055f5b4>] (__dev_open+0xa4/0x108)
 r7:ef01e030 r6:c06ff3d8 r5:ffff9003 r4:ef01e000
[<c055f510>] (__dev_open) from [<c055f844>] (__dev_change_flags+0x94/0x150)
 r7:00001002 r6:00000001 r5:ffff9003 r4:ef01e000
[<c055f7b0>] (__dev_change_flags) from [<c055f938>] (dev_change_flags+0x20/0x50)
 r8:00000000 r7:c09334c8 r6:00001002 r5:00000148 r4:ef01e000 r3:00008914
[<c055f918>] (dev_change_flags) from [<c05de044>] (devinet_ioctl+0x6f4/0x7e0)
 r8:00000000 r7:c09334c8 r6:00000000 r5:ee87200c r4:00000000 r3:00008914
[<c05dd950>] (devinet_ioctl) from [<c05e0168>] (inet_ioctl+0x1b8/0x1c8)
 r10:beb4499c r9:edfe4000 r8:ecf13280 r7:c096cf00 r6:beb4499c r5:eef7c240
 r4:00008914
[<c05dffb0>] (inet_ioctl) from [<c053c898>] (sock_ioctl+0x78/0x300)
[<c053c820>] (sock_ioctl) from [<c0155ecc>] (do_vfs_ioctl+0x98/0xa60)
 r7:00000011 r6:00008914 r5:00000011 r4:c01568d0
[<c0155e34>] (do_vfs_ioctl) from [<c01568d0>] (SyS_ioctl+0x3c/0x60)
 r10:00000000 r9:edfe4000 r8:beb4499c r7:00000011 r6:00008914 r5:ecf13280
 r4:ecf13280
[<c0156894>] (SyS_ioctl) from [<c000fe60>] (ret_fast_syscall+0x0/0x1c)
 r8:c0010004 r7:00000036 r6:00000011 r5:000a2978 r4:00000000 r3:00009003
---[ end trace 711f625d5b04b3a7 ]---

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Jon Nettleton <jon@solid-run.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
shenki pushed a commit that referenced this issue Jun 5, 2017
commit 6f6266a upstream.

Reserving a runtime region results in splitting the EFI memory
descriptors for the runtime region. This results in runtime region
descriptors with bogus memory mappings, leading to interesting crashes
like the following during a kexec:

  general protection fault: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
  Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05   09/30/2016
  RIP: 0010:virt_efi_set_variable()
  ...
  Call Trace:
   efi_delete_dummy_variable()
   efi_enter_virtual_mode()
   start_kernel()
   ? set_init_arg()
   x86_64_start_reservations()
   x86_64_start_kernel()
   start_cpu()
  ...
  Kernel panic - not syncing: Fatal exception

Runtime regions will not be freed and do not need to be reserved, so
skip the memmap modification in this case.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 8e80632 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Feb 13, 2019
[ Upstream commit 6fa19f5 ]

syzbot was able to catch a bug in rds [1]

The issue here is that the socket might be found in a hash table
but that its refcount has already be set to 0 by another cpu.

We need to use refcount_inc_not_zero() to be safe here.

[1]

refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 panic+0x2cb/0x65c kernel/panic.c:214
 __warn.cold+0x20/0x48 kernel/panic.c:571
 report_bug+0x263/0x2b0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 fixup_bug arch/x86/kernel/traps.c:173 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
 sock_hold include/net/sock.h:647 [inline]
 rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
 rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
 rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
 rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
 rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
 rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:631
 __sys_sendto+0x387/0x5f0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto net/socket.c:1796 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458089
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff

Fixes: cc4dfb7 ("rds: fix two RCU related problems")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Mar 4, 2019
commit 70ed714 upstream.

KASAN detects a use-after-free when vop devices are removed.

This problem was introduced by commit 0063e8b ("virtio_vop:
don't kfree device on register failure").  That patch moved the freeing
of the struct _vop_vdev to the release function, but failed to ensure
that vop holds a reference to the device when it doesn't want it to go
away.  A kfree() was replaced with a put_device() in the unregistration
path, but the last reference to the device is already dropped in
unregister_virtio_device() so the struct is freed before vop is done
with it.

Fix it by holding a reference until cleanup is done.  This is similar to
the fix in virtio_pci in commit 2989be0 ("virtio_pci: fix use
after free on release").

 ==================================================================
 BUG: KASAN: use-after-free in vop_scan_devices+0xc6c/0xe50 [vop]
 Read of size 8 at addr ffff88800da18580 by task kworker/0:1/12

 CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc4+ #53
 Workqueue: events vop_hotplug_devices [vop]
 Call Trace:
  dump_stack+0x74/0xbb
  print_address_description+0x5d/0x2b0
  ? vop_scan_devices+0xc6c/0xe50 [vop]
  kasan_report+0x152/0x1aa
  ? vop_scan_devices+0xc6c/0xe50 [vop]
  ? vop_scan_devices+0xc6c/0xe50 [vop]
  vop_scan_devices+0xc6c/0xe50 [vop]
  ? vop_loopback_free_irq+0x160/0x160 [vop_loopback]
  process_one_work+0x7c0/0x14b0
  ? pwq_dec_nr_in_flight+0x2d0/0x2d0
  ? do_raw_spin_lock+0x120/0x280
  worker_thread+0x8f/0xbf0
  ? __kthread_parkme+0x78/0xf0
  ? process_one_work+0x14b0/0x14b0
  kthread+0x2ae/0x3a0
  ? kthread_park+0x120/0x120
  ret_from_fork+0x3a/0x50

 Allocated by task 12:
  kmem_cache_alloc_trace+0x13a/0x2a0
  vop_scan_devices+0x473/0xe50 [vop]
  process_one_work+0x7c0/0x14b0
  worker_thread+0x8f/0xbf0
  kthread+0x2ae/0x3a0
  ret_from_fork+0x3a/0x50

 Freed by task 12:
  kfree+0x104/0x310
  device_release+0x73/0x1d0
  kobject_put+0x14f/0x420
  unregister_virtio_device+0x32/0x50
  vop_scan_devices+0x19d/0xe50 [vop]
  process_one_work+0x7c0/0x14b0
  worker_thread+0x8f/0xbf0
  kthread+0x2ae/0x3a0
  ret_from_fork+0x3a/0x50

 The buggy address belongs to the object at ffff88800da18008
  which belongs to the cache kmalloc-2k of size 2048
 The buggy address is located 1400 bytes inside of
  2048-byte region [ffff88800da18008, ffff88800da18808)
 The buggy address belongs to the page:
 page:ffffea0000368600 count:1 mapcount:0 mapping:ffff88801440dbc0 index:0x0 compound_mapcount: 0
 flags: 0x4000000000010200(slab|head)
 raw: 4000000000010200 ffffea0000378608 ffffea000037a008 ffff88801440dbc0
 raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88800da18480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88800da18500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff88800da18580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                    ^
  ffff88800da18600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88800da18680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

Fixes: 0063e8b ("virtio_vop: don't kfree device on register failure")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue May 5, 2020
commit 351cbf6 upstream.

Zygo reported the following lockdep splat while testing the balance
patches

======================================================
WARNING: possible circular locking dependency detected
5.6.0-c6f0579d496a+ #53 Not tainted
------------------------------------------------------
kswapd0/1133 is trying to acquire lock:
ffff888092f622c0 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node+0x7c/0x5b0

but task is already holding lock:
ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}:
       fs_reclaim_acquire.part.91+0x29/0x30
       fs_reclaim_acquire+0x19/0x20
       kmem_cache_alloc_trace+0x32/0x740
       add_block_entry+0x45/0x260
       btrfs_ref_tree_mod+0x6e2/0x8b0
       btrfs_alloc_tree_block+0x789/0x880
       alloc_tree_block_no_bg_flush+0xc6/0xf0
       __btrfs_cow_block+0x270/0x940
       btrfs_cow_block+0x1ba/0x3a0
       btrfs_search_slot+0x999/0x1030
       btrfs_insert_empty_items+0x81/0xe0
       btrfs_insert_delayed_items+0x128/0x7d0
       __btrfs_run_delayed_items+0xf4/0x2a0
       btrfs_run_delayed_items+0x13/0x20
       btrfs_commit_transaction+0x5cc/0x1390
       insert_balance_item.isra.39+0x6b2/0x6e0
       btrfs_balance+0x72d/0x18d0
       btrfs_ioctl_balance+0x3de/0x4c0
       btrfs_ioctl+0x30ab/0x44a0
       ksys_ioctl+0xa1/0xe0
       __x64_sys_ioctl+0x43/0x50
       do_syscall_64+0x77/0x2c0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&delayed_node->mutex){+.+.}:
       __lock_acquire+0x197e/0x2550
       lock_acquire+0x103/0x220
       __mutex_lock+0x13d/0xce0
       mutex_lock_nested+0x1b/0x20
       __btrfs_release_delayed_node+0x7c/0x5b0
       btrfs_remove_delayed_node+0x49/0x50
       btrfs_evict_inode+0x6fc/0x900
       evict+0x19a/0x2c0
       dispose_list+0xa0/0xe0
       prune_icache_sb+0xbd/0xf0
       super_cache_scan+0x1b5/0x250
       do_shrink_slab+0x1f6/0x530
       shrink_slab+0x32e/0x410
       shrink_node+0x2a5/0xba0
       balance_pgdat+0x4bd/0x8a0
       kswapd+0x35a/0x800
       kthread+0x1e9/0x210
       ret_from_fork+0x3a/0x50

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(&delayed_node->mutex);
                               lock(fs_reclaim);
  lock(&delayed_node->mutex);

 *** DEADLOCK ***

3 locks held by kswapd0/1133:
 #0: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 #1: ffffffff8fc380d8 (shrinker_rwsem){++++}, at: shrink_slab+0x1e8/0x410
 #2: ffff8881e0e6c0e8 (&type->s_umount_key#42){++++}, at: trylock_super+0x1b/0x70

stack backtrace:
CPU: 2 PID: 1133 Comm: kswapd0 Not tainted 5.6.0-c6f0579d496a+ #53
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
 dump_stack+0xc1/0x11a
 print_circular_bug.isra.38.cold.57+0x145/0x14a
 check_noncircular+0x2a9/0x2f0
 ? print_circular_bug.isra.38+0x130/0x130
 ? stack_trace_consume_entry+0x90/0x90
 ? save_trace+0x3cc/0x420
 __lock_acquire+0x197e/0x2550
 ? btrfs_inode_clear_file_extent_range+0x9b/0xb0
 ? register_lock_class+0x960/0x960
 lock_acquire+0x103/0x220
 ? __btrfs_release_delayed_node+0x7c/0x5b0
 __mutex_lock+0x13d/0xce0
 ? __btrfs_release_delayed_node+0x7c/0x5b0
 ? __asan_loadN+0xf/0x20
 ? pvclock_clocksource_read+0xeb/0x190
 ? __btrfs_release_delayed_node+0x7c/0x5b0
 ? mutex_lock_io_nested+0xc20/0xc20
 ? __kasan_check_read+0x11/0x20
 ? check_chain_key+0x1e6/0x2e0
 mutex_lock_nested+0x1b/0x20
 ? mutex_lock_nested+0x1b/0x20
 __btrfs_release_delayed_node+0x7c/0x5b0
 btrfs_remove_delayed_node+0x49/0x50
 btrfs_evict_inode+0x6fc/0x900
 ? btrfs_setattr+0x840/0x840
 ? do_raw_spin_unlock+0xa8/0x140
 evict+0x19a/0x2c0
 dispose_list+0xa0/0xe0
 prune_icache_sb+0xbd/0xf0
 ? invalidate_inodes+0x310/0x310
 super_cache_scan+0x1b5/0x250
 do_shrink_slab+0x1f6/0x530
 shrink_slab+0x32e/0x410
 ? do_shrink_slab+0x530/0x530
 ? do_shrink_slab+0x530/0x530
 ? __kasan_check_read+0x11/0x20
 ? mem_cgroup_protected+0x13d/0x260
 shrink_node+0x2a5/0xba0
 balance_pgdat+0x4bd/0x8a0
 ? mem_cgroup_shrink_node+0x490/0x490
 ? _raw_spin_unlock_irq+0x27/0x40
 ? finish_task_switch+0xce/0x390
 ? rcu_read_lock_bh_held+0xb0/0xb0
 kswapd+0x35a/0x800
 ? _raw_spin_unlock_irqrestore+0x4c/0x60
 ? balance_pgdat+0x8a0/0x8a0
 ? finish_wait+0x110/0x110
 ? __kasan_check_read+0x11/0x20
 ? __kthread_parkme+0xc6/0xe0
 ? balance_pgdat+0x8a0/0x8a0
 kthread+0x1e9/0x210
 ? kthread_create_worker_on_cpu+0xc0/0xc0
 ret_from_fork+0x3a/0x50

This is because we hold that delayed node's mutex while doing tree
operations.  Fix this by just wrapping the searches in nofs.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Aug 10, 2020
commit 51415b6 upstream.

[BUG]
When balance is canceled, there is a pretty high chance that unmounting
the fs can lead to lead the NULL pointer dereference:

  BTRFS warning (device dm-3): page private not zero on page 223158272
  ...
  BTRFS warning (device dm-3): page private not zero on page 223162368
  BTRFS error (device dm-3): leaked root 18446744073709551608-304 refcount 1
  BUG: kernel NULL pointer dereference, address: 0000000000000168
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 5793 Comm: umount Tainted: G           O      5.7.0-rc5-custom+ #53
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:__lock_acquire+0x5dc/0x24c0
  Call Trace:
   lock_acquire+0xab/0x390
   _raw_spin_lock+0x39/0x80
   btrfs_release_extent_buffer_pages+0xd7/0x200 [btrfs]
   release_extent_buffer+0xb2/0x170 [btrfs]
   free_extent_buffer+0x66/0xb0 [btrfs]
   btrfs_put_root+0x8e/0x130 [btrfs]
   btrfs_check_leaked_roots.cold+0x5/0x5d [btrfs]
   btrfs_free_fs_info+0xe5/0x120 [btrfs]
   btrfs_kill_super+0x1f/0x30 [btrfs]
   deactivate_locked_super+0x3b/0x80
   deactivate_super+0x3e/0x50
   cleanup_mnt+0x109/0x160
   __cleanup_mnt+0x12/0x20
   task_work_run+0x67/0xa0
   exit_to_usermode_loop+0xc5/0xd0
   syscall_return_slowpath+0x205/0x360
   do_syscall_64+0x6e/0xb0
   entry_SYSCALL_64_after_hwframe+0x49/0xb3
  RIP: 0033:0x7fd028ef740b

[CAUSE]
When balance is canceled, all reloc roots are marked as orphan, and
orphan reloc roots are going to be cleaned up.

However for orphan reloc roots and merged reloc roots, their lifespan
are quite different:

	Merged reloc roots	|	Orphan reloc roots by cancel
--------------------------------------------------------------------
create_reloc_root()		| create_reloc_root()
|- refs == 1			| |- refs == 1
				|
btrfs_grab_root(reloc_root);	| btrfs_grab_root(reloc_root);
|- refs == 2			| |- refs == 2
				|
root->reloc_root = reloc_root;	| root->reloc_root = reloc_root;
		>>> No difference so far <<<
				|
prepare_to_merge()		| prepare_to_merge()
|- btrfs_set_root_refs(item, 1);| |- if (!err) (err == -EINTR)
				|
merge_reloc_roots()		| merge_reloc_roots()
|- merge_reloc_root()		| |- Doing nothing to put reloc root
   |- insert_dirty_subvol()	| |- refs == 2
      |- __del_reloc_root()	|
         |- btrfs_put_root()	|
            |- refs == 1	|
		>>> Now orphan reloc roots still have refs 2 <<<
				|
clean_dirty_subvols()		| clean_dirty_subvols()
|- btrfs_drop_snapshot()	| |- btrfS_drop_snapshot()
   |- reloc_root get freed	|    |- reloc_root still has refs 2
				|	related ebs get freed, but
				|	reloc_root still recorded in
				|	allocated_roots
btrfs_check_leaked_roots()	| btrfs_check_leaked_roots()
|- No leaked roots		| |- Leaked reloc_roots detected
				| |- btrfs_put_root()
				|    |- free_extent_buffer(root->node);
				|       |- eb already freed, caused NULL
				|	   pointer dereference

[FIX]
The fix is to clear fs_root->reloc_root and put it at
merge_reloc_roots() time, so that we won't leak reloc roots.

Fixes: d2311e6 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.1+
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[Manually solve the conflicts due to no btrfs root refs rework]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Jul 26, 2021
[ Upstream commit 05cf8ff ]

The to_ti_syscon_reset_data macro currently only works if the
parameter passed into it is called 'rcdev'.

Fixes a checkpatch --strict issue:

  CHECK: Macro argument reuse 'rcdev' - possible side-effects?
  #53: FILE: drivers/reset/reset-ti-syscon.c:53:
  +#define to_ti_syscon_reset_data(rcdev)	\
  +	container_of(rcdev, struct ti_syscon_reset_data, rcdev)

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
shenki pushed a commit that referenced this issue Feb 17, 2022
[ Upstream commit c0bf3d8 ]

We encountered a crash in smc_setsockopt() and it is caused by
accessing smc->clcsock after clcsock was released.

 BUG: kernel NULL pointer dereference, address: 0000000000000020
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E     5.16.0-rc4+ #53
 RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
 Call Trace:
  <TASK>
  __sys_setsockopt+0xfc/0x190
  __x64_sys_setsockopt+0x20/0x30
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f16ba83918e
  </TASK>

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Fixes: fd57770 ("net/smc: wait for pending work before clcsock release_sock")
Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
shenki pushed a commit that referenced this issue Aug 18, 2022
commit 2af89eb upstream.

coresight devices track their connections (output connections) and
hold a reference to the fwnode. When a device goes away, we walk through
the devices on the coresight bus and make sure that the references
are dropped. This happens both ways:
 a) For all output connections from the device, drop the reference to
    the target device via coresight_release_platform_data()

b) Iterate over all the devices on the coresight bus and drop the
   reference to fwnode if *this* device is the target of the output
   connection, via coresight_remove_conns()->coresight_remove_match().

However, the coresight_remove_match() doesn't clear the fwnode field,
after dropping the reference, this causes use-after-free and
additional refcount drops on the fwnode.

e.g., if we have two devices, A and B, with a connection, A -> B.
If we remove B first, B would clear the reference on B, from A
via coresight_remove_match(). But when A is removed, it still has
a connection with fwnode still pointing to B. Thus it tries to  drops
the reference in coresight_release_platform_data(), raising the bells
like :

[   91.990153] ------------[ cut here ]------------
[   91.990163] refcount_t: addition on 0; use-after-free.
[   91.990212] WARNING: CPU: 0 PID: 461 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
[   91.990260] Modules linked in: coresight_funnel coresight_replicator coresight_etm4x(-)
 crct10dif_ce coresight ip_tables x_tables ipv6 [last unloaded: coresight_cpu_debug]
[   91.990398] CPU: 0 PID: 461 Comm: rmmod Tainted: G        W       T 5.19.0-rc2+ #53
[   91.990418] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb  1 2019
[   91.990434] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   91.990454] pc : refcount_warn_saturate+0xa0/0x144
[   91.990476] lr : refcount_warn_saturate+0xa0/0x144
[   91.990496] sp : ffff80000c843640
[   91.990509] x29: ffff80000c843640 x28: ffff800009957c28 x27: ffff80000c8439a8
[   91.990560] x26: ffff00097eff1990 x25: ffff8000092b6ad8 x24: ffff00097eff19a8
[   91.990610] x23: ffff80000c8439a8 x22: 0000000000000000 x21: ffff80000c8439c2
[   91.990659] x20: 0000000000000000 x19: ffff00097eff1a10 x18: ffff80000ab99c40
[   91.990708] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000abf6fa0
[   91.990756] x14: 000000000000001d x13: 0a2e656572662d72 x12: 657466612d657375
[   91.990805] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffff8000081aba28
[   91.990854] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572
[   91.990903] x5 : ffff00097648ec58 x4 : 0000000000000000 x3 : 0000000000000027
[   91.990952] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080260ba00
[   91.991000] Call trace:
[   91.991012]  refcount_warn_saturate+0xa0/0x144
[   91.991034]  kobject_get+0xac/0xb0
[   91.991055]  of_node_get+0x2c/0x40
[   91.991076]  of_fwnode_get+0x40/0x60
[   91.991094]  fwnode_handle_get+0x3c/0x60
[   91.991116]  fwnode_get_nth_parent+0xf4/0x110
[   91.991137]  fwnode_full_name_string+0x48/0xc0
[   91.991158]  device_node_string+0x41c/0x530
[   91.991178]  pointer+0x320/0x3ec
[   91.991198]  vsnprintf+0x23c/0x750
[   91.991217]  vprintk_store+0x104/0x4b0
[   91.991238]  vprintk_emit+0x8c/0x360
[   91.991257]  vprintk_default+0x44/0x50
[   91.991276]  vprintk+0xcc/0xf0
[   91.991295]  _printk+0x68/0x90
[   91.991315]  of_node_release+0x13c/0x14c
[   91.991334]  kobject_put+0x98/0x114
[   91.991354]  of_node_put+0x24/0x34
[   91.991372]  of_fwnode_put+0x40/0x5c
[   91.991390]  fwnode_handle_put+0x38/0x50
[   91.991411]  coresight_release_platform_data+0x74/0xb0 [coresight]
[   91.991472]  coresight_unregister+0x64/0xcc [coresight]
[   91.991525]  etm4_remove_dev+0x64/0x78 [coresight_etm4x]
[   91.991563]  etm4_remove_amba+0x1c/0x2c [coresight_etm4x]
[   91.991598]  amba_remove+0x3c/0x19c

Reproducible by: (Build all coresight components as modules):

  #!/bin/sh
  while true
  do
     for m in tmc stm cpu_debug etm4x replicator funnel
     do
     	modprobe coresight_${m}
     done

     for m in tmc stm cpu_debug etm4x replicator funnel
     do
     	rmmode coresight_${m}
     done
  done

Cc: stable@vger.kernel.org
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Fixes: 37ea1ff ("coresight: Use fwnode handle instead of device names")
Link: https://lore.kernel.org/r/20220614214024.3005275-1-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit to amboar/linux that referenced this issue Apr 15, 2024
When unregister pd capabilitie in tcpm, KASAN will capture below double
-free issue. The root cause is the same capabilitiy will be kfreed twice,
the first time is kfreed by pd_capabilities_release() and the second time
is explicitly kfreed by tcpm_port_unregister_pd().

[    3.988059] BUG: KASAN: double-free in tcpm_port_unregister_pd+0x1a4/0x3dc
[    3.995001] Free of addr ffff0008164d3000 by task kworker/u16:0/10
[    4.001206]
[    4.002712] CPU: 2 PID: 10 Comm: kworker/u16:0 Not tainted 6.8.0-rc5-next-20240220-05616-g52728c567a55 openbmc#53
[    4.012402] Hardware name: Freescale i.MX8QXP MEK (DT)
[    4.017569] Workqueue: events_unbound deferred_probe_work_func
[    4.023456] Call trace:
[    4.025920]  dump_backtrace+0x94/0xec
[    4.029629]  show_stack+0x18/0x24
[    4.032974]  dump_stack_lvl+0x78/0x90
[    4.036675]  print_report+0xfc/0x5c0
[    4.040289]  kasan_report_invalid_free+0xa0/0xc0
[    4.044937]  __kasan_slab_free+0x124/0x154
[    4.049072]  kfree+0xb4/0x1e8
[    4.052069]  tcpm_port_unregister_pd+0x1a4/0x3dc
[    4.056725]  tcpm_register_port+0x1dd0/0x2558
[    4.061121]  tcpci_register_port+0x420/0x71c
[    4.065430]  tcpci_probe+0x118/0x2e0

To fix the issue, this will remove kree() from tcpm_port_unregister_pd().

Fixes: cd099cd ("usb: typec: tcpm: Support multiple capabilities")
cc: stable@vger.kernel.org
Suggested-by: Aisheng Dong <aisheng.dong@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240311065219.777037-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants