Kernel OOPS at boot and shutdown #53

shenki · 2016-02-24T03:39:49Z

Reported by Doug:

There are some problems getting to boot successfully, and I copied some console
data to the attached file. 2 out of 6 boot attempts made it to the login prompt, and
power was cycled between each attempt.

[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Started udev Kernel Device Manager.
         Starting Network Time Synchronization...
         Starting Update UTMP about System Boot/Shutdown...
Unable to handle kernel paging request at virtual address ffffffff
pgd = cfb20000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 512 Comm: kworker/0:2 Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
Workqueue: events cache_reap
task: cfa3ac00 ti: ce444000 task.ti: ce444000
PC is at drain_array+0x18/0xec
LR is at cache_reap+0x54/0x114
pc : [<c0087d8c>]    lr : [<c0088060>]    psr: a0000013
sp : ce445ec8  ip : 00000000  fp : 00000008
r10: c05084d8  r9 : c050b7a0  r8 : c0509f00
r7 : ce445ed0  r6 : c050b51c  r5 : cf9c9400  r4 : ffffffff
r3 : 00000000  r2 : ffffffff  r1 : cf9c9400  r0 : cf856b20

[  OK  ] Started Create Static Device Nodes in /dev.
systemd-journald[522]: Received request to flush runtime journal from PID 1
[  OK  ] Reached target Local File Systems (Pre).
         Mounting /var/volatile...
         Starting udev Kernel Device Manager...
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Mounted /var/volatile.
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Reached target Local File Systems.
Unable to handle kernel paging request at virtual address ffffffff
pgd = cdcd0000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf810aa0 ti: cf826000 task.ti: cf826000
PC is at kfree+0x44/0xa8
LR is at remove_proc_subtree+0xc8/0xd4
pc : [<c0088164>]    lr : [<c00d4900>]    psr: 40000093
sp : cf827e90  ip : cf827e98  fp : be969770
r10: cdca12c8  r9 : cdc8e630  r8 : 00000000
r7 : cf800100  r6 : a0000013  r5 : ce45f580  r4 : ffffffff
r3 : cffc1be0  r2 : 00000080  r1 : cfdf9000  r0 : ce45f580
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4dcd0000  DAC: 00000051
Process systemd (pid: 1, stack limit = 0xcf826190)

From Shutdown:

+ umount /oldroot/dev/pts
+ umount /oldroot/dev
+ umount /oldroot
------------[ cut here ]------------
Kernel BUG at c005ec10 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 1095 Comm: umount Tainted: G    B           4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf994860 ti: cde32000 task.ti: cde32000
PC is at __delete_from_page_cache+0x204/0x2a8
LR is at __delete_from_page_cache+0x11c/0x2a8
pc : [<c005ec10>]    lr : [<c005eb28>]    psr: 60000093
sp : cde33df0  ip : c0525400  fp : 00000000
r10: 00000001  r9 : cf4852d8  r8 : 00000003
r7 : 00000000  r6 : 00000000  r5 : cf4852d4  r4 : cffadb20
r3 : 00000000  r2 : 000002bc  r1 : c053a334  r0 : 00003602
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4de14000  DAC: 00000051
Process umount (pid: 1095, stack limit = 0xcde32190)
Stack: (0xcde33df0 to 0xcde34000)
3de0:                                     00000000 0000000a cf48cd40 cf48cd5c
3e00: cde33ea8 cffadb20 00000000 00000013 cf4852d4 00000001 00000000 00000001
3e20: cffadb20 c005ecf0 cffadb20 cf4852d4 00000001 c0069428 00001000 00000000
3e40: 00000000 cffadba0 00000000 00000000 ffffffff 00000000 cf4852d4 c00695cc
3e60: cde33e68 00000000 00000000 00000001 00000002 00000003 00000004 00000005
3e80: 00000006 00000007 00000008 00000009 0000000a 0000000b 0000000c 0000000d
3ea0: 0000000a 00000000 cffadba0 cffadb20 cffadbe0 cffadc00 cffadc20 cffadc40
3ec0: cffadc60 cffadc80 cffadca0 cffadcc0 cffad8c0 cffad8a0 cffad880 cffad860
3ee0: c000a424 ffffffff ffffffff cde32000 cdcf6800 c000a424 cde32000 00000000
3f00: bedaaecc c0069990 ffffffff ffffffff 00000000 cf485218 c037d180 c00a1038
3f20: cde32000 cde33f38 cf7b5a38 c00a1100 cdcf69fc c00a1d3c cf48513c cf7b5abc
3f40: 00000000 cdcf6800 c037d180 c052e62c 00000034 c008e5b0 00000000 cf403b80
3f60: 00000081 c008e810 cdcf6800 c050c548 c052e62c c008e910 ffff0001 cdcb0c20
3f80: cfae387c c00a42ac cf994860 c002a97c c000a424 cde32000 cde33fb0 c000cba8
3fa0: 000bc1a0 000bc290 000bc1c0 c000a2d0 00000000 00000000 00000000 000bc1a0
3fc0: 000bc1a0 000bc290 000bc1c0 00000034 00000000 000bc2c0 00000000 bedaaecc
3fe0: 000aa3a8 bedaacd4 0004ebc4 4f77407c 60000010 000bc1c0 00000000 00000000
[<c005ec10>] (__delete_from_page_cache) from [<c005ecf0>] (delete_from_page_cache+0x3c/0x5c)
[<c005ecf0>] (delete_from_page_cache) from [<c0069428>] (truncate_inode_page+0x98/0xa4)
[<c0069428>] (truncate_inode_page) from [<c00695cc>] (truncate_inode_pages_range+0x168/0x518)
[<c00695cc>] (truncate_inode_pages_range) from [<c0069990>] (truncate_inode_pages+0x14/0x1c)
[<c0069990>] (truncate_inode_pages) from [<c00a1038>] (evict+0x90/0x128)
[<c00a1038>] (evict) from [<c00a1100>] (dispose_list+0x30/0x3c)
[<c00a1100>] (dispose_list) from [<c00a1d3c>] (evict_inodes+0xb4/0xbc)
[<c00a1d3c>] (evict_inodes) from [<c008e5b0>] (generic_shutdown_super+0x40/0xc0)
[<c008e5b0>] (generic_shutdown_super) from [<c008e810>] (kill_block_super+0x18/0x68)
[<c008e810>] (kill_block_super) from [<c008e910>] (deactivate_locked_super+0x44/0x74)
[<c008e910>] (deactivate_locked_super) from [<c00a42ac>] (cleanup_mnt+0x4c/0x70)
[<c00a42ac>] (cleanup_mnt) from [<c002a97c>] (task_work_run+0x6c/0x80)
[<c002a97c>] (task_work_run) from [<c000cba8>] (do_work_pending+0xa4/0xc0)
[<c000cba8>] (do_work_pending) from [<c000a2d0>] (slow_work_pending+0xc/0x20)
Code: e121f002 e594300c e3530000 ba000000 (e7f001f2) 
---[ end trace 85608810d6a88310 ]---
Segmentation fault

[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
         Starting Network Time Synchronization...
Unable to handle kernel paging request at virtual address ffffffff
pgd = ce4b0000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 515 Comm: kworker/0:2 Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
Workqueue: events cache_reap
task: cf99f1e0 ti: ce462000 task.ti: ce462000
PC is at drain_array+0x18/0xec
LR is at cache_reap+0x54/0x114
pc : [<c0087d8c>]    lr : [<c0088060>]    psr: a0000013
sp : ce463ec8  ip : cffb2934  fp : 00000008
r10: c05084d8  r9 : c050b7a0  r8 : c0509f00
r7 : ce463ed0  r6 : c050b51c  r5 : cf801280  r4 : ffffffff
r3 : 00000000  r2 : ffffffff  r1 : cf801280  r0 : cf800340
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4e4b0000  DAC: 00000053
Process kworker/0:2 (pid: 515, stack limit = 0xce462190)
Stack: (0xce463ec8 to 0xce464000)
3ec0:                   cf800340 cf801280 ce463ed0 ce463ed0 cf800340 cf801280
3ee0: c050b51c 00000000 c0509f00 c0088060 00000000 00000000 c05084a4 cfa036a0
3f00: c050b7a0 c0508494 00000000 cfdebb00 00000000 c0027be8 cfa036a0 c050b7a0
3f20: cf9f1500 cfa036a0 c0508494 cfa036b8 ce462000 c05084a4 c0508494 c05084d8
3f40: 00000008 c0028280 00000000 ce43dfe0 00000000 cfa036a0 c0027fd0 00000000
3f60: 00000000 00000000 00000000 c002bf74 00000000 00000000 000001f6 cfa036a0
3f80: 00000000 ce463f84 ce463f84 00000000 ce463f90 ce463f90 ce463fac ce43dfe0
3fa0: c002beb0 00000000 00000000 c000a330 00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 6af75684 6868d53b
[<c0087d8c>] (drain_array) from [<c0088060>] (cache_reap+0x54/0x114)
[<c0088060>] (cache_reap) from [<c0027be8>] (process_one_work+0x1bc/0x2f8)
[<c0027be8>] (process_one_work) from [<c0028280>] (worker_thread+0x2b0/0x3ec)
[<c0028280>] (worker_thread) from [<c002bf74>] (kthread+0xc4/0xd8)
[<c002bf74>] (kthread) from [<c000a330>] (ret_from_fork+0x14/0x24)
Code: e28d7008 e58d7008 e58d700c 0a000032 (e5942000) 
---[ end trace 0a57a7db8754ec3b ]---
Unable to handle kernel paging request at virtual address fffffff0
pgd = c0004000
[fffffff0] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 37 [#2] ARM
Modules linked in:
CPU: 0 PID: 515 Comm: kworker/0:2 Tainted: G      D         4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf99f1e0 ti: ce462000 task.ti: ce462000
PC is at kthread_data+0x4/0xc
LR is at wq_worker_sleeping+0xc/0xb8
pc : [<c002c2f0>]    lr : [<c002854c>]    psr: 20000093
sp : ce463c48  ip : c0374b14  fp : ce463c74
r10: 00000001  r9 : cf99f380  r8 : cf99f404
r7 : 00000000  r6 : c0508da0  r5 : ce463adc  r4 : 00000000
r3 : 00000000  r2 : ffffffff  r1 : 00000000  r0 : cf99f1e0
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4e4b0000  DAC: 00000051
Process kworker/0:2 (pid: 515, stack limit = 0xce462190)

Freeing unused kernel memory: 176K (c04d6000 - c0502000)
rofs = mtd5 squashfs rwfs = mtd6 ext4
/dev/mtdblock6 was not cleanly unmounted, check forced.
/dev/mtdblock6: 124/1024 files (0.0% non-contiguous), 215/1024 blocks
EXT4-fs (mtdblock6): mounted filesystem without journal. Opts: (null)
systemd[1]: Failed to insert module 'ipv6': Function not implemented
systemd[1]: Failed to insert module 'kdbus': Function not implemented
random: systemd urandom read with 20 bits of entropy available
systemd[1]: systemd 225 running in system mode. (-PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP -GCRYPT -GNUTLS +ACL -XZ -LZ4 -SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
systemd[1]: Detected architecture arm.

Welcome to Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) 0.1.0 (master)!

...............

[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
Unable to handle kernel paging request at virtual address ffffffff
pgd = cdcd4000
[ffffffff] *pgd=4fffd871, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cf810aa0 ti: cf826000 task.ti: cf826000
PC is at kmem_cache_free+0x54/0xb8
LR is at kernfs_put+0xf4/0x188
pc : [<c0087c5c>]    lr : [<c00dacc4>]    psr: a0000093
sp : cf827e90  ip : 00000000  fp : cf827f64
r10: 7f754f00  r9 : c0525b60  r8 : c0525b60
r7 : a0000013  r6 : ce47b088  r5 : ffffffff  r4 : cf800a00
r3 : a0000093  r2 : 00000080  r1 : cfdf9000  r0 : cf800a00
Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4dcd4000  DAC: 00000051
Process systemd (pid: 1, stack limit = 0xcf826190)
Stack: (0xcf827e90 to 0xcf828000)
7e80:                                     c053c9cc ce47b088 ce47b178 cdcb7ec8
7ea0: cdcb7ec0 c00dacc4 ce47b088 ce47b1a4 a0000013 ce47b088 ce47b178 00000000
7ec0: c0525b60 cf827ee4 ffffff9c 7f754f00 cf827f64 c00db478 c009e33c cf827f2c
7ee0: 0000004c 00000000 00000000 c009da50 c050bed8 ce47b178 ce4f4524 00000000
7f00: cf827f70 c00dbd30 00000000 ce4f44a0 00000000 c005a0dc c050ae4c ce47b178
7f20: cf4e07f0 c005a114 c005a0fc c050ae4c ce47b178 c00db844 c00db7f8 cf4e56e8
7f40: cf4e56e8 c009843c 00000000 cf4e56e8 cfb0b000 c0098d2c cf827f70 cf827f64
7f60: 40000010 00000000 cfaf26f0 cf4e3b28 b8d446ca 0000001e cfb0b034 ce456000
7f80: c000a424 00000000 00000000 b6f47f10 00000028 c000a424 cf826000 00000000
7fa0: 7f6bacbc c000a280 00000000 00000000 7f754f00 00000000 b6e13ba0 00000000
7fc0: 00000000 00000000 b6f47f10 00000028 b6f4053c 7f750f18 7f722ee0 7f6bacbc
7fe0: 7f700d4c bed08734 7f6181d0 b6d9999c 20000010 7f754f00 f4ad8c1b 8b9d0ac4
[<c0087c5c>] (kmem_cache_free) from [<c00dacc4>] (kernfs_put+0xf4/0x188)
[<c00dacc4>] (kernfs_put) from [<c00db478>] (__kernfs_remove+0x200/0x22c)
[<c00db478>] (__kernfs_remove) from [<c00dbd30>] (kernfs_remove+0x1c/0x2c)
[<c00dbd30>] (kernfs_remove) from [<c005a0dc>] (cgroup_destroy_locked+0x54/0x74)
[<c005a0dc>] (cgroup_destroy_locked) from [<c005a114>] (cgroup_rmdir+0x18/0x34)
[<c005a114>] (cgroup_rmdir) from [<c00db844>] (kernfs_iop_rmdir+0x4c/0x70)
[<c00db844>] (kernfs_iop_rmdir) from [<c009843c>] (vfs_rmdir+0x70/0xfc)
[<c009843c>] (vfs_rmdir) from [<c0098d2c>] (do_rmdir+0xd4/0x124)
[<c0098d2c>] (do_rmdir) from [<c000a280>] (ret_fast_syscall+0x0/0x38)
Code: e10f7000 e3873080 e121f003 e5945000 (e895000c) 
---[ end trace 9e1fd1527d8f4e91 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

The text was updated successfully, but these errors were encountered:

shenki · 2016-02-24T03:41:14Z

14:08 < NormJ> that issue only occurs on a EVT barreleye.  doesn't happen ever 
               on DVT (newer) systems
14:08 < miltonm> NormJ one EVT or all ?
14:09 < NormJ> all the ones doug tried, but not all.  i have a an EVT barreleye 
               that doesn't have issue.

shenki · 2016-02-24T04:29:12Z

Unable to handle kernel paging request at virtual address 01b50325
pgd = ce4a4000
[01b50325] *pgd=00000000
Internal error: Oops: 1 [#1] ARM
Modules linked in:
CPU: 0 PID: 1103 Comm: umount Not tainted 4.3.5-openbmc-20160212-1 #1
Hardware name: ASpeed SoC
task: cfa3a180 ti: cfb3e000 task.ti: cfb3e000
PC is at kfree+0x3c/0xa8
LR is at squashfs_cache_delete+0x60/0x9c
pc : [<c008815c>]    lr : [<c01236e4>]    psr: 00000093
sp : cfb3ff18  ip : 00000010  fp : beeb0ebc
r10: 00000000  r9 : cfb3e000  r8 : 00000038
r7 : 00000004  r6 : a0000013  r5 : cf8db000  r4 : cfadad40
r3 : 01b50309  r2 : ffffffff  r1 : cfdf9000  r0 : cf8db000
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4e4a4000  DAC: 00000051
Process umount (pid: 1103, stack limit = 0xcfb3e190)
Stack: (0xcfb3ff18 to 0xcfb40000)
ff00:                                                       cdcafa00 cfadad40
ff20: 00000000 00000000 00000004 c01236e4 cdcaf900 cdcf4800 c052e62c 00000034
ff40: c000a424 c0125dc8 c0125d9c cdcf4800 c037d180 c008e5dc 00000000 cf403b80
ff60: 00000081 c008e810 cdcf4800 c050c548 c052e62c c008e910 ffff0001 cdcb0c20
ff80: cfae387c c00a42ac cfa3a180 c002a97c c000a424 cfb3e000 cfb3ffb0 c000cba8
ffa0: 000bc1a0 000bc290 000bc1c0 c000a2d0 00000000 00000000 00000000 000bc1a0
ffc0: 000bc1a0 000bc290 000bc1c0 00000034 00000000 000bc2c0 00000000 beeb0ebc
ffe0: 000aa3a8 beeb0cc4 0004ebc4 4f77407c 60000010 000bc1c0 cfbe4a40 00000000
[<c008815c>] (kfree) from [<c01236e4>] (squashfs_cache_delete+0x60/0x9c)
[<c01236e4>] (squashfs_cache_delete) from [<c0125dc8>] (squashfs_put_super+0x2c/0x70)
[<c0125dc8>] (squashfs_put_super) from [<c008e5dc>] (generic_shutdown_super+0x6c/0xc0)
[<c008e5dc>] (generic_shutdown_super) from [<c008e810>] (kill_block_super+0x18/0x68)
[<c008e810>] (kill_block_super) from [<c008e910>] (deactivate_locked_super+0x44/0x74)
[<c008e910>] (deactivate_locked_super) from [<c00a42ac>] (cleanup_mnt+0x4c/0x70)
[<c00a42ac>] (cleanup_mnt) from [<c002a97c>] (task_work_run+0x6c/0x80)
[<c002a97c>] (task_work_run) from [<c000cba8>] (do_work_pending+0xa4/0xc0)
[<c000cba8>] (do_work_pending) from [<c000a2d0>] (slow_work_pending+0xc/0x20)
Code: e0813282 e7912282 e3120902 1593301c (e593701c) 
---[ end trace 293ad3070fe69835 ]---

nkskjames · 2016-02-26T19:24:05Z

This happens on occasionally on my EVT system with latest kernel including jffs2. Looks similar to fail #4 above:

root@barreleye:~# reboot
[  OK  ] Stopped target System Time Synchronized.
[  OK  ] Stopped Dispatch Password Requests to Console Directory Watch.
[  OK  ] Stopped target Swap.
         Stopping SSH Per-Connection Server (9.3.62.94:40890)...
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped target Network.
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Load/Save Random Seed...
         Stopping Network Time Synchronization...
         Stopping SSH Per-Connection Server (9.123.229.28:49692)...
[  OK  ] Stopped target Local File Systems.
         Unmounting /var/volatile...
BUG: Bad page map in process systemd-timesyn  pte:4da32100 pmd:4fb2b831
addr:b6e00000 vm_flags:00000075 anon_vma:  (null) mapping:cf48b874 index:0
file:libpthread-2.22.so fault:filemap_fault mmap:generic_file_readonly_mmap readpage:squashfs_readpage
CPU: 0 PID: 850 Comm: systemd-timesyn Not tainted 4.3.6-openbmc-20160222-1 #1
Hardware name: ASpeed SoC
[<c000f28c>] (unwind_backtrace) from [<c000cf0c>] (show_stack+0x10/0x14)
[<c000cf0c>] (show_stack) from [<c00787c8>] (print_bad_pte+0x158/0x190)
[<c00787c8>] (print_bad_pte) from [<c0079cec>] (unmap_single_vma+0x4ec/0x500)
[<c0079cec>] (unmap_single_vma) from [<c007a438>] (unmap_vmas+0x44/0x54)
[<c007a438>] (unmap_vmas) from [<c007e97c>] (exit_mmap+0xc8/0x1dc)
[<c007e97c>] (exit_mmap) from [<c0014770>] (mmput+0x38/0xb8)
[<c0014770>] (mmput) from [<c0017380>] (do_exit+0x2f0/0x7b0)
[<c0017380>] (do_exit) from [<c0018654>] (do_group_exit+0x4c/0xb8)
[<c0018654>] (do_group_exit) from [<c00186d0>] (__wake_up_parent+0x0/0x18)
Disabling lock debugging due to kernel taint
         Unmounting /run/initramfs/rw...
         Stopping SSH Per-Connection Server (9.109.165.198:54558)...
[  OK  ] Removed slice system-getty.slice.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Multi-User System.
         Stopping Phosphor OpenBMC BT to DBUS...
         Stopping D-Bus System Message Bus...
         Stopping Temp placeholder for skeleton function...
         Stopping DBUS introspecting REST server....
         Stopping Phosphor OpenBMC event management daemon...
         Stopping System Logging Service...
[  OK  ] Stopped target Login Prompts.
         Stopping Serial Getty on ttyS4...
         Stopping Phosphor OpenBMC user management daemon...
         Stopping Login Service...
         Stopping Phosphor OpenBMC DBus REST daemon...
[  OK  ] Stopped Forward Password Requests to Wall Directory Watch.
         Stopping SSH Per-Connection Server (9.109.165.198:54737)...
         Unmounting /run/initramfs/ro...
         Stopping Kernel Logging Service...
[  OK  ] Stopped Login Service.
[  OK  ] Stopped DBUS introspecting REST server..
[  OK  ] Stopped Phosphor OpenBMC BT to DBUS.
[  OK  ] Stopped Phosphor OpenBMC event management daemon.
[  OK  ] Stopped Kernel Logging Service.
BUG: Bad rss-counter state mm:ce491640 idx:0 val:1
BUG: Bad rss-counter state mm:ce491640 idx:2 val:-1
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Stopped System Logging Service.
[  OK  ] Stopped Network Time Synchronization.
[  OK  ] Stopped Phosphor OpenBMC user management daemon.
[  OK  ] Stopped Serial Getty on ttyS4.
[  OK  ] Stopped SSH Per-Connection Server (9.3.62.94:40890).
[  OK  ] Stopped SSH Per-Connection Server (9.123.229.28:49692).
[  OK  ] Stopped Temp placeholder for skeleton function.
[  OK  ] Stopped SSH Per-Connection Server (9.109.165.198:54558).
[  OK  ] Stopped SSH Per-Connection Server (9.109.165.198:54737).
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Unmounted /var/volatile.
[  OK  ] Unmounted /run/initramfs/rw.
[  OK  ] Unmounted /run/initramfs/ro.
[  OK  ] Stopped Phosphor OpenBMC DBus REST daemon.
[  OK  ] Stopped Phosphor OpenBMC DBus service management daemon.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Removed slice system-dropbear.slice.
[  OK  ] Closed dropbear.socket.
[  OK  ] Removed slice system-serial\x2dgetty.slice.
[  OK  ] Closed Syslog Socket.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Reached target Shutdown.
systemd-shutdown[1]: Sending SIGTERM to remaining processes...
systemd-journald[520]: Received SIGTERM from PID 1 (systemd-shutdow).
systemd-shutdown[1]: Sending SIGKILL to remaining processes...
systemd-shutdown[1]: Sending SIGKILL to PID 1321 (sh).
systemd-shutdown[1]: Hardware watchdog 'aspeed_wdt', version 0
systemd-shutdown[1]: Unmounting file systems.
systemd-shutdown[1]: Unmounting /tmp.
systemd-shutdown[1]: All filesystems unmounted.
systemd-shutdown[1]: Deactivating swaps.
systemd-shutdown[1]: All swaps deactivated.
systemd-shutdown[1]: Detaching loop devices.
systemd-shutdown[1]: All loop devices detached.
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: All DM devices detached.
systemd-shutdown[1]: Successfully changed into root pivot.
systemd-shutdown[1]: Returning to initrd...
shutdown: reboot --log-level 6 --log-target kmsg --log-color
+ awk /oldroot|mnt/ { print $2 }
+ sort -r
+ umount /oldroot/sys/kernel/debug
+ umount /oldroot/sys/kernel/config
+ umount /oldroot/sys/fs/cgroup/systemd
+ umount /oldroot/sys/fs/cgroup
+ umount /oldroot/sys
+ umount /oldroot/proc
+ umount /oldroot/dev/shm
+ umount /oldroot/dev/pts
+ umount /oldroot/dev
+ umount /oldroot
------------[ cut here ]------------
Kernel BUG at c005ec1c [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 2097 Comm: umount Tainted: G    B           4.3.6-openbmc-20160222-1 #1
Hardware name: ASpeed SoC
task: ce88a3c0 ti: cde50000 task.ti: cde50000
PC is at __delete_from_page_cache+0x204/0x2a8
LR is at __delete_from_page_cache+0x11c/0x2a8
pc : [<c005ec1c>]    lr : [<c005eb34>]    psr: 60000093
sp : cde51df0  ip : c0527400  fp : 00000000
r10: 00000000  r9 : cf48b878  r8 : 00000003
r7 : 00000000  r6 : 00000000  r5 : cf48b874  r4 : cffad640
r3 : 00000000  r2 : 000002bc  r1 : c053c334  r0 : 0000592f
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 4de9c000  DAC: 00000051
Process umount (pid: 2097, stack limit = 0xcde50190)
Stack: (0xcde51df0 to 0xcde52000)
1de0:                                     00000000 00000000 cf48a9b0 cf48a9c8
1e00: cde51ea8 cffad640 00000000 00000013 cf48b874 00000000 00000000 00000000
1e20: cffad640 c005ecfc cffad640 cf48b874 00000000 c0069430 00001000 00000000
1e40: 00000000 00000fff 00000000 00000000 ffffffff 00000000 cf48b874 c00695d4
1e60: cde51e68 00000000 00000000 00000001 00000002 00000003 00000004 00000005
1e80: 00000006 00000007 00000008 00000009 0000000a 0000000b 0000000c 0000000d
1ea0: 0000000e 00000000 cffad640 cffad6a0 cffad6c0 cffad6e0 cffad700 cffad720
1ec0: cffad740 cffad760 cffad780 cffad7a0 cffad7c0 cffad7e0 cffad800 cffad820
1ee0: c000a424 ffffffff ffffffff cde50000 cdcb8800 c000a424 cde50000 00000000
1f00: bef19ebc c0069998 ffffffff ffffffff 00000000 cf48b7b8 c037d180 c00a1060
1f20: cde50000 cde51f38 cec760b8 c00a1128 cdcb89fc c00a1d64 cf48b6dc cec7613c
1f40: 00000000 cdcb8800 c037d180 c053062c 00000034 c008e5d0 00000000 cf4039e0
1f60: 00000081 c008e830 cdcb8800 c050e548 c053062c c008e930 ffff0001 cdc8e9e0
1f80: cdc8463c c00a42d4 ce88a3c0 c002a97c c000a424 cde50000 cde51fb0 c000cba8
1fa0: 000bc1a0 000bc290 000bc1c0 c000a2d0 00000000 00000000 00000000 000bc1a0
1fc0: 000bc1a0 000bc290 000bc1c0 00000034 00000000 000bc2c0 00000000 bef19ebc
1fe0: 000aa3a8 bef19cc4 0004ebc4 4e0b407c 60000010 000bc1c0 00000000 00000000
[<c005ec1c>] (__delete_from_page_cache) from [<c005ecfc>] (delete_from_page_cache+0x3c/0x5c)
[<c005ecfc>] (delete_from_page_cache) from [<c0069430>] (truncate_inode_page+0x98/0xa4)
[<c0069430>] (truncate_inode_page) from [<c00695d4>] (truncate_inode_pages_range+0x168/0x518)
[<c00695d4>] (truncate_inode_pages_range) from [<c0069998>] (truncate_inode_pages+0x14/0x1c)
[<c0069998>] (truncate_inode_pages) from [<c00a1060>] (evict+0x90/0x128)
[<c00a1060>] (evict) from [<c00a1128>] (dispose_list+0x30/0x3c)
[<c00a1128>] (dispose_list) from [<c00a1d64>] (evict_inodes+0xb4/0xbc)
[<c00a1d64>] (evict_inodes) from [<c008e5d0>] (generic_shutdown_super+0x40/0xc0)
[<c008e5d0>] (generic_shutdown_super) from [<c008e830>] (kill_block_super+0x18/0x68)
[<c008e830>] (kill_block_super) from [<c008e930>] (deactivate_locked_super+0x44/0x74)
[<c008e930>] (deactivate_locked_super) from [<c00a42d4>] (cleanup_mnt+0x4c/0x70)
[<c00a42d4>] (cleanup_mnt) from [<c002a97c>] (task_work_run+0x6c/0x80)
[<c002a97c>] (task_work_run) from [<c000cba8>] (do_work_pending+0xa4/0xc0)
[<c000cba8>] (do_work_pending) from [<c000a2d0>] (slow_work_pending+0xc/0x20)
Code: e121f002 e594300c e3530000 ba000000 (e7f001f2) 
---[ end trace c053299951a20b73 ]---
Segmentation fault
+ umount /mnt
+ set +x
update: reboot --log-level 6 --log-target kmsg --log-color
Updating bmc...
Erasing block: 8192/8192 (100%) 
Writing kb: 32768/32768 (100%) 
Verifying kb: 32768/32768 (100%) 
Remaining mounts:
tmpfs / tmpfs rw,nosuid,nodev,mode=755 0 0
dev /dev devtmpfs rw,relatime,size=126380k,nr_inodes=31595,mode=755 0 0
proc /proc proc rw,relatime 0 0
sys /sys sysfs rw,relatime 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0

nkskjames · 2016-03-03T23:20:38Z

I think we found some important observations:

The EVT system was plugged directly into lab switch. The DVT system was plugged into a local switch.
If the BMC network is plugged in and we do a cold reboot, then seg fault is likely to occur.
The seg faults does not occur on warm reboots (with jffs2 filesystem)
If we wait at uboot prompt for around 10secs before letting kernel start, then seg fault does not occur
If we unplug BMC network, the seg fault does not occur
I went back to original uboot FB patches and seg fault still occurs

gwshan · 2016-03-07T10:09:19Z

It's mos likely caused by race condition between uboot and kernel like below:

The uboot has enabled NCSI and there has BDs pointing to valid memory buffer.
Kernel boots and lots of memory slabs (struct kmem_cache) are created. One of the slab cache ("kernfs_node_cache" as being seen in the following example) is using the memory block that uboot reserved to store ingress frames.
Ingress ARP request received and stored to system memory by uboot. It's corrupting the slab cache "kernfs_node_cache".
Kernel tries to allocate the slab object from percpu's hot list and run into crash because of the corrupted slab cache "kernfs_node_cache".

The serial port can't be accessed this moment. I'll figure out a command to stop the activity on MAC before running "bootcmd" command from uboot side after serial port becomes available again.

I added some code to dump the corrupted slab cache and got below dump for "kernfs_node_cache". Obviously, the data here is exactly a ARP request whose source IP address is
9.3.40.155

Corrupted CPU list detected on [kernfs_node_cache]
ff ff ff ff ff ff 5c f3 fc 5f 3f 60 08 06 00 01 
08 00 06 04 00 01 5c f3 fc 5f 3f 60 09 03 28 9b 
00 00 00 00 00 00 09 03 28 96 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 c1 98 e5 ab 
c8 05 44 c0 24 71 80 cf 84 7f 80 cf 01 00 00 00 
50 00 00 00 08 00 00 00 52 12 00 00 b9 12 00 00 
52 12 00 00 7c 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
ea 12 00 00 78 01 00 00 12 02 00 00 06 00 00 00 
08 00 00 00 40 1e 80 cf

gwshan@gwshan:~$ ping 9.3.40.155
PING 9.3.40.155 (9.3.40.155) 56(84) bytes of data.
64 bytes from 9.3.40.155: icmp_seq=1 ttl=50 time=192 ms
64 bytes from 9.3.40.155: icmp_seq=2 ttl=50 time=193 ms

gwshan · 2016-03-07T10:18:08Z

Writing 0x0 to MACCR (offset: 0x50) should stop the MAC from receiving any frames. It's just a workaround. The final solution is to change uboot to do that before loading and booting kernel image. I'm not sure who is responsible for that? If nobody is in charge of that, I can do that by providing more info:

(A) The source code of uboot
(B) Command to flush uboot image

nkskjames · 2016-03-07T15:22:12Z

(A) We have our own fork of uboot: https://github.com/openbmc/u-boot/tree/v2013.07-aspeed-openbmc

So can we use the "workaround" until the final fix? If so, can you do a pull request for that?

shenki · 2016-03-29T07:44:45Z

These have been committed to our branch at https://github.com/openbmc/u-boot/tree/v2013.07-aspeed-openbmc as of openbmc/u-boot@fecb84a

If mvneta_mdio_probe() fails, a kernel warning is triggered due to missing cleanup in the error path. Add the necessary cleanup. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 281 at kernel/irq/manage.c:1814 __free_percpu_irq+0xfc/0x130 percpu IRQ 38 still enabled on CPU0! Modules linked in: bnep bluetooth xhci_plat_hcd xhci_hcd marvell_cesa armada_thermal des_generic ehci_orion mcp3021 spi_orion sfp mdio_i2c evbug fuse CPU: 1 PID: 281 Comm: connmand Not tainted 4.7.0-rc2+ openbmc#53 Hardware name: Marvell Armada 380/385 (Device Tree) Backtrace: [<c0013488>] (dump_backtrace) from [<c00137d0>] (show_stack+0x18/0x1c) r6:60010093 r5:ffffffff r4:00000000 r3:dc8ba500 [<c00137b8>] (show_stack) from [<c02c6fe0>] (dump_stack+0xa4/0xdc) [<c02c6f3c>] (dump_stack) from [<c002d4ec>] (__warn+0xd8/0x104) r6:c081e6a0 r5:00000000 r4:edfe5d50 r3:dc8ba500 [<c002d414>] (__warn) from [<c002d5d0>] (warn_slowpath_fmt+0x40/0x48) r10:a0010013 r8:c09356f8 r7:00000026 r6:ef11a260 r5:edd7b980 r4:ef11a200 [<c002d594>] (warn_slowpath_fmt) from [<c008c8e0>] (__free_percpu_irq+0xfc/0x130) r3:00000026 r2:c081e7ac [<c008c7e4>] (__free_percpu_irq) from [<c008c95c>] (free_percpu_irq+0x48/0x74) r10:00008914 r8:00000000 r7:ffffffed r6:c09356f8 r5:00000026 r4:ef11a200 [<c008c914>] (free_percpu_irq) from [<c043dd70>] (mvneta_open+0x118/0x134) r6:ffffffed r5:ef01e640 r4:ef01e000 r3:ef01e000 [<c043dc58>] (mvneta_open) from [<c055f5b4>] (__dev_open+0xa4/0x108) r7:ef01e030 r6:c06ff3d8 r5:ffff9003 r4:ef01e000 [<c055f510>] (__dev_open) from [<c055f844>] (__dev_change_flags+0x94/0x150) r7:00001002 r6:00000001 r5:ffff9003 r4:ef01e000 [<c055f7b0>] (__dev_change_flags) from [<c055f938>] (dev_change_flags+0x20/0x50) r8:00000000 r7:c09334c8 r6:00001002 r5:00000148 r4:ef01e000 r3:00008914 [<c055f918>] (dev_change_flags) from [<c05de044>] (devinet_ioctl+0x6f4/0x7e0) r8:00000000 r7:c09334c8 r6:00000000 r5:ee87200c r4:00000000 r3:00008914 [<c05dd950>] (devinet_ioctl) from [<c05e0168>] (inet_ioctl+0x1b8/0x1c8) r10:beb4499c r9:edfe4000 r8:ecf13280 r7:c096cf00 r6:beb4499c r5:eef7c240 r4:00008914 [<c05dffb0>] (inet_ioctl) from [<c053c898>] (sock_ioctl+0x78/0x300) [<c053c820>] (sock_ioctl) from [<c0155ecc>] (do_vfs_ioctl+0x98/0xa60) r7:00000011 r6:00008914 r5:00000011 r4:c01568d0 [<c0155e34>] (do_vfs_ioctl) from [<c01568d0>] (SyS_ioctl+0x3c/0x60) r10:00000000 r9:edfe4000 r8:beb4499c r7:00000011 r6:00008914 r5:ecf13280 r4:ecf13280 [<c0156894>] (SyS_ioctl) from [<c000fe60>] (ret_fast_syscall+0x0/0x1c) r8:c0010004 r7:00000036 r6:00000011 r5:000a2978 r4:00000000 r3:00009003 ---[ end trace 711f625d5b04b3a7 ]--- Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Jon Nettleton <jon@solid-run.com> Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6f6266a upstream. Reserving a runtime region results in splitting the EFI memory descriptors for the runtime region. This results in runtime region descriptors with bogus memory mappings, leading to interesting crashes like the following during a kexec: general protection fault: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53 Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05 09/30/2016 RIP: 0010:virt_efi_set_variable() ... Call Trace: efi_delete_dummy_variable() efi_enter_virtual_mode() start_kernel() ? set_init_arg() x86_64_start_reservations() x86_64_start_kernel() start_cpu() ... Kernel panic - not syncing: Fatal exception Runtime regions will not be freed and do not need to be reserved, so skip the memmap modification in this case. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dave Young <dyoung@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 8e80632 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()") Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 6fa19f5 ] syzbot was able to catch a bug in rds [1] The issue here is that the socket might be found in a hash table but that its refcount has already be set to 0 by another cpu. We need to use refcount_inc_not_zero() to be safe here. [1] refcount_t: increment on 0; use-after-free. WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline] WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1db/0x2d0 lib/dump_stack.c:113 panic+0x2cb/0x65c kernel/panic.c:214 __warn.cold+0x20/0x48 kernel/panic.c:571 report_bug+0x263/0x2b0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] fixup_bug arch/x86/kernel/traps.c:173 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973 RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline] RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151 Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49 RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000 RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005 RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021 R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0 R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64 sock_hold include/net/sock.h:647 [inline] rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675 rds_find_bound+0x97c/0x1080 net/rds/bind.c:82 rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362 rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96 rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355 rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:631 __sys_sendto+0x387/0x5f0 net/socket.c:1788 __do_sys_sendto net/socket.c:1800 [inline] __se_sys_sendto net/socket.c:1796 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458089 Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089 RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005 RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4 R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff Fixes: cc4dfb7 ("rds: fix two RCU related problems") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: rds-devel@oss.oracle.com Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 70ed714 upstream. KASAN detects a use-after-free when vop devices are removed. This problem was introduced by commit 0063e8b ("virtio_vop: don't kfree device on register failure"). That patch moved the freeing of the struct _vop_vdev to the release function, but failed to ensure that vop holds a reference to the device when it doesn't want it to go away. A kfree() was replaced with a put_device() in the unregistration path, but the last reference to the device is already dropped in unregister_virtio_device() so the struct is freed before vop is done with it. Fix it by holding a reference until cleanup is done. This is similar to the fix in virtio_pci in commit 2989be0 ("virtio_pci: fix use after free on release"). ================================================================== BUG: KASAN: use-after-free in vop_scan_devices+0xc6c/0xe50 [vop] Read of size 8 at addr ffff88800da18580 by task kworker/0:1/12 CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc4+ #53 Workqueue: events vop_hotplug_devices [vop] Call Trace: dump_stack+0x74/0xbb print_address_description+0x5d/0x2b0 ? vop_scan_devices+0xc6c/0xe50 [vop] kasan_report+0x152/0x1aa ? vop_scan_devices+0xc6c/0xe50 [vop] ? vop_scan_devices+0xc6c/0xe50 [vop] vop_scan_devices+0xc6c/0xe50 [vop] ? vop_loopback_free_irq+0x160/0x160 [vop_loopback] process_one_work+0x7c0/0x14b0 ? pwq_dec_nr_in_flight+0x2d0/0x2d0 ? do_raw_spin_lock+0x120/0x280 worker_thread+0x8f/0xbf0 ? __kthread_parkme+0x78/0xf0 ? process_one_work+0x14b0/0x14b0 kthread+0x2ae/0x3a0 ? kthread_park+0x120/0x120 ret_from_fork+0x3a/0x50 Allocated by task 12: kmem_cache_alloc_trace+0x13a/0x2a0 vop_scan_devices+0x473/0xe50 [vop] process_one_work+0x7c0/0x14b0 worker_thread+0x8f/0xbf0 kthread+0x2ae/0x3a0 ret_from_fork+0x3a/0x50 Freed by task 12: kfree+0x104/0x310 device_release+0x73/0x1d0 kobject_put+0x14f/0x420 unregister_virtio_device+0x32/0x50 vop_scan_devices+0x19d/0xe50 [vop] process_one_work+0x7c0/0x14b0 worker_thread+0x8f/0xbf0 kthread+0x2ae/0x3a0 ret_from_fork+0x3a/0x50 The buggy address belongs to the object at ffff88800da18008 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1400 bytes inside of 2048-byte region [ffff88800da18008, ffff88800da18808) The buggy address belongs to the page: page:ffffea0000368600 count:1 mapcount:0 mapping:ffff88801440dbc0 index:0x0 compound_mapcount: 0 flags: 0x4000000000010200(slab|head) raw: 4000000000010200 ffffea0000378608 ffffea000037a008 ffff88801440dbc0 raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88800da18480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800da18500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88800da18580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88800da18600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800da18680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 0063e8b ("virtio_vop: don't kfree device on register failure") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 351cbf6 upstream. Zygo reported the following lockdep splat while testing the balance patches ====================================================== WARNING: possible circular locking dependency detected 5.6.0-c6f0579d496a+ #53 Not tainted ------------------------------------------------------ kswapd0/1133 is trying to acquire lock: ffff888092f622c0 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node+0x7c/0x5b0 but task is already holding lock: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.91+0x29/0x30 fs_reclaim_acquire+0x19/0x20 kmem_cache_alloc_trace+0x32/0x740 add_block_entry+0x45/0x260 btrfs_ref_tree_mod+0x6e2/0x8b0 btrfs_alloc_tree_block+0x789/0x880 alloc_tree_block_no_bg_flush+0xc6/0xf0 __btrfs_cow_block+0x270/0x940 btrfs_cow_block+0x1ba/0x3a0 btrfs_search_slot+0x999/0x1030 btrfs_insert_empty_items+0x81/0xe0 btrfs_insert_delayed_items+0x128/0x7d0 __btrfs_run_delayed_items+0xf4/0x2a0 btrfs_run_delayed_items+0x13/0x20 btrfs_commit_transaction+0x5cc/0x1390 insert_balance_item.isra.39+0x6b2/0x6e0 btrfs_balance+0x72d/0x18d0 btrfs_ioctl_balance+0x3de/0x4c0 btrfs_ioctl+0x30ab/0x44a0 ksys_ioctl+0xa1/0xe0 __x64_sys_ioctl+0x43/0x50 do_syscall_64+0x77/0x2c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0x197e/0x2550 lock_acquire+0x103/0x220 __mutex_lock+0x13d/0xce0 mutex_lock_nested+0x1b/0x20 __btrfs_release_delayed_node+0x7c/0x5b0 btrfs_remove_delayed_node+0x49/0x50 btrfs_evict_inode+0x6fc/0x900 evict+0x19a/0x2c0 dispose_list+0xa0/0xe0 prune_icache_sb+0xbd/0xf0 super_cache_scan+0x1b5/0x250 do_shrink_slab+0x1f6/0x530 shrink_slab+0x32e/0x410 shrink_node+0x2a5/0xba0 balance_pgdat+0x4bd/0x8a0 kswapd+0x35a/0x800 kthread+0x1e9/0x210 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&delayed_node->mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/1133: #0: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff8fc380d8 (shrinker_rwsem){++++}, at: shrink_slab+0x1e8/0x410 #2: ffff8881e0e6c0e8 (&type->s_umount_key#42){++++}, at: trylock_super+0x1b/0x70 stack backtrace: CPU: 2 PID: 1133 Comm: kswapd0 Not tainted 5.6.0-c6f0579d496a+ #53 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xc1/0x11a print_circular_bug.isra.38.cold.57+0x145/0x14a check_noncircular+0x2a9/0x2f0 ? print_circular_bug.isra.38+0x130/0x130 ? stack_trace_consume_entry+0x90/0x90 ? save_trace+0x3cc/0x420 __lock_acquire+0x197e/0x2550 ? btrfs_inode_clear_file_extent_range+0x9b/0xb0 ? register_lock_class+0x960/0x960 lock_acquire+0x103/0x220 ? __btrfs_release_delayed_node+0x7c/0x5b0 __mutex_lock+0x13d/0xce0 ? __btrfs_release_delayed_node+0x7c/0x5b0 ? __asan_loadN+0xf/0x20 ? pvclock_clocksource_read+0xeb/0x190 ? __btrfs_release_delayed_node+0x7c/0x5b0 ? mutex_lock_io_nested+0xc20/0xc20 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1e6/0x2e0 mutex_lock_nested+0x1b/0x20 ? mutex_lock_nested+0x1b/0x20 __btrfs_release_delayed_node+0x7c/0x5b0 btrfs_remove_delayed_node+0x49/0x50 btrfs_evict_inode+0x6fc/0x900 ? btrfs_setattr+0x840/0x840 ? do_raw_spin_unlock+0xa8/0x140 evict+0x19a/0x2c0 dispose_list+0xa0/0xe0 prune_icache_sb+0xbd/0xf0 ? invalidate_inodes+0x310/0x310 super_cache_scan+0x1b5/0x250 do_shrink_slab+0x1f6/0x530 shrink_slab+0x32e/0x410 ? do_shrink_slab+0x530/0x530 ? do_shrink_slab+0x530/0x530 ? __kasan_check_read+0x11/0x20 ? mem_cgroup_protected+0x13d/0x260 shrink_node+0x2a5/0xba0 balance_pgdat+0x4bd/0x8a0 ? mem_cgroup_shrink_node+0x490/0x490 ? _raw_spin_unlock_irq+0x27/0x40 ? finish_task_switch+0xce/0x390 ? rcu_read_lock_bh_held+0xb0/0xb0 kswapd+0x35a/0x800 ? _raw_spin_unlock_irqrestore+0x4c/0x60 ? balance_pgdat+0x8a0/0x8a0 ? finish_wait+0x110/0x110 ? __kasan_check_read+0x11/0x20 ? __kthread_parkme+0xc6/0xe0 ? balance_pgdat+0x8a0/0x8a0 kthread+0x1e9/0x210 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 This is because we hold that delayed node's mutex while doing tree operations. Fix this by just wrapping the searches in nofs. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 51415b6 upstream. [BUG] When balance is canceled, there is a pretty high chance that unmounting the fs can lead to lead the NULL pointer dereference: BTRFS warning (device dm-3): page private not zero on page 223158272 ... BTRFS warning (device dm-3): page private not zero on page 223162368 BTRFS error (device dm-3): leaked root 18446744073709551608-304 refcount 1 BUG: kernel NULL pointer dereference, address: 0000000000000168 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 5793 Comm: umount Tainted: G O 5.7.0-rc5-custom+ #53 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:__lock_acquire+0x5dc/0x24c0 Call Trace: lock_acquire+0xab/0x390 _raw_spin_lock+0x39/0x80 btrfs_release_extent_buffer_pages+0xd7/0x200 [btrfs] release_extent_buffer+0xb2/0x170 [btrfs] free_extent_buffer+0x66/0xb0 [btrfs] btrfs_put_root+0x8e/0x130 [btrfs] btrfs_check_leaked_roots.cold+0x5/0x5d [btrfs] btrfs_free_fs_info+0xe5/0x120 [btrfs] btrfs_kill_super+0x1f/0x30 [btrfs] deactivate_locked_super+0x3b/0x80 deactivate_super+0x3e/0x50 cleanup_mnt+0x109/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x67/0xa0 exit_to_usermode_loop+0xc5/0xd0 syscall_return_slowpath+0x205/0x360 do_syscall_64+0x6e/0xb0 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fd028ef740b [CAUSE] When balance is canceled, all reloc roots are marked as orphan, and orphan reloc roots are going to be cleaned up. However for orphan reloc roots and merged reloc roots, their lifespan are quite different: Merged reloc roots | Orphan reloc roots by cancel -------------------------------------------------------------------- create_reloc_root() | create_reloc_root() |- refs == 1 | |- refs == 1 | btrfs_grab_root(reloc_root); | btrfs_grab_root(reloc_root); |- refs == 2 | |- refs == 2 | root->reloc_root = reloc_root; | root->reloc_root = reloc_root; >>> No difference so far <<< | prepare_to_merge() | prepare_to_merge() |- btrfs_set_root_refs(item, 1);| |- if (!err) (err == -EINTR) | merge_reloc_roots() | merge_reloc_roots() |- merge_reloc_root() | |- Doing nothing to put reloc root |- insert_dirty_subvol() | |- refs == 2 |- __del_reloc_root() | |- btrfs_put_root() | |- refs == 1 | >>> Now orphan reloc roots still have refs 2 <<< | clean_dirty_subvols() | clean_dirty_subvols() |- btrfs_drop_snapshot() | |- btrfS_drop_snapshot() |- reloc_root get freed | |- reloc_root still has refs 2 | related ebs get freed, but | reloc_root still recorded in | allocated_roots btrfs_check_leaked_roots() | btrfs_check_leaked_roots() |- No leaked roots | |- Leaked reloc_roots detected | |- btrfs_put_root() | |- free_extent_buffer(root->node); | |- eb already freed, caused NULL | pointer dereference [FIX] The fix is to clear fs_root->reloc_root and put it at merge_reloc_roots() time, so that we won't leak reloc roots. Fixes: d2311e6 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: stable@vger.kernel.org # 5.1+ Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [Manually solve the conflicts due to no btrfs root refs rework] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 05cf8ff ] The to_ti_syscon_reset_data macro currently only works if the parameter passed into it is called 'rcdev'. Fixes a checkpatch --strict issue: CHECK: Macro argument reuse 'rcdev' - possible side-effects? #53: FILE: drivers/reset/reset-ti-syscon.c:53: +#define to_ti_syscon_reset_data(rcdev) \ + container_of(rcdev, struct ti_syscon_reset_data, rcdev) Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit c0bf3d8 ] We encountered a crash in smc_setsockopt() and it is caused by accessing smc->clcsock after clcsock was released. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53 RIP: 0010:smc_setsockopt+0x59/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f16ba83918e </TASK> This patch tries to fix it by holding clcsock_release_lock and checking whether clcsock has already been released before access. In case that a crash of the same reason happens in smc_getsockopt() or smc_switch_to_fallback(), this patch also checkes smc->clcsock in them too. And the caller of smc_switch_to_fallback() will identify whether fallback succeeds according to the return value. Fixes: fd57770 ("net/smc: wait for pending work before clcsock release_sock") Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 2af89eb upstream. coresight devices track their connections (output connections) and hold a reference to the fwnode. When a device goes away, we walk through the devices on the coresight bus and make sure that the references are dropped. This happens both ways: a) For all output connections from the device, drop the reference to the target device via coresight_release_platform_data() b) Iterate over all the devices on the coresight bus and drop the reference to fwnode if *this* device is the target of the output connection, via coresight_remove_conns()->coresight_remove_match(). However, the coresight_remove_match() doesn't clear the fwnode field, after dropping the reference, this causes use-after-free and additional refcount drops on the fwnode. e.g., if we have two devices, A and B, with a connection, A -> B. If we remove B first, B would clear the reference on B, from A via coresight_remove_match(). But when A is removed, it still has a connection with fwnode still pointing to B. Thus it tries to drops the reference in coresight_release_platform_data(), raising the bells like : [ 91.990153] ------------[ cut here ]------------ [ 91.990163] refcount_t: addition on 0; use-after-free. [ 91.990212] WARNING: CPU: 0 PID: 461 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144 [ 91.990260] Modules linked in: coresight_funnel coresight_replicator coresight_etm4x(-) crct10dif_ce coresight ip_tables x_tables ipv6 [last unloaded: coresight_cpu_debug] [ 91.990398] CPU: 0 PID: 461 Comm: rmmod Tainted: G W T 5.19.0-rc2+ #53 [ 91.990418] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019 [ 91.990434] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 91.990454] pc : refcount_warn_saturate+0xa0/0x144 [ 91.990476] lr : refcount_warn_saturate+0xa0/0x144 [ 91.990496] sp : ffff80000c843640 [ 91.990509] x29: ffff80000c843640 x28: ffff800009957c28 x27: ffff80000c8439a8 [ 91.990560] x26: ffff00097eff1990 x25: ffff8000092b6ad8 x24: ffff00097eff19a8 [ 91.990610] x23: ffff80000c8439a8 x22: 0000000000000000 x21: ffff80000c8439c2 [ 91.990659] x20: 0000000000000000 x19: ffff00097eff1a10 x18: ffff80000ab99c40 [ 91.990708] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000abf6fa0 [ 91.990756] x14: 000000000000001d x13: 0a2e656572662d72 x12: 657466612d657375 [ 91.990805] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffff8000081aba28 [ 91.990854] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572 [ 91.990903] x5 : ffff00097648ec58 x4 : 0000000000000000 x3 : 0000000000000027 [ 91.990952] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080260ba00 [ 91.991000] Call trace: [ 91.991012] refcount_warn_saturate+0xa0/0x144 [ 91.991034] kobject_get+0xac/0xb0 [ 91.991055] of_node_get+0x2c/0x40 [ 91.991076] of_fwnode_get+0x40/0x60 [ 91.991094] fwnode_handle_get+0x3c/0x60 [ 91.991116] fwnode_get_nth_parent+0xf4/0x110 [ 91.991137] fwnode_full_name_string+0x48/0xc0 [ 91.991158] device_node_string+0x41c/0x530 [ 91.991178] pointer+0x320/0x3ec [ 91.991198] vsnprintf+0x23c/0x750 [ 91.991217] vprintk_store+0x104/0x4b0 [ 91.991238] vprintk_emit+0x8c/0x360 [ 91.991257] vprintk_default+0x44/0x50 [ 91.991276] vprintk+0xcc/0xf0 [ 91.991295] _printk+0x68/0x90 [ 91.991315] of_node_release+0x13c/0x14c [ 91.991334] kobject_put+0x98/0x114 [ 91.991354] of_node_put+0x24/0x34 [ 91.991372] of_fwnode_put+0x40/0x5c [ 91.991390] fwnode_handle_put+0x38/0x50 [ 91.991411] coresight_release_platform_data+0x74/0xb0 [coresight] [ 91.991472] coresight_unregister+0x64/0xcc [coresight] [ 91.991525] etm4_remove_dev+0x64/0x78 [coresight_etm4x] [ 91.991563] etm4_remove_amba+0x1c/0x2c [coresight_etm4x] [ 91.991598] amba_remove+0x3c/0x19c Reproducible by: (Build all coresight components as modules): #!/bin/sh while true do for m in tmc stm cpu_debug etm4x replicator funnel do modprobe coresight_${m} done for m in tmc stm cpu_debug etm4x replicator funnel do rmmode coresight_${m} done done Cc: stable@vger.kernel.org Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: 37ea1ff ("coresight: Use fwnode handle instead of device names") Link: https://lore.kernel.org/r/20220614214024.3005275-1-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

When unregister pd capabilitie in tcpm, KASAN will capture below double -free issue. The root cause is the same capabilitiy will be kfreed twice, the first time is kfreed by pd_capabilities_release() and the second time is explicitly kfreed by tcpm_port_unregister_pd(). [ 3.988059] BUG: KASAN: double-free in tcpm_port_unregister_pd+0x1a4/0x3dc [ 3.995001] Free of addr ffff0008164d3000 by task kworker/u16:0/10 [ 4.001206] [ 4.002712] CPU: 2 PID: 10 Comm: kworker/u16:0 Not tainted 6.8.0-rc5-next-20240220-05616-g52728c567a55 openbmc#53 [ 4.012402] Hardware name: Freescale i.MX8QXP MEK (DT) [ 4.017569] Workqueue: events_unbound deferred_probe_work_func [ 4.023456] Call trace: [ 4.025920] dump_backtrace+0x94/0xec [ 4.029629] show_stack+0x18/0x24 [ 4.032974] dump_stack_lvl+0x78/0x90 [ 4.036675] print_report+0xfc/0x5c0 [ 4.040289] kasan_report_invalid_free+0xa0/0xc0 [ 4.044937] __kasan_slab_free+0x124/0x154 [ 4.049072] kfree+0xb4/0x1e8 [ 4.052069] tcpm_port_unregister_pd+0x1a4/0x3dc [ 4.056725] tcpm_register_port+0x1dd0/0x2558 [ 4.061121] tcpci_register_port+0x420/0x71c [ 4.065430] tcpci_probe+0x118/0x2e0 To fix the issue, this will remove kree() from tcpm_port_unregister_pd(). Fixes: cd099cd ("usb: typec: tcpm: Support multiple capabilities") cc: stable@vger.kernel.org Suggested-by: Aisheng Dong <aisheng.dong@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240311065219.777037-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nkskjames added the bug label Mar 4, 2016

shenki closed this as completed Mar 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel OOPS at boot and shutdown #53

Kernel OOPS at boot and shutdown #53

shenki commented Feb 24, 2016

shenki commented Feb 24, 2016

shenki commented Feb 24, 2016

nkskjames commented Feb 26, 2016

nkskjames commented Mar 3, 2016

gwshan commented Mar 7, 2016

gwshan commented Mar 7, 2016

nkskjames commented Mar 7, 2016

shenki commented Mar 29, 2016

Kernel OOPS at boot and shutdown #53

Kernel OOPS at boot and shutdown #53

Comments

shenki commented Feb 24, 2016

shenki commented Feb 24, 2016

shenki commented Feb 24, 2016

nkskjames commented Feb 26, 2016

nkskjames commented Mar 3, 2016

gwshan commented Mar 7, 2016

gwshan commented Mar 7, 2016

nkskjames commented Mar 7, 2016

shenki commented Mar 29, 2016