-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VGA does not work with 3.0.31 #25
Comments
zgrep -i "pty" /proc/config.gz CONFIG_LEGACY_PTYS is not setMaybe CONFIG_LEGACY_PTYS should be enabled. I'll check tomorrow if nobody has... |
Oh well, it was just because the disp, hdmi and lcd are build as modules with the config provided in github... |
Adding disp, hdmi and lcd to /etc/modules fixes a "no screen" issue in Xorg.o.log, but the monitor remain black (both for HDMI and VGA), and I'm not sure what to do since I can't see errors. |
"Fixed". The modules need to be loaded in a particular order: If hdmi is loaded before lcd, it will fail. |
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LogFS does not use a specialized area to maintain the inodes. The inodes information is kept in a specialized file called inode file. Similarly, the segment information is kept in a segment file. Since the segment file also has an inode which is kept in the inode file, the inode for segment file must be evicted before the inode for inode file. The change fixes the following BUG during unmount Pid: 2057, comm: umount Not tainted 3.5.0-rc6+ #25 Bochs Bochs RIP: 0010:[<ffffffffa005c5f2>] [<ffffffffa005c5f2>] move_page_to_btree+0x32/0x1f0 [logfs] Process umount (pid: 2057, threadinfo ...) Call Trace: [<ffffffff8112adca>] ? find_get_pages+0x2a/0x180 [<ffffffffa00549f5>] logfs_invalidatepage+0x85/0x90 [logfs] [<ffffffff81136c51>] truncate_inode_page+0xb1/0xd0 [<ffffffff81136dcf>] truncate_inode_pages_range+0x15f/0x490 [<ffffffff81558549>] ? printk+0x78/0x7a [<ffffffff81137185>] truncate_inode_pages+0x15/0x20 [<ffffffffa005b7fc>] logfs_evict_inode+0x6c/0x190 [logfs] [<ffffffff8155c75b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff8119e3d7>] evict+0xa7/0x1b0 [<ffffffff8119ea6e>] dispose_list+0x3e/0x60 [<ffffffff8119f1c4>] evict_inodes+0xf4/0x110 [<ffffffff81185b53>] generic_shutdown_super+0x53/0xf0 [<ffffffffa005d8f2>] logfs_kill_sb+0x52/0xf0 [logfs] [<ffffffff81185ec5>] deactivate_locked_super+0x45/0x80 [<ffffffff81186a4a>] deactivate_super+0x4a/0x70 [<ffffffff811a228e>] mntput_no_expire+0xde/0x140 [<ffffffff811a30ff>] sys_umount+0x6f/0x3a0 [<ffffffff8155d8e9>] system_call_fastpath+0x16/0x1b ---[ end trace 45f7752082cefafd ]--- Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
…d reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
This patch fixes stmmac_pltfr_remove function, which is broken because, it is accessing plat variable via freed memory priv pointer which gets freed by free_netdev called from stmmac_dvr_remove. In short this patch caches the plat pointer in local variable before calling stmmac_dvr_remove to prevent code accessing freed memory. Without this patch any attempt to remove the stmmac device will fail as below: Unregistering eth 0 ... Unable to handle kernel paging request at virtual address 6b6b6bab pgd = de5dc000 [6b6b6bab] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP Modules linked in: cdev(O+) CPU: 0 Tainted: G O (3.3.1_stm24_0210-b2000+ #25) PC is at stmmac_pltfr_remove+0x2c/0xa0 LR is at stmmac_pltfr_remove+0x28/0xa0 pc : [<c01b8908>] lr : [<c01b8904>] psr: 60000013 sp : def6be78 ip : de6c5a00 fp : 00000000 r10: 00000028 r9 : c082d81d r8 : 00000001 r7 : de65a600 r6 : df81b240 r5 : c0413fd8 r4 : 00000000 r3 : 6b6b6b6b r2 : def6be6c r1 : c0355e2b r0 : 00000020 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c53c7d Table: 5e5dc04a DAC: 00000015 Process insmod (pid: 738, stack limit = 0xdef6a2f0) Stack: (0xdef6be78 to 0xdef6c000) be60: c0413fe0 c0403658 be80: c0400bb0 c019270c c01926f8 c0191478 00000000 c0414014 c0413fe0 c01914d8 bea0: 00000000 c0413fe0 df8045d0 c019109c c0413fe0 c0400bf0 c0413fd8 c018f04c bec0: 00000000 bf000000 c0413fd8 c01929a0 c0413fd8 bf000000 00000000 c0192bfc bee0: bf00009c bf000014 def6a000 c000859c 00000000 00000001 bf00009c bf00009c bf00: 00000001 bf00009c 00000001 bf0000e4 de65a600 00000001 c082d81d c0058cd0 bf20: bf0000a8 c004fbd8 c0056414 c082d815 c02aea20 bf0001f0 00b0b008 e0846208 bf40: c03ec8a0 e0846000 0000db0d e0850604 e08504de e0853a24 00000204 000002d4 bf60: 00000000 00000000 0000001c 0000001d 00000009 00000000 00000006 00000000 bf80: 00000003 f63d4e2e 0000db0d bef02ed8 00000080 c000d2e8 def6a000 00000000 bfa0: 00000000 c000d140 f63d4e2e 0000db0d 00b0b018 0000db0d 00b0b008 b6f4f298 bfc0: f63d4e2e 0000db0d bef02ed8 00000080 00000003 00000000 00010000 00000000 bfe0: 00b0b008 bef02c64 00008d20 b6ef3784 60000010 00b0b018 5a5a5a5a 5a5a5a5a [<c01b8908>] (stmmac_pltfr_remove+0x2c/0xa0) from [<c019270c>] (platform_drv_remove+0x14/0x18) [<c019270c>] (platform_drv_remove+0x14/0x18) from [<c0191478>] (__device_release_driver+0x64/0xa4) [<c0191478>] (__device_release_driver+0x64/0xa4) from [<c01914d8>] (device_release_driver+0x20/0x2c) [<c01914d8>] (device_release_driver+0x20/0x2c) from [<c019109c>] (bus_remove_device+0xcc/0xdc) [<c019109c>] (bus_remove_device+0xcc/0xdc) from [<c018f04c>] (device_del+0x104/0x160) [<c018f04c>] (device_del+0x104/0x160) from [<c01929a0>] (platform_device_del+0x18/0x58) [<c01929a0>] (platform_device_del+0x18/0x58) from [<c0192bfc>] (platform_device_unregister+0xc/0x18) [<c0192bfc>] (platform_device_unregister+0xc/0x18) from [<bf000014>] (r_init+0x14/0x2c [cdev]) [<bf000014>] (r_init+0x14/0x2c [cdev]) from [<c000859c>] (do_one_initcall+0x90/0x160) [<c000859c>] (do_one_initcall+0x90/0x160) from [<c0058cd0>] (sys_init_module+0x15c4/0x1794) [<c0058cd0>] (sys_init_module+0x15c4/0x1794) from [<c000d140>] (ret_fast_syscall+0x0/0x30) Code: e1a04000 e59f0070 eb039b65 e59636e4 (e5933040) Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Check for the presence of the '/cpus' OF node before dereferencing it blindly: [ 4.181793] Unable to handle kernel NULL pointer dereference at virtual address 0000001c [ 4.181793] pgd = c0004000 [ 4.181823] [0000001c] *pgd=00000000 [ 4.181823] Internal error: Oops: 5 [#1] SMP ARM [ 4.181823] Modules linked in: [ 4.181823] CPU: 1 Tainted: G W (3.8.0-15-generic #25~hbankD) [ 4.181854] PC is at of_get_next_child+0x64/0x70 [ 4.181854] LR is at of_get_next_child+0x24/0x70 [ 4.181854] pc : [<c04fda18>] lr : [<c04fd9d8>] psr: 60000113 [ 4.181854] sp : ed891ec0 ip : ed891ec0 fp : ed891ed4 [ 4.181884] r10: c04dafd0 r9 : c098690c r8 : c0936208 [ 4.181884] r7 : ed890000 r6 : c0a63d00 r5 : 00000000 r4 : 00000000 [ 4.181884] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c0b2acc8 [ 4.181884] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 4.181884] Control: 10c5387d Table: adcb804a DAC: 00000015 [ 4.181915] Process swapper/0 (pid: 1, stack limit = 0xed890238) [ 4.181915] Stack: (0xed891ec0 to 0xed892000) [ 4.181915] 1ec0: c09b7b70 00000007 ed891efc ed891ed8 c04daff4 c04fd9c0 00000000 c09b7b70 [ 4.181915] 1ee0: 00000007 c0a63d00 ed890000 c0936208 ed891f54 ed891f00 c00088e0 c04dafdc [ 4.181945] 1f00: ed891f54 ed891f10 c006e940 00000000 00000000 00000007 00000007 c08a4914 [ 4.181945] 1f20: 00000000 c07dbd30 c0a63d00 c09b7b70 00000007 c0a63d00 000000bc c0936208 [ 4.181945] 1f40: c098690c c0986914 ed891f94 ed891f58 c0936a40 c00087bc 00000007 00000007 [ 4.181976] 1f60: c0936208 be8bda20 b6eea010 c0a63d00 c064547c 00000000 00000000 00000000 [ 4.181976] 1f80: 00000000 00000000 ed891fac ed891f98 c0645498 c09368c8 00000000 00000000 [ 4.181976] 1fa0: 00000000 ed891fb0 c0014658 c0645488 00000000 00000000 00000000 00000000 [ 4.182006] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 4.182006] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 4.182037] [<c04fda18>] (of_get_next_child+0x64/0x70) from [<c04daff4>] (cpu0_cpufreq_driver_init+0x24/0x284) [ 4.182067] [<c04daff4>] (cpu0_cpufreq_driver_init+0x24/0x284) from [<c00088e0>] (do_one_initcall+0x130/0x1b0) [ 4.182067] [<c00088e0>] (do_one_initcall+0x130/0x1b0) from [<c0936a40>] (kernel_init_freeable+0x184/0x24c) [ 4.182098] [<c0936a40>] (kernel_init_freeable+0x184/0x24c) from [<c0645498>] (kernel_init+0x1c/0xf4) [ 4.182128] [<c0645498>] (kernel_init+0x1c/0xf4) from [<c0014658>] (ret_from_fork+0x14/0x20) [ 4.182128] Code: f57ff04f e320f004 e89da830 e89da830 (e595001c) [ 4.182128] ---[ end trace 634903a22e8609cb ]--- [ 4.182189] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 4.182189] [ 4.642395] CPU0: stopping [rjw: Changelog] Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Several people reported the warning: "kernel BUG at kernel/timer.c:729!" and the stack trace is: #7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905 #8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge] #9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge] #10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge] #11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge] #12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc #13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6 #14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad #15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17 #16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68 #17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101 #18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8 #19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun] #20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun] #21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1 #22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe #23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f #24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1 #25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292 this is due to I forgot to check if mp->timer is armed in br_multicast_del_pg(). This bug is introduced by commit 9f00b2e (bridge: only expire the mdb entry when query is received). Same for __br_mdb_del(). Tested-by: poma <pomidorabelisima@gmail.com> Reported-by: LiYonghua <809674045@qq.com> Reported-by: Robert Hancock <hancockrwd@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 7cb2ef5 ("mm: fix aio performance regression for database caused by THP") can cause dereference of a dangling pointer if split_huge_page runs during PageHuge() if there are updates to the tail_page->private field. Also it is repeating compound_head twice for hugetlbfs and it is running compound_head+compound_trans_head for THP when a single one is needed in both cases. The new code within the PageSlab() check doesn't need to verify that the THP page size is never bigger than the smallest hugetlbfs page size, to avoid memory corruption. A longstanding theoretical race condition was found while fixing the above (see the change right after the skip_unlock label, that is relevant for the compound_lock path too). By re-establishing the _mapcount tail refcounting for all compound pages, this also fixes the below problem: echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages BUG: Bad page state in process bash pfn:59a01 page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0 page flags: 0x1c00000000008000(tail) Modules linked in: CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x55/0x76 bad_page+0xd5/0x130 free_pages_prepare+0x213/0x280 __free_pages+0x36/0x80 update_and_free_page+0xc1/0xd0 free_pool_huge_page+0xc2/0xe0 set_max_huge_pages.part.58+0x14c/0x220 nr_hugepages_store_common.isra.60+0xd0/0xf0 nr_hugepages_store+0x13/0x20 kobj_attr_store+0xf/0x20 sysfs_write_file+0x189/0x1e0 vfs_write+0xc5/0x1f0 SyS_write+0x55/0xb0 system_call_fastpath+0x16/0x1b Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…after split thp Memory failures on thp tail pages cause kernel panic like below: mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 7 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0 PGD bae42067 PUD ba47d067 PMD 0 Oops: 0000 [#1] SMP ... CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G M O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25 ... Call Trace: me_huge_page+0x3e/0x50 memory_failure+0x4bb/0xc20 mce_process_work+0x3e/0x70 process_one_work+0x171/0x420 worker_thread+0x11b/0x3a0 ? manage_workers.isra.25+0x2b0/0x2b0 kthread+0xe4/0x100 ? kthread_create_on_node+0x190/0x190 ret_from_fork+0x7c/0xb0 ? kthread_create_on_node+0x190/0x190 ... RIP dequeue_hwpoisoned_huge_page+0x131/0x1e0 CR2: 0000000000000058 The reasoning of this problem is shown below: - when we have a memory error on a thp tail page, the memory error handler grabs a refcount of the head page to keep the thp under us. - Before unmapping the error page from processes, we split the thp, where page refcounts of both of head/tail pages don't change. - Then we call try_to_unmap() over the error page (which was a tail page before). We didn't pin the error page to handle the memory error, this error page is freed and removed from LRU list. - We never have the error page on LRU list, so the first page state check returns "unknown page," then we move to the second check with the saved page flag. - The saved page flag have PG_tail set, so the second page state check returns "hugepage." - We call me_huge_page() for freed error page, then we hit the above panic. The root cause is that we didn't move refcount from the head page to the tail page after split thp. So this patch suggests to do this. This panic was introduced by commit 524fca1 ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages"). Note that we did have the same refcount problem before this commit, but it was just ignored because we had only first page state check which returned "unknown page." The commit changed the refcount problem from "doesn't work" to "kernel panic." Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…ce_activate power_supply_register() calls device_init_wakeup() to register a wakeup source before initializing dev_name. As a result, device_wakeup_enable() end up registering wakeup source with a null name when wakeup_source_register() gets called with dev_name(dev) which is null at the time. When kernel is booted with wakeup_source_activate enabled, it will panic when the trace point code tries to dereference ws->name. Fixed the problem by moving up the kobject_set_name() call prior to accesses to dev_name(). Replaced kobject_set_name() with dev_set_name() which is the right interface to be called from drivers. Fixed the call to device_del() prior to device_add() in for wakeup_init_failed error handling code. Trace after the change: bash-2143 [003] d... 132.280697: wakeup_source_activate: BAT1 state=0x20001 kworker/3:2-1169 [003] d... 132.281305: wakeup_source_deactivate: BAT1 state=0x30000 Oops message: [ 819.769934] device: 'BAT1': device_add [ 819.770078] PM: Adding info for No Bus:BAT1 [ 819.770235] BUG: unable to handle kernel NULL pointer dereference at (null) [ 819.770435] IP: [<ffffffff813381c0>] skip_spaces+0x30/0x30 [ 819.770572] PGD 3efd90067 PUD 3eff61067 PMD 0 [ 819.770716] Oops: 0000 [#1] SMP [ 819.770829] Modules linked in: arc4 iwldvm mac80211 x86_pkg_temp_thermal coretemp kvm_intel joydev i915 kvm uvcvideo ghash_clmulni_intel videobuf2_vmalloc aesni_intel videobuf2_memops videobuf2_core aes_x86_64 ablk_helper cryptd videodev iwlwifi lrw rfcomm gf128mul glue_helper bnep btusb media bluetooth parport_pc hid_generic ppdev snd_hda_codec_hdmi drm_kms_helper snd_hda_codec_realtek cfg80211 drm tpm_infineon samsung_laptop snd_hda_intel usbhid snd_hda_codec hid snd_hwdep snd_pcm microcode snd_page_alloc snd_timer psmouse i2c_algo_bit lpc_ich tpm_tis video wmi mac_hid serio_raw ext2 lp parport r8169 mii [ 819.771802] CPU: 0 PID: 2167 Comm: bash Not tainted 3.12.0+ #25 [ 819.771876] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 900X3C/900X3D/900X4C/900X4D/SAMSUNG_NP1234567890, BIOS P03AAC 07/12/2012 [ 819.772022] task: ffff88002e6ddcc0 ti: ffff8804015ca000 task.ti: ffff8804015ca000 [ 819.772119] RIP: 0010:[<ffffffff813381c0>] [<ffffffff813381c0>] skip_spaces+0x30/0x30 [ 819.772242] RSP: 0018:ffff8804015cbc70 EFLAGS: 00010046 [ 819.772310] RAX: 0000000000000003 RBX: ffff88040cfd6d40 RCX: 0000000000000018 [ 819.772397] RDX: 0000000000020001 RSI: 0000000000000000 RDI: 0000000000000000 [ 819.772484] RBP: ffff8804015cbcc0 R08: 0000000000000000 R09: ffff8803f0768d40 [ 819.772570] R10: ffffea001033b800 R11: 0000000000000000 R12: ffffffff81c519c0 [ 819.772656] R13: 0000000000020001 R14: 0000000000000000 R15: 0000000000020001 [ 819.772744] FS: 00007ff98309b740(0000) GS:ffff88041f200000(0000) knlGS:0000000000000000 [ 819.772845] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 819.772917] CR2: 0000000000000000 CR3: 00000003f59dc000 CR4: 00000000001407f0 [ 819.773001] Stack: [ 819.773030] ffffffff81114003 ffff8804015cbcb0 0000000000000000 0000000000000046 [ 819.773146] ffff880409757a18 ffff8803f065a160 0000000000000000 0000000000020001 [ 819.773273] 0000000000000000 0000000000000000 ffff8804015cbce8 ffffffff8143e388 [ 819.773387] Call Trace: [ 819.773434] [<ffffffff81114003>] ? ftrace_raw_event_wakeup_source+0x43/0xe0 [ 819.773520] [<ffffffff8143e388>] wakeup_source_report_event+0xb8/0xd0 [ 819.773595] [<ffffffff8143e3cd>] __pm_stay_awake+0x2d/0x50 [ 819.773724] [<ffffffff8153395c>] power_supply_changed+0x3c/0x90 [ 819.773795] [<ffffffff8153407c>] power_supply_register+0x18c/0x250 [ 819.773869] [<ffffffff813d8d18>] sysfs_add_battery+0x61/0x7b [ 819.773935] [<ffffffff813d8d69>] battery_notify+0x37/0x3f [ 819.774001] [<ffffffff816ccb7c>] notifier_call_chain+0x4c/0x70 [ 819.774071] [<ffffffff81073ded>] __blocking_notifier_call_chain+0x4d/0x70 [ 819.774149] [<ffffffff81073e26>] blocking_notifier_call_chain+0x16/0x20 [ 819.774227] [<ffffffff8109397a>] pm_notifier_call_chain+0x1a/0x40 [ 819.774316] [<ffffffff81095b66>] hibernate+0x66/0x1c0 [ 819.774407] [<ffffffff81093931>] state_store+0x71/0xa0 [ 819.774507] [<ffffffff81331d8f>] kobj_attr_store+0xf/0x20 [ 819.774613] [<ffffffff811f8618>] sysfs_write_file+0x128/0x1c0 [ 819.774735] [<ffffffff8118579d>] vfs_write+0xbd/0x1e0 [ 819.774841] [<ffffffff811861d9>] SyS_write+0x49/0xa0 [ 819.774939] [<ffffffff816d1052>] system_call_fastpath+0x16/0x1b [ 819.775055] Code: 89 f8 48 89 e5 f6 82 c0 a6 84 81 20 74 15 0f 1f 44 00 00 48 83 c0 01 0f b6 10 f6 82 c0 a6 84 81 20 75 f0 5d c3 66 0f 1f 44 00 00 <80> 3f 00 55 48 89 e5 74 15 48 89 f8 0f 1f 40 00 48 83 c0 01 80 [ 819.775760] RIP [<ffffffff813381c0>] skip_spaces+0x30/0x30 [ 819.775881] RSP <ffff8804015cbc70> [ 819.775949] CR2: 0000000000000000 [ 819.794175] ---[ end trace c4ef25127039952e ]--- Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Acked-by: Anton Vorontsov <anton@enomsg.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: Anton Vorontsov <anton@enomsg.org>
Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... torvalds#31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok. Booting Node 1, Processors torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok. Booting Node 0, Processors torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok. Booting Node 1, Processors torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 Brought up 32 CPUs Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
commit 27c73ae upstream. Commit 7cb2ef5 ("mm: fix aio performance regression for database caused by THP") can cause dereference of a dangling pointer if split_huge_page runs during PageHuge() if there are updates to the tail_page->private field. Also it is repeating compound_head twice for hugetlbfs and it is running compound_head+compound_trans_head for THP when a single one is needed in both cases. The new code within the PageSlab() check doesn't need to verify that the THP page size is never bigger than the smallest hugetlbfs page size, to avoid memory corruption. A longstanding theoretical race condition was found while fixing the above (see the change right after the skip_unlock label, that is relevant for the compound_lock path too). By re-establishing the _mapcount tail refcounting for all compound pages, this also fixes the below problem: echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages BUG: Bad page state in process bash pfn:59a01 page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0 page flags: 0x1c00000000008000(tail) Modules linked in: CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x55/0x76 bad_page+0xd5/0x130 free_pages_prepare+0x213/0x280 __free_pages+0x36/0x80 update_and_free_page+0xc1/0xd0 free_pool_huge_page+0xc2/0xe0 set_max_huge_pages.part.58+0x14c/0x220 nr_hugepages_store_common.isra.60+0xd0/0xf0 nr_hugepages_store+0x13/0x20 kobj_attr_store+0xf/0x20 sysfs_write_file+0x189/0x1e0 vfs_write+0xc5/0x1f0 SyS_write+0x55/0xb0 system_call_fastpath+0x16/0x1b Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guillaume Morin <guillaume@morinfr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the following spacing issues detected by checkpatch.pl: WARNING: line over 80 characters torvalds#357: FILE: drivers/staging/ozwpan/ozhcd.c:357: +static struct oz_urb_link *oz_uncancel_urb(struct oz_hcd *ozhcd, struct urb *urb) ERROR: trailing whitespace #25: FILE: drivers/staging/ozwpan/ozpd.h:25: +/* $ Signed-off-by: Jerome Pinot <ngc891@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
… built as a module [ 12.761956] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 12.762016] IP: [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.762060] PGD 1fec74067 PUD 1fee5b067 PMD 0 [ 12.762127] Oops: 0000 [linux-sunxi#1] SMP [ 12.762177] Modules linked in: hid_generic crc32c_intel usbhid hid firewire_ohci(+) e1000e(+) firewire_core crc_itu_t xhci_hcd(+) thermal(+) fan thermal_sys hwmon [ 12.762423] CPU 1 [ 12.762443] Pid: 187, comm: modprobe Tainted: G A 3.7.0-thermal-module+ linux-sunxi#25 /DH77DF [ 12.762496] RIP: 0010:[<ffffffffa0005277>] [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.762682] RSP: 0018:ffff8801fe7ddc18 EFLAGS: 00010282 [ 12.762704] RAX: 0000000000000000 RBX: ffff8801ff3e9c00 RCX: ffff8801fdc39800 [ 12.762728] RDX: ffff8801fe7ddc24 RSI: 0000000000000001 RDI: ffff8801ff3e9c00 [ 12.762764] RBP: ffff8801fe7ddc48 R08: 0000000004000000 R09: ffffffffa001f568 [ 12.762797] R10: ffffffff81363083 R11: 0000000000000001 R12: 0000000000000001 [ 12.762832] R13: 0000000000000000 R14: 0000000000000001 R15: ffff8801fde73e68 [ 12.762866] FS: 00007f5548516700(0000) GS:ffff88021f240000(0000) knlGS:0000000000000000 [ 12.762912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.762946] CR2: 0000000000000018 CR3: 00000001fefe2000 CR4: 00000000001407e0 [ 12.762979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.763014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 12.763048] Process modprobe (pid: 187, threadinfo ffff8801fe7dc000, task ffff8801fe5bdb40) [ 12.763095] Stack: [ 12.763122] 0000000000019640 00000000fdc39800 ffff8801fe7ddc48 ffff8801ff3e9c00 [ 12.763225] 0000000000000002 0000000000000000 ffff8801fe7ddc78 ffffffffa00053e7 [ 12.763338] ffff8801ff3e9c00 0000000000006c98 ffffffffa0007480 ffff8801ff3e9c00 [ 12.763440] Call Trace: [ 12.763470] [<ffffffffa00053e7>] thermal_zone_device_update+0x77/0xa0 [thermal_sys] [ 12.763515] [<ffffffffa0006d38>] thermal_zone_device_register+0x788/0xa88 [thermal_sys] [ 12.763562] [<ffffffffa001f394>] acpi_thermal_add+0x360/0x4c8 [thermal] [ 12.763598] [<ffffffff8133902a>] acpi_device_probe+0x50/0x190 [ 12.763632] [<ffffffff811bd793>] ? sysfs_create_link+0x13/0x20 [ 12.763666] [<ffffffff813cc41b>] driver_probe_device+0x7b/0x240 [ 12.763699] [<ffffffff813cc68b>] __driver_attach+0xab/0xb0 [ 12.763732] [<ffffffff813cc5e0>] ? driver_probe_device+0x240/0x240 [ 12.763766] [<ffffffff813ca836>] bus_for_each_dev+0x56/0x90 [ 12.763799] [<ffffffff813cbf4e>] driver_attach+0x1e/0x20 [ 12.763831] [<ffffffff813cbac0>] bus_add_driver+0x190/0x290 [ 12.763864] [<ffffffffa0022000>] ? 0xffffffffa0021fff [ 12.763896] [<ffffffff813ccbea>] driver_register+0x7a/0x160 [ 12.763928] [<ffffffffa0022000>] ? 0xffffffffa0021fff [ 12.763960] [<ffffffff813399fb>] acpi_bus_register_driver+0x43/0x45 [ 12.763995] [<ffffffffa002203a>] acpi_thermal_init+0x3a/0x42 [thermal] [ 12.764029] [<ffffffff8100207f>] do_one_initcall+0x3f/0x170 [ 12.764063] [<ffffffff810b1a5f>] sys_init_module+0x8f/0x200 [ 12.764097] [<ffffffff815ff259>] system_call_fastpath+0x16/0x1b [ 12.764129] Code: 48 8b 87 c8 02 00 00 41 89 f4 48 8d 55 dc ff 50 28 44 8b 6d dc 41 8d 45 fe 83 f8 01 76 5e 48 8b 83 d8 02 00 00 44 89 e6 48 89 df <ff> 50 18 4c 8d a3 10 03 00 00 4c 89 e7 e8 87 f1 5e e1 8b 83 bc [ 12.765164] RIP [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.765223] RSP <ffff8801fe7ddc18> [ 12.765252] CR2: 0000000000000018 [ 12.765284] ---[ end trace 7723294cdfb00d2a ]--- This is because thermal_zone_device_update() is invoked before any thermal governors being registered. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
airq_iv_(alloc|free) is called by some users with interrupts enabled and by some with interrupts disabled which leads to the following lockdep warning: [ INFO: possible irq lock inversion dependency detected ] 3.14.0-15249-gbf29b7b-dirty #25 Not tainted --------------------------------------------------------- insmod/2108 just changed the state of lock: (&(&iv->lock)->rlock){+.....}, at: [<000000000046ee3e>] airq_iv_alloc+0x62/0x228 but this lock was taken by another, HARDIRQ-READ-safe lock in the past: (&info->lock){.-.-..} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&iv->lock)->rlock); local_irq_disable(); lock(&info->lock); lock(&(&iv->lock)->rlock); <Interrupt> lock(&info->lock); *** DEADLOCK *** Although this is a false alarm (since each airq user consistently calls these functions from the same context) fix this by ensuring that interrupts are disabled when the airq lock is held. Reported-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
mm could be removed from current task struct, using previous vma->vm_mm It will crash on blackfin after updated to Linux 3.15. The commit "mm: per-thread vma caching" caused the crash. mm could be removed from current task struct before mmput()-> exit_mmap()-> delete_vma_from_mm() the detailed fault information: NULL pointer access Kernel OOPS in progress Deferred Exception context CURRENT PROCESS: COMM=modprobe PID=278 CPU=0 invalid mm return address: [0x000531de]; contents of: 0x000531b0: c727 acea 0c42 181d 0000 0000 0000 a0a8 0x000531c0: b090 acaa 0c42 1806 0000 0000 0000 a0e8 0x000531d0: b0d0 e801 0000 05b3 0010 e522 0046 [a090] 0x000531e0: 6408 b090 0c00 17cc 3042 e3ff f37b 2fc8 CPU: 0 PID: 278 Comm: modprobe Not tainted 3.15.0-ADI-2014R1-pre-00345-gea9f446 #25 task: 0572b720 ti: 0569e000 task.ti: 0569e000 Compiled for cpu family 0x27fe (Rev 0), but running on:0x0000 (Rev 0) ADSP-BF609-0.0 500(MHz CCLK) 125(MHz SCLK) (mpu off) Linux version 3.15.0-ADI-2014R1-pre-00345-gea9f446 (steven@steven-OptiPlex-390) (gcc version 4.3.5 (ADI-trunk/svn-5962) ) #25 Tue Jun 10 17:47:46 CST 2014 SEQUENCER STATUS: Not tainted SEQSTAT: 00000027 IPEND: 8008 IMASK: ffff SYSCFG: 2806 EXCAUSE : 0x27 physical IVG3 asserted : <0xffa00744> { _trap + 0x0 } physical IVG15 asserted : <0xffa00d68> { _evt_system_call + 0x0 } logical irq 6 mapped : <0xffa003bc> { _bfin_coretmr_interrupt + 0x0 } logical irq 7 mapped : <0x00008828> { _bfin_fault_routine + 0x0 } logical irq 11 mapped : <0x00007724> { _l2_ecc_err + 0x0 } logical irq 13 mapped : <0x00008828> { _bfin_fault_routine + 0x0 } logical irq 39 mapped : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 } logical irq 40 mapped : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 } RETE: <0x00000000> /* Maybe null pointer? */ RETN: <0x0569fe50> /* kernel dynamic memory (maybe user-space) */ RETX: <0x00000480> /* Maybe fixed code section */ RETS: <0x00053384> { _exit_mmap + 0x28 } PC : <0x000531de> { _delete_vma_from_mm + 0x92 } DCPLB_FAULT_ADDR: <0x00000008> /* Maybe null pointer? */ ICPLB_FAULT_ADDR: <0x000531de> { _delete_vma_from_mm + 0x92 } PROCESSOR STATE: R0 : 00000004 R1 : 0569e000 R2 : 00bf3db4 R3 : 00000000 R4 : 057f9800 R5 : 00000001 R6 : 0569ddd0 R7 : 0572b720 P0 : 0572b854 P1 : 00000004 P2 : 00000000 P3 : 0569dda0 P4 : 0572b720 P5 : 0566c368 FP : 0569fe5c SP : 0569fd74 LB0: 057f523f LT0: 057f523e LC0: 00000000 LB1: 0005317c LT1: 00053172 LC1: 00000002 B0 : 00000000 L0 : 00000000 M0 : 0566f5bc I0 : 00000000 B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : ffffffff B2 : 00000001 L2 : 00000000 M2 : 00000000 I2 : 00000000 B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 057f8000 A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000 USP : 056ffcf8 ASTAT: 02003024 Hardware Trace: 0 Target : <0x00003fb8> { _trap_c + 0x0 } Source : <0xffa006d8> { _exception_to_level5 + 0xa0 } JUMP.L 1 Target : <0xffa00638> { _exception_to_level5 + 0x0 } Source : <0xffa004f2> { _bfin_return_from_exception + 0x6 } RTX 2 Target : <0xffa004ec> { _bfin_return_from_exception + 0x0 } Source : <0xffa00590> { _ex_trap_c + 0x70 } JUMP.S 3 Target : <0xffa00520> { _ex_trap_c + 0x0 } Source : <0xffa0076e> { _trap + 0x2a } JUMP (P4) 4 Target : <0xffa00744> { _trap + 0x0 } FAULT : <0x000531de> { _delete_vma_from_mm + 0x92 } P0 = W[P2 + 2] Source : <0x000531da> { _delete_vma_from_mm + 0x8e } P2 = [P4 + 0x18] 5 Target : <0x000531da> { _delete_vma_from_mm + 0x8e } Source : <0x00053176> { _delete_vma_from_mm + 0x2a } IF CC JUMP pcrel 6 Target : <0x0005314c> { _delete_vma_from_mm + 0x0 } Source : <0x00053380> { _exit_mmap + 0x24 } JUMP.L 7 Target : <0x00053378> { _exit_mmap + 0x1c } Source : <0x00053394> { _exit_mmap + 0x38 } IF !CC JUMP pcrel (BP) 8 Target : <0x00053390> { _exit_mmap + 0x34 } Source : <0xffa020e0> { __cond_resched + 0x20 } RTS 9 Target : <0xffa020c0> { __cond_resched + 0x0 } Source : <0x0005338c> { _exit_mmap + 0x30 } JUMP.L 10 Target : <0x0005338c> { _exit_mmap + 0x30 } Source : <0x0005333a> { _delete_vma + 0xb2 } RTS 11 Target : <0x00053334> { _delete_vma + 0xac } Source : <0x0005507a> { _kmem_cache_free + 0xba } RTS 12 Target : <0x00055068> { _kmem_cache_free + 0xa8 } Source : <0x0005505e> { _kmem_cache_free + 0x9e } IF !CC JUMP pcrel (BP) 13 Target : <0x00055052> { _kmem_cache_free + 0x92 } Source : <0x0005501a> { _kmem_cache_free + 0x5a } IF CC JUMP pcrel 14 Target : <0x00054ff4> { _kmem_cache_free + 0x34 } Source : <0x00054fce> { _kmem_cache_free + 0xe } IF CC JUMP pcrel (BP) 15 Target : <0x00054fc0> { _kmem_cache_free + 0x0 } Source : <0x00053330> { _delete_vma + 0xa8 } JUMP.L Kernel Stack Stack info: SP: [0x0569ff24] <0x0569ff24> /* kernel dynamic memory (maybe user-space) */ Memory from 0x0569ff20 to 056a0000 0569ff20: 00000001 [04e8da5a] 00008000 00000000 00000000 056a0000 04e8da5a 04e8da5a 0569ff40: 04eb9eea ffa00dce 02003025 04ea09c5 057f523f 04ea09c4 057f523e 00000000 0569ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 0569ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0569ffa0: 0566f5bc 057f8000 057f8000 00000001 04ec0170 056ffcf8 056ffd04 057f9800 0569ffc0: 04d1d498 057f9800 057f8fe4 057f8ef0 00000001 057f928c 00000001 00000001 0569ffe0: 057f9800 00000000 00000008 00000007 00000001 00000001 00000001 <00002806> Return addresses in stack: address : <0x00002806> { _show_cpuinfo + 0x2d2 } Modules linked in: Kernel panic - not syncing: Kernel exception [ end Kernel panic - not syncing: Kernel exception Signed-off-by: Steven Miao <realmz6@gmail.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.15.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm getting the spew below when booting with Haswell (Xeon E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled in the BIOS. It seems similar to the issue that some folks from AMD ran in to on their systems and addressed in this commit: 161270f ("x86/smp: Fix topology checks on AMD MCM CPUs") Both these Intel and AMD systems break an assumption which is being enforced by topology_sane(): a socket may not contain more than one NUMA node. AMD special-cased their system by looking for a cpuid flag. The Intel mode is dependent on BIOS options and I do not know of a way which it is enumerated other than the tables being parsed during the CPU bringup process. In other words, we have to trust the ACPI tables <shudder>. This detects the situation where a NUMA node occurs at a place in the middle of the "CPU" sched domains. It replaces the default topology with one that relies on the NUMA information from the firmware (SRAT table) for all levels of sched domains above the hyperthreads. This also fixes a sysfs bug. We used to freak out when we saw the "mc" group cross a node boundary, so we stopped building the MC group. MC gets exported as the 'core_siblings_list' in /sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with the same 'physical_package_id' to not be listed together in 'core_siblings_list'. This violates a statement from Documentation/ABI/testing/sysfs-devices-system-cpu: core_siblings: internal kernel map of cpu#'s hardware threads within the same physical_package_id. core_siblings_list: human-readable list of the logical CPU numbers within the same physical_package_id as cpu#. The sysfs effects here cause an issue with the hwloc tool where it gets confused and thinks there are more sockets than are physically present. Before this patch, there are two packages: # cd /sys/devices/system/cpu/ # cat cpu*/topology/physical_package_id | sort | uniq -c 18 0 18 1 But 4 _sets_ of core siblings: # cat cpu*/topology/core_siblings_list | sort | uniq -c 9 0-8 9 18-26 9 27-35 9 9-17 After this set, there are only 2 sets of core siblings, which is what we expect for a 2-socket system. # cat cpu*/topology/physical_package_id | sort | uniq -c 18 0 18 1 # cat cpu*/topology/core_siblings_list | sort | uniq -c 18 0-17 18 18-35 Example spew: ... NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. #2 #3 #4 #5 #6 #7 #8 .... node #1, CPUs: #9 ------------[ cut here ]------------ WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90() sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. Modules linked in: CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty torvalds#631 Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014 0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48 ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009 ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98 Call Trace: [<ffffffff8172e485>] dump_stack+0x45/0x56 [<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90 [<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0 [<ffffffff8107568d>] start_secondary+0x1ad/0x240 ---[ end trace 3fe5f587a9fcde61 ]--- #10 #11 #12 #13 #14 #15 #16 #17 .... node #2, CPUs: #18 #19 #20 #21 #22 #23 #24 #25 #26 .... node #3, CPUs: #27 #28 #29 #30 #31 #32 #33 #34 #35 Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> [ Added LLC domain and s/match_mc/match_die/ ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: brice.goglin@gmail.com Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch wires up the new syscall sys_bpf() on powerpc. Passes the tests in samples/bpf: #0 add+sub+mul OK #1 unreachable OK #2 unreachable2 OK #3 out of range jump OK #4 out of range jump2 OK #5 test1 ld_imm64 OK #6 test2 ld_imm64 OK #7 test3 ld_imm64 OK #8 test4 ld_imm64 OK #9 test5 ld_imm64 OK #10 no bpf_exit OK #11 loop (back-edge) OK #12 loop2 (back-edge) OK #13 conditional loop OK #14 read uninitialized register OK #15 read invalid register OK #16 program doesn't init R0 before exit OK #17 stack out of bounds OK #18 invalid call insn1 OK #19 invalid call insn2 OK #20 invalid function call OK #21 uninitialized stack1 OK #22 uninitialized stack2 OK #23 check valid spill/fill OK #24 check corrupted spill/fill OK #25 invalid src register in STX OK #26 invalid dst register in STX OK #27 invalid dst register in ST OK #28 invalid src register in LDX OK #29 invalid dst register in LDX OK #30 junk insn OK #31 junk insn2 OK #32 junk insn3 OK #33 junk insn4 OK #34 junk insn5 OK #35 misaligned read from stack OK #36 invalid map_fd for function call OK #37 don't check return value before access OK #38 access memory with incorrect alignment OK #39 sometimes access memory with incorrect alignment OK #40 jump test 1 OK #41 jump test 2 OK #42 jump test 3 OK #43 jump test 4 OK Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> [mpe: test using samples/bpf] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The ehci_init_driver is used to initialize hcd APIs for each ehci controller driver, it is designed to be called only one time and before driver register is called. The current design will cause ehci_init_driver is called multiple times at probe process, it will cause hc_driver's initialization affect current running hcd. We run out NULL pointer dereference problem when one hcd is started by module_init, and the other is started by otg thread at SMP platform. The reason for this problem is ehci_init_driver will do memory copy for current uniform hc_driver, and this memory copy will do memset (as 0) first, so when the first hcd is running usb_add_hcd, and the second hcd may clear the uniform hc_driver's space (at ehci_init_driver), then the first hcd will meet NULL pointer at the same time. See below two logs: LOG_1: ci_hdrc ci_hdrc.0: EHCI Host Controller ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1 ci_hdrc ci_hdrc.1: doesn't support gadget Unable to handle kernel NULL pointer dereference at virtual address 00000014 pgd = 80004000 [00000014] *pgd=00000000 Internal error: Oops: 805 [linux-sunxi#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 108 Comm: kworker/u8:2 Not tainted 3.14.38-222193-g24b2734-dirty linux-sunxi#25 Workqueue: ci_otg ci_otg_work task: d839ec00 ti: d8400000 task.ti: d8400000 PC is at ehci_run+0x4c/0x284 LR is at _raw_spin_unlock_irqrestore+0x28/0x54 pc : [<8041f9a0>] lr : [<8070ea84>] psr: 60000113 sp : d8401e30 ip : 00000000 fp : d8004400 r10: 00000001 r9 : 00000001 r8 : 00000000 r7 : 00000000 r6 : d8419940 r5 : 80dd24c0 r4 : d8419800 r3 : 8001d060 r2 : 00000000 r1 : 00000001 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c53c7d Table: 1000404a DAC: 00000015 Process kworker/u8:2 (pid: 108, stack limit = 0xd8400238) Stack: (0xd8401e30 to 0xd8402000) 1e20: d87523c0 d8401e48 66667562 d8419800 1e40: 00000000 00000000 d8419800 00000000 00000000 00000000 d84198b0 8040fcdc 1e60: 00000000 80dd320c d8477610 d8419c00 d803d010 d8419800 00000000 00000000 1e80: d8004400 00000000 d8400008 80431494 80431374 d803d100 d803d010 d803d1ac 1ea0: 00000000 80432428 804323d4 d803d100 00000001 80435eb8 80e0d0bc d803d100 1ec0: 00000006 80436458 00000000 d803d100 80e92ec8 80436f44 d803d010 d803d100 1ee0: d83fde00 8043292c d8752710 d803d1f4 d803d010 8042ddfc 8042ddb8 d83f3b00 1f00: d803d1f4 80042b60 00000000 00000003 00000001 00000001 80054598 d83f3b00 1f20: d8004400 d83f3b18 d8004414 d8400000 80e3957b 00000089 d8004400 80043814 1f40: d839ec00 00000000 d83fcd80 d83f3b00 800436e4 00000000 00000000 00000000 1f60: 00000000 80048f34 00000000 00000000 00000000 d83f3b00 00000000 00000000 1f80: d8401f80 d8401f80 00000000 00000000 d8401f90 d8401f90 d8401fac d83fcd80 1fa0: 80048e68 00000000 00000000 8000e538 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<8041f9a0>] (ehci_run) from [<8040fcdc>] (usb_add_hcd+0x248/0x6e8) [<8040fcdc>] (usb_add_hcd) from [<80431494>] (host_start+0x120/0x2e4) [<80431494>] (host_start) from [<80432428>] (ci_otg_start_host+0x54/0xbc) [<80432428>] (ci_otg_start_host) from [<80435eb8>] (otg_set_protocol+0xa4/0xd0) [<80435eb8>] (otg_set_protocol) from [<80436458>] (otg_set_state+0x574/0xc58) [<80436458>] (otg_set_state) from [<80436f44>] (otg_statemachine+0x408/0x46c) [<80436f44>] (otg_statemachine) from [<8043292c>] (ci_otg_fsm_work+0x3c/0x190) [<8043292c>] (ci_otg_fsm_work) from [<8042ddfc>] (ci_otg_work+0x44/0x1c4) [<8042ddfc>] (ci_otg_work) from [<80042b60>] (process_one_work+0xf4/0x35c) [<80042b60>] (process_one_work) from [<80043814>] (worker_thread+0x130/0x3bc) [<80043814>] (worker_thread) from [<80048f34>] (kthread+0xcc/0xe4) [<80048f34>] (kthread) from [<8000e538>] (ret_from_fork+0x14/0x3c) Code: e5953018 e3530000 0a000000 e12fff33 (e5878014) LOG_2: ci_hdrc ci_hdrc.0: EHCI Host Controller ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1 ci_hdrc ci_hdrc.1: doesn't support gadget Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 80004000 [00000000] *pgd=00000000 In Online 00:00ternal e Offline rror: Oops: 80000005 [linux-sunxi#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 108 Comm: kworker/u8:2 Not tainted 3.14.38-02007-g24b2734-dirty linux-sunxi#127 Workque Online 00:00ue: ci_o Offline tg ci_otg_work Online 00:00task: d8 Offline 39ec00 ti: d83ea000 task.ti: d83ea000 PC is at 0x0 LR is at usb_add_hcd+0x248/0x6e8 pc : [<00000000>] lr : [<8040f644>] psr: 60000113 sp : d83ebe60 ip : 00000000 fp : d8004400 r10: 00000001 r9 : 00000001 r8 : d85fd4b0 r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : d85fd400 r3 : 00000000 r2 : d85fd4f4 r1 : 80410178 r0 : d85fd400 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c53c7d Table: 1000404a DAC: 00000015 Process kworker/u8:2 (pid: 108, stack limit = 0xd83ea238) Stack: (0xd83ebe60 to 0xd83ec000) be60: 00000000 80dd920c d8654e10 d85fd800 d803e010 d85fd400 00000000 00000000 be80: d8004400 00000000 d83ea008 80430e34 80430d14 d803e100 d803e010 d803e1ac bea0: 00000000 80431dc8 80431d74 d803e100 00000001 80435858 80e130bc d803e100 bec0: 00000006 80435df8 00000000 d803e100 80e98ec8 804368e4 d803e010 d803e100 bee0: d86e8100 804322cc d86cf050 d803e1f4 d803e010 8042d79c 8042d758 d83cf900 bf00: d803e1f4 80042b78 00000000 00000003 00000001 00000001 800545e8 d83cf900 bf20: d8004400 d83cf918 d8004414 d83ea000 80e3f57b 00000089 d8004400 8004382c bf40: d839ec00 00000000 d8393780 d83cf900 800436fc 00000000 00000000 00000000 bf60: 00000000 80048f50 80e019f4 00000000 0000264c d83cf900 00000000 00000000 bf80: d83ebf80 d83ebf80 00000000 00000000 d83ebf90 d83ebf90 d83ebfac d8393780 bfa0: 80048e84 00000000 00000000 8000e538 00000000 00000000 00000000 00000000 bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ee66e85d 133ebd03 [<804 Online 00:000f644>] Offline (usb_add_hcd) from [<80430e34>] (host_start+0x120/0x2e4) [<80430e34>] (host_start) from [<80431dc8>] (ci_otg_start_host+0x54/0xbc) [<80431dc8>] (ci_otg_start_host) from [<80435858>] (otg_set_protocol+0xa4/0xd0) [<80435858>] (otg_set_protocol) from [<80435df8>] (otg_set_state+0x574/0xc58) [<80435df8>] (otg_set_state) from [<804368e4>] (otg_statemachine+0x408/0x46c) [<804368e4>] (otg_statemachine) from [<804322cc>] (ci_otg_fsm_work+0x3c/0x190) [<804322cc>] (ci_otg_fsm_work) from [<8042d79c>] (ci_otg_work+0x44/0x1c4) [<8042d79c>] (ci_otg_work) from [<80042b78>] (process_one_work+0xf4/0x35c) [<80042b78>] (process_one_work) from [<8004382c>] (worker_thread+0x130/0x3bc) [<8004382c>] (worker_thread) from [<80048f50>] (kthread+0xcc/0xe4) [<80048f50>] (kthread) from [<8000e538>] (ret_from_fork+0x14/0x3c) Code: bad PC value Cc: Jun Li <jun.li@freescale.com> Cc: <stable@vger.kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Peter Chen <peter.chen@freescale.com>
Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 linux-sunxi#1 schedule at ffffffff815ab76e linux-sunxi#2 schedule_timeout at ffffffff815ae5e5 linux-sunxi#3 io_schedule_timeout at ffffffff815aad6a linux-sunxi#4 bit_wait_io at ffffffff815abfc6 linux-sunxi#5 __wait_on_bit at ffffffff815abda5 linux-sunxi#6 wait_on_page_bit at ffffffff8111fd4f linux-sunxi#7 shrink_page_list at ffffffff81135445 linux-sunxi#8 shrink_inactive_list at ffffffff81135845 linux-sunxi#9 shrink_lruvec at ffffffff81135ead linux-sunxi#10 shrink_zone at ffffffff811360c3 linux-sunxi#11 shrink_zones at ffffffff81136eff linux-sunxi#12 do_try_to_free_pages at ffffffff8113712f linux-sunxi#13 try_to_free_mem_cgroup_pages at ffffffff811372be linux-sunxi#14 try_charge at ffffffff81189423 linux-sunxi#15 mem_cgroup_try_charge at ffffffff8118c6f5 linux-sunxi#16 __add_to_page_cache_locked at ffffffff8112137d linux-sunxi#17 add_to_page_cache_lru at ffffffff81121618 linux-sunxi#18 pagecache_get_page at ffffffff8112170b linux-sunxi#19 grow_dev_page at ffffffff811c8297 linux-sunxi#20 __getblk_slow at ffffffff811c91d6 linux-sunxi#21 __getblk_gfp at ffffffff811c92c1 linux-sunxi#22 ext4_ext_grow_indepth at ffffffff8124565c linux-sunxi#23 ext4_ext_create_new_leaf at ffffffff81246ca8 linux-sunxi#24 ext4_ext_insert_extent at ffffffff81246f09 linux-sunxi#25 ext4_ext_map_blocks at ffffffff8124a848 linux-sunxi#26 ext4_map_blocks at ffffffff8121a5b7 linux-sunxi#27 mpage_map_one_extent at ffffffff8121b1fa linux-sunxi#28 mpage_map_and_submit_extent at ffffffff8121f07b linux-sunxi#29 ext4_writepages at ffffffff8121f6d5 linux-sunxi#30 do_writepages at ffffffff8112c490 linux-sunxi#31 __filemap_fdatawrite_range at ffffffff81120199 linux-sunxi#32 filemap_flush at ffffffff8112041c linux-sunxi#33 ext4_alloc_da_blocks at ffffffff81219da1 linux-sunxi#34 ext4_rename at ffffffff81229b91 linux-sunxi#35 ext4_rename2 at ffffffff81229e32 linux-sunxi#36 vfs_rename at ffffffff811a08a5 linux-sunxi#37 SYSC_renameat2 at ffffffff811a3ffc linux-sunxi#38 sys_renameat2 at ffffffff811a408e linux-sunxi#39 sys_rename at ffffffff8119e51e linux-sunxi#40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: stable@vger.kernel.org # 3.9+ [tytso@mit.edu: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can handle the special case of num_stripes == 0 directly inside btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to catch other unhandled cases where we fail to validate external data. A crafted or corrupted image crashes at mount time: BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0 BTRFS info (device loop0): disk space caching is enabled BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()! Kernel panic - not syncing: BUG! CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25 Stack: 637af890 60062489 602aeb2e 604192ba 60387961 00000011 637af8a0 6038a835 637af9c0 6038776b 634ef32b 00000000 Call Trace: [<6001c86d>] show_stack+0xfe/0x15b [<6038a835>] dump_stack+0x2a/0x2c [<6038776b>] panic+0x13e/0x2b3 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff [<601cfbbe>] open_ctree+0x192d/0x27af [<6019c2c1>] btrfs_mount+0x8f5/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<6019bcb0>] btrfs_mount+0x2e4/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<600d710b>] do_mount+0xa35/0xbc9 [<600d7557>] SyS_mount+0x95/0xc8 [<6001e884>] handle_syscall+0x6b/0x8e Reported-by: Jiri Slaby <jslaby@suse.com> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> CC: stable@vger.kernel.org # 3.19+ Signed-off-by: David Sterba <dsterba@suse.com>
Intel Tangier SoC is known to have 64-bit dual core CPU. Enable 64-bit build for it. The kernel has been tested on Intel Edison board: Linux buildroot 4.4.0-next-20160115+ #25 SMP Fri Jan 15 22:03:19 EET 2016 x86_64 GNU/Linux processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 74 model name : Genuine Intel(R) CPU 4000 @ 500MHz stepping : 8 Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452888668-147116-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
sunxi-3.4 fixes and (mainline) U-Boot compatibility
In a case when a ptp chardev (like /dev/ptp0) is open but an underlying device is removed, closing this file leads to a race. This reproduces easily in a kvm virtual machine: ts# cat openptp0.c int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } ts# uname -r 5.5.0-rc3-46cf053e ts# cat /proc/cmdline ... slub_debug=FZP ts# modprobe ptp_kvm ts# ./openptp0 & [1] 670 opened /dev/ptp0, sleeping 10s... ts# rmmod ptp_kvm ts# ls /dev/ptp* ls: cannot access '/dev/ptp*': No such file or directory ts# ...woken up [ 48.010809] general protection fault: 0000 [#1] SMP [ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e linux-sunxi#25 [ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80 [ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202 [ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0 [ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b [ 48.019470] ... ^^^ a slub poison [ 48.023854] Call Trace: [ 48.024050] __fput+0x21f/0x240 [ 48.024288] task_work_run+0x79/0x90 [ 48.024555] do_exit+0x2af/0xab0 [ 48.024799] ? vfs_write+0x16a/0x190 [ 48.025082] do_group_exit+0x35/0x90 [ 48.025387] __x64_sys_exit_group+0xf/0x10 [ 48.025737] do_syscall_64+0x3d/0x130 [ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.026479] RIP: 0033:0x7f53b12082f6 [ 48.026792] ... [ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm] [ 48.045001] Fixing recursive fault but reboot is needed! This happens in: static void __fput(struct file *file) { ... if (file->f_op->release) file->f_op->release(inode, file); <<< cdev is kfree'd here if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); <<< cdev fields are accessed here Namely: __fput() posix_clock_release() kref_put(&clk->kref, delete_clock) <<< the last reference delete_clock() delete_ptp_clock() kfree(ptp) <<< cdev is embedded in ptp cdev_put module_put(p->owner) <<< *p is kfree'd, bang! Here cdev is embedded in posix_clock which is embedded in ptp_clock. The race happens because ptp_clock's lifetime is controlled by two refcounts: kref and cdev.kobj in posix_clock. This is wrong. Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() created especially for such cases. This way the parent device with its ptp_clock is not released until all references to the cdev are released. This adds a requirement that an initialized but not exposed struct device should be provided to posix_clock_register() by a caller instead of a simple dev_t. This approach was adopted from the commit 72139df ("watchdog: Fix the race between the release of watchdog_core_data and cdev"). See details of the implementation in the commit 233ed09 ("chardev: add helper function to register char devs with a struct device"). Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The hidden aliasing-ppgtt's size is never revealed, as we only inspect the front GTT when engaged. However, we were "fixing" the hidden ppgtt to match, with the net result that we ended up leaking the unused portion on Braswell were we preallocated the entire set of top level PDP, see gen8_preallocate_top_level_pdp(). [ 26.025364] DMA-API: pci 0000:00:02.0: device driver has pending DMA allocations while released from device [count=2] [ 26.025364] One of leaked entries details: [device address=0x0000000230778000] [size=4096 bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as single] [ 26.025683] WARNING: CPU: 0 PID: 415 at kernel/dma/debug.c:894 dma_debug_device_change+0x1a4/0x1f0 [ 26.025905] Modules linked in: i915(E-) intel_powerclamp(E) nls_ascii(E) nls_cp437(E) crct10dif_pclmul(E) crc32_pclmul(E) vfat(E) crc32c_intel(E) fat(E) ghash_clmulni_intel(E) prime_numbers(E) intel_gtt(E) i2c_algo_bit(E) efi_pstore(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) evdev(E) drm(E) aesni_intel(E) glue_helper(E) crypto_simd(E) cryptd(E) intel_cstate(E) sg(E) efivars(E) pcspkr(E) video(E) button(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) sd_mod(E) lpc_ich(E) ahci(E) mfd_core(E) i2c_i801(E) libahci(E) i2c_designware_pci(E) i2c_designware_core(E) [ 26.026613] CPU: 0 PID: 415 Comm: rmmod Tainted: G E 5.4.0-rc6+ #25 [ 26.026837] Hardware name: /, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 26.027080] RIP: 0010:dma_debug_device_change+0x1a4/0x1f0 [ 26.027319] Code: 89 54 24 08 e8 ad 60 62 00 48 8b 54 24 08 48 89 c6 41 57 4d 89 e9 49 89 d8 44 89 f1 41 54 48 c7 c7 e0 61 06 82 e8 c1 aa f5 ff <0f> 0b 5a 59 48 83 3c 24 00 0f 85 97 26 00 00 8b 05 77 47 92 01 85 [ 26.027600] RSP: 0018:ffff888228d2fcc8 EFLAGS: 00010282 [ 26.027831] RAX: 0000000000000000 RBX: 0000000230778000 RCX: 0000000000000000 [ 26.028053] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffed10451a5f8f [ 26.028279] RBP: ffff88823480c0b0 R08: 0000000000000001 R09: ffffed1046e83eb1 [ 26.028500] R10: ffffed1046e83eb0 R11: ffff88823741f587 R12: ffffffff82067340 [ 26.028725] R13: 0000000000001000 R14: 0000000000000002 R15: ffffffff82067480 [ 26.028952] FS: 00007fdf3ed174c0(0000) GS:ffff888237400000(0000) knlGS:0000000000000000 [ 26.029185] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.029405] CR2: 000055e211109030 CR3: 0000000230139000 CR4: 00000000001006f0 [ 26.029622] Call Trace: [ 26.029846] notifier_call_chain+0x67/0xa0 [ 26.030076] blocking_notifier_call_chain+0x5a/0x80 [ 26.030305] device_release_driver_internal+0x20d/0x260 [ 26.030535] driver_detach+0x7b/0xe1 [ 26.030761] bus_remove_driver+0x8c/0x153 [ 26.030993] pci_unregister_driver+0x2d/0xf0 [ 26.032603] i915_exit+0x16/0x1c [i915] Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 1eda701 ("drm/i915/gtt: Recursive cleanup for gen8") References: c082afa ("drm/i915: Move aliasing_ppgtt underneath its i915_ggtt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191106221223.7437-1-chris@chris-wilson.co.uk
[ Upstream commit f6766ff ] We need to check mddev->del_work before flush workqueu since the purpose of flush is to ensure the previous md is disappeared. Otherwise the similar deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the bdev->bd_mutex before flush workqueue. kernel: [ 154.522645] ====================================================== kernel: [ 154.522647] WARNING: possible circular locking dependency detected kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default linux-sunxi#25 Tainted: G O kernel: [ 154.522651] ------------------------------------------------------ kernel: [ 154.522653] mdadm/2482 is trying to acquire lock: kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0 kernel: [ 154.522673] kernel: [ 154.522673] but task is already holding lock: kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522691] kernel: [ 154.522691] which lock already depends on the new lock. kernel: [ 154.522691] kernel: [ 154.522694] kernel: [ 154.522694] the existing dependency chain (in reverse order) is: kernel: [ 154.522696] kernel: [ 154.522696] -> jwrdegoede#4 (&bdev->bd_mutex){+.+.}: kernel: [ 154.522704] __mutex_lock+0x87/0x950 kernel: [ 154.522706] __blkdev_get+0x79/0x590 kernel: [ 154.522708] blkdev_get+0x65/0x140 kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40 kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod] kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod] kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod] kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522735] vfs_write+0xad/0x1a0 kernel: [ 154.522737] ksys_write+0xa4/0xe0 kernel: [ 154.522745] do_syscall_64+0x64/0x2b0 kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522749] kernel: [ 154.522749] -> jwrdegoede#3 (&mddev->reconfig_mutex){+.+.}: kernel: [ 154.522752] __mutex_lock+0x87/0x950 kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod] kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522763] vfs_write+0xad/0x1a0 kernel: [ 154.522765] ksys_write+0xa4/0xe0 kernel: [ 154.522767] do_syscall_64+0x64/0x2b0 kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522770] kernel: [ 154.522770] -> jwrdegoede#2 (kn->count#253){++++}: kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0 kernel: [ 154.522778] kernfs_remove+0x1f/0x30 kernel: [ 154.522780] kobject_del+0x28/0x60 kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod] kernel: [ 154.522786] process_one_work+0x2a7/0x5f0 kernel: [ 154.522788] worker_thread+0x2d/0x3d0 kernel: [ 154.522793] kthread+0x117/0x130 kernel: [ 154.522795] ret_from_fork+0x3a/0x50 kernel: [ 154.522796] kernel: [ 154.522796] -> jwrdegoede#1 ((work_completion)(&mddev->del_work)){+.+.}: kernel: [ 154.522800] process_one_work+0x27e/0x5f0 kernel: [ 154.522802] worker_thread+0x2d/0x3d0 kernel: [ 154.522804] kthread+0x117/0x130 kernel: [ 154.522806] ret_from_fork+0x3a/0x50 kernel: [ 154.522807] kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}: kernel: [ 154.522813] __lock_acquire+0x1392/0x1690 kernel: [ 154.522816] lock_acquire+0xb4/0x1a0 kernel: [ 154.522818] flush_workqueue+0xab/0x4b0 kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522823] __blkdev_get+0xea/0x590 kernel: [ 154.522825] blkdev_get+0x65/0x140 kernel: [ 154.522828] do_dentry_open+0x1d1/0x380 kernel: [ 154.522831] path_openat+0x567/0xcc0 kernel: [ 154.522834] do_filp_open+0x9b/0x110 kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522838] do_sys_open+0x57/0x80 kernel: [ 154.522840] do_syscall_64+0x64/0x2b0 kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522844] kernel: [ 154.522844] other info that might help us debug this: kernel: [ 154.522844] kernel: [ 154.522846] Chain exists of: kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex kernel: [ 154.522846] kernel: [ 154.522850] Possible unsafe locking scenario: kernel: [ 154.522850] kernel: [ 154.522852] CPU0 CPU1 kernel: [ 154.522853] ---- ---- kernel: [ 154.522854] lock(&bdev->bd_mutex); kernel: [ 154.522856] lock(&mddev->reconfig_mutex); kernel: [ 154.522858] lock(&bdev->bd_mutex); kernel: [ 154.522860] lock((wq_completion)md_misc); kernel: [ 154.522861] kernel: [ 154.522861] *** DEADLOCK *** kernel: [ 154.522861] kernel: [ 154.522864] 1 lock held by mdadm/2482: kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522868] kernel: [ 154.522868] stack backtrace: kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default linux-sunxi#25 kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 kernel: [ 154.522878] Call Trace: kernel: [ 154.522881] dump_stack+0x8f/0xcb kernel: [ 154.522884] check_noncircular+0x194/0x1b0 kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690 kernel: [ 154.522890] __lock_acquire+0x1392/0x1690 kernel: [ 154.522893] lock_acquire+0xb4/0x1a0 kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522898] flush_workqueue+0xab/0x4b0 kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522910] __blkdev_get+0xea/0x590 kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522914] blkdev_get+0x65/0x140 kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522918] do_dentry_open+0x1d1/0x380 kernel: [ 154.522921] path_openat+0x567/0xcc0 kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690 kernel: [ 154.522926] do_filp_open+0x9b/0x110 kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0 kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630 kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0 kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522944] do_sys_open+0x57/0x80 kernel: [ 154.522946] do_syscall_64+0x64/0x2b0 kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae And md_alloc also flushed the same workqueue, but the thing is different here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and the flush is necessary to avoid race condition, so leave it as it is. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) jwrdegoede#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 jwrdegoede#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 jwrdegoede#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 jwrdegoede#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 jwrdegoede#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 jwrdegoede#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 linux-sunxi#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 linux-sunxi#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 linux-sunxi#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 linux-sunxi#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 linux-sunxi#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 linux-sunxi#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 linux-sunxi#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 linux-sunxi#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 linux-sunxi#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 linux-sunxi#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 linux-sunxi#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 linux-sunxi#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 linux-sunxi#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 linux-sunxi#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 linux-sunxi#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 linux-sunxi#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 linux-sunxi#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 linux-sunxi#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 linux-sunxi#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 linux-sunxi#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: 4fbce63 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 linux-sunxi#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 linux-sunxi#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 linux-sunxi#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 linux-sunxi#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 linux-sunxi#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 linux-sunxi#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 linux-sunxi#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 linux-sunxi#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) linux-sunxi#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 linux-sunxi#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 linux-sunxi#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 linux-sunxi#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 linux-sunxi#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 linux-sunxi#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 linux-sunxi#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 linux-sunxi#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 linux-sunxi#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 linux-sunxi#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 linux-sunxi#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 linux-sunxi#17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandeep Dasgupta <sdasgup@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
commit 27af8e2 upstream. We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ linux-sunxi#25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 997acaf (lockdep: report broken irq restoration), the guest splatting below during boot: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30 Modules linked in: hid_generic usbhid hid CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ linux-sunxi#25 RIP: 0010:warn_bogus_irq_restore+0x26/0x30 Call Trace: kvm_wait+0x76/0x90 __pv_queued_spin_lock_slowpath+0x285/0x2e0 do_raw_spin_lock+0xc9/0xd0 _raw_spin_lock+0x59/0x70 lockref_get_not_dead+0xf/0x50 __legitimize_path+0x31/0x60 legitimize_root+0x37/0x50 try_to_unlazy_next+0x7f/0x1d0 lookup_fast+0xb0/0x170 path_openat+0x165/0x9b0 do_filp_open+0x99/0x110 do_sys_openat2+0x1f1/0x2e0 do_sys_open+0x5c/0x80 __x64_sys_open+0x21/0x30 do_syscall_64+0x32/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae The new consistency checking, expects local_irq_save() and local_irq_restore() to be paired and sanely nested, and therefore expects local_irq_restore() to be called with irqs disabled. The irqflags handling in kvm_wait() which ends up doing: local_irq_save(flags); safe_halt(); local_irq_restore(flags); instead triggers it. This patch fixes it by using local_irq_disable()/enable() directly. Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit f02d408 upstream. Commit a6dcfe0 ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs") lowers the number of allocated MSI-X vectors to the number of CPUs. That breaks vector allocation assumptions in qla83xx_iospace_config(), qla24xx_enable_msix() and qla2x00_iospace_config(). Either of the functions computes maximum number of qpairs as: ha->max_qpairs = ha->msix_count - 1 (MB interrupt) - 1 (default response queue) - 1 (ATIO, in dual or pure target mode) max_qpairs is set to zero in case of two CPUs and initiator mode. The number is then used to allocate ha->queue_pair_map inside qla2x00_alloc_queues(). No allocation happens and ha->queue_pair_map is left NULL but the driver thinks there are queue pairs available. qla2xxx_queuecommand() tries to find a qpair in the map and crashes: if (ha->mqenable) { uint32_t tag; uint16_t hwq; struct qla_qpair *qpair = NULL; tag = blk_mq_unique_tag(cmd->request); hwq = blk_mq_unique_tag_to_hwq(tag); qpair = ha->queue_pair_map[hwq]; # <- HERE if (qpair) return qla2xxx_mqueuecommand(host, cmd, qpair); } BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [jwrdegoede#1] SMP PTI CPU: 0 PID: 72 Comm: kworker/u4:3 Tainted: G W 5.10.0-rc1+ linux-sunxi#25 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: scsi_wq_7 fc_scsi_scan_rport [scsi_transport_fc] RIP: 0010:qla2xxx_queuecommand+0x16b/0x3f0 [qla2xxx] Call Trace: scsi_queue_rq+0x58c/0xa60 blk_mq_dispatch_rq_list+0x2b7/0x6f0 ? __sbitmap_get_word+0x2a/0x80 __blk_mq_sched_dispatch_requests+0xb8/0x170 blk_mq_sched_dispatch_requests+0x2b/0x50 __blk_mq_run_hw_queue+0x49/0xb0 __blk_mq_delay_run_hw_queue+0xfb/0x150 blk_mq_sched_insert_request+0xbe/0x110 blk_execute_rq+0x45/0x70 __scsi_execute+0x10e/0x250 scsi_probe_and_add_lun+0x228/0xda0 __scsi_scan_target+0xf4/0x620 ? __pm_runtime_resume+0x4f/0x70 scsi_scan_target+0x100/0x110 fc_scsi_scan_rport+0xa1/0xb0 [scsi_transport_fc] process_one_work+0x1ea/0x3b0 worker_thread+0x28/0x3b0 ? process_one_work+0x3b0/0x3b0 kthread+0x112/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 The driver should allocate enough vectors to provide every CPU it's own HW queue and still handle reserved (MB, RSP, ATIO) interrupts. The change fixes the crash on dual core VM and prevents unbalanced QP allocation where nr_hw_queues is two less than the number of CPUs. Link: https://lore.kernel.org/r/20210412165740.39318-1-r.bolshakov@yadro.com Fixes: a6dcfe0 ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs") Cc: Daniel Wagner <daniel.wagner@suse.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Quinn Tran <qutran@marvell.com> Cc: Nilesh Javali <njavali@marvell.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org # 5.11+ Reported-by: Aleksandr Volkov <a.y.volkov@yadro.com> Reported-by: Aleksandr Miloserdov <a.miloserdov@yadro.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU linux-sunxi#24 skipping offline CPU linux-sunxi#25 skipping offline CPU linux-sunxi#26 skipping offline CPU linux-sunxi#27 skipping offline CPU linux-sunxi#28 skipping offline CPU linux-sunxi#29 skipping offline CPU linux-sunxi#30 skipping offline CPU linux-sunxi#31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit d3256ef ("veth: allow enabling NAPI even without XDP"), if GRO is enabled on a veth device and TSO is disabled on the peer device, TCP skbs will go through the NAPI callback. If there is no XDP program attached, the veth code does not perform any share check, and shared/cloned skbs could enter the GRO engine. Ignat reported a BUG triggered later-on due to the above condition: [ 53.970529][ C1] kernel BUG at net/core/skbuff.c:3574! [ 53.981755][ C1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 53.982634][ C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ linux-sunxi#25 [ 53.982634][ C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 53.982634][ C1] RIP: 0010:skb_shift+0x13ef/0x23b0 [ 53.982634][ C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f 85 74 f5 ff ff <0f> 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89 f7 4c 89 8c [ 53.982634][ C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246 [ 53.982634][ C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000 [ 53.982634][ C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2 [ 53.982634][ C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0 [ 53.982634][ C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590 [ 53.982634][ C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0 [ 53.982634][ C1] FS: 0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000 [ 53.982634][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.982634][ C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0 [ 53.982634][ C1] Call Trace: [ 53.982634][ C1] <TASK> [ 53.982634][ C1] tcp_sacktag_walk+0xaba/0x18e0 [ 53.982634][ C1] tcp_sacktag_write_queue+0xe7b/0x3460 [ 53.982634][ C1] tcp_ack+0x2666/0x54b0 [ 53.982634][ C1] tcp_rcv_established+0x4d9/0x20f0 [ 53.982634][ C1] tcp_v4_do_rcv+0x551/0x810 [ 53.982634][ C1] tcp_v4_rcv+0x22ed/0x2ed0 [ 53.982634][ C1] ip_protocol_deliver_rcu+0x96/0xaf0 [ 53.982634][ C1] ip_local_deliver_finish+0x1e0/0x2f0 [ 53.982634][ C1] ip_sublist_rcv_finish+0x211/0x440 [ 53.982634][ C1] ip_list_rcv_finish.constprop.0+0x424/0x660 [ 53.982634][ C1] ip_list_rcv+0x2c8/0x410 [ 53.982634][ C1] __netif_receive_skb_list_core+0x65c/0x910 [ 53.982634][ C1] netif_receive_skb_list_internal+0x5f9/0xcb0 [ 53.982634][ C1] napi_complete_done+0x188/0x6e0 [ 53.982634][ C1] gro_cell_poll+0x10c/0x1d0 [ 53.982634][ C1] __napi_poll+0xa1/0x530 [ 53.982634][ C1] net_rx_action+0x567/0x1270 [ 53.982634][ C1] __do_softirq+0x28a/0x9ba [ 53.982634][ C1] run_ksoftirqd+0x32/0x60 [ 53.982634][ C1] smpboot_thread_fn+0x559/0x8c0 [ 53.982634][ C1] kthread+0x3b9/0x490 [ 53.982634][ C1] ret_from_fork+0x22/0x30 [ 53.982634][ C1] </TASK> Address the issue by skipping the GRO stage for shared or cloned skbs. To reduce the chance of OoO, try to unclone the skbs before giving up. v1 -> v2: - use avoid skb_copy and fallback to netif_receive_skb - Eric Reported-by: Ignat Korchagin <ignat@cloudflare.com> Fixes: d3256ef ("veth: allow enabling NAPI even without XDP") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/b5f61c5602aab01bac8d711d8d1bfab0a4817db7.1640197544.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef linux-sunxi#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... linux-sunxi#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 linux-sunxi#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] linux-sunxi#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] linux-sunxi#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] linux-sunxi#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] linux-sunxi#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] linux-sunxi#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] linux-sunxi#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] linux-sunxi#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 linux-sunxi#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 linux-sunxi#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 linux-sunxi#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf linux-sunxi#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 linux-sunxi#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 linux-sunxi#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 linux-sunxi#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff linux-sunxi#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f linux-sunxi#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1e41e69 upstream. UBSAN complains about array-index-out-of-bounds: [ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41 [ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]' [ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic linux-sunxi#25-Ubuntu [ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010 [ 1.980718] kernel: Call Trace: [ 1.980721] kernel: <TASK> [ 1.980723] kernel: show_stack+0x52/0x58 [ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f [ 1.980734] kernel: dump_stack+0x10/0x12 [ 1.980736] kernel: ubsan_epilogue+0x9/0x45 [ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49 [ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci] [ 1.980748] kernel: ata_qc_issue+0x135/0x240 [ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580 [ 1.980754] kernel: ? vprintk_default+0x1d/0x20 [ 1.980759] kernel: ata_exec_internal+0x67/0xa0 [ 1.980762] kernel: sata_pmp_read+0x8d/0xc0 [ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90 [ 1.980768] kernel: sata_pmp_attach+0x8b/0x310 [ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0 [ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30 [ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci] [ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci] [ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci] [ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0 [ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560 [ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40 [ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci] [ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600 [ 1.980810] kernel: ata_scsi_error+0x9c/0xd0 [ 1.980813] kernel: scsi_error_handler+0xa1/0x180 [ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0 [ 1.980820] kernel: kthread+0x12a/0x150 [ 1.980823] kernel: ? set_kthread_struct+0x50/0x50 [ 1.980826] kernel: ret_from_fork+0x22/0x30 [ 1.980831] kernel: </TASK> This happens because sata_pmp_init_links() initialize link->pmp up to SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array. I can't find the maximum Enclosure Management ports specified in AHCI spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier Information" can utilize 4 bits, which implies it can support up to 16 ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the issue. BugLink: https://bugs.launchpad.net/bugs/1970074 Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…der memory pressure When destroying a queue, when calling sock_release, the network stack might need to allocate an skb to send a FIN/RST. When that happens during memory pressure, there is a need to reclaim memory, which in turn may ask the nvme-tcp device to write out dirty pages, however this is not possible due to a ctrl teardown that is going on. Set PF_MEMALLOC to the task that releases the socket to grant access to PF_MEMALLOC reserves. In addition, do the same for the nvme-tcp thread as this may also originate from the swap itself and should be more resilient to memory pressure situations. This fixes the following lockdep complaint: -- ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc2+ linux-sunxi#25 Tainted: G W ------------------------------------------------------ kswapd0/92 is trying to acquire lock: ffff888114003240 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_sendpage+0x23/0xa0 but task is already holding lock: ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x11e/0x160 kmem_cache_alloc_node+0x44/0x530 __alloc_skb+0x158/0x230 tcp_send_active_reset+0x7e/0x730 tcp_disconnect+0x1272/0x1ae0 __tcp_close+0x707/0xd90 tcp_close+0x26/0x80 inet_release+0xfa/0x220 sock_release+0x85/0x1a0 nvme_tcp_free_queue+0x1fd/0x470 [nvme_tcp] nvme_do_delete_ctrl+0x130/0x13d [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x356/0x530 vfs_write+0x4e8/0xce0 ksys_write+0xfd/0x1d0 do_syscall_64+0x58/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (sk_lock-AF_INET-NVME){+.+.}-{0:0}: __lock_acquire+0x2a0c/0x5690 lock_acquire+0x18e/0x4f0 lock_sock_nested+0x37/0xc0 tcp_sendpage+0x23/0xa0 inet_sendpage+0xad/0x120 kernel_sendpage+0x156/0x440 nvme_tcp_try_send+0x48a/0x2630 [nvme_tcp] nvme_tcp_queue_rq+0xefb/0x17e0 [nvme_tcp] __blk_mq_try_issue_directly+0x452/0x660 blk_mq_plug_issue_direct.constprop.0+0x207/0x700 blk_mq_flush_plug_list+0x6f5/0xc70 __blk_flush_plug+0x264/0x410 blk_finish_plug+0x4b/0xa0 shrink_lruvec+0x1263/0x1ea0 shrink_node+0x736/0x1a80 balance_pgdat+0x740/0x10d0 kswapd+0x5f2/0xaf0 kthread+0x256/0x2f0 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(sk_lock-AF_INET-NVME); lock(fs_reclaim); lock(sk_lock-AF_INET-NVME); *** DEADLOCK *** 3 locks held by kswapd0/92: #0: ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0 #1: ffff88811f21b0b0 (q->srcu){....}-{0:0}, at: blk_mq_flush_plug_list+0x6b3/0xc70 #2: ffff888170b11470 (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0xeb9/0x17e0 [nvme_tcp] Fixes: 3f2304f ("nvme-tcp: add NVMe over TCP host driver") Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
`hostname` needs to be set as null-pointer after free in `cifs_put_tcp_session` function, or when `cifsd` thread attempts to resolve hostname and reconnect the host, the thread would deref the invalid pointer. Here is one of practical backtrace examples as reference: Task 477 --------------------------- do_mount path_mount do_new_mount vfs_get_tree smb3_get_tree smb3_get_tree_common cifs_smb3_do_mount cifs_mount mount_put_conns cifs_put_tcp_session --> kfree(server->hostname) cifsd --------------------------- kthread cifs_demultiplex_thread cifs_reconnect reconn_set_ipaddr_from_hostname --> if (!server->hostname) --> if (server->hostname[0] == '\0') // !! UAF fault here CIFS: VFS: cifs_mount failed w/return code = -112 mount error(112): Host is down BUG: KASAN: use-after-free in reconn_set_ipaddr_from_hostname+0x2ba/0x310 Read of size 1 at addr ffff888108f35380 by task cifsd/480 CPU: 2 PID: 480 Comm: cifsd Not tainted 6.1.0-rc2-00106-gf705792f89dd-dirty linux-sunxi#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0x85 print_report+0x16c/0x4a3 kasan_report+0x95/0x190 reconn_set_ipaddr_from_hostname+0x2ba/0x310 __cifs_reconnect.part.0+0x241/0x800 cifs_reconnect+0x65f/0xb60 cifs_demultiplex_thread+0x1570/0x2570 kthread+0x2c5/0x380 ret_from_fork+0x22/0x30 </TASK> Allocated by task 477: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7e/0x90 __kmalloc_node_track_caller+0x52/0x1b0 kstrdup+0x3b/0x70 cifs_get_tcp_session+0xbc/0x19b0 mount_get_conns+0xa9/0x10c0 cifs_mount+0xdf/0x1970 cifs_smb3_do_mount+0x295/0x1660 smb3_get_tree+0x352/0x5e0 vfs_get_tree+0x8e/0x2e0 path_mount+0xf8c/0x1990 do_mount+0xee/0x110 __x64_sys_mount+0x14b/0x1f0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 477: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0x10a/0x190 __kmem_cache_free+0xca/0x3f0 cifs_put_tcp_session+0x30c/0x450 cifs_mount+0xf95/0x1970 cifs_smb3_do_mount+0x295/0x1660 smb3_get_tree+0x352/0x5e0 vfs_get_tree+0x8e/0x2e0 path_mount+0xf8c/0x1990 do_mount+0xee/0x110 __x64_sys_mount+0x14b/0x1f0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888108f35380 which belongs to the cache kmalloc-16 of size 16 The buggy address is located 0 bytes inside of 16-byte region [ffff888108f35380, ffff888108f35390) The buggy address belongs to the physical page: page:00000000333f8e58 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888108f350e0 pfn:0x108f35 flags: 0x200000000000200(slab|node=0|zone=2) raw: 0200000000000200 0000000000000000 dead000000000122 ffff8881000423c0 raw: ffff888108f350e0 000000008080007a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888108f35280: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc ffff888108f35300: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc >ffff888108f35380: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc ^ ffff888108f35400: fa fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888108f35480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 7be3248 ("cifs: To match file servers, make sure the server hostname matches") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
[ Upstream commit 7d984da ] rxe_mr_cleanup() which tries to free mr->map again will be called when rxe_mr_init_user() fails: CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ linux-sunxi#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x5d panic+0x19e/0x349 end_report.part.0+0x54/0x7c kasan_report.cold+0xa/0xf rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe] __rxe_cleanup+0x10a/0x1e0 [rdma_rxe] rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe] ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs] This issue was firstly exposed since commit b18c7da ("RDMA/rxe: Fix memory leak in error path code") and then we fixed it in commit 8ff5f5d ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this fix was reverted together at last by commit 1e75550 (Revert "RDMA/rxe: Create duplicate mapping tables for FMRs") Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is successfully allocated. Fixes: 1e75550 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] linux-sunxi#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] linux-sunxi#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] linux-sunxi#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] linux-sunxi#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] linux-sunxi#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] linux-sunxi#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 linux-sunxi#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] linux-sunxi#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] linux-sunxi#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 linux-sunxi#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 linux-sunxi#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 linux-sunxi#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 linux-sunxi#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 linux-sunxi#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 linux-sunxi#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 linux-sunxi#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a linux-sunxi#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 linux-sunxi#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 linux-sunxi#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f linux-sunxi#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 linux-sunxi#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] linux-sunxi#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c linux-sunxi#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 linux-sunxi#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d linux-sunxi#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Sebastian reports that commit 1c913a1 ("KVM: arm64: Iterate arm_pmus list to probe for default PMU") introduced the following splat with CONFIG_DEBUG_PREEMPT enabled: [70506.110187] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-aar/3078242 [70506.119077] caller is debug_smp_processor_id+0x20/0x30 [70506.124229] CPU: 129 PID: 3078242 Comm: qemu-system-aar Tainted: G W 6.4.0-rc5 linux-sunxi#25 [70506.133176] Hardware name: GIGABYTE R181-T92-00/MT91-FS4-00, BIOS F34 08/13/2020 [70506.140559] Call trace: [70506.142993] dump_backtrace+0xa4/0x130 [70506.146737] show_stack+0x20/0x38 [70506.150040] dump_stack_lvl+0x48/0x60 [70506.153704] dump_stack+0x18/0x28 [70506.157007] check_preemption_disabled+0xe4/0x108 [70506.161701] debug_smp_processor_id+0x20/0x30 [70506.166046] kvm_arm_pmu_v3_set_attr+0x460/0x628 [70506.170662] kvm_arm_vcpu_arch_set_attr+0x88/0xd8 [70506.175363] kvm_arch_vcpu_ioctl+0x258/0x4a8 [70506.179632] kvm_vcpu_ioctl+0x32c/0x6b8 [70506.183465] __arm64_sys_ioctl+0xb4/0x100 [70506.187467] invoke_syscall+0x78/0x108 [70506.191205] el0_svc_common.constprop.0+0x4c/0x100 [70506.195984] do_el0_svc+0x34/0x50 [70506.199287] el0_svc+0x34/0x108 [70506.202416] el0t_64_sync_handler+0xf4/0x120 [70506.206674] el0t_64_sync+0x194/0x198 Fix the issue by using the raw variant that bypasses the debug assertion. While at it, stick all of the nuance and UAPI baggage into a comment for posterity. Fixes: 1c913a1 ("KVM: arm64: Iterate arm_pmus list to probe for default PMU") Reported-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230606184814.456743-1-oliver.upton@linux.dev
The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 linux-sunxi#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 linux-sunxi#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 linux-sunxi#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 linux-sunxi#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 linux-sunxi#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 linux-sunxi#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 linux-sunxi#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 linux-sunxi#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 linux-sunxi#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 linux-sunxi#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 linux-sunxi#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 linux-sunxi#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 linux-sunxi#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 linux-sunxi#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 linux-sunxi#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 linux-sunxi#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 linux-sunxi#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 linux-sunxi#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 linux-sunxi#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 linux-sunxi#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 linux-sunxi#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 linux-sunxi#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 linux-sunxi#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 linux-sunxi#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 linux-sunxi#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 linux-sunxi#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ Upstream commit 37c3b9f ] The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 jwrdegoede#1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 jwrdegoede#2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 jwrdegoede#3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 jwrdegoede#4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb jwrdegoede#5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] jwrdegoede#6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] linux-sunxi#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] linux-sunxi#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] linux-sunxi#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] linux-sunxi#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] linux-sunxi#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] linux-sunxi#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 linux-sunxi#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] linux-sunxi#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] linux-sunxi#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 linux-sunxi#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 linux-sunxi#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 linux-sunxi#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 linux-sunxi#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 linux-sunxi#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 linux-sunxi#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 linux-sunxi#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a linux-sunxi#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 linux-sunxi#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 linux-sunxi#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f linux-sunxi#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 linux-sunxi#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 jwrdegoede#1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 jwrdegoede#2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 jwrdegoede#3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b jwrdegoede#4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] jwrdegoede#5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] jwrdegoede#6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] linux-sunxi#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c linux-sunxi#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 linux-sunxi#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d linux-sunxi#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL while __ip_make_skb() is running, the function will access icmphdr in the skb even if it is not included. This causes the issue reported by KMSAN. Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL on the socket. Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These are union in struct flowi4 and are implicitly initialized by flowi4_init_output(), but we should not rely on specific union layout. Initialize these explicitly in raw_sendmsg(). [1] BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 ip_finish_skb include/net/ip.h:243 [inline] ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508 raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] __ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128 ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365 raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 linux-sunxi#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 Fixes: 99e5aca ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[ Upstream commit fc1092f ] KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL while __ip_make_skb() is running, the function will access icmphdr in the skb even if it is not included. This causes the issue reported by KMSAN. Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL on the socket. Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These are union in struct flowi4 and are implicitly initialized by flowi4_init_output(), but we should not rely on specific union layout. Initialize these explicitly in raw_sendmsg(). [1] BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 ip_finish_skb include/net/ip.h:243 [inline] ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508 raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] __ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128 ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365 raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 linux-sunxi#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 Fixes: 99e5aca ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously I used kernel 3.0.8 and modules released with the Ubuntu 10.04 image and could enable VGA output with the following program
int main(int argc, char const *argv[])
{
}
I've build 3.0.31 (allwinner-v3.0-android-v2) today, replaced uImage and the modules. And the program above does not seem to work anymore.
I've also got a lot of "init: Failed to create pty - disabling logging for job" messages that did not appear before. So maybe we need to enable something in the kernel that is not enabled by default.
The text was updated successfully, but these errors were encountered: