New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mt6625l: some fixes for R2 internal WiFi driver #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far...have you tested function as builtin and module?
Re normalizing tags - sure, will do. |
Signed-off-by: Alexey Loukianov <lx2@lexa2.ru>
Signed-off-by: Alexey Loukianov <lx2@lexa2.ru>
Original-Author: Nikolay Amiantov <ab@fmap.me> Signed-off-by: Alexey Loukianov <lx2@lexa2.ru>
cc4a00c
to
880f164
Compare
Normalization: done. |
Signed-off-by: Alexey Loukianov <lx2@lexa2.ru>
Signed-off-by: Alexey Loukianov <lx2@lexa2.ru>
880f164
to
c444512
Compare
Frank, I had updated this PR to match the current state of your 5.4-wifi branch. Please check if I had addressed all points from your comments and feel free to merge this one if everything seems to be more or less okay. Thanks. |
Hi |
Hadn't had any luck with tracing potentional memory corruption issue when built as a module - last time I was testing wifi as a module board had been working fine with 6 days uptime without any issues. So it seems to be hard to track intermittent issue. |
Signed-off-by: Alexey Loukianov <lx2@lexa2.ru>
already merged to 5.4-main, thank you ;) |
After previous fix for zero extension test_verifier tests #65 and #66 now fail. Before the fix we can see the alu32 mov op at insn 10 10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP(id=0, smin_value=4294967168,smax_value=4294967423, umin_value=4294967168,umax_value=4294967423, var_off=(0x0; 0x1ffffffff), s32_min_value=-2147483648,s32_max_value=2147483647, u32_min_value=0,u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm 10: (bc) w1 = w1 11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP(id=0, smin_value=0,smax_value=2147483647, umin_value=0,umax_value=4294967295, var_off=(0x0; 0xffffffff), s32_min_value=-2147483648,s32_max_value=2147483647, u32_min_value=0,u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm After the fix at insn 10 because we have 's32_min_value < 0' the following step 11 now has 'smax_value=U32_MAX' where before we pulled the s32_max_value bound into the smax_value as seen above in 11 with smax_value=2147483647. 10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv(id=0, smin_value=4294967168,smax_value=4294967423, umin_value=4294967168,umax_value=4294967423, var_off=(0x0; 0x1ffffffff), s32_min_value=-2147483648, s32_max_value=2147483647, u32_min_value=0,u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm 10: (bc) w1 = w1 11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv(id=0, smin_value=0,smax_value=4294967295, umin_value=0,umax_value=4294967295, var_off=(0x0; 0xffffffff), s32_min_value=-2147483648, s32_max_value=2147483647, u32_min_value=0, u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm The fall out of this is by the time we get to the failing instruction at step 14 where previously we had the following: 14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv(id=0, smin_value=72057594021150720,smax_value=72057594029539328, umin_value=72057594021150720,umax_value=72057594029539328, var_off=(0xffffffff000000; 0xffffff), s32_min_value=-16777216,s32_max_value=-1, u32_min_value=-16777216,u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm 14: (0f) r0 += r1 We now have, 14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv(id=0, smin_value=0,smax_value=72057594037927935, umin_value=0,umax_value=72057594037927935, var_off=(0x0; 0xffffffffffffff), s32_min_value=-2147483648,s32_max_value=2147483647, u32_min_value=0,u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm 14: (0f) r0 += r1 In the original step 14 'smin_value=72057594021150720' this trips the logic in the verifier function check_reg_sane_offset(), if (smin >= BPF_MAX_VAR_OFF || smin <= -BPF_MAX_VAR_OFF) { verbose(env, "value %lld makes %s pointer be out of bounds\n", smin, reg_type_str[type]); return false; } Specifically, the 'smin <= -BPF_MAX_VAR_OFF' check. But with the fix at step 14 we have bounds 'smin_value=0' so the above check is not tripped because BPF_MAX_VAR_OFF=1<<29. We have a smin_value=0 here because at step 10 the smaller smin_value=0 means the subtractions at steps 11 and 12 bring the smin_value negative. 11: (17) r1 -= 2147483584 12: (17) r1 -= 2147483584 13: (77) r1 >>= 8 Then the shift clears the top bit and smin_value is set to 0. Note we still have the smax_value in the fixed code so any reads will fail. An alternative would be to have reg_sane_check() do both smin and smax value tests. To fix the test we can omit the 'r1 >>=8' at line 13. This will change the err string, but keeps the intention of the test as suggseted by the title, "check after truncation of boundary-crossing range". If the verifier logic changes a different value is likely to be thrown in the error or the error will no longer be thrown forcing this test to be examined. With this change we see the new state at step 13. 13: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP(id=0, smin_value=-4294967168,smax_value=127, umin_value=0,umax_value=18446744073709551615, s32_min_value=-2147483648,s32_max_value=2147483647, u32_min_value=0,u32_max_value=-1) R10=fp0 fp-8_w=mmmmmmmm Giving the expected out of bounds error, "value -4294967168 makes map_value pointer be out of bounds" However, for unpriv case we see a different error now because of the mixed signed bounds pointer arithmatic. This seems OK so I've only added the unpriv_errstr for this. Another optino may have been to do addition on r1 instead of subtraction but I favor the approach above slightly. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/159077333942.6014.14004320043595756079.stgit@john-Precision-5820-Tower
commit b094961 upstream. ncsi_rsp_handler_gc() allocates the filter arrays using GFP_KERNEL in softirq context, causing the below backtrace. This allocation is only a few dozen bytes during probing so allocate with GFP_ATOMIC instead. [ 42.813372] BUG: sleeping function called from invalid context at mm/slab.h:416 [ 42.820900] in_atomic(): 1, irqs_disabled(): 0, pid: 213, name: kworker/0:1 [ 42.827893] INFO: lockdep is turned off. [ 42.832023] CPU: 0 PID: 213 Comm: kworker/0:1 Tainted: G W 4.13.16-01441-gad99b38 #65 [ 42.841007] Hardware name: Generic DT based system [ 42.845966] Workqueue: events ncsi_dev_work [ 42.850251] [<8010a494>] (unwind_backtrace) from [<80107510>] (show_stack+0x20/0x24) [ 42.858046] [<80107510>] (show_stack) from [<80612770>] (dump_stack+0x20/0x28) [ 42.865309] [<80612770>] (dump_stack) from [<80148248>] (___might_sleep+0x230/0x2b0) [ 42.873241] [<80148248>] (___might_sleep) from [<80148334>] (__might_sleep+0x6c/0xac) [ 42.881129] [<80148334>] (__might_sleep) from [<80240d6c>] (__kmalloc+0x210/0x2fc) [ 42.888737] [<80240d6c>] (__kmalloc) from [<8060ad54>] (ncsi_rsp_handler_gc+0xd0/0x170) [ 42.896770] [<8060ad54>] (ncsi_rsp_handler_gc) from [<8060b454>] (ncsi_rcv_rsp+0x16c/0x1d4) [ 42.905314] [<8060b454>] (ncsi_rcv_rsp) from [<804d86c8>] (__netif_receive_skb_core+0x3c8/0xb50) [ 42.914158] [<804d86c8>] (__netif_receive_skb_core) from [<804d96cc>] (__netif_receive_skb+0x20/0x7c) [ 42.923420] [<804d96cc>] (__netif_receive_skb) from [<804de4b0>] (netif_receive_skb_internal+0x78/0x6a4) [ 42.932931] [<804de4b0>] (netif_receive_skb_internal) from [<804df980>] (netif_receive_skb+0x78/0x158) [ 42.942292] [<804df980>] (netif_receive_skb) from [<8042f204>] (ftgmac100_poll+0x43c/0x4e8) [ 42.950855] [<8042f204>] (ftgmac100_poll) from [<804e094c>] (net_rx_action+0x278/0x4c4) [ 42.958918] [<804e094c>] (net_rx_action) from [<801016a8>] (__do_softirq+0xe0/0x4c4) [ 42.966716] [<801016a8>] (__do_softirq) from [<8011cd9c>] (do_softirq.part.4+0x50/0x78) [ 42.974756] [<8011cd9c>] (do_softirq.part.4) from [<8011cebc>] (__local_bh_enable_ip+0xf8/0x11c) [ 42.983579] [<8011cebc>] (__local_bh_enable_ip) from [<804dde08>] (__dev_queue_xmit+0x260/0x890) [ 42.992392] [<804dde08>] (__dev_queue_xmit) from [<804df1f0>] (dev_queue_xmit+0x1c/0x20) [ 43.000689] [<804df1f0>] (dev_queue_xmit) from [<806099c0>] (ncsi_xmit_cmd+0x1c0/0x244) [ 43.008763] [<806099c0>] (ncsi_xmit_cmd) from [<8060dc14>] (ncsi_dev_work+0x2e0/0x4c8) [ 43.016725] [<8060dc14>] (ncsi_dev_work) from [<80133dfc>] (process_one_work+0x214/0x6f8) [ 43.024940] [<80133dfc>] (process_one_work) from [<80134328>] (worker_thread+0x48/0x558) [ 43.033070] [<80134328>] (worker_thread) from [<8013ba80>] (kthread+0x130/0x174) [ 43.040506] [<8013ba80>] (kthread) from [<80102950>] (ret_from_fork+0x14/0x24) Fixes: 062b3e1 ("net/ncsi: Refactor MAC, VLAN filters") Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…fixes WARNING: line over 80 characters #65: FILE: mm/hugetlb.c:4469: + * an active hugepage in pagecache. This goto expects the 2nd page fault, total: 0 errors, 1 warnings, 48 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Ethon Paul <ethp@qq.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[ Upstream commit 230db82 ] When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: f2ac57a ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 230db82 ] When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: f2ac57a ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 230db82 ] When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: f2ac57a ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 230db82 ] When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: f2ac57a ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 230db82 ] When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: f2ac57a ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: f1d9a2a ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Alexey Loukianov lx2@lexa2.ru