fixup! arm64: dts: qcom: add device-tree for Redmi Note 6 Pro (tulip) #22

M0Rf30 · 2024-02-11T20:21:59Z

After some analysis, removed regions are the same as declared within in sdm660.dtsi.
This sports:

wifi enablement
spmi-haptics node
remoteprocfs fixes (now rmtfs daemon sees /dev/qcom_rmtfs_mem1e)
add hall sensor

Known issues:

[   65.457083] qcom-q6v5-mss 4080000.remoteproc: fatal error received: err_qdi.c:459:EX:wlan_process:1:WLAN RT:1087:PC=b00c87e0
[   65.465433] remoteproc remoteproc1: crash detected in 4080000.remoteproc: type fatal error
[   65.473827] remoteproc remoteproc1: handling crash #3 in 4080000.remoteproc
[   65.482046] remoteproc remoteproc1: recovering 4080000.remoteproc
[   65.524203] ath10k_snoc 18800000.wifi: firmware crashed! (guid ad8cb34b-385c-4465-8e32-c02cb94593f6)
[   65.532747] ath10k_snoc 18800000.wifi: wcn3990 hw1.0 target 0x00000008 chip_id 0x00000000 sub 0000:0000
[   65.540995] ath10k_snoc 18800000.wifi: kconfig debug 1 debugfs 1 tracing 0 dfs 0 testmode 0
[   65.549050] ath10k_snoc 18800000.wifi: firmware ver 1.0.0.591 api 5 features wowlan,mgmt-tx-by-reference,non-bmi crc32 b3d4b790
[   65.557163] ath10k_snoc 18800000.wifi: board_file api 2 bmi_id N/A crc32 00000000
[   65.565221] ath10k_snoc 18800000.wifi: htt-ver 3.58 wmi-op 4 htt-op 3 cal file max-sta 32 raw 0 hwcrypto 1
[   65.594267] qcom-q6v5-mss 4080000.remoteproc: port failed halt
[   65.608939] remoteproc remoteproc1: stopped remote processor 4080000.remoteproc
[   65.730295] qcom-q6v5-mss 4080000.remoteproc: MBA booted without debug policy, loading mpss
[   67.384927] remoteproc remoteproc1: remote processor 4080000.remoteproc is now up
[   68.574331] ath10k_snoc 18800000.wifi: failed to set PS Mode 1 for vdev 0: -108
[   68.594242] ath10k_snoc 18800000.wifi: failed to setup powersave: -108
[   68.610239] ath10k_snoc 18800000.wifi: failed to setup ps on vdev 0: -108
[   68.626277] ath10k_snoc 18800000.wifi: failed to set vdev wmm params on vdev 1: -108
[   68.642312] ath10k_snoc 18800000.wifi: failed to set vdev wmm params on vdev 1: -108
[   68.658252] ath10k_snoc 18800000.wifi: failed to set vdev wmm params on vdev 1: -108
[   68.674252] ath10k_snoc 18800000.wifi: failed to set vdev wmm params on vdev 1: -108
[   68.690251] ath10k_snoc 18800000.wifi: device successfully recovered
[   70.015739] ath10k_snoc 18800000.wifi: failed to start hw scan: -108
[   70.436591] ieee80211 phy0: Hardware restart was requested

I've already done lots of attempts with qca-swissarmyknife and official ath10k board-2.bin and firmware-5.bin but firmware crashes after a while.

Wifi APs are detected and associated.

minlexx · 2024-02-11T22:31:03Z

Can you post full dmesg log since time stamp 0? Or at least since the first moment ath10k probes.
I'm interested in messages like

[ 23.219133] ath10k_snoc 18800000.wifi: qmi chip_id 0x140 chip_family 0x4002 board_id 0xff soc_id 0x40050000
[ 23.226119] ath10k_snoc 18800000.wifi: qmi fw_version 0x10128219 fw_build_timestamp 2018-11-05 21:43 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HL.1.0.1.c2-00537-QCAHLSWMTPLZ-1

M0Rf30 · 2024-02-11T22:40:28Z

Can you post full dmesg log since time stamp 0? Or at least since the first moment ath10k probes. I'm interested in messages like
[ 23.219133] ath10k_snoc 18800000.wifi: qmi chip_id 0x140 chip_family 0x4002 board_id 0xff soc_id 0x40050000
[ 23.226119] ath10k_snoc 18800000.wifi: qmi fw_version 0x10128219 fw_build_timestamp 2018-11-05 21:43 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HL.1.0.1.c2-00537-QCAHLSWMTPLZ-1

[   31.507183] ath10k_snoc 18800000.wifi: qmi chip_id 0x140 chip_family 0x4002 board_id 0xff soc_id 0x40050000
[   31.507412] zram0: detected capacity change from 0 to 3776512
[   31.514917] ath10k_snoc 18800000.wifi: qmi fw_version 0x101c821a fw_build_timestamp 2019-07-25 03:17 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HL.1.0.1.c2-00538-QCAHLSWMTPLZ-1.214870.1

minlexx · 2024-02-12T00:11:53Z

Is diag-router service running?
Have you tried in json that's used as input to ath10k-bdencoder, leave only entry with board_id 0xff ?

In other words, use only

[
  {
          "data": "bdf/bdwlan.bin",
          "names": ["bus=snoc,qmi-board-id=ff"]
  }
]

as input to ath10k-bdencoder, without all other entries. Will it crash less?

M0Rf30 · 2024-02-12T00:26:57Z

Is diag-router service running?

Have you tried in json that's used as input to ath10k-bdencoder, leave only entry with board_id 0xff ?

In other words, use only
[
  {
          "data": "bdf/bdwlan.bin",
          "names": ["bus=snoc,qmi-board-id=ff"]
  }
]
as input to ath10k-bdencoder, without all other entries. Will it crash less?

already tried, also with chip-id=140. but nothing meaningful
what I should expect from diag-router? I only see something related to modem

minlexx · 2024-02-12T10:41:59Z

what I should expect from diag-router?

Nothing specific, it should be running and you should see DIAG services in qrtr-lookup output.

Well, I have no other suggestions and I'm not expert in tulip's reserved memory, I'm assuming you know better.

[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

…uddy pages commit 8cf360b upstream. When I did memory failure tests recently, below panic occurs: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page)) ------------[ cut here ]------------ kernel BUG at include/linux/page-flags.h:1009! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__del_page_from_free_list+0x151/0x180 RSP: 0018:ffffa49c90437998 EFLAGS: 00000046 RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0 RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69 R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80 R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009 FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0 Call Trace: <TASK> __rmqueue_pcplist+0x23b/0x520 get_page_from_freelist+0x26b/0xe40 __alloc_pages_noprof+0x113/0x1120 __folio_alloc_noprof+0x11/0xb0 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130 __alloc_fresh_hugetlb_folio+0xe7/0x140 alloc_pool_huge_folio+0x68/0x100 set_max_huge_pages+0x13d/0x340 hugetlb_sysctl_handler_common+0xe8/0x110 proc_sys_call_handler+0x194/0x280 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff916114887 RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887 RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003 RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0 R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00 </TASK> Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- And before the panic, there had an warning about bad page state: BUG: Bad page state in process page-types pfn:8cee00 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) page_type: 0xffffff7f(buddy) raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22 Call Trace: <TASK> dump_stack_lvl+0x83/0xa0 bad_page+0x63/0xf0 free_unref_page+0x36e/0x5c0 unpoison_memory+0x50b/0x630 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xcd/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f189a514887 RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887 RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003 RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8 R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040 </TASK> The root cause should be the below race: memory_failure try_memory_failure_hugetlb me_huge_page __page_handle_poison dissolve_free_hugetlb_folio drain_all_pages -- Buddy page can be isolated e.g. for compaction. take_page_off_buddy -- Failed as page is not in the buddy list. -- Page can be putback into buddy after compaction. page_ref_inc -- Leads to buddy page with refcnt = 1. Then unpoison_memory() can unpoison the page and send the buddy page back into buddy list again leading to the above bad page state warning. And bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to allocate this page. Fix this issue by only treating __page_handle_poison() as successful when it returns 1. Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com Fixes: ceaf8fb ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 1be9449 ] There is a specific error path in probe functions in wilc drivers (both sdio and spi) which can lead to kernel panic, as this one for example when using SPI: Unable to handle kernel paging request at virtual address 9f000000 when read [9f000000] *pgd=00000000 Internal error: Oops: 5 [#1] ARM Modules linked in: wilc1000_spi(+) crc_itu_t crc7 wilc1000 cfg80211 bluetooth ecdh_generic ecc CPU: 0 UID: 0 PID: 106 Comm: modprobe Not tainted 6.13.0-rc3+ #22 Hardware name: Atmel SAMA5 PC is at wiphy_unregister+0x244/0xc40 [cfg80211] LR is at wiphy_unregister+0x1c0/0xc40 [cfg80211] [...] wiphy_unregister [cfg80211] from wilc_netdev_cleanup+0x380/0x494 [wilc1000] wilc_netdev_cleanup [wilc1000] from wilc_bus_probe+0x360/0x834 [wilc1000_spi] wilc_bus_probe [wilc1000_spi] from spi_probe+0x15c/0x1d4 spi_probe from really_probe+0x270/0xb2c really_probe from __driver_probe_device+0x1dc/0x4e8 __driver_probe_device from driver_probe_device+0x5c/0x140 driver_probe_device from __driver_attach+0x220/0x540 __driver_attach from bus_for_each_dev+0x13c/0x1a8 bus_for_each_dev from bus_add_driver+0x2a0/0x6a4 bus_add_driver from driver_register+0x27c/0x51c driver_register from do_one_initcall+0xf8/0x564 do_one_initcall from do_init_module+0x2e4/0x82c do_init_module from load_module+0x59a0/0x70c4 load_module from init_module_from_file+0x100/0x148 init_module_from_file from sys_finit_module+0x2fc/0x924 sys_finit_module from ret_fast_syscall+0x0/0x1c The issue can easily be reproduced, for example by not wiring correctly a wilc device through SPI (and so, make it unresponsive to early SPI commands). It is due to a recent change decoupling wiphy allocation from wiphy registration, however wilc_netdev_cleanup has not been updated accordingly, letting it possibly call wiphy unregister on a wiphy which has never been registered. Fix this crash by moving wiphy_unregister/wiphy_free out of wilc_netdev_cleanup, and by adjusting error paths in both drivers Fixes: fbdf0c5 ("wifi: wilc1000: Register wiphy after reading out chipid") Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20241223-wilc_fix_probe_error_path-v1-1-91fa7bd8e5b6@bootlin.com Signed-off-by: Sasha Levin <sashal@kernel.org>

…stamping [ Upstream commit 030ce91 ] The stmmac platform drivers that do not open-code the clk_ptp_rate value after having retrieved the default one from the device-tree can end up with 0 in clk_ptp_rate (as clk_get_rate can return 0). It will eventually propagate up to PTP initialization when bringing up the interface, leading to a divide by 0: Division by zero in kernel. CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.30-00001-g48313bd5768a #22 Hardware name: STM32 (Device Tree Support) Call trace: unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x6c/0x8c dump_stack_lvl from Ldiv0_64+0x8/0x18 Ldiv0_64 from stmmac_init_tstamp_counter+0x190/0x1a4 stmmac_init_tstamp_counter from stmmac_hw_setup+0xc1c/0x111c stmmac_hw_setup from __stmmac_open+0x18c/0x434 __stmmac_open from stmmac_open+0x3c/0xbc stmmac_open from __dev_open+0xf4/0x1ac __dev_open from __dev_change_flags+0x1cc/0x224 __dev_change_flags from dev_change_flags+0x24/0x60 dev_change_flags from ip_auto_config+0x2e8/0x11a0 ip_auto_config from do_one_initcall+0x84/0x33c do_one_initcall from kernel_init_freeable+0x1b8/0x214 kernel_init_freeable from kernel_init+0x24/0x140 kernel_init from ret_from_fork+0x14/0x28 Exception stack(0xe0815fb0 to 0xe0815ff8) Prevent this division by 0 by adding an explicit check and error log about the actual issue. While at it, remove the same check from stmmac_ptp_register, which then becomes duplicate Fixes: 19d857c ("stmmac: Fix calculations for ptp counters when clock input = 50Mhz.") Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-1-d73340a794d5@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 28aa299 upstream. When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches pagemap_scan_backout_range(), kernel panics with null-ptr-deref: [ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none) [ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80 <snip registers, unreliable trace> [ 44.946828] Call Trace: [ 44.947030] <TASK> [ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0 [ 44.952593] walk_pmd_range.isra.0+0x302/0x910 [ 44.954069] walk_pud_range.isra.0+0x419/0x790 [ 44.954427] walk_p4d_range+0x41e/0x620 [ 44.954743] walk_pgd_range+0x31e/0x630 [ 44.955057] __walk_page_range+0x160/0x670 [ 44.956883] walk_page_range_mm+0x408/0x980 [ 44.958677] walk_page_range+0x66/0x90 [ 44.958984] do_pagemap_scan+0x28d/0x9c0 [ 44.961833] do_pagemap_cmd+0x59/0x80 [ 44.962484] __x64_sys_ioctl+0x18d/0x210 [ 44.962804] do_syscall_64+0x5b/0x290 [ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are allocated and p->vec_buf remains set to NULL. This breaks an assumption made later in pagemap_scan_backout_range(), that page_region is always allocated for p->vec_buf_index. Fix it by explicitly checking p->vec_buf for NULL before dereferencing. Other sites that might run into same deref-issue are already (directly or transitively) protected by checking p->vec_buf. Note: From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output is requested and it's only the side effects caller is interested in, hence it passes check in pagemap_scan_get_args(). This issue was found by syzkaller. Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de Fixes: 52526ca ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Penglei Jiang <superman.xpt@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches pagemap_scan_backout_range(), kernel panics with null-ptr-deref: [ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none) [ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80 <snip registers, unreliable trace> [ 44.946828] Call Trace: [ 44.947030] <TASK> [ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0 [ 44.952593] walk_pmd_range.isra.0+0x302/0x910 [ 44.954069] walk_pud_range.isra.0+0x419/0x790 [ 44.954427] walk_p4d_range+0x41e/0x620 [ 44.954743] walk_pgd_range+0x31e/0x630 [ 44.955057] __walk_page_range+0x160/0x670 [ 44.956883] walk_page_range_mm+0x408/0x980 [ 44.958677] walk_page_range+0x66/0x90 [ 44.958984] do_pagemap_scan+0x28d/0x9c0 [ 44.961833] do_pagemap_cmd+0x59/0x80 [ 44.962484] __x64_sys_ioctl+0x18d/0x210 [ 44.962804] do_syscall_64+0x5b/0x290 [ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are allocated and p->vec_buf remains set to NULL. This breaks an assumption made later in pagemap_scan_backout_range(), that page_region is always allocated for p->vec_buf_index. Fix it by explicitly checking p->vec_buf for NULL before dereferencing. Other sites that might run into same deref-issue are already (directly or transitively) protected by checking p->vec_buf. Note: From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output is requested and it's only the side effects caller is interested in, hence it passes check in pagemap_scan_get_args(). This issue was found by syzkaller. Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de Fixes: 52526ca ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Penglei Jiang <superman.xpt@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

minlexx approved these changes Feb 12, 2024

View reviewed changes

fixup! arm64: dts: qcom: add device-tree for Redmi Note 6 Pro (tulip)

fae89ae

M0Rf30 force-pushed the sdm660-6.8.0-rc3/sdm636-xiaomi-tulip branch from bf11f73 to fae89ae Compare February 12, 2024 17:48

M0Rf30 merged this pull request into sdm660-mainline:qcom-sdm660-6.8.0-rc3 Feb 12, 2024

M0Rf30 deleted the sdm660-6.8.0-rc3/sdm636-xiaomi-tulip branch February 12, 2024 18:45

M0Rf30 restored the sdm660-6.8.0-rc3/sdm636-xiaomi-tulip branch February 18, 2024 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixup! arm64: dts: qcom: add device-tree for Redmi Note 6 Pro (tulip) #22

fixup! arm64: dts: qcom: add device-tree for Redmi Note 6 Pro (tulip) #22

Uh oh!

M0Rf30 commented Feb 11, 2024 •

edited

Loading

Uh oh!

minlexx commented Feb 11, 2024

Uh oh!

M0Rf30 commented Feb 11, 2024

Uh oh!

minlexx commented Feb 12, 2024

Uh oh!

M0Rf30 commented Feb 12, 2024 •

edited

Loading

Uh oh!

minlexx commented Feb 12, 2024

Uh oh!

Uh oh!

fixup! arm64: dts: qcom: add device-tree for Redmi Note 6 Pro (tulip) #22

fixup! arm64: dts: qcom: add device-tree for Redmi Note 6 Pro (tulip) #22

Uh oh!

Conversation

M0Rf30 commented Feb 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minlexx commented Feb 11, 2024

Uh oh!

M0Rf30 commented Feb 11, 2024

Uh oh!

minlexx commented Feb 12, 2024

Uh oh!

M0Rf30 commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minlexx commented Feb 12, 2024

Uh oh!

Uh oh!

M0Rf30 commented Feb 11, 2024 •

edited

Loading

M0Rf30 commented Feb 12, 2024 •

edited

Loading