riscv: Extension parsing fixes #920

bjoto · 2024-04-24T16:33:22Z

Pull request for series with
subject: riscv: Extension parsing fixes
version: 1
url: https://patchwork.kernel.org/project/linux-riscv/list/?series=847525

bjoto · 2024-04-24T16:33:23Z

Upstream branch: ba5ea59
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=847525
version: 1

bjoto · 2024-04-26T22:03:08Z

Upstream branch: ba5ea59
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=848414
version: 2

When the cpus in the same cluster are all in the idle state, the kernel might put the cluster into a deeper low power state. Call the cluster_pm_enter() before entering the low power state and call the cluster_pm_exit() after the cluster woken up. Signed-off-by: Nick Hu <nick.hu@sifive.com> Link: https://lore.kernel.org/r/20240226065113.1690534-1-nick.hu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Alexandre Ghiti <alexghiti@rivosinc.com> says: patch 1 removes a useless memory barrier and patch 2 actually fixes the issue with IPI in the patching code. * b4-shazam-merge: riscv: Fix text patching when IPI are used riscv: Remove superfluous smp_mb() Link: https://lore.kernel.org/r/20240229121056.203419-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

…USH_CTX prctl" Charlie Jenkins <charlie@rivosinc.com> says: Improve the performance of icache flushing by creating a new prctl flag PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow for future expansions such as with the proposed J extension [1]. Documentation is also provided to explain the use case. Patch sent to add PR_RISCV_SET_ICACHE_FLUSH_CTX to man-pages [2]. [1] https://github.com/riscv/riscv-j-extension [2] https://lore.kernel.org/linux-man/20240124-fencei_prctl-v1-1-0bddafcef331@rivosinc.com * b4-shazam-merge: cpumask: Add assign cpu documentation: Document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl riscv: Include riscv_set_icache_flush_ctx prctl riscv: Remove unnecessary irqflags processor.h include Link: https://lore.kernel.org/r/20240312-fencei-v13-0-4b6bdc2bbf32@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

IS_ENABLED(CONFIG_64BIT) in initialization of pgtable_l{4,5}_enabled is redundant, remove it. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240320064712.442579-2-dawei.li@shingroup.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

pgtable_l{4,5}_enabled are read only after initialization, make explicit annotation of __ro_after_init on them. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240320064712.442579-3-dawei.li@shingroup.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

…ired After commit f51f7a0 ("riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent"), for non-coherent platforms with less than 4GB memory, we rely on users to pass "swiotlb=mmnn,force" kernel parameters to enable DMA bouncing for unaligned kmalloc() buffers. Now let's go further: If no bouncing needed for ZONE_DMA, let kernel automatically allocate 1MB swiotlb buffer per 1GB of RAM for kmalloc() bouncing on non-coherent platforms, so that no need to pass "swiotlb=mmnn,force" any more. The math of "1MB swiotlb buffer per 1GB of RAM for kmalloc() bouncing" is taken from arm64. Users can still force smaller swiotlb buffer by passing "swiotlb=mmnn". Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240325110036.1564-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Jisheng Zhang <jszhang@kernel.org> says: This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the cmpxchg-based lockless lockref implementation for riscv. Then, implement arch_cmpxchg64_{relaxed|acquire|release}. After patch1: Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement. On JH7110 platform, I see 12.0% improvement. After patch2: on both TH1520 and JH7110 platforms, I didn't see obvious performance improvement with Linus' test case [1]. IMHO, this may be related with the fence and lr.d/sc.d hw implementations. In theory, lr/sc without fence could give performance improvement over lr/sc plus fence, so add the code here to leave performance improvement room on newer HW platforms. * b4-shazam-merge: riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release} riscv: select ARCH_USE_CMPXCHG_LOCKREF Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1] Link: https://lore.kernel.org/r/20240325111038.1700-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Currently, riscv linux requires at least IMA, so all platforms have a multiplier. And I assume the 'mul' efficiency is comparable or better than a sequence of five or so register-dependent arithmetic instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer codegen. Refer to commit f9b4192 ("[PATCH] bitops: hweight() speedup") for more details. In a simple benchmark test calling hweight64() in a loop, it got: about 14% performance improvement on JH7110, tested on Milkv Mars. about 23% performance improvement on TH1520 and SG2042, tested on Sipeed LPI4A and SG2042 platform. a slight performance drop on CV1800B, tested on milkv duo. Among all riscv platforms in my hands, this is the only one which sees a slight performance drop. It means the 'mul' isn't quick enough. However, the situation exists on x86 too, for example, P4 doesn't have fast integer multiplies as said in the above commit, x86 also selects ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER which can benefit almost riscv platforms. Samuel also provided some performance numbers: On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for __sw_hweight64. On D1: 8% speedup for __sw_hweight32 and 8% slowdown for __sw_hweight64. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240325105823.1483-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

bjoto · 2024-04-28T22:00:11Z

Upstream branch: 0a16a17
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=848414
version: 2

bjoto · 2024-04-29T22:35:08Z

Upstream branch: 0a16a17
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=849066
version: 3

bjoto · 2024-05-03T05:02:47Z

Upstream branch: 0a16a17
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=850129
version: 4

The riscv_cpuinfo struct that contains mvendorid and marchid is not populated until all harts are booted which happens after the DT parsing. Use the mvendorid/marchid from the boot hart to determine if the DT contains an invalid V. Fixes: d82f322 ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>

This loop is supposed to check if ext->subset_ext_ids[j] is valid, rather than if ext->subset_ext_ids[i] is valid, before setting the extension id ext->subset_ext_ids[j] in isainfo->isa. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Fixes: 0d8295e ("riscv: add ISA extension parsing for scalar crypto") Signed-off-by: Björn Töpel <bjorn@rivosinc.com>

bjoto added new V1 for-next labels Apr 24, 2024

bjoto added V2 and removed V1 labels Apr 26, 2024

bjoto force-pushed the series/847525=>for-next branch from 5bfd8e0 to f1fc9ba Compare April 26, 2024 22:03

nick650823 and others added 9 commits April 28, 2024 14:50

adding ci files

ff9921f

bjoto force-pushed the for-next_base branch from ba6cb25 to ff9921f Compare April 28, 2024 21:58

bjoto force-pushed the series/847525=>for-next branch from f1fc9ba to 6167b34 Compare April 28, 2024 22:00

bjoto added V3 and removed V2 labels Apr 29, 2024

bjoto force-pushed the series/847525=>for-next branch from 6167b34 to 54e22df Compare April 29, 2024 22:35

bjoto added V4 and removed V3 labels May 3, 2024

charlie-rivos added 2 commits May 3, 2024 05:02

bjoto force-pushed the series/847525=>for-next branch from 54e22df to 141d385 Compare May 3, 2024 05:02

bjoto force-pushed the for-next_base branch from ff9921f to 68c5281 Compare May 14, 2024 14:02

bjoto closed this May 14, 2024

bjoto deleted the series/847525=>for-next branch May 16, 2024 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv: Extension parsing fixes #920

riscv: Extension parsing fixes #920

bjoto commented Apr 24, 2024

bjoto commented Apr 24, 2024

bjoto commented Apr 26, 2024

bjoto commented Apr 28, 2024

bjoto commented Apr 29, 2024

bjoto commented May 3, 2024

riscv: Extension parsing fixes #920

riscv: Extension parsing fixes #920

Conversation

bjoto commented Apr 24, 2024

bjoto commented Apr 24, 2024

bjoto commented Apr 26, 2024

bjoto commented Apr 28, 2024

bjoto commented Apr 29, 2024

bjoto commented May 3, 2024