forked from torvalds/linux
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
arm64: mm: fix linear mapping mem access performance degradation
The arm64 can build 2M/1G block/sectiion mapping. When using DMA/DMA32 zone (enable crashkernel, disable rodata full, disable kfence), the mem_map will use non block/section mapping(for crashkernel requires to shrink the region in page granularity). But it will degrade performance when doing larging continuous mem access in kernel(memcpy/memmove, etc). There are many changes and discussions: commit 0314956 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones") commit 0a30c53 ("arm64: mm: Move reserve_crashkernel() into mem_init()") commit 2687275 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required") This patch changes mem_map to use block/section mapping with crashkernel. Firstly, do block/section mapping(normally 2M or 1G) for all avail mem at mem_map, reserve crashkernel memory. And then walking pagetable to split block/section mapping to non block/section mapping(normally 4K) [[[only]]] for crashkernel mem. So the linear mem mapping use block/section mapping as more as possible. We will reduce the cpu dTLB miss conspicuously, and accelerate mem access about 10-20% performance improvement. I have tested it with pft(Page Fault Test) and fio, obtained great performace improvement. For fio test: 1.prepare ramdisk modprobe -r brd modprobe brd rd_nr=1 rd_size=67108864 dmsetup remove_all wipefs -a --force /dev/ram0 mkfs -t ext4 -E lazy_itable_init=0,lazy_journal_init=0 -q -F /dev/ram0 mkdir -p /fs/ram0 mount -t ext4 /dev/ram0 /fs/ram0 2.prepare fio paremeter in x.fio file: [global] bs=4k ioengine=psync iodepth=128 size=32G direct=1 invalidate=1 group_reporting thread=1 rw=read directory=/fs/ram0 numjobs=1 [task_0] cpus_allowed=16 stonewall=1 3.run testcase: perf stat -e dTLB-load-misses fio x.fio 4.contrast ------------------------ without patch with patch fio READ aggrb=1493.2MB/s aggrb=1775.3MB/s dTLB-load-misses 1,818,320,693 438,729,774 time elapsed(s) 70.500326434 62.877316408 user(s) 15.926332000 15.684721000 sys(s) 54.211939000 47.046165000 5.conclusion Using this patch will reduce dTLB misses and improve performace greatly. Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
- Loading branch information
1 parent
03c765b
commit 168f0d5
Showing
3 changed files
with
116 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters