-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable LZ4_FAST_DEC_LOOP build macro on aarch64 by default #707
Conversation
Your results are in line with several of our observations. However, the issue is, In general, server-class So I believe we need something more accurate than just |
I see your point. What about if we enable the build macro for |
Well, at least it would match our experiments. |
I will try this on kernel Lz4 module with GCC build to see the behavior. |
Sounds like a good idea. Let us know how it goes. |
The patch looks fine @prekageo . |
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ib284c1688942109ec12ccf62998d70f55cbc7296
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sdm845 with LLVM Clang 15, this patch does offer a nice 5-10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec) - lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: HeroBuxx <me@herobuxx.me>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Change-Id: I94a04ce371b2459db59be44e35fbaae14f35b941 Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> (cherry picked from commit 9debe32) Signed-off-by: TogoFire <togofire@mailfence.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Change-Id: I94a04ce371b2459db59be44e35fbaae14f35b941 Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> (cherry picked from commit 9debe32) Signed-off-by: TogoFire <togofire@mailfence.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] lz4/lz4#707 Change-Id: Ie5c1671068770758d0557f3ec00f1e7545d28b4e Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com> Signed-off-by: HeroBuxx <me@herobuxx.me>
Pull request #645 has introduced the build macro LZ4_FAST_DEC_LOOP which by default enables an optimization only for x86/x64.
I propose to enable this optimization for aarch64 as well. Here are the benchmark results for this pull request running on a1.4xlarge AWS EC2 instance. The final percent is how much faster this patchset is vs. the current dev branch.