SPU LLVM: Use VDBPSADBW in SUMB #10937

Whatcookie · 2021-09-27T09:33:32Z

I overlooked this instruction earlier when optimizing the VNNI path for SUMB.

This AVX-512 instruction first calculates the absolute difference between 2 vectors before summing bytes horizontally. By using a vector of all zeroes, we can effectively use the instruction to just sum bytes horizontally.

This instruction should be faster than the VNNI alternative since we don't need to zero out the destination register, and we don't need to load a vector full of the constant 0x01. Additionally, when op.ra == op.rb, we only need a single instruction for the correct behavior.

- This instruction can be used to sum bytes horrizontally if the second input vector is all zeroes.

- The first element can be extracted via vmovd rather than vpextrd, which saves 1 uop.

Whatcookie · 2021-09-29T09:14:14Z

Added a commit to optimize some branches following a byteswap.

Nekotekina · 2021-09-29T23:15:02Z

Hmm, How about new syntax for intrinsics? See fre/fmax.

Megamouse added CPU Optimization Optimizes existing code labels Sep 28, 2021

Whatcookie added 2 commits September 29, 2021 00:44

SPU LLVM: Use VDBPSADBW in SUMB

b3e7e5d

- This instruction can be used to sum bytes horrizontally if the second input vector is all zeroes.

SPU LLVM: Optimize branches following byteswaps

06585d9

- The first element can be extracted via vmovd rather than vpextrd, which saves 1 uop.

Whatcookie force-pushed the spu branch from c7e10eb to 06585d9 Compare September 29, 2021 09:11

LLVM DSL: reimplement vdbpsadbw

e4624d8

Nekotekina merged commit 2cfa540 into RPCS3:master Sep 30, 2021

Asinin3 mentioned this pull request Sep 30, 2021

Error while linking PPU Modules #10947

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPU LLVM: Use VDBPSADBW in SUMB #10937

SPU LLVM: Use VDBPSADBW in SUMB #10937

Whatcookie commented Sep 27, 2021 •

edited

Whatcookie commented Sep 29, 2021

Nekotekina commented Sep 29, 2021

SPU LLVM: Use VDBPSADBW in SUMB #10937

SPU LLVM: Use VDBPSADBW in SUMB #10937

Conversation

Whatcookie commented Sep 27, 2021 • edited

Whatcookie commented Sep 29, 2021

Nekotekina commented Sep 29, 2021

Whatcookie commented Sep 27, 2021 •

edited