Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU LLVM: Use VDBPSADBW in SUMB #10937

Merged
merged 3 commits into from Sep 30, 2021
Merged

SPU LLVM: Use VDBPSADBW in SUMB #10937

merged 3 commits into from Sep 30, 2021

Conversation

Whatcookie
Copy link
Member

@Whatcookie Whatcookie commented Sep 27, 2021

I overlooked this instruction earlier when optimizing the VNNI path for SUMB.

This AVX-512 instruction first calculates the absolute difference between 2 vectors before summing bytes horizontally. By using a vector of all zeroes, we can effectively use the instruction to just sum bytes horizontally.

This instruction should be faster than the VNNI alternative since we don't need to zero out the destination register, and we don't need to load a vector full of the constant 0x01. Additionally, when op.ra == op.rb, we only need a single instruction for the correct behavior.

@Megamouse Megamouse added CPU Optimization Optimizes existing code labels Sep 28, 2021
- This instruction can be used to sum bytes horrizontally if the second input vector is all zeroes.
- The first element can be extracted via vmovd rather than vpextrd, which saves 1 uop.
@Whatcookie
Copy link
Member Author

Added a commit to optimize some branches following a byteswap.

@Nekotekina
Copy link
Member

Hmm, How about new syntax for intrinsics? See fre/fmax.

@Nekotekina Nekotekina merged commit 2cfa540 into RPCS3:master Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPU Optimization Optimizes existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants