Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPU/SPU LLVM: Emulate VPERM2B with a 256 bit wide VPERMB (AVX512 optimization) #10959

Merged
merged 1 commit into from Oct 13, 2021

Conversation

Whatcookie
Copy link
Member

Saves 1 uop by using 256 wide VPERMB instead of VPERM2B. (Compiles down to a vinserti128 and vpermb)

On my tigerlake laptop the AVX2 path has a throughput of one shufb per 4 cycles. The old AVX512 path has a throughput of 1 per 3 cycles, and the new AVX512 path has a throughput of 1 per 2.3 cycles. (tested in a little benchmark I wrote in asm before I rewrote it in LLVM IR.)

- Save 1 uop by using 256 wide VPERMB instead of VPERM2B. (Compiles down to a vinserti128 and vpermb)
@Megamouse Megamouse added CPU Optimization Optimizes existing code labels Oct 5, 2021
@Nekotekina Nekotekina merged commit f06c8b2 into RPCS3:master Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPU Optimization Optimizes existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants