Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPU LLVM: Optimize altivec FMA with 0 addend #8013

Merged
merged 1 commit into from Apr 12, 2020
Merged

Conversation

Whatcookie
Copy link
Member

One quirk of the altivec ISA is that only floating multiply add (FMA) and floating add instructions are provided. To execute a floating multiply without an add you had to execute an FMA with an addend of 0.

Let's detect this case and emit only a floating multiply when a constant addend of 0 is used.

On skylake the gains are very small, since FMA and floating multiply ops are executed with the same latency, but on ryzen floating multiply has lower latency than FMA, so it may benefit more. Anything without native FMA support should also benefit plenty.

rpcs3/Emu/Cell/PPUTranslator.cpp Outdated Show resolved Hide resolved
rpcs3/Emu/Cell/PPUTranslator.cpp Outdated Show resolved Hide resolved
rpcs3/Emu/Cell/PPUTranslator.cpp Outdated Show resolved Hide resolved
@Whatcookie Whatcookie force-pushed the ppu_vpu branch 2 times, most recently from 5aa0f7b to a350dd3 Compare April 11, 2020 23:31
rpcs3/Emu/Cell/PPUTranslator.cpp Outdated Show resolved Hide resolved
- When VMADDFP and VNMSUBFP are used with a constant addend of 0, they can be simplified into a single floating multiply
@AniLeo AniLeo merged commit 6b0f7a8 into RPCS3:master Apr 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants