Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU LLVM: Rearange FM instruction for better performance #9896

Merged
merged 1 commit into from Mar 8, 2021

Conversation

Whatcookie
Copy link
Member

Rearranges the FM instruction to allow the comparisons and the multiplication to be processed in parallel. This doesn't save any instructions, but still results in a speedup.

In the mandelbrot homebrew performance increased from 120 --> 122fps on my 7700K at 5ghz.

On a cpu with more out of order execution resources available, such as my i5-1135G7 at 2.6ghz performance was increased from 69 --> 74fps.

- Doesn't eliminate any instructions, but allows for better out of order execution.
@Yahfz
Copy link
Contributor

Yahfz commented Mar 4, 2021

Got a nice boost here.
149-151 -> 161
9900KS 5.3

const auto cb = eval(bitcast<f32[4]>(bitcast<s32[4]>(b) & ma));
set_vr(op.rt, fm(ca, cb));
const auto cx = eval(ma & mb);
const auto x = fm(a, b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you modify fm function as well? llvm expressions detection relies on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fm is currently just an unnecessary alias for multiplication operator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you modify fm function as well? llvm expressions detection relies on this.

The only time we look for the fm pattern is in is_input_positive, which should still be working since it only looks for the case when a = b

@Megamouse Megamouse added the Optimization Optimizes existing code label Mar 6, 2021
const auto ca = eval(bitcast<f32[4]>(bitcast<s32[4]>(a) & mb));
const auto cb = eval(bitcast<f32[4]>(bitcast<s32[4]>(b) & ma));
set_vr(op.rt, fm(ca, cb));
const auto cx = eval(ma & mb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it could be more correct to & first, then use sext, but I won't bother trying it for now.

@Nekotekina Nekotekina merged commit e5d0e03 into RPCS3:master Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Optimization Optimizes existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants