-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aarch64 - improve after lea lowering #7929
Conversation
@mxw - Hi Max, Can you inspect this one? |
@swalk-cavium No new failures in either the unit tests or in OSS Performance suite. |
@@ -99,6 +99,25 @@ bool simplify(Env& env, const loadb& inst, Vlabel b, size_t i) { | |||
|
|||
/////////////////////////////////////////////////////////////////////////////// | |||
|
|||
bool simplify(Env& env, const ldimmq& inst, Vlabel b, size_t i) { | |||
return if_inst<Vinstr::lea>(env, b, i + 1, [&] (const lea& ea) { | |||
// ldimmq{s, tmp}; lea{tmp, d} -> subqi{s, d} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description makes no sense. ldimmq{}
has a Vreg dst, while lea{}
has a Vptr src; those are incompatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this as follows:
// ldimmq{s, index}; lea{base[index], d} -> lea{base[s], d}
auto sf = env.unit.makeReg(); | ||
v << subqi{-inst.s.l(), ea.s.base, ea.d, sf}; | ||
return 2; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this simplification intended to do? Currently, it could transform the following sequence:
ldimmq{-42, rimm};
lea{rvmfp()[0], d};
store{rimm, m};
into:
subqi{42, rvmfp(), d, sf};
store{rimm, m};
This is totally nonsensical and unsound, and I'm not really sure what you meant to do here.
@mxw - Hi Max, the code is intended to detect this sequence Initial generationlea { Vptr{baseReg, DISP}, Vreg64{leaDstReg} } After lowerVptrldimmq {DISP, newReg} After emissionmovn newReg, -DISP When the final sequence could really be one I think you're right. The conditions could be tighter to not fire in your |
@swalk-cavium updated the pull request - view changes |
@mxw - Hi Max, I think this corrects the issue you noted. Regression test run with six |
@swalk-cavium—Thinking on this a little more:
|
@mxw - Hi Max, I don't think this could be caught in the imm-folder. The immediate values used there are coming from the Vunit 554 // figure out which Vregs are constants and stash their values. Won't the VisitSF pass take care of the status flags? |
Maybe the ImmFolder should also look for The flags strength-reduction pass would take care of this, but in general, optimization passes should avoid dependencies on one another. Why use a |
@mxw - Hi Max, This peephole is needed because lea cannot do everything we need. The extra instruction (movn in this case) is introduced during the lowering of the Vptr I think updating lowerVptr() to put the ldimmq immediates into the Vunit would have a Updating the lea emitter is too late since the instruction creating the immediate has The lea instruction is emitted as an 'add rD,rN,uimm12'. There is no form for a signed hhvm already generates the subqi in other contexts including the folder. |
Can you change the |
@mxw - Hi Max. By the time you get to the emitter the constant is not available. lowerVptr() has |
I realize that what you're doing here is to rewrite an |
@mxw - Hi Max, by the time you get to the emitter the constant is no longer available. Are you suggesting making a special case of lower(..., lea&, ...) ? In that case it would still have |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@swalk-cavium updated the pull request - view changes - changes since last import |
@mxw - Hi Max, Updated per our extended IRC last week. Ran regression tests 6 option sets, no new regressions. Examined hphp.log and it cleans up the extra movn instructions as expected. |
// ldimmq{s, tmp}; lea{tmp, d} -> subqi{s, d} | ||
if (!(env.use_counts[inst.d] == 1 && | ||
inst.s.q() <= -1 && | ||
inst.s.q() >= -4095 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are positive values up through 4095 not also suitable for this optimization?
return simplify_impl(env, b, i, [&] (Vout& v) { | ||
// eXtend ea - lowerVptr() too conservative. | ||
Vptr xea{ea.s.base, inst.s.l()}; | ||
v << lea{xea, ea.d}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just
v << lea{ea.s.base[inst.s.l()], ea.d};
This notation reads more clearly than the opaque Vptr constructor.
@@ -99,6 +99,25 @@ bool simplify(Env& env, const loadb& inst, Vlabel b, size_t i) { | |||
|
|||
/////////////////////////////////////////////////////////////////////////////// | |||
|
|||
bool simplify(Env& env, const ldimmq& inst, Vlabel b, size_t i) { | |||
return if_inst<Vinstr::lea>(env, b, i + 1, [&] (const lea& ea) { | |||
// ldimmq{s, tmp}; lea{tmp, d} -> subqi{s, d} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this as follows:
// ldimmq{s, index}; lea{base[index], d} -> lea{base[s], d}
@swalk-cavium updated the pull request - view changes - changes since last import |
@mxw - Updated per comments, retested. |
@mxw - Hi Max, Any further comments on this one? |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@mxw - Hi Max, Any further comments on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxw is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@max - Hi Max, I retested this one this morning. It still saves 1 instruction whenever it fires. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxw is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This change improves the sequence generated when lowering the Vptr element of the lea
Vinstr by taking advantage of the sub immediate instruction. This saves 1 instruction
per instance.
Before
After
This was seen approximately 300 times in hphp/test/quick/all_type_comparison_test.php
and around 1000 times in hphp/test/zend/good/ext/intl/tests/grapheme.php
The standard regression tests were run with six option sets. No new failures were observed.