Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aarch64 - improve after lea lowering #7929

Closed
wants to merge 4 commits into from

Conversation

swalk-cavium
Copy link
Contributor

This change improves the sequence generated when lowering the Vptr element of the lea
Vinstr by taking advantage of the sub immediate instruction. This saves 1 instruction
per instance.

Before

(07) StLocRange<[0, 108)> t0:FramePtr, Uninit
    Main:   
        0x52c0035c  d10043a0              sub x0, x29, #0x10 (16)
        0x52c00360  9280d9e1              movn x1, #0x6cf
        0x52c00364  8b0103a1              add x1, x29, x1 
        0x52c00368  3900201f              strb wzr, [x0, #8]

After

(07) StLocRange<[0, 108)> t0:FramePtr, Uninit
    Main:   
        0x50400404  d10043a0              sub x0, x29, #0x10 (16)
        0x50400408  d11b43a1              sub x1, x29, #0x6d0 (1744)  //<<---
        0x5040040c  3900201f              strb wzr, [x0, #8]

This was seen approximately 300 times in hphp/test/quick/all_type_comparison_test.php
and around 1000 times in hphp/test/zend/good/ext/intl/tests/grapheme.php

The standard regression tests were run with six option sets. No new failures were observed.

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, Can you inspect this one?

@jim-saxman
Copy link
Contributor

jim-saxman commented Jul 25, 2017

@swalk-cavium No new failures in either the unit tests or in OSS Performance suite.

@@ -99,6 +99,25 @@ bool simplify(Env& env, const loadb& inst, Vlabel b, size_t i) {

///////////////////////////////////////////////////////////////////////////////

bool simplify(Env& env, const ldimmq& inst, Vlabel b, size_t i) {
return if_inst<Vinstr::lea>(env, b, i + 1, [&] (const lea& ea) {
// ldimmq{s, tmp}; lea{tmp, d} -> subqi{s, d}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description makes no sense. ldimmq{} has a Vreg dst, while lea{} has a Vptr src; those are incompatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this as follows:

// ldimmq{s, index}; lea{base[index], d} -> lea{base[s], d}

auto sf = env.unit.makeReg();
v << subqi{-inst.s.l(), ea.s.base, ea.d, sf};
return 2;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this simplification intended to do? Currently, it could transform the following sequence:

ldimmq{-42, rimm};
lea{rvmfp()[0], d};
store{rimm, m};

into:

subqi{42, rvmfp(), d, sf};
store{rimm, m};

This is totally nonsensical and unsound, and I'm not really sure what you meant to do here.

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, the code is intended to detect this sequence

Initial generation

lea { Vptr{baseReg, DISP}, Vreg64{leaDstReg} }

After lowerVptr

ldimmq {DISP, newReg}
lea { Vptr{baseReg, newReg}, Vreg64{leaDstReg} }

After emission

movn newReg, -DISP
add leaDstReg, baseReg, newReg

When the final sequence could really be one
subtract instruction.

I think you're right. The conditions could be tighter to not fire in your
example. I'll update the request after testing finishes.

@facebook-github-bot
Copy link
Contributor

@swalk-cavium updated the pull request - view changes

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, I think this corrects the issue you noted. Regression test run with six
options sets. No additional failures. Thanks for finding it.

@mxw
Copy link
Contributor

mxw commented Aug 18, 2017

@swalk-cavium—Thinking on this a little more:

  • Can this go in the imm-folder instead of in simplify?
  • If not, why not? And can we rewrite the lea{} as an lea{} with immediate displacement instead of a subqi{}, to avoid setting the flags register?

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, I don't think this could be caught in the imm-folder. The immediate values used there are coming from the Vunit

554 // figure out which Vregs are constants and stash their values.
555 for (auto& entry : unit.constToReg) {
556 folder.valid.set(entry.second);
557 folder.vals[entry.second] = entry.first.val;
558 }
runtime/vm/jit/vasm-fold-imms.cpp

Won't the VisitSF pass take care of the status flags?

@mxw
Copy link
Contributor

mxw commented Aug 19, 2017

Maybe the ImmFolder should also look for ldimm instructions.

The flags strength-reduction pass would take care of this, but in general, optimization passes should avoid dependencies on one another. Why use a subqi{} when an lea{} is strictly weaker and does everything you need?

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, This peephole is needed because lea cannot do everything we need.

The extra instruction (movn in this case) is introduced during the lowering of the Vptr
in the initial lea instruction. lowerForARM --> lower(..., lea&, ...) --> lowerVptr().
I don't think we should pull the addressing logic into lower(..., lea&,...) function.

I think updating lowerVptr() to put the ldimmq immediates into the Vunit would have a
negative impact because it could potentially increase the live range. Right now the
range is only 1 instruction.

Updating the lea emitter is too late since the instruction creating the immediate has
already generated.

The lea instruction is emitted as an 'add rD,rN,uimm12'. There is no form for a signed
value. There is a 'sub rD,rN,uimm12' instruction which is what this is trying to utilize.

hhvm already generates the subqi in other contexts including the folder.

@mxw
Copy link
Contributor

mxw commented Aug 23, 2017

Can you change the lea{} emitter to emit either an add or a sub depending on whether the displacement is negative?

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max. By the time you get to the emitter the constant is not available. lowerVptr() has
changed it into a register with a mov immediate.

@mxw
Copy link
Contributor

mxw commented Aug 23, 2017

I realize that what you're doing here is to rewrite an lea{} with a 0 displacement and a nontrivial constant-valued index register as a constant displacement. My question is—why can't you implement that constant displacement via an lea{}, instead of via a sub{} of some sort?

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, by the time you get to the emitter the constant is no longer available.
The lowerVptr() has removed constants outside the range of a simm9 into a separate register.
That range is the common denominator for all the addressing modes.

Are you suggesting making a special case of lower(..., lea&, ...) ? In that case it would still have
to know some specifics of the Vptr internals like the peephole.

@facebook-github-bot
Copy link
Contributor

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@swalk-cavium updated the pull request - view changes - changes since last import

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, Updated per our extended IRC last week. Ran regression tests 6 option sets, no new regressions. Examined hphp.log and it cleans up the extra movn instructions as expected.

// ldimmq{s, tmp}; lea{tmp, d} -> subqi{s, d}
if (!(env.use_counts[inst.d] == 1 &&
inst.s.q() <= -1 &&
inst.s.q() >= -4095 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are positive values up through 4095 not also suitable for this optimization?

return simplify_impl(env, b, i, [&] (Vout& v) {
// eXtend ea - lowerVptr() too conservative.
Vptr xea{ea.s.base, inst.s.l()};
v << lea{xea, ea.d};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just

v << lea{ea.s.base[inst.s.l()], ea.d};

This notation reads more clearly than the opaque Vptr constructor.

@@ -99,6 +99,25 @@ bool simplify(Env& env, const loadb& inst, Vlabel b, size_t i) {

///////////////////////////////////////////////////////////////////////////////

bool simplify(Env& env, const ldimmq& inst, Vlabel b, size_t i) {
return if_inst<Vinstr::lea>(env, b, i + 1, [&] (const lea& ea) {
// ldimmq{s, tmp}; lea{tmp, d} -> subqi{s, d}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this as follows:

// ldimmq{s, index}; lea{base[index], d} -> lea{base[s], d}

@facebook-github-bot
Copy link
Contributor

@swalk-cavium updated the pull request - view changes - changes since last import

@swalk-cavium
Copy link
Contributor Author

@mxw - Updated per comments, retested.

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, Any further comments on this one?

@facebook-github-bot
Copy link
Contributor

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@swalk-cavium
Copy link
Contributor Author

@mxw - Hi Max, Any further comments on this one?

Copy link
Contributor

@hhvm-bot hhvm-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxw is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@swalk-cavium
Copy link
Contributor Author

@max - Hi Max, I retested this one this morning. It still saves 1 instruction whenever it fires.
Anything else I need to do?

Copy link
Contributor

@hhvm-bot hhvm-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxw is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hhvm-bot hhvm-bot closed this in 57cf791 Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants