AArch64 Fix test/slow/async/simple_meth.php #7943

jim-saxman · 2017-08-01T23:01:15Z

On AArch64 systems, the unit test test/slow/async/simple_meth.php
fails when run with the -r (repoAuth) flag due to the early-
truncation policy (see comment at top of vasm-arm.cpp). This patch
fixes the test case.

jim-saxman · 2017-08-01T23:40:55Z

Hi @mxw, would you please review this?
Hi @swalk-cavium, is there a chance you can test this?
I believe it was overlooked in #7851.

dave-estes-QCOM · 2017-08-02T19:22:56Z

After a second look, I'm a little uneasy about this change. At first I thought it was in keeping with the changes made in 61b3b0d to do special truncation for ARM soon after the defs of narrow Vregs (8/16). That was primarily in vasm-lower.cpp and not where we've got it in irlower-ret.cpp. Wonder if there's some way to move it there?

Is everyone aware of what's happening with this issue? It comes down to one of the key changes in f9ecdf1 that was made possible by this truncation. The semantics of the movzXX vasm is to zero when moving from a narrow to wider Vreg. These changes bypassed that requirement by attempting to ensure that any Vreg8/16 that is ever def'd is immediately truncated so that it can later be simply copied from one ARM register to another using the 32bit "r" register names.

This particular case is a sticky spot that we seem to have missed. I was never really comfortable with this series of changes, but there seemed to be a lot of desire to get them in since doing explicit narrowing before before vasm like cmpb was expensive on some platforms. My discomfort arises because I think this approach is too fragile.

I'm probably OK with spot fixing this one bug and hoping there aren't more existing or yet to be produced. However, I'm more of the opinion that we should rethink all of our ARM specific narrowing. Instead of doing the required signed/unsigned narrowing/widening when lowering, perhaps we should have implemented a pass for ARM with some use-def analysis in order to raise the narrowing/widening as close to the def as possible instead of repeatedly doing it near the uses like we were doing in the original port.

Anyhow, my look into this whole thing is still pretty fresh. I'm mostly just bringing it up in hopes for a good discussion.

swalk-cavium · 2017-08-02T19:52:41Z

Hi Dave, I thought a pass was just added to vasm-simplify to validate some width assumptions. Isn't that part of what this one is doing? 6e1488a Steve Walk

…

________________________________ From: Dave Estes <notifications@github.com> Sent: Wednesday, August 2, 2017 2:23 PM To: facebook/hhvm Cc: Walk, Steve; Mention Subject: Re: [facebook/hhvm] AArch64 Fix test/slow/async/simple_meth.php (#7943) After a second look, I'm a little uneasy about this change. At first I thought it was in keeping with the changes made in 61b3b0d<61b3b0d> to do special truncation for ARM soon after the defs of narrow Vregs (8/16). That was primarily in vasm-lower.cpp and not where we've got it in irlower-ret.cpp. Wonder if there's some way to move it there? Is everyone aware of what's happening with this issue? It comes down to one of the key changes in f9ecdf1<f9ecdf1> that was made possible by this truncation. The semantics of the movzXX vasm is to zero when moving from a narrow to wider Vreg. These changes bypassed that requirement by attempting to ensure that any Vreg8/16 that is ever def'd is immediately truncated so that it can later be simply copied from one ARM register to another using the 32bit "r" register names. This particular case is a sticky spot that we seem to have missed. I was never really comfortable with this series of changes, but there seemed to be a lot of desire to get them in since doing explicit narrowing before before vasm like cmpb was expensive on some platforms. My discomfort arises because I think this approach is too fragile. I'm probably OK with spot fixing this one bug and hoping there aren't more existing or yet to be produced. However, I'm more of the opinion that we should rethink all of our ARM specific narrowing. Instead of doing the required signed/unsigned narrowing/widening when lowering, perhaps we should have implemented a pass for ARM with some use-def analysis in order to raise the narrowing/widening as close to the def as possible instead of repeatedly doing it near the uses like we were doing in the original port. Anyhow, my look into this whole thing is still pretty fresh. I'm mostly just bringing it up in hopes for a good discussion. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#7943 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AV_bZNn08iui0xzI0Mtkipa8qvsB7MuFks5sUMy_gaJpZM4OqZpm>.

swalk-cavium · 2017-08-03T15:17:43Z

@jim-saxman - I ran the regression tests with six option sets with your change cherry-picked on to the latest commit at that time. I think we're back to the baseline.

mxw · 2017-08-17T03:36:48Z

I fully agree with @dave-estes. While I think the current solution is better than what we had before, I think a narrowing-lifting pass would be better still than the current solution. (Notably, it avoids an existing unsoundness around signedness tests for, e.g., byte compares that we implement as zero-extended dword compares. It happens that we don't ever test signedness on bytes or words, but if we ever did (and vasm allows it), things would break immediately.)

As far as this specific fix goes, it makes no sense to me. The movzbq{} instruction is required to zero bits 8..63 already. Moreover, doing an andqi{} with a byte register operand is illegal in vasm. As far as I can tell, the vasm we're emitting is already supposed to do what you intend, and you've just reimplemented in a way that breaks invariants. If movzbq{} doesn't zero all the upper bits currently, it means your implementation of vasm is incorrect.

mxw

Back to you.

dave-estes-QCOM · 2017-08-18T18:07:09Z

Yeah movzbq{} is technically broken as of f9ecdf1 in the name of optimization. It's that change which lowers all movz*{} to simple copies. Overall, I don't see these changes as a win across all ARM architectures. My vote, FWIW, would be to revert these changes from f9ecdf1 and the related changes that allowed this optimization to function correctly. Then any interested parties that really need optimizations like this can do a more careful pass on vasm alone. This might require new vasm, but at least it wouldn't be changing the semantics of existing vasm.

We've already got a patch prepared and tested that reverts these changes, but I'd like to hear what people think before even posting it. In particular, were these original optimizations a huge win on OSS or other workloads?

swalk-cavium · 2017-08-18T19:11:41Z

@dave-estes - Sorry, I don't have performance info before/after that particular commit. Does this new patch still clean up all the zero-extensions?

dave-estes-QCOM · 2017-08-23T15:10:08Z

Also, it looks like f9ecdf1 is also the cause of an assert that's crashing mediawiki when using a DebugOpt build. The following is failing to safe cast when imm.b() is 128.

uint8_t bAsUb(Immed imm) {
  return static_cast<uint8_t>(imm.b());
}

swalk-cavium · 2017-08-23T19:05:18Z

@dave-estes - So, typo? Wrong method used
84 uint8_t ub() const { return safe_cast<uint8_t>(m_int); }
util/immed.h

mxw · 2017-08-23T22:11:02Z

@dave-estes—IIRC, there were already width-punning inconsistencies before f9ecdf1. Sounds like f9ecdf1 fixed some of them but created others.

I'm not sure I remember very well what exactly it was that was wrong before f9ecdf1, but I'm fairly confident things weren't quite correct. As I said, what you've described (a vasm pass which lifts zero-extensions) would be ideal, but I'm not sure what a good intermediate state would be before we have that. Such a pass might not be terribly difficult, though. We can talk more on IRC about the details of what's happening here.

mxw · 2017-08-24T20:09:09Z

After talking with @dave-estes about this issue in general, I think that this change is basically reasonable, and as you said, analogous to the changes in #7851. However, it's still pretty weird that we're interpreting type as a Vreg8 on x64 and PPC, but as a Vreg64 on ARM.

I think you should keep the movzbq{} on all platforms, but then add the andqi{} afterwards, for ARM only. I'm not a fan of how we're using imposing widths in different platforms for the type register in #7851, and it would be nice to avoid doing that here if possible.

dave-estes-QCOM · 2017-08-24T20:18:47Z

hphp/runtime/vm/jit/irlower-ret.cpp

+      case Arch::ARM:
+        // For ARM64 we need to clear the bits 8..63 from the type value.
+        v << andqi{0xFF, type, extended, v.makeReg()};
+        break;


Perhaps we can use the following:

// Explicitly enforce the invariant that bits 31..8 are zero for Vreg8's // until movzbq{} and similar instructions are optimized to simple mov{} // in a more careful and selective way. v << movzbq{type, extended}; v << andqi{0xFF, extended, extended, v.makeReg()};

You need to create a new Vreg because of SSA, but this is the idea.

I'd do this:

switch (arch()) { case Arch::X64: case Arch::PPC64: v << movzbq{type, extended}; break; case Arch::ARM: auto const tmp = v.makeReg(); v << movzbq{type, tmp}; v << andqi{0xff, tmp, extended, v.makeReg()}; break; }

facebook-github-bot · 2017-08-24T23:55:30Z

@jim-saxman updated the pull request - view changes

facebook-github-bot · 2017-08-25T14:21:06Z

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mxw · 2017-08-29T16:57:18Z

@markw65 reminded me that the current guarantee is that u8 registers are only zero-extended to 32 bits. The actual bug here is that movzbq{}, movzwq{}, and movzlq{} are no-ops. All of those need to actually do work (unlike movzbl{}, which should be safe according to the ARM invariant).

On AArch64 systems, the unit test test/slow/async/simple_meth.php fails when run with the -r (repoAuth) flag due to the early- truncation policy (see comment at top of vasm-arm.cpp). This patch fixes the test case.

facebook-github-bot · 2017-09-01T18:45:31Z

@jim-saxman updated the pull request - view changes - changes since last import

jim-saxman · 2017-09-01T18:48:36Z

@mwx I had force push the requested changes. I also squashed the commits, since half of them were reverts.

facebook-github-bot · 2017-09-02T01:31:41Z

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mxw · 2017-09-02T01:32:29Z

Looks good, @jim-saxman, and sorry for all the churn on this PR. I'm glad we've sorted this stuff out a bit better.

hhvm-bot added GH Review: review-needed CLA Signed labels Aug 1, 2017

mxw self-assigned this Aug 16, 2017

mxw added GH Review: needs-revision and removed GH Review: review-needed labels Aug 17, 2017

mxw suggested changes Aug 17, 2017

View reviewed changes

dave-estes-QCOM reviewed Aug 24, 2017

View reviewed changes

hhvm-bot added GH Review: review-needed Import Started and removed GH Review: needs-revision labels Aug 24, 2017

hhvm-bot added GH Review: accepted and removed GH Review: review-needed labels Aug 28, 2017

mxw added GH Review: needs-revision and removed GH Review: accepted labels Aug 29, 2017

AArch64 Fix test/slow/async/simple_meth.php

7faf573

On AArch64 systems, the unit test test/slow/async/simple_meth.php fails when run with the -r (repoAuth) flag due to the early- truncation policy (see comment at top of vasm-arm.cpp). This patch fixes the test case.

jim-saxman force-pushed the 0017-aarch64-fix-simple-meth.php branch from 7b31506 to 7faf573 Compare September 1, 2017 18:45

hhvm-bot added GH Review: review-needed and removed GH Review: needs-revision labels Sep 1, 2017

hhvm-bot closed this in 2dad767 Sep 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64 Fix test/slow/async/simple_meth.php #7943

AArch64 Fix test/slow/async/simple_meth.php #7943

jim-saxman commented Aug 1, 2017

jim-saxman commented Aug 1, 2017

dave-estes-QCOM commented Aug 2, 2017

swalk-cavium commented Aug 2, 2017 via email

swalk-cavium commented Aug 3, 2017

mxw commented Aug 17, 2017

mxw left a comment

dave-estes-QCOM commented Aug 18, 2017

swalk-cavium commented Aug 18, 2017

dave-estes-QCOM commented Aug 23, 2017

swalk-cavium commented Aug 23, 2017

mxw commented Aug 23, 2017

mxw commented Aug 24, 2017

dave-estes-QCOM Aug 24, 2017

mxw Aug 24, 2017

mxw Aug 24, 2017 •

edited

Loading

facebook-github-bot commented Aug 24, 2017

facebook-github-bot commented Aug 25, 2017

mxw commented Aug 29, 2017

facebook-github-bot commented Sep 1, 2017

jim-saxman commented Sep 1, 2017 •

edited

Loading

facebook-github-bot commented Sep 2, 2017

mxw commented Sep 2, 2017

AArch64 Fix test/slow/async/simple_meth.php #7943

AArch64 Fix test/slow/async/simple_meth.php #7943

Conversation

jim-saxman commented Aug 1, 2017

jim-saxman commented Aug 1, 2017

dave-estes-QCOM commented Aug 2, 2017

swalk-cavium commented Aug 2, 2017 via email

swalk-cavium commented Aug 3, 2017

mxw commented Aug 17, 2017

mxw left a comment

Choose a reason for hiding this comment

dave-estes-QCOM commented Aug 18, 2017

swalk-cavium commented Aug 18, 2017

dave-estes-QCOM commented Aug 23, 2017

swalk-cavium commented Aug 23, 2017

mxw commented Aug 23, 2017

mxw commented Aug 24, 2017

dave-estes-QCOM Aug 24, 2017

Choose a reason for hiding this comment

mxw Aug 24, 2017

Choose a reason for hiding this comment

mxw Aug 24, 2017 • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Aug 24, 2017

facebook-github-bot commented Aug 25, 2017

mxw commented Aug 29, 2017

facebook-github-bot commented Sep 1, 2017

jim-saxman commented Sep 1, 2017 • edited Loading

facebook-github-bot commented Sep 2, 2017

mxw commented Sep 2, 2017

mxw Aug 24, 2017 •

edited

Loading

jim-saxman commented Sep 1, 2017 •

edited

Loading