-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64 Fix test/slow/async/simple_meth.php #7943
AArch64 Fix test/slow/async/simple_meth.php #7943
Conversation
Hi @mxw, would you please review this? |
After a second look, I'm a little uneasy about this change. At first I thought it was in keeping with the changes made in 61b3b0d to do special truncation for ARM soon after the defs of narrow Vregs (8/16). That was primarily in vasm-lower.cpp and not where we've got it in irlower-ret.cpp. Wonder if there's some way to move it there? Is everyone aware of what's happening with this issue? It comes down to one of the key changes in f9ecdf1 that was made possible by this truncation. The semantics of the movzXX vasm is to zero when moving from a narrow to wider Vreg. These changes bypassed that requirement by attempting to ensure that any Vreg8/16 that is ever def'd is immediately truncated so that it can later be simply copied from one ARM register to another using the 32bit "r" register names. This particular case is a sticky spot that we seem to have missed. I was never really comfortable with this series of changes, but there seemed to be a lot of desire to get them in since doing explicit narrowing before before vasm like cmpb was expensive on some platforms. My discomfort arises because I think this approach is too fragile. I'm probably OK with spot fixing this one bug and hoping there aren't more existing or yet to be produced. However, I'm more of the opinion that we should rethink all of our ARM specific narrowing. Instead of doing the required signed/unsigned narrowing/widening when lowering, perhaps we should have implemented a pass for ARM with some use-def analysis in order to raise the narrowing/widening as close to the def as possible instead of repeatedly doing it near the uses like we were doing in the original port. Anyhow, my look into this whole thing is still pretty fresh. I'm mostly just bringing it up in hopes for a good discussion. |
Hi Dave,
I thought a pass was just added to vasm-simplify to validate
some width assumptions. Isn't that part of what this one is
doing?
6e1488a
Steve Walk
…________________________________
From: Dave Estes <notifications@github.com>
Sent: Wednesday, August 2, 2017 2:23 PM
To: facebook/hhvm
Cc: Walk, Steve; Mention
Subject: Re: [facebook/hhvm] AArch64 Fix test/slow/async/simple_meth.php (#7943)
After a second look, I'm a little uneasy about this change. At first I thought it was in keeping with the changes made in 61b3b0d<61b3b0d> to do special truncation for ARM soon after the defs of narrow Vregs (8/16). That was primarily in vasm-lower.cpp and not where we've got it in irlower-ret.cpp. Wonder if there's some way to move it there?
Is everyone aware of what's happening with this issue? It comes down to one of the key changes in f9ecdf1<f9ecdf1> that was made possible by this truncation. The semantics of the movzXX vasm is to zero when moving from a narrow to wider Vreg. These changes bypassed that requirement by attempting to ensure that any Vreg8/16 that is ever def'd is immediately truncated so that it can later be simply copied from one ARM register to another using the 32bit "r" register names.
This particular case is a sticky spot that we seem to have missed. I was never really comfortable with this series of changes, but there seemed to be a lot of desire to get them in since doing explicit narrowing before before vasm like cmpb was expensive on some platforms. My discomfort arises because I think this approach is too fragile.
I'm probably OK with spot fixing this one bug and hoping there aren't more existing or yet to be produced. However, I'm more of the opinion that we should rethink all of our ARM specific narrowing. Instead of doing the required signed/unsigned narrowing/widening when lowering, perhaps we should have implemented a pass for ARM with some use-def analysis in order to raise the narrowing/widening as close to the def as possible instead of repeatedly doing it near the uses like we were doing in the original port.
Anyhow, my look into this whole thing is still pretty fresh. I'm mostly just bringing it up in hopes for a good discussion.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#7943 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AV_bZNn08iui0xzI0Mtkipa8qvsB7MuFks5sUMy_gaJpZM4OqZpm>.
|
@jim-saxman - I ran the regression tests with six option sets with your change cherry-picked on to the latest commit at that time. I think we're back to the baseline. |
I fully agree with @dave-estes. While I think the current solution is better than what we had before, I think a narrowing-lifting pass would be better still than the current solution. (Notably, it avoids an existing unsoundness around signedness tests for, e.g., byte compares that we implement as zero-extended dword compares. It happens that we don't ever test signedness on bytes or words, but if we ever did (and vasm allows it), things would break immediately.) As far as this specific fix goes, it makes no sense to me. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you.
Yeah We've already got a patch prepared and tested that reverts these changes, but I'd like to hear what people think before even posting it. In particular, were these original optimizations a huge win on OSS or other workloads? |
@dave-estes - Sorry, I don't have performance info before/after that particular commit. Does this new patch still clean up all the zero-extensions? |
Also, it looks like f9ecdf1 is also the cause of an assert that's crashing mediawiki when using a DebugOpt build. The following is failing to safe cast when
|
@dave-estes - So, typo? Wrong method used |
@dave-estes—IIRC, there were already width-punning inconsistencies before f9ecdf1. Sounds like f9ecdf1 fixed some of them but created others. I'm not sure I remember very well what exactly it was that was wrong before f9ecdf1, but I'm fairly confident things weren't quite correct. As I said, what you've described (a vasm pass which lifts zero-extensions) would be ideal, but I'm not sure what a good intermediate state would be before we have that. Such a pass might not be terribly difficult, though. We can talk more on IRC about the details of what's happening here. |
After talking with @dave-estes about this issue in general, I think that this change is basically reasonable, and as you said, analogous to the changes in #7851. However, it's still pretty weird that we're interpreting I think you should keep the |
hphp/runtime/vm/jit/irlower-ret.cpp
Outdated
case Arch::ARM: | ||
// For ARM64 we need to clear the bits 8..63 from the type value. | ||
v << andqi{0xFF, type, extended, v.makeReg()}; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can use the following:
// Explicitly enforce the invariant that bits 31..8 are zero for Vreg8's
// until movzbq{} and similar instructions are optimized to simple mov{}
// in a more careful and selective way.
v << movzbq{type, extended};
v << andqi{0xFF, extended, extended, v.makeReg()};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to create a new Vreg because of SSA, but this is the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do this:
switch (arch()) {
case Arch::X64:
case Arch::PPC64:
v << movzbq{type, extended};
break;
case Arch::ARM:
auto const tmp = v.makeReg();
v << movzbq{type, tmp};
v << andqi{0xff, tmp, extended, v.makeReg()};
break;
}
@jim-saxman updated the pull request - view changes |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@markw65 reminded me that the current guarantee is that u8 registers are only zero-extended to 32 bits. The actual bug here is that |
On AArch64 systems, the unit test test/slow/async/simple_meth.php fails when run with the -r (repoAuth) flag due to the early- truncation policy (see comment at top of vasm-arm.cpp). This patch fixes the test case.
7b31506
to
7faf573
Compare
@jim-saxman updated the pull request - view changes - changes since last import |
@mwx I had force push the requested changes. I also squashed the commits, since half of them were reverts. |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Looks good, @jim-saxman, and sorry for all the churn on this PR. I'm glad we've sorted this stuff out a bit better. |
On AArch64 systems, the unit test test/slow/async/simple_meth.php
fails when run with the -r (repoAuth) flag due to the early-
truncation policy (see comment at top of vasm-arm.cpp). This patch
fixes the test case.