Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64 Fix test/slow/async/simple_meth.php #7943

Conversation

jim-saxman
Copy link
Contributor

On AArch64 systems, the unit test test/slow/async/simple_meth.php
fails when run with the -r (repoAuth) flag due to the early-
truncation policy (see comment at top of vasm-arm.cpp). This patch
fixes the test case.

@jim-saxman
Copy link
Contributor Author

Hi @mxw, would you please review this?
Hi @swalk-cavium, is there a chance you can test this?
I believe it was overlooked in #7851.

@dave-estes-QCOM
Copy link
Contributor

After a second look, I'm a little uneasy about this change. At first I thought it was in keeping with the changes made in 61b3b0d to do special truncation for ARM soon after the defs of narrow Vregs (8/16). That was primarily in vasm-lower.cpp and not where we've got it in irlower-ret.cpp. Wonder if there's some way to move it there?

Is everyone aware of what's happening with this issue? It comes down to one of the key changes in f9ecdf1 that was made possible by this truncation. The semantics of the movzXX vasm is to zero when moving from a narrow to wider Vreg. These changes bypassed that requirement by attempting to ensure that any Vreg8/16 that is ever def'd is immediately truncated so that it can later be simply copied from one ARM register to another using the 32bit "r" register names.

This particular case is a sticky spot that we seem to have missed. I was never really comfortable with this series of changes, but there seemed to be a lot of desire to get them in since doing explicit narrowing before before vasm like cmpb was expensive on some platforms. My discomfort arises because I think this approach is too fragile.

I'm probably OK with spot fixing this one bug and hoping there aren't more existing or yet to be produced. However, I'm more of the opinion that we should rethink all of our ARM specific narrowing. Instead of doing the required signed/unsigned narrowing/widening when lowering, perhaps we should have implemented a pass for ARM with some use-def analysis in order to raise the narrowing/widening as close to the def as possible instead of repeatedly doing it near the uses like we were doing in the original port.

Anyhow, my look into this whole thing is still pretty fresh. I'm mostly just bringing it up in hopes for a good discussion.

@swalk-cavium
Copy link
Contributor

swalk-cavium commented Aug 2, 2017 via email

@swalk-cavium
Copy link
Contributor

@jim-saxman - I ran the regression tests with six option sets with your change cherry-picked on to the latest commit at that time. I think we're back to the baseline.

@mxw mxw self-assigned this Aug 16, 2017
@mxw
Copy link
Contributor

mxw commented Aug 17, 2017

I fully agree with @dave-estes. While I think the current solution is better than what we had before, I think a narrowing-lifting pass would be better still than the current solution. (Notably, it avoids an existing unsoundness around signedness tests for, e.g., byte compares that we implement as zero-extended dword compares. It happens that we don't ever test signedness on bytes or words, but if we ever did (and vasm allows it), things would break immediately.)

As far as this specific fix goes, it makes no sense to me. The movzbq{} instruction is required to zero bits 8..63 already. Moreover, doing an andqi{} with a byte register operand is illegal in vasm. As far as I can tell, the vasm we're emitting is already supposed to do what you intend, and you've just reimplemented in a way that breaks invariants. If movzbq{} doesn't zero all the upper bits currently, it means your implementation of vasm is incorrect.

Copy link
Contributor

@mxw mxw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you.

@dave-estes-QCOM
Copy link
Contributor

Yeah movzbq{} is technically broken as of f9ecdf1 in the name of optimization. It's that change which lowers all movz*{} to simple copies. Overall, I don't see these changes as a win across all ARM architectures. My vote, FWIW, would be to revert these changes from f9ecdf1 and the related changes that allowed this optimization to function correctly. Then any interested parties that really need optimizations like this can do a more careful pass on vasm alone. This might require new vasm, but at least it wouldn't be changing the semantics of existing vasm.

We've already got a patch prepared and tested that reverts these changes, but I'd like to hear what people think before even posting it. In particular, were these original optimizations a huge win on OSS or other workloads?

@swalk-cavium
Copy link
Contributor

@dave-estes - Sorry, I don't have performance info before/after that particular commit. Does this new patch still clean up all the zero-extensions?

@dave-estes-QCOM
Copy link
Contributor

Also, it looks like f9ecdf1 is also the cause of an assert that's crashing mediawiki when using a DebugOpt build. The following is failing to safe cast when imm.b() is 128.

uint8_t bAsUb(Immed imm) {
  return static_cast<uint8_t>(imm.b());
}

@swalk-cavium
Copy link
Contributor

@dave-estes - So, typo? Wrong method used
84 uint8_t ub() const { return safe_cast<uint8_t>(m_int); }
util/immed.h

@mxw
Copy link
Contributor

mxw commented Aug 23, 2017

@dave-estes—IIRC, there were already width-punning inconsistencies before f9ecdf1. Sounds like f9ecdf1 fixed some of them but created others.

I'm not sure I remember very well what exactly it was that was wrong before f9ecdf1, but I'm fairly confident things weren't quite correct. As I said, what you've described (a vasm pass which lifts zero-extensions) would be ideal, but I'm not sure what a good intermediate state would be before we have that. Such a pass might not be terribly difficult, though. We can talk more on IRC about the details of what's happening here.

@mxw
Copy link
Contributor

mxw commented Aug 24, 2017

After talking with @dave-estes about this issue in general, I think that this change is basically reasonable, and as you said, analogous to the changes in #7851. However, it's still pretty weird that we're interpreting type as a Vreg8 on x64 and PPC, but as a Vreg64 on ARM.

I think you should keep the movzbq{} on all platforms, but then add the andqi{} afterwards, for ARM only. I'm not a fan of how we're using imposing widths in different platforms for the type register in #7851, and it would be nice to avoid doing that here if possible.

case Arch::ARM:
// For ARM64 we need to clear the bits 8..63 from the type value.
v << andqi{0xFF, type, extended, v.makeReg()};
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can use the following:

// Explicitly enforce the invariant that bits 31..8 are zero for Vreg8's
// until movzbq{} and similar instructions are optimized to simple mov{}
// in a more careful and selective way.
v << movzbq{type, extended};
v << andqi{0xFF, extended, extended, v.makeReg()};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to create a new Vreg because of SSA, but this is the idea.

Copy link
Contributor

@mxw mxw Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do this:

switch (arch()) {
  case Arch::X64:
  case Arch::PPC64:
    v << movzbq{type, extended};
    break;
  case Arch::ARM:
    auto const tmp = v.makeReg();
    v << movzbq{type, tmp};
    v << andqi{0xff, tmp, extended, v.makeReg()};
    break;
}

@facebook-github-bot
Copy link
Contributor

@jim-saxman updated the pull request - view changes

@facebook-github-bot
Copy link
Contributor

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mxw
Copy link
Contributor

mxw commented Aug 29, 2017

@markw65 reminded me that the current guarantee is that u8 registers are only zero-extended to 32 bits. The actual bug here is that movzbq{}, movzwq{}, and movzlq{} are no-ops. All of those need to actually do work (unlike movzbl{}, which should be safe according to the ARM invariant).

On AArch64 systems, the unit test test/slow/async/simple_meth.php
fails when run with the -r (repoAuth) flag due to the early-
truncation policy (see comment at top of vasm-arm.cpp).
This patch fixes the test case.
@jim-saxman jim-saxman force-pushed the 0017-aarch64-fix-simple-meth.php branch from 7b31506 to 7faf573 Compare September 1, 2017 18:45
@facebook-github-bot
Copy link
Contributor

@jim-saxman updated the pull request - view changes - changes since last import

@jim-saxman
Copy link
Contributor Author

jim-saxman commented Sep 1, 2017

@mwx I had force push the requested changes. I also squashed the commits, since half of them were reverts.

@facebook-github-bot
Copy link
Contributor

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mxw
Copy link
Contributor

mxw commented Sep 2, 2017

Looks good, @jim-saxman, and sorry for all the churn on this PR. I'm glad we've sorted this stuff out a bit better.

@hhvm-bot hhvm-bot closed this in 2dad767 Sep 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants