Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: possible latent codegen issue on amd64 removing zero extensions #36897

Open
josharian opened this issue Jan 30, 2020 · 4 comments
Open
Milestone

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Jan 30, 2020

I'm not sure whether it is possible to trigger this bug right now, but I think there may be an issue lurking.

Consider code like:

func f(x uint32) uint64 {
	return uint64(x & 0xFFFFFFFF)
}

The outmost op is ZeroExt32to64, which gets lowered to MOVLQZX. The innermost op ends up being (ANDLconst [0xFFFFFFFF] x).

Then this rule triggers: (MOVLQZX x) && zeroUpper32Bits(x,3) -> x. ANDLconst is among the ops listed as zeroing the upper 32 bits of a register, so we eliminate the zero extension.

Then this rule triggers: (ANDLconst [c] x) && int32(c)==-1 -> x, eliminating the ANDLconst.

This leaves us with just x, but without any zero extension, which could leave junk in the top half of the register.

As it stands, this doesn't happen in this function, because the And32 is eliminated during generic optimization. But there are ways to sneak an & -1 past generic optimization, e.g. with a constant shift, which is how I discovered this issue. We are also saved in this case by using a MOVL to load the value from an arg. But we could be using a computed value instead. So I haven't convinced myself that this couldn't actually cause bad code generation right now for just the right input.

There are two possible diagnoses/fixes:

  • zeroUpper32Bits assumes that the inner op actually gets executed, which is wrong, in which case we need to pare back the list of accepted ops, probably to only loads, which I think cannot be eliminated at that point (?), since dse occurs earlier
  • the ANDLconst rule is unsound because eliminating it eliminates the side-effects of zeroing the top half of the register

I'm strongly inclined to the first diagnosis and fix. Thoughts?

cc @randall77 @cherrymui

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Jan 30, 2020

I've never been super happy about our whole sign extension story. In machine-independent space I think we're good, but we play very fast and loose in the machine-dependent space. Maybe there's some invariant we can use where we guarantee to only use the bits of a value that exist in its type. So there would be two effective ANDLconsts, one with a 32-bit type and one with a 64-bit type. ANDLconst [-1] could be removed in the former but not in the latter.

@josharian

This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Jan 30, 2020

Agreed. And at the same time, we also fail to remove a bunch of extraneous extensions

That approach sounds good to me, assuming the details work out ok. It could cause an explosion in number of ops and rules (which is already a problem), and might be tricky with shifts.

I don't think I want to sign up for tackling this one. :)

@josharian josharian added this to the Go1.15 milestone Jan 30, 2020
@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Jan 30, 2020

Same for me. I'm also not super happy about the extension removing.

(ANDLconst [c] x) && int32(c)==-1 -> x

I think we should only do this if the type is 32-bit wide or smaller. But how good are we keeping the types accurate?

(MOVLQZX x) && zeroUpper32Bits(x,3) -> x

The LHS is 64-bit wide, whereas the RHS is 32-bit wide. This kind of rules have always troubled me. Maybe we could have a way of doing this in a type-preserving way? Something like
(op x:(op2 ...)) -> (op2 <t> ...) ?
(this basically makes copy of x (not OpCopy, but copy the expression of x) in the new type.)

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 21, 2020

Change https://golang.org/cl/220499 mentions this issue: cmd/compile: mark Lsyms as readonly earlier

gopherbot pushed a commit that referenced this issue Feb 26, 2020
The SSA backend has rules to read the contents of readonly Lsyms.
However, this rule was failing to trigger for many readonly Lsyms.
This is because the readonly attribute that was set on the Node.Name
was not propagated to its Lsym until the dump globals phase, after SSA runs.

To work around this phase ordering problem, introduce Node.SetReadonly,
which sets Node.Name.Readonly and also configures the Lsym
enough that SSA can use it.

This change also fixes a latent problem in the rewrite rule function,
namely that reads past the end of lsym.P were treated as entirely zero,
instead of merely requiring padding with trailing zeros.

This change also adds an amd64 rule needed to fully optimize
the results of this change. It would be better not to need this,
but the zero extension that should handle this for us
gets optimized away too soon (see #36897 for a similar problem).
I have not investigated whether other platforms also need new
rules to take full advantage of the new optimizations.

Compiled code for (interface{})(true) on amd64 goes from:

LEAQ	type.bool(SB), AX
MOVBLZX	""..stmp_0(SB), BX
LEAQ	runtime.staticbytes(SB), CX
ADDQ	CX, BX

to

LEAQ	type.bool(SB), AX
LEAQ	runtime.staticbytes+1(SB), BX

Prior to this change, the readonly symbol rewrite rules
fired a total of 884 times during make.bash.
Afterwards they fire 1807 times.

file    before    after     Δ       %
cgo     4827832   4823736   -4096   -0.085%
compile 24907768  24895656  -12112  -0.049%
fix     3376952   3368760   -8192   -0.243%
pprof   14751700  14747604  -4096   -0.028%
total   120343528 120315032 -28496  -0.024%

Change-Id: I59ea52138276c37840f69e30fb109fd376d579ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/220499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.