New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: loop optimization #24240

Open
MichaelTJones opened this Issue Mar 4, 2018 · 10 comments

Comments

Projects
None yet
7 participants
@MichaelTJones
Contributor

MichaelTJones commented Mar 4, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

master (1.11 dev)

Does this issue reproduce with the latest release?

this is a compilation strategy proposal

What operating system and processor architecture are you using (go env)?

amd64 / Darwin (macbook pro)

What did you do?

I have code that iterates without referencing the loop variable. Changing that from:

for i := 0; i < n; i++ {
// no access to i
}

...to...

for i := -n; i < 0; i++ {
// no access to i
}

saves me 4%-5% in my test cases.

Looking at the .S files, I see that there are two gains (as expected): one is that the loop limit needs only be addressable and referenced at the start of the loop--which frees a register over the body of the loop; and also, because the test is against zero and there are dedicated instructions for that rather than loading immediate literals or otherwise.

It seems (from afar) that there could be a rule along the lines of:

if loop variable not referenced in body
and assignment has block structure (i:=)
and type of loop variable is signed
and incr RHS does not depend on additive offset to variable (i++ ok, i*=2 not),
then transform "for v := low; v OP high; incr" into "for v := low-high; v OP 0; incr"

I don't have any comprehensive data about how common this situation may be, but it does help.

What did you expect to see?

no change because i expected the compiler to be doing this.

What did you see instead?

5% speedup

@bradfitz bradfitz changed the title from Loop optimization to cmd/compile: loop optimization Mar 4, 2018

@bradfitz bradfitz added this to the Unplanned milestone Mar 4, 2018

@rasky

This comment has been minimized.

Member

rasky commented Mar 4, 2018

Since you have the code handy, can you provide a working example that includes a benchmark?

@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Mar 4, 2018

@RalphCorderoy

This comment has been minimized.

RalphCorderoy commented Mar 4, 2018

When n is negated, if that's the style chosen to re-write the loop, then look out for n starting off as the smallest possible value as two's complement representation is asymmetrical.

@RLH

This comment has been minimized.

Contributor

RLH commented Mar 4, 2018

Isn't the decrement of i followed by a jz (or jnz) another way to write this.
for i := n; i != 0; i-- { }

@RalphCorderoy

This comment has been minimized.

RalphCorderoy commented Mar 4, 2018

Writing assembly for various architectures, a pre-decrement test against zero is the normal way I'm used to seeing, yes. But I wanted to warn against a flaw in the OP's suggestion.

@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Mar 4, 2018

@RalphCorderoy

This comment has been minimized.

RalphCorderoy commented Mar 5, 2018

Something else to ponder in implementing this... I hear some folk use a debugger; might they try and inspect the loop's variable and be confused that it's changed sign, or running in the opposite direction? Or perhaps DWARF can represent the transformation for the debugger to reverse? If other compilers have implemented this for other languages then they might give ideas.

@dgryski

This comment has been minimized.

Contributor

dgryski commented Mar 5, 2018

OTOH, stepping through C code that's been built withgcc -O3 is pretty painful. It's fine as long as you're reading the assembly and not trying to match it up to the C code that produced it.

@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Mar 5, 2018

Ok, here is a simplified and extracted example. I called it "loop_test.go"

https://play.golang.org/p/uRdymTPtTCi

Alas, i don't see such a big difference in this case. The code is not the same, not so much register pressure.

Summary: maybe this is not so important.

@RalphCorderoy

This comment has been minimized.

RalphCorderoy commented Mar 5, 2018

I think it's still worth trying to measure gains across more tests, especially as some platforms have more register pressure than others. But I've also a niggling recollection that this has been suggested before and turned down because of debugger confusion, probably by @rsc. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment