Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
cmd/compile: looprotate picking wrong jump target #20355
This is easier to explain with some visuals. They're a bit complicated; bear with me. Ask if you have questions about how to read them.
First, here's looprotate doing a good job.
The nodes are blocks. The node labels contain block id and block kind; the number in parens indicates its index in f.Blocks (i.e. where is has been laid out). Edges are labeled with successor numbers. Dashed edges are unlikely. Green edges indicate adjacency in f.Blocks. Dotted green edges aren't part of the CFG; they are there only to show layout adjacency. Node coloring was done manually, by clicking on them.
You can see that b3 is the header for the loop b4 b5 b6, which can panic (via b9) or terminate normally (via b7). Looprotate changes the layout so that after b3 comes b5, not b4. Observe the new dotted green edge and updated parenthesized indices.
Now here's looprotate doing something untoward. Same graph, different highlighted nodes. The second screenshot shows the bottom of the graph.
b7 is a loop header for a very big loop starting at b11 (observe the edge coming in from b12 offscreen below). But b11 is not the node that decides whether the loop should continue. That's b61, which (if still looping) continues to b12, which returns to b11.
However, looprotate bypasses b11 in favor of b14, which doesn't really make much sense. The third screenshot shows what this looks like after the trim pass. Definitely doesn't seem like ideal code layout.
(You don't mean to say that b7 is the loop header, do you? b11 is the header. b7 isn't even in the loop.)
I agree that this second reordering is not desirable. From my perspective, that's just because the branch at the end of b11 isn't an exit-loop branch. Maybe that should be part of the condition for triggering the rotate? Then we'd only move single blocks.
Maybe the condition is that if we rotate the loop, the non-local edge out of the header is now a fallthrough edge (the non-loop successor is the immediately next block).
I think your CL 43491 is correct, but it may be overkill.
I ran some benchmarks with CL 43491 ("josh") and CL 43502 ("khr") on top of 5548f7d ("tip"). For each, I'll show all three together, tip -> khr, tip -> josh, and khr -> josh. It's lots of data, but it's hard to make sense of without all the pairwise stats.
Looking through these, I'm weakly inclined to go for the only slightly more complicated josh CL, mostly on the basis of the macro (compilebench) benchmark and the big append regression. I also think it is interesting that the khr CL has a big +/- impact on lots of go1 benchmarks, but the josh CL does not. It makes me suspect that there are better heuristics available to decide whether to not unroll (khr) or move many blocks (josh). I think we should pull in either khr or josh for 1.9 and then revisit in 1.10 whether they can be fruitfully combined. Other changes to the layout pass may also affect this.
@randall77 now that we have some data, please make an executive decision about how to proceed.
Fannkuch running alone:
All go1 benchmarks:
append benchmarks from #14758:
Another data point. For the code in #16122:
Again, seems like josh is a bit better than khr, but an appropriate combo might be better yet.