The program linked below generates an array of random uint8s and then repeatedly runs a function that finds the second highest value in the array of random uint8s. The logic inside the inner loop is simple, it keeps track of the highest value and the second highest value using an if-else condition.
The problem is that the code generated by the compiler using the naive implementation (shown in max2_slow() in the example) runs slow -- 810 ms for the test. Adding a continue statement in the final else block generates better code that runs the test in 540 milliseconds.
Just for reference, the same logic in C runs much faster (490ms without loop unrolling, 350 with unrolling), but the C compiler seems to make use of conditional move instructions, which neither Golang nor V8 seem able to generate.
This one is a bit tricky. Sort of the luck of the draw, the val > max1 path is processed first during register allocation. Because of the way regalloc works, the reg-reg moves from that branch end up at the end of the loop. The other paths through the loop then have to issue countervailing moves to pre-undo what those moves do. By using "continue", you avoid that merge point and the need to do/undo the moves on every iteration.
At least, I think that is what is going on.
I don't see any easy fix for this. If we knew that both ifs were unlikely, the layout pass would fix it for us. FDO would help with that. Without FDO, I don't see any easy way to know. The register allocator is correctly optimizing for the case that the first if is likely taken.
Maybe there's a way to detect that we would need undo register moves, and lay things out a bit differently as a result. Not sure.
In any case, it is unlikely that this will be fixed for 1.14. Moving to unplanned.