Compiler patch to avoid useless float boxing #4558
Original bug ID: 4558
I would like to propose two patches to the compiler that prevent some useless boxing and unboxing of intermediate floats.
The following code:
let f x = x -. 3.
is compiled into the following CMM code:
(function camlTest13__g_63 (x/64: addr)
While the compiler generally optimizes patterns like "unbox (box v) -> v", here it does not work because of the let that appears inbetween. This happens every time a function is called with a non-trivial argument and returns a float that is used for further computations. The attached patch diff1.txt is a trivial fix to implement the optimization over the "let". I am fairly convinced this patch is correct. FYI, in the previous example, it generates the following optimized code:
(function camlTest13__g_64 (x/65: addr)
Here is the second example:
let min (x : float) y = if x < y then x else y
(function camlTest13__f_108 (x/109: addr)
When the compiler needs to access both the value of a float and the same boxed float (usually as a return value), it generates code like: "let z = float_value in let zbox = box(z) in body". Note in the example above that two boxed floats are always allocated, even though we will only use one of them. Moreover, if "unbox(zbox)" appears in "body" (because of inlining), we could optimize by expanding the binding. Generally speaking, whenever "zbox" appears only once in "body" (but not in a loop), it is strictly better to expand the definition (if it appears twice, it might also be better to expand twice and potentially allocate twice but it is harder to know in advance).
The second proposed patch (see file diff2.txt, includes diff1.txt) implements this idea. Here is the new code for the example:
(function camlTest13__f_109 (x/110: addr)
All my test cases worked fine with this patch, but some other people should definitely consider it carefully before including it.
The text was updated successfully, but these errors were encountered:
Comment author: @mmottl
It seems there are a few more cases one can optimise in diff1.txt. Besides Clet, Cifthenelse, and Csequence, which are already considered, it would be great to also optimise Cswitch, Ctrywith and Ccatch (and any other cases we may have missed).
Comment author: @xavierleroy
Thanks for two very interesting suggestions. The first (unboxing across let and others) is really sweet; well spotted! I think it is almost always a win -- I can see a corner case on x86 32 bits where pushing the unbox down the branches of a conditional would be less efficient (because this platform has no float registers), but that's certainly negligible compared with other benefits. I have integrated it in the CVS working sources, generalized it to a few more constructs as suggested by Markus (but I don't think this makes much of a difference in practice), and applied it also to the unboxing of boxed integers.
I'm still evaluating the second suggestion (delayed boxing). For "min"-like functions, it's a win. For some straight-line functions, it can generate slightly less efficient code. Consider:
let test fn x =
If "x" and "y" are boxed at the point of "let", the two allocations are coalesced, resulting in just one test whether to call the GC. If the boxing is performed at point of use, we end up with two allocations in two different basic blocks, no coalescing, and two GC tests. On my (limited) tests, this doesn't seem to happen often: the number of GC check points globally decreases, but without noticeable performance gains. I keep this PR open until I have more data.