Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler patch to avoid useless float boxing #4558

Closed
vicuna opened this issue Jun 2, 2008 · 2 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Jun 2, 2008

Original bug ID: 4558
Reporter: pzimmer
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2010-04-29T12:25:51Z)
Resolution: open
Priority: normal
Severity: feature
Version: 3.10.2
Category: ~DO NOT USE (was: OCaml general)
Monitored by: wagerlabs @lefessan dvaillancourt BenediktGrundmann @mshinwell sweeks sds @hcarty @dbuenzli yminsky @Chris00 @yakobowski pzimmer @alainfrisch @mmottl

Bug description

I would like to propose two patches to the compiler that prevent some useless boxing and unboxing of intermediate floats.

The following code:

let f x = x -. 3.
let g x = 2. *. f (x +. 1.)

is compiled into the following CMM code:

(function camlTest13__g_63 (x/64: addr)
(alloc 2301
(*f 2.
(load float64u
(let x/192 (+f (load float64u x/64) 1.) (alloc 2301 (-f x/192 3.)))))))

While the compiler generally optimizes patterns like "unbox (box v) -> v", here it does not work because of the let that appears inbetween. This happens every time a function is called with a non-trivial argument and returns a float that is used for further computations. The attached patch diff1.txt is a trivial fix to implement the optimization over the "let". I am fairly convinced this patch is correct. FYI, in the previous example, it generates the following optimized code:

(function camlTest13__g_64 (x/65: addr)
(alloc 2301 (*f 2. (let x/190 (+f (load float64u x/65) 1.) (-f x/190 3.)))))

Here is the second example:

let min (x : float) y = if x < y then x else y
let f x = min (x +. 1.) (x *. 2.)

(function camlTest13__f_108 (x/109: addr)
(let
(y/181 (*f (load float64u x/109) 2.) y/174 (alloc 2301 y/181)
x/182 (+f (load float64u x/109) 1.) x/175 (alloc 2301 x/182))
(if (<f x/182 y/181) x/175 y/174)))

When the compiler needs to access both the value of a float and the same boxed float (usually as a return value), it generates code like: "let z = float_value in let zbox = box(z) in body". Note in the example above that two boxed floats are always allocated, even though we will only use one of them. Moreover, if "unbox(zbox)" appears in "body" (because of inlining), we could optimize by expanding the binding. Generally speaking, whenever "zbox" appears only once in "body" (but not in a loop), it is strictly better to expand the definition (if it appears twice, it might also be better to expand twice and potentially allocate twice but it is harder to know in advance).

The second proposed patch (see file diff2.txt, includes diff1.txt) implements this idea. Here is the new code for the example:

(function camlTest13__f_109 (x/110: addr)
(let
(y/179 (*f (load float64u x/110) 2.) x/180 (+f (load float64u x/110) 1.))
(if (<f x/180 y/179) (alloc 2301 x/180) (alloc 2301 y/179))))

All my test cases worked fine with this patch, but some other people should definitely consider it carefully before including it.

File attachments

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 11, 2008

Comment author: @mmottl

It seems there are a few more cases one can optimise in diff1.txt. Besides Clet, Cifthenelse, and Csequence, which are already considered, it would be great to also optimise Cswitch, Ctrywith and Ccatch (and any other cases we may have missed).

@vicuna

This comment has been minimized.

Copy link
Author

commented Aug 5, 2008

Comment author: @xavierleroy

Thanks for two very interesting suggestions. The first (unboxing across let and others) is really sweet; well spotted! I think it is almost always a win -- I can see a corner case on x86 32 bits where pushing the unbox down the branches of a conditional would be less efficient (because this platform has no float registers), but that's certainly negligible compared with other benefits. I have integrated it in the CVS working sources, generalized it to a few more constructs as suggested by Markus (but I don't think this makes much of a difference in practice), and applied it also to the unboxing of boxed integers.

I'm still evaluating the second suggestion (delayed boxing). For "min"-like functions, it's a win. For some straight-line functions, it can generate slightly less efficient code. Consider:

let test fn x =
let y = x +. 1.0 and z = x *. 2.0 in
fn z (fn x y)

If "x" and "y" are boxed at the point of "let", the two allocations are coalesced, resulting in just one test whether to call the GC. If the boxing is performed at point of use, we end up with two allocations in two different basic blocks, no coalescing, and two GC tests. On my (limited) tests, this doesn't seem to happen often: the number of GC check points globally decreases, but without noticeable performance gains. I keep this PR open until I have more data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.