Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily box float and int references #109

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
4 participants
@vbrankov
Copy link

commented Oct 24, 2014

An argument follows that the penalty for early boxing of floats and ints is bigger than lazy boxing. It is supported by a benchmark of a production application.

Cmmgen.transl_unbox_let will unbox a float reference if it only exists in:

  1. Left side of assignments: x := <value>
  2. Its value is read in an operation, because operations accept unboxed values: !x +. 5.

Unboxing a float reference means that the variable will contain the very value of float. This is okay since the variable will never end up in OCaml heap. This gives a significant speedup since assigning it a value will never result in memory allocation. For instance in s := !s +. a *. b, the aggregate cost of allocating and cleaning a float will be considerable compared to the cost of computing the right side.

Unboxing will not occur if the value is read but not in an operation. This is done in language constructs that accept only boxed references. Probably the most frequent situations are:

  1. Using the value as an argument to a call: let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.
  2. Assigning the value to a variable: let y = !x in
  3. Returning the value:
let foo () =
  let s = ref 0. in
  ...
  !s

I'm advocating the approach of lazy boxing, which means that float references are always unboxed. For the language constructs which accept boxed floats, the value is boxed in place. This is advantageous to early boxing: more likely is that assignments will be much more frequent than such language constructs. In the unlikely situation where this approach is indeed slower because of frequent boxing, it is much easier to write a workaround than with the current compiler:

let s = ref 0. in
...
let s_value = !s in
while true do
  foo s_value (* instead of !s *)
  ...

I've seen a lot of slow code because the compiler chooses not to unbox a reference due to a single call or assignment. Here's an example illustrating it:

https://github.com/vbrankov/ocaml-demos/blob/master/lazy-boxing/test.ml

A benchmark of a production application shows speed improvement and less garbage:

Name Time/Run mWd/Run mjWd/Run Prom/Run Percentage
no patch 5.12s 1.71Gw 2.55Mw 2.32Mw 100.00%
with patch 5.01s 1.61Gw 2.45Mw 2.22Mw 100.00%
Vladimir Brankov
Lazily box float and int references because it is assumed that the pe…
…nalty for early boxing is bigger. Late boxing most often occurs when the reference value is used as an argument to a function call, when it is assigned to a record field or when the value is returned. Early boxing most often occurs in assignment, where the speed of boxing and consequently cleaning the garbage can often dwarf the speed of computing the right side of assignment.
@Chris00

This comment has been minimized.

Copy link
Member

commented Oct 25, 2014

Using the value as an argument to a call: let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.

I always wondered if the compiler should not produce two versions of some functions: for, say, let f x = x +. sin(x**2.), shouldn't f be compiled to a version accepting and returning unboxed floats, the boxing occurring only when needed (thus in essence f is like a primitive function with the "float" qualifier).

Assigning the value to a variable: let y = !x in

Shouldn't this be recursive? If y can be unboxed, then one may want to do it for x too.

Returning the value:

This is more tricky. See http://caml.inria.fr/mantis/view.php?id=5133

@vbrankov

This comment has been minimized.

Copy link
Author

commented Oct 25, 2014

The bug 5133 should be fixed with this patch.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2014

See also http://caml.inria.fr/mantis/view.php?id=5204 and the more_unboxing branch in OCaml SVN. The idea there was to box lazily, by keeping a reference to the boxed value for future use (before it is re-used, its content is compared to the current unboxed variable). This ensures that (i) there is never more boxing than with the current strategy (ii) there is no boxing if the boxed value is not actually required.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2015

let s_value = !s in

Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2015

This has been discussed yesterday during a caml-devel meeting, and there is an agreement in principle about switching to the proposed strategy (without my "caching" variant). I'll do some more tests on LexiFi's numerical code to see the impact on performance and if everything goes well, I'll merge the patch.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2015

I cannot observe big impacts on LexiFi's tests, performance wise (perhaps partly because we optimized around the previous unboxing scheme; but also because most of the time is spent in actual computation, not boxing). There is a consistent gain in terms of total allocations, which is good anyway. And I trust @vbrankov that on some real applications, the gain in runtime could be noticeable (this can be trivially seen on micro-benchmarks and some real-world code is not so different from such tight loops). Moreover, the change makes the optimizer more predictable and avoids the need/temptation to add "... +. 0.", while simplifying the code of the optimizer itself.

I'll merge the patch.

It could be interesting to refine the new strategy to box only once, as before, when the identifier is not mutated, but the boxed value is used (potentially/certainly) multiple times. I doubt this would lead to significant improvements, though (although again, micro-benchmarks could show the difference).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2015

Merged in trunk, commit 16215.

@gasche gasche closed this Jul 18, 2015

@vbrankov

This comment has been minimized.

Copy link
Author

commented Jul 22, 2015

Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand).

In the current strategy, cmmgen.ml determines that a variable is float only if it is a result of a float operation. In the case of my workaround the variable is a result of dereferencing so the value is boxed. I completely agree with you that the compiler should use more clues to determine whether a variable is float, which would break the workaround. I already discussed it with @lpw25 and @def-lkb to make the compiler smarter in that matter but as far as I know it's still sitting on our to-do lists.

Apologies for a delayed response.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2015

@vrbankov : can you have a look at http://caml.inria.fr/mantis/view.php?id=6940 and review the attached patch?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2015

This new unboxing strategy, which decides whether to unbox an id or not purely based on the bound expression, actually enables to simplify the whole unboxing scheme, by doing it "on the fly" instead of requiring a post processing part. See http://caml.inria.fr/mantis/view.php?id=7022 and https://github.com/alainfrisch/ocaml/tree/unbox_earlier .

@vbrankov

This comment has been minimized.

Copy link
Author

commented Oct 27, 2015

I'm sorry Alain, I completely missed your email from Jul 28th. I'll review the code right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.