Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Avoid boxing float/int32/int64 when doing direct call #5894

vicuna opened this issue Jan 17, 2013 · 5 comments

[patch] Avoid boxing float/int32/int64 when doing direct call #5894

vicuna opened this issue Jan 17, 2013 · 5 comments


Copy link

@vicuna vicuna commented Jan 17, 2013

Original bug ID: 5894
Reporter: @chambart
Status: acknowledged (set by @damiendoligez on 2013-06-19T18:48:03Z)
Resolution: open
Priority: normal
Severity: feature
Target version: later
Category: back end (clambda to assembly)
Tags: patch
Related to: #5917
Monitored by: meyer tgazagna meurer @diml @jmeber @hcarty

Bug description

Functions taking floating points parameters always receive them boxed.
When a direct call is done, it is possible to specialise the call to
pass those parameters in registers. This also applies to boxed integers.
The return value can also be unboxed.

This allow avoiding all allocations in a function like

let (+) = Nativeint.add
let (-) = Nativeint.sub
let rec fib n =
if n < 2n then 0n + 1n else fib(n-1n) + fib(n-2n)

Additional information

This patch take care of not unboxing parameters when it is possible that this
could increase the number of allocation.

For instance a function like this

let bad (x:float) =
ignore [x] (* force boxing of x *)

bad will not be specialised because its parameter is used boxed inside its body.

In a case like:

let rec f x y =
let _ = x +. 1. in
let _ = y +. 1. in
h 1. x

and g a b =
let _ = a+.1. in
let _ = b+.1. in
f b 1.

and h c d =
let _ = c +. 1. in
let _ = d +. 1. in
let _ = g 1. c in
let _ = g 1. d in
bad d;

only the y parameter of f and a of g will be unboxed.

The return value will be unboxed if every possible returned value is unboxed.

For instance

x +. y is considered unboxed


if b
then x +. y
else List.hd l

is considered boxed because List.hd return an already boxed value.

Current problems:

  • Constants are considered boxed because they are already boxed at
    compile time, but this may not be the right default choice. It can still be
    circumvented by things like 0n + 1n in the fibonacci example.

  • Specialised functions code is duplicated:

    • It is possible to know that some functions will never be used in direct call:
      those can be compiled only one time
    • specialisation could be forbidden over a certain size
    • the generic version of function specialised only for parameters (not return
      value) could be compiled by adding a stub before the specialised version.

File attachments

Copy link

@vicuna vicuna commented Jan 17, 2013

Comment author: @gasche

Have you tested the efficiency of this optimization on some of the float-heavy benchmark code that started flowing in (from Alain for example)?

Specialised functions code is duplicated

It should be possible to have the boxed version just unbox, and then call the unboxed version; that would result in little additional compiled code. But this is only efficient if the inliner reliably removes this indirection layer (which amounts to unboxing at call-site).

Can the inliner be relied upon for this? How much does this degrade the performances in the case where no inlining can happen anyway (no .cmx, functor application...)?

(A way to work around the "not sure if boxing will be needed" problem is to specialize to a function that takes two arguments, the boxed and the unboxed data. This is better than the boxing version because you don't have to unbox at definition site, could actually not have much overhead if the unboxed data is passed efficiently, and guarantees that no additional allocation is necessary -- it's essentially a specialization of the technique of "On the runtime complexity of type-directed unboxing", Garrigue and Minamide, 1998. I don't expect it to work in practice, but you never know.)

Copy link

@vicuna vicuna commented Jan 18, 2013

Comment author: @alainfrisch

This is great news. I fully support this project!

will not be specialised because its parameter is used boxed inside its body.

I'm not sure this is a good criterion, although it is in line with the current intra-function behavior (don't unbox if there is a potential use site in boxed form). It could very well be the case that the parameter might be used boxed inside the body, but only in rare cases (e.g. to print an error message if some invariant is broken). If this disable the optimization, it means the caller might need to box the parameter.

Have you tried to combine your patch with the strategy I propose to deal with boxing (#5204), i.e. box "lazily" (on demand + memoization)?

Copy link

@vicuna vicuna commented Jan 21, 2013

Comment author: @chambart

On Alain's bench on my computer:
without patch:
time: 21.0s
minor_words: 2243147047
promoted_words: 450541

with patch (and a few forgoten cases present in my git branch):
time: 20.1s
minor_words: 1007129106
promoted_words: 238960

Notice that more than half the running time is taken in __ieee754_exp

The majority of other float heavy benchmarks I encountered, where written with
a big while loop such that nothing was boxed.

I will update the patch with a few more things: function parameters
where not considered as unboxable float so a function like

let f x = let a = if Random.bool () then x else x in a +.a

was compiled allocating x and loading from it.

This is fixed in my git branch:

testing it with opam:

opam remote add comp
opam switch install trunk+floatarg

No the inliner cannot (yet) generate direct calls so a code like:

let apply f x = f x
let g x = apply (+.) x

will generate a generic call to (+.) I have another patch in preparation
for that kind of things.

The main interest of avoiding boxing is to avoid some allocation, passing both
parameters would loose that. The cost of loading from the boxed value is not
that much and you would still have to pay for if at each function call (reloading
spilled registers).

I didn't merge both patch yet, I think that I will need to change the
stategy a bit. I am not completely sure of which cases will not be worth
unboxing, maybe when we can show that every possible path in the function
use the value boxed.

Off topic: How do I remove an attached file ? Is the bug reporter
supposed to have the rights to do that ?

Copy link

@vicuna vicuna commented Dec 4, 2015

Comment author: @alainfrisch

Will the many changes since Jan. 2013, and the ones coming with flambda, this is going to take some work bringing this up to date. But it is certainly something worth pushing forward.

Copy link

@vicuna vicuna commented Sep 6, 2017

Comment author: @alainfrisch

I've started to work on a similar approach: #1322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant