Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor inline always even for functions with optional arguments and default values #12526

Merged

Conversation

alainfrisch
Copy link
Contributor

(Note: this is only about the Closure middle-end; I haven't checked Flambda.)

Functions with optional arguments and default values are split into an inner function and wrapper, which is responsible for filling in missing arguments with their default values (and then tail-calling the inner function). The rationale is that the wrapper can then be inlined on call sites, avoiding in particular the need to wrap passed optional arguments with Some and deconstruct that Some immediately.

However, we don't want this split in two cases:

  • When the function is explicitly marked with inline always ==> we want to inline the whole function, not just the wrapper. (The way inlining currently works, after a split, the inner function is never inlined in the wrapper.)
  • When the function is explicitly marked with inline never ==> we should honor that, and not even inline the wrapper (which makes the split useless).

This PR disables the split in these cases. One could try to be more clever, and also avoid the split if the inlining heuristics triggers on the whole function, but this is more tricky (the function size that determines if we are below the inlining threshold is computed on Clambda, but the split is done at the Lambda level). Honoring the inline attribute is simple and could be seen as a bug fix.

Here is a micro-benchmark to illustrate the benefits of this PR:

let[@ocaml.inline always] f ?(x = 1.) a b = a +. b *. x
let () = let r = ref 0. in for _ = 1 to 1000000000 do r := !r +. f 1. 1. done

With trunk, this takes 2.8s on my machine. With this PR, it takes 0.45s (6x faster), thanks to the float unboxing made possible by inlining.

Without the inline attributes, trunk and PR behave the same.

With inline never, we have trunk->2.7s and PR->2.9s. The slowdown is intentional; it corresponds to that lack of inlining of the wrapper with the PR.

Shall I create a non-regression test (perhaps based on observing Gc.allocated_bytes in the example above)?

@alainfrisch alainfrisch changed the title Honor ocaml.inline even for function with optional arguments and default values Honor ocaml.inline even for functions with optional arguments and default values Sep 1, 2023
@alainfrisch
Copy link
Contributor Author

With trunk, this takes 2.8s on my machine. With this PR, it takes 0.45s (6x faster), thanks to the float unboxing made possible by inlining.

Just tested with flambda : it doesn't seem to be impacted, the function is fully inlined (resulting in a constant), but surprisingly the code takes 0.9s to run.

Here is the cmm code for the loop with flambda:

             (if (> _for/306 2000000001) (exit 4)
               (catch rec (exit 5) with(5)
                 (let Paddfloat/312 (+f r/310 2.)
                   (assign r/310 Paddfloat/312))
                 (let *id_prev*/311 _for/306 (assign _for/306 (+ _for/306 2))
                   (if (== *id_prev*/311 2000000001) (exit 4) []))
                 (exit 5)))
           with(4) []))

and here is the code with Closure (and this PR):

           (if (> _for/276 2000000001) (exit 4)
             (catch rec (exit 5) with(5)
               (assign r/282
                         (+f r/282
                           (let x/281 "camlTmp.1"
                             (+f 1. (*f 1. (load float64 x/281))))))
               (let *id_prev*/283 _for/276 (assign _for/276 (+ _for/276 2))
                 (if (== *id_prev*/283 2000000001) (exit 4) []))
               (exit 5)))
         with(4) []))

(I'm a bit puzzled at why this is twice faster than the flambda version...)

@lthls
Copy link
Contributor

lthls commented Sep 4, 2023

Let's leave aside the "Just use flambda" answer and focus on this PR.
The part about not splitting functions that should be inlined makes sense. Using an inlining attribute instead of a proper inlining decision is unfortunate, but that's a consequence of how Closure is designed, so I think the choice of this PR is appropriate.
I don't understand why the special case for [@inline never] is there though. I could understand it for Flambda, which does inline these wrappers a bit too aggressively even though they might not be small (these default expressions can be arbitrarily big), but with Closure either these wrappers are small and it doesn't hurt to inline them, or they are not small and they won't be inlined anyway.

As a last note, I can confirm the weird performance results (where the Flambda code appears faster but runs slower), and the answer is just alignment issues. Both versions end up producing the same code for the loop (the reference is not used, so its contents are not even computed; this is detected in the Deadcode pass I believe), but the Closure version has a few more instructions before the loop. Changing the example to add a few more instructions before the loop even for Flambda makes it as fast as Closure (the number of instructions reported by perf is the same in all cases).
If I add an instruction that uses the result at the end, then I get the same time with Closure and Flambda (although the Closure one executes about 50% more instructions, as expected).

@alainfrisch
Copy link
Contributor Author

I don't understand why the special case for [@inline never] is there though.

Well it's just that if the user asks to avoid inlining, it's better to honor the request. One might argue that it could be interpreted as "don't inline the body, but ok to inline the wrapper". But that doesn't fit immediately in the current code, which simply ignores the attribute (whatever his payload is) after the split (because we end up with several function, and the attribute is only supported for a single function). But I think that what is implemented in the current PR is more faithful to the author's request of not inlining.

@lthls
Copy link
Contributor

lthls commented Sep 4, 2023

Well it's just that if the user asks to avoid inlining, it's better to honor the request. One might argue that it could be interpreted as "don't inline the body, but ok to inline the wrapper".

Yes, that's what Flambda does.

But that doesn't fit immediately in the current code, which simply ignores the attribute (whatever his payload is) after the split (because we end up with several function, and the attribute is only supported for a single function).

I'm not sure what you meant by that, but the attribute for the original function is faithfully propagated to the inner function. For Closure, this doesn't matter, as the only call to the inner function is in the wrapper and Closure can't inline the worker in the wrapper anyway.

But I think that what is implemented in the current PR is more faithful to the author's request of not inlining.

I don't agree with that, and I don't think we should change something that is not broken without more evidence that it bothers users.
We received a number of complaints about code that was inlined where it wasn't meant to be, but I don't remember any such case involving default wrappers (Flambda does inline them even for functions with [@inline never]).

@alainfrisch
Copy link
Contributor Author

Ok, I've pushed a commit so that the behavior only changes in the inline always case.

FTR, with Closure and a function with optional arguments (with default values):

  • If no inline attribute: the wrapper can be inlined on call sites, but never the body.
  • In case of inline never... : same!

I don't find that particularly useful, but I don't really care, and I can see the benefit of having a behavior slightly closer to flambda.

@alainfrisch
Copy link
Contributor Author

@lthls : I added a changelog entry with you as a reviewer, but it's not clear to me whether you actually reviewed the code change or only agreed in principle with its description. Let me know!

@alainfrisch alainfrisch force-pushed the afrisch_inline_function_with_opt_args branch from adf48aa to 0803762 Compare September 6, 2023 20:51
@alainfrisch
Copy link
Contributor Author

Do you think I should add a non-regression test (e.g. tracking memory allocation in my example)? I don't see any actual tests of inlining in the testsuite.

@alainfrisch alainfrisch changed the title Honor ocaml.inline even for functions with optional arguments and default values Honor inline always even for functions with optional arguments and default values Sep 6, 2023
Copy link
Contributor

@lthls lthls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test for optimisations around optional arguments in testsuite/tests/asmcomp/optargs.ml. You could add something with inlining attributes in this file if you have a test in mind.
And since the discussion here may prove interesting to readers of the code, you might want to add a reference to this PR in the comment that you've updated.
I'm fine with the PR as it currently stands though.

@alainfrisch alainfrisch force-pushed the afrisch_inline_function_with_opt_args branch from 0803762 to 665f5ae Compare September 7, 2023 19:30
@alainfrisch
Copy link
Contributor Author

There's a test for optimisations around optional arguments in testsuite/tests/asmcomp/optargs.ml. You could add something with inlining attributes in this file if you have a test in mind.

Thanks for reminding about the test I wrote :-) I've extended it to check non-regression for the problem fixed by this PR. I've rewritten the history of the branch to include the test first. This makes it easy to check that the test fails before the fix (and succeeds after!). The PR should be squashed-merged to avoid that commit that doesn't pass the testsuite.

@lthls : if you are happy with the PR, can you merge it?

@alainfrisch alainfrisch force-pushed the afrisch_inline_function_with_opt_args branch 4 times, most recently from 154290d to d80896d Compare September 7, 2023 19:40
@alainfrisch alainfrisch force-pushed the afrisch_inline_function_with_opt_args branch from d80896d to 3535c36 Compare September 7, 2023 19:41
@lthls lthls merged commit fcf87c4 into ocaml:trunk Sep 7, 2023
9 checks passed
eutro pushed a commit to eutro/ocaml that referenced this pull request Sep 17, 2023
…default values (ocaml#12526)

* Add a non regression test

* Honor ocaml.inline even for function with option arguments and default values

---------

Co-authored-by: Alain Frisch <alain@frisch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants