Cmm arithmetic optimisations #17

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
6 participants
Contributor

stedolan commented Feb 18, 2014

Some optimisations on Cmm expressions involving tagging or constants. Here's a (best-case) example:

let int_of_digits a b c = 
  100 * (Char.code a - Char.code '0') + 
   10 * (Char.code b - Char.code '0') +
    1 * (Char.code c - Char.code '0')

Trunk compiles this to:

 (+
   (+ (+ (* 200 (>>s (+ a/1020 -96) 1)) (* 20 (>>s (+ b/1021 -96) 1)))
     (* 2 (>>s (+ c/1022 -96) 1)))
   1)

This branch compiles this to:

(+ (+ (+ ( * a/1020 100) ( * b/1021 10)) c/1022) -10766)

Floating all of the constant operations, tags, etc. out of arithmetic means they can be merged into one addition.

@stedolan stedolan Cmm arithmetic optimisations.
Constant additions and tagging operations are moved out of
subexpressions when possible. Often they can be merged.
0c520e4

@chambart chambart commented on an outdated diff Feb 19, 2014

asmcomp/cmmgen.ml
@@ -196,6 +194,11 @@ let untag_int = function
| Cop(Cor, [c; Cconst_int 1]) -> Cop(Casr, [c; Cconst_int 1])
| c -> Cop(Casr, [c; Cconst_int 1])
+let mul_int_tagged c1 c2 =
+ match (c1, c2) with
+ | (Cconst_int n1, c2) -> mul_int (untag_int c1) (decr_int c2)
+ | (_, _) -> mul_int (decr_int c1) (untag_int c2)
+
@chambart

chambart Feb 19, 2014

Contributor

From the name of the function, it is not obvious that the result is the tagged result minus one. You should probably move the 'incr_int' here.

@chambart chambart commented on an outdated diff Feb 19, 2014

asmcomp/cmmgen.ml
| (c1, c2) ->
Cop(Csubi, [c1; c2])
-let mul_int c1 c2 =
+let rec lsl_int c1 c2 =
+ match (c1, c2) with
+ (Cop(Clsl, [c; Cconst_int n1]), Cconst_int n2)
+ when n1 > 0 && n2 > 0 && n1 + n2 < size_int * 8 ->
+ Cop(Clsl, [c; Cconst_int (n1 + n2)])
+ | (Cop(Caddi, [c1; Cconst_int n1]), Cconst_int n2)
+ when no_overflow_lsl n1 n2 ->
@chambart

chambart Feb 19, 2014

Contributor

It is probably a good idea to check that n2 is positive (or do that in no_overflow_lsl), I am not certain that the behaviour of shift is the same on all architectures.

@chambart chambart commented on an outdated diff Feb 19, 2014

asmcomp/cmmgen.ml
@@ -133,16 +134,13 @@ let mul_int c1 c2 =
sub_int (Cconst_int 0) c
| (c, Cconst_int n) | (Cconst_int n, c) when n = 1 lsl Misc.log2 n->
Cop(Clsl, [c; Cconst_int(Misc.log2 n)])
@chambart

chambart Feb 19, 2014

Contributor

You should use lsl_int here

Contributor

stedolan commented Feb 23, 2014

Thanks for the quick response, sorry it took me a while to get back. I've fixed the things you pointed out.

A couple of days ago, you said that it might have O(n^2) behaviour. I'm fairly sure it doesn't: add_int only recurses when it actually finds constants, which will only happen at the root of an expression generated by these functions.

In a test, compiling a function f x = x + x + ... + x, I did observe O(n^2) runtimes. But add_int was only called O(n) times, and the problem exists on older versions of ocamlopt.

You can use Sys.word_size instead of redetecting it

Contributor

chambart commented Feb 23, 2014

I know, I deleted the comment for that reason... The quadratic behaviour for a long addition is probably due to register allocation (or more precisely constraints generation).

Now this patch looks ok to me.

Member

vouillon commented Oct 10, 2014

It would be great to optimize logical operations as well.

Contributor

bobot commented Nov 11, 2014

This PR have been discussed during the last developers' meeting, where I had the pleasure to be. To sum-up:

  • Style: add bar at the start of match
  • Overflow? seems good, no obvious mistake
  • Use Sys.word_size.
  • For the record: This optimization can break other optimization like let-introduction of shared terms but it should be a general win.
  • OK for integration when style and Sys.word_size will be fixed.
Contributor

DemiMarie commented Dec 17, 2014

Would it be possible for me to write & submit a fixed patch?

Contributor

stedolan commented Dec 18, 2014

@drbo no objection from me! Sorry I haven't had a chance to fix this up recently.

@stedolan stedolan Coding style fixes, finally.
Add | on initial match case, and use Sys.word_size
432ce7a
Contributor

stedolan commented Dec 29, 2014

Finally fixed this, sorry everyone for the delay.

Member

gasche commented Feb 7, 2015

Merged in trunk@15820, thanks!

gasche closed this Feb 7, 2015

@bactrian bactrian pushed a commit that referenced this pull request Feb 7, 2015

@gasche gasche Cmm arithmetic optimisations.
Constant additions and tagging operations are moved out of
subexpressions when possible. Often they can be merged.

From Stephen Dolan:
  #17

git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@15820 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
81e8d5b

@nojb nojb added a commit to nojb/ocaml that referenced this pull request Apr 12, 2015

@gasche @nojb gasche + nojb Cmm arithmetic optimisations.
Constant additions and tagging operations are moved out of
subexpressions when possible. Often they can be merged.

From Stephen Dolan:
  ocaml#17

git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@15820 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
5e094d6

@bactrian bactrian pushed a commit that referenced this pull request Apr 18, 2015

@gasche gasche Cmm arithmetic optimisations.
Constant additions and tagging operations are moved out of
subexpressions when possible. Often they can be merged.

From Stephen Dolan:
  #17

git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@15820 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
e2907b8

@stedolan stedolan added a commit to stedolan/ocaml that referenced this pull request Aug 18, 2015

@stedolan stedolan Use CAML_STATIC_ASSERT only once per line.
Fixes #17. (hopefully!)
c2a54cc

@mshinwell mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 1, 2016

@lpw25 lpw25 Merge pull request #17 from janestreet/patch/better-short-paths
Better short paths
be3b862

@lpw25 lpw25 pushed a commit to lpw25/ocaml that referenced this pull request Aug 22, 2016

@kayceesrk kayceesrk Merge pull request #17 from ocamllabs/clone
Added continuation cloning
d40f41e

stedolan deleted the stedolan:linear-constant-opts branch Mar 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment