Use native_compute rather than vm_compute #394

JasonGross · 2018-07-13T21:48:04Z

This should hopefully speed things up. See also coq/coq#8055.

Except it doesn't speed things up (maybe because we are frequently computing enormous PHOAS syntax trees?), so we probably shouldn't merge this. cc @ppedrot

After     | File Name                                          | Before    || Change    | % Change
--------------------------------------------------------------------------------------------------
15m44.56s | Total                                              | 10m54.17s || +4m50.38s | +44.39%
--------------------------------------------------------------------------------------------------
11m49.08s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 6m50.36s  || +4m58.72s | +72.79%
2m14.24s  | Experiments/NewPipeline/Toplevel2                  | 2m23.03s  || -0m08.78s | -6.14%
1m37.61s  | Experiments/NewPipeline/Toplevel1                  | 1m37.18s  || +0m00.42s | +0.44%
0m01.31s  | Experiments/NewPipeline/CLI                        | 0m01.30s  || +0m00.01s | +0.76%
0m01.20s  | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.20s  || +0m00.00s | +0.00%
0m01.12s  | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.10s  || +0m00.02s | +1.81%

ppedrot · 2018-07-15T16:59:55Z

For some reason, fiat-crypto at f79fdb7 doesn't compile on my machine, and thus I cannot observe the slow native compilation instances.

I get an error:

make: execvp: /bin/sh : Argument list too long

Any idea on how to work around this problem?

JasonGross · 2018-07-15T20:24:00Z

Try git rm -rf src/Specific && make update-_CoqProject? You might have to manually rm Makefile.coq or whatever also.

JasonGross · 2018-07-15T20:24:53Z

(It is probably because some line in coq_makefile's Makefile is unhappy with ~5000 .v files in _CoqProject.)

ppedrot · 2018-07-16T09:01:04Z

Still not able to compile it. Now it seems there are missing dependency rules regarding coqprime, but I don't see anything to do from reading the diffs or the makefile.

ppedrot · 2018-07-16T13:35:02Z

OK, I managed to compile it at last. There seems indeed to be an important overhead due to native compilation. I can try to submit a patch working with OCaml 4.06+ so that you can check whether things get noticeably better.

ppedrot · 2018-07-16T14:03:48Z

@JasonGross could you try timing with native-fast-is-accu+linscan?

ppedrot · 2018-07-16T14:05:56Z

Forgot to mention but you need OCaml >= 4.06.0 to take advantage of this.

JasonGross · 2018-07-16T14:09:20Z

@ppedrot Did you forget to git submodule update --init? (re coqprime issues)

Building now with that branch, output of configure is

You have OCaml 4.06.0. Good!
You have OCamlfind 1.7.3. Good!
You have Camlp5 7.05. Good!
You have native-code compilation. Good!
You have the Num library installed. Good!
CoqIde manually disabled:
=> no CoqIde will be built.

  Architecture                : Linux
  Coq VM bytecode link flags  : -dllib -lcoqrun -dllpath /home/jgross/Documents/repos/coq-native-fast-is-accu+linscan/kernel/byterun
  Other bytecode link flags   :
  OCaml version               : 4.06.0
  OCaml binaries in           : /home/jgross/.opam/4.06.0+fp/bin/
  OCaml library in            : /home/jgross/.opam/4.06.0+fp/lib/ocaml
  OCaml flambda flags         :
  Camlp5 version              : 7.05
  Camlp5 binaries in          : /home/jgross/.opam/4.06.0+fp/bin
  Camlp5 library in           : /home/jgross/.opam/4.06.0+fp/lib/camlp5
  Native dynamic link support : true
  CoqIde                      : no
  Documentation               : None
  Web browser                 : firefox -remote "OpenURL(%s,new-tab)" || firefox %s &
  Coq web site                : http://coq.inria.fr/
  Bytecode VM enabled         : true
  Native Compiler enabled     : true

  Local build, no installation...

If anything is wrong above, please restart './configure'.

*Warning* To compile the system for a new architecture
          don't forget to do a 'make clean' before './configure'.

ppedrot · 2018-07-16T14:20:38Z

@JasonGross Yes, my submodules were in some crappy state. I never quite understood how this whole mess is supposed to work, they're really a wart in my git mental model.

The configure output looks OK, go ahead.

JasonGross · 2018-07-16T14:25:31Z

@ppedrot It fails with

ocamlopt.opt: unknown option '-Oclassic'.
Usage: ocamlopt <options> <files>
Options are:
  -fPIC  Generate position-independent machine code (default)
  -fno-PIC  Generate position-dependent machine code
  -a  Build a library
  -absname  Show absolute filenames in error messages
  -annot  Save information in <filename>.annot
  -bin-annot  Save typedtree in <filename>.cmt
  -c  Compile only (do not link)
  -cc <command>  Use <command> as the C compiler and linker
  -cclib <opt>  Pass option <opt> to the C linker
  -ccopt <opt>  Pass option <opt> to the C compiler and linker
  -compact  Optimize code size rather than speed
  -config  Print configuration values and exit
  -dtypes  (deprecated) same as -annot
  -for-pack <ident>  Generate code that can later be `packed' with
     ocamlopt -pack -o <ident>.cmx
  -g  Record debugging information for exception backtrace
  -i  Print inferred interface
  -I <dir>  Add <dir> to the list of include directories
  -impl <file>  Compile <file> as a .ml file
  -inline <n>  Set aggressiveness of inlining to <n>
  -intf <file>  Compile <file> as a .mli file
  -intf-suffix <string>  Suffix for interface files (default: .mli)
  -keep-docs  Keep documentation strings in .cmi files
  -keep-locs  Keep locations in .cmi files
  -labels  Use commuting label mode
  -linkall  Link all modules, even unused ones
  -no-alias-deps  Do not record dependencies for module aliases
  -no-app-funct  Deactivate applicative functors
  -no-float-const-prop  Deactivate constant propagation for floating-point operations
  -noassert  Do not compile assertion checks
  -noautolink  Do not automatically link C libraries specified in .cmxa files
  -nodynlink  Enable optimizations for code that will not be dynlinked
  -nolabels  Ignore non-optional labels in types
  -nostdlib  Do not add default directory to the list of include directories
  -o <file>  Set output file name to <file>
  -open <module>  Opens the module <module> before typing
  -output-obj  Output an object file instead of an executable
  -output-complete-obj  Output an object file, including runtime, instead of an executable
  -p  Compile and link with profiling support for "gprof"
     (not supported on all platforms)
  -pack  Package the given .cmx files into one .cmx
  -pp <command>  Pipe sources through preprocessor <command>
  -ppx <command>  Pipe abstract syntax trees through preprocessor <command>
  -principal  Check principality of type inference
  -rectypes  Allow arbitrary recursive types
  -runtime-variant <str>  Use the <str> variant of the run-time system
  -S  Keep intermediate assembly file
  -safe-string  Make strings immutable
  -shared  Produce a dynlinkable plugin
  -short-paths  Shorten paths in types
  -strict-sequence  Left-hand part of a sequence must have type unit
  -strict-formats  Reject invalid formats accepted by legacy implementations
     (Warning: Invalid formats may behave differently from
      previous OCaml versions, and will become always-rejected
      in future OCaml versions. You should use this flag
      to detect and fix invalid formats.)
  -thread  Generate code that supports the system threads library
  -unsafe  Do not compile bounds checking on array and string access
  -unsafe-string  Make strings mutable (default)
  -v  Print compiler version and location of standard library and exit
  -verbose  Print calls to external commands
  -version  Print version and exit
  -vnum  Print version number and exit
  -w <list>  Enable or disable warnings according to <list>:
        +<spec>   enable warnings in <spec>
        -<spec>   disable warnings in <spec>
        @<spec>   enable warnings in <spec> and treat them as errors
     <spec> can be:
        <num>             a single warning number
        <num1>..<num2>    a range of consecutive warning numbers
        <letter>          a predefined set
     default setting is "+a-4-6-7-9-27-29-32..39-41..42-44-45-48-50"
  -warn-error <list>  Enable or disable error status for warnings according
     to <list>.  See option -w for the syntax of <list>.
     Default setting is "-a"
  -warn-help  Show description of warning numbers
  -where  Print location of standard library and exit
  - <file>  Treat <file> as a file name (even if it starts with `-')
  -nopervasives  (undocumented)
  -dsource  (undocumented)
  -dparsetree  (undocumented)
  -dtypedtree  (undocumented)
  -drawlambda  (undocumented)
  -dlambda  (undocumented)
  -dclambda  (undocumented)
  -dcmm  (undocumented)
  -dsel  (undocumented)
  -dcombine  (undocumented)
  -dcse  (undocumented)
  -dlive  (undocumented)
  -dspill  (undocumented)
  -dsplit  (undocumented)
  -dinterf  (undocumented)
  -dprefer  (undocumented)
  -dalloc  (undocumented)
  -dreload  (undocumented)
  -dscheduling  (undocumented)
  -dlinear  (undocumented)
  -dstartup  (undocumented)
  -opaque  Does not generate cross-module optimization information
     (reduces necessary recompilation on module change)
  -help  Display this list of options
  --help  Display this list of options
File "./src/Experiments/NewPipeline/Toplevel2.v", line 92, characters 9-44:
Warning: Native compiler exited with status 2
[native-compiler-failed,native-compiler]
Finished failing transaction in 1.465 secs (1.38u,0.047s) (failure)
File "./src/Experiments/NewPipeline/Toplevel2.v", line 92, characters 9-44:
Error: Anomaly "Compilation failure."
Please report at http://coq.inria.fr/bugs/.

Command exited with non-zero status 129
src/Experiments/NewPipeline/Toplevel2 (real: 2.74, user: 2.37, sys: 0.36, mem: 658172 ko)
Makefile.coq:656: recipe for target 'src/Experiments/NewPipeline/Toplevel2.vo' failed
make: *** [src/Experiments/NewPipeline/Toplevel2.vo] Error 129

JasonGross · 2018-07-16T18:21:26Z

Oops, I forgot to eval `opam config env`.

@ppedrot I'm not sure how much of a fair comparison this is, given that I have the linscan branch on 4.06.0+fp, and everything else on 4.02.3, but there are some things that got better and other things that got worse, and overall it's a wash.

linscan   | File Name                                          | 8.8       || Change    | % Change
--------------------------------------------------------------------------------------------------
15m43.54s | Total                                              | 15m44.56s || -0m01.01s | -0.10%  
--------------------------------------------------------------------------------------------------
1m52.82s  | Experiments/NewPipeline/Toplevel2                  | 2m14.24s  || -0m21.42s | -15.95% 
12m06.26s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 11m49.08s || +0m17.17s | +2.42%  
1m41.45s  | Experiments/NewPipeline/Toplevel1                  | 1m37.61s  || +0m03.84s | +3.93%  
0m01.12s  | Experiments/NewPipeline/CLI                        | 0m01.31s  || -0m00.18s | -14.50% 
0m01.00s  | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.12s  || -0m00.12s | -10.71% 
0m00.90s  | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.20s  || -0m00.29s | -24.99%

maximedenes · 2018-07-17T12:45:04Z

I have the linscan branch on 4.06.0+fp

fp can have significant cost, so it would be nice to have a more fair comparison (4.07.0 with linscan vs 4.07.0 without, for example).

ppedrot · 2018-07-17T12:51:07Z

@maximedenes Really? I though fp essentially made one register unavailable. So at worst it should be on par with the linscan flag.

maximedenes · 2018-07-17T12:54:06Z

Really? I though fp essentially made one register unavailable. So at worst it should be on par with the linscan flag.

Yeah, but they add. And things are not linear there, probably spilling vs not spilling at all matters, also not all registers play the same role in the instruction set, etc.

So it would be safer to make a more direct comparison.

ppedrot · 2018-07-18T13:57:25Z

@JasonGross Not sure if my correct, but the commented first example marked too long in SlowPrimeSynthesisExamples is not slow because of vm/native reduction but because it's taking all of its time in kernel reduction. I suspect that this is due to some fancy semantics of Ltac, as I have discovered that the following 2-line patch makes the script reach native compilation instantly (and then run out of memory while computing):

diff --git a/src/Experiments/NewPipeline/Toplevel1.v b/src/Experiments/NewPipeline/Toplevel1.v
index dde6bd6c4..9dd4e23f6 100644
--- a/src/Experiments/NewPipeline/Toplevel1.v
+++ b/src/Experiments/NewPipeline/Toplevel1.v
@@ -1563,7 +1563,7 @@ Module Import UnsaturatedSolinas.
 End UnsaturatedSolinas.
 
 Ltac peel_interp_app _ :=
-  lazymatch goal with
+  [> lazymatch goal with
   | [ |- ?R' (?InterpE ?arg) (?f ?arg) ]
     => apply fg_equal_rel; [ | reflexivity ];
        try peel_interp_app ()
@@ -1585,7 +1585,7 @@ Ltac peel_interp_app _ :=
                  => let rc := constr:(GallinaReify.Reify c) in
                     unify ev rc; reflexivity
                end ] ]
-  end.
+  end ].
 Ltac pre_cache_reify _ :=
   cbv [type.app_curried];
   let arg := fresh "arg" in

Is the patch correct? If so, maybe there is something to find out.

JasonGross · 2018-07-18T14:19:09Z

That patch looks correct. It shouldn't change behavior except in cases where the new version errors with an incorrect number of goals, so if that's not happening, it seems very sketchy that it should change performance. Can you figure out what's going on?

JasonGross · 2018-07-18T17:24:17Z

For historians: Turns out the confusion here was that Print Ltac prints constr:(ltac:(...)) as _. And it was a reflexivity after unify that was taking forever.

@ppedrot

With some help from @ppedrot in tracking things down. C.f. #394 (comment) After | File Name | Before || Change | % Change ------------------------------------------------------------------------------------------------ 8m48.43s | Total | 9m07.60s || -0m19.16s | -3.50% ------------------------------------------------------------------------------------------------ 5m35.98s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 5m45.63s || -0m09.64s | -2.79% 1m29.20s | Experiments/NewPipeline/Toplevel2 | 1m38.67s || -0m09.46s | -9.59% 1m39.78s | Experiments/NewPipeline/Toplevel1 | 1m39.77s || +0m00.00s | +0.01% 0m01.18s | Experiments/NewPipeline/CLI | 0m01.31s || -0m00.13s | -9.92% 0m01.16s | Experiments/NewPipeline/StandaloneHaskellMain | 0m01.09s || +0m00.06s | +6.42% 0m01.14s | Experiments/NewPipeline/StandaloneOCamlMain | 0m01.14s || +0m00.00s | +0.00%

This should hopefully speed things up. See also coq/coq#8055. Before thunking nat_rect: After | File Name | Before || Change | % Change -------------------------------------------------------------------------------------------------- 15m44.56s | Total | 10m54.17s || +4m50.38s | +44.39% -------------------------------------------------------------------------------------------------- 11m49.08s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 6m50.36s || +4m58.72s | +72.79% 2m14.24s | Experiments/NewPipeline/Toplevel2 | 2m23.03s || -0m08.78s | -6.14% 1m37.61s | Experiments/NewPipeline/Toplevel1 | 1m37.18s || +0m00.42s | +0.44% 0m01.31s | Experiments/NewPipeline/CLI | 0m01.30s || +0m00.01s | +0.76% 0m01.20s | Experiments/NewPipeline/StandaloneOCamlMain | 0m01.20s || +0m00.00s | +0.00% 0m01.12s | Experiments/NewPipeline/StandaloneHaskellMain | 0m01.10s || +0m00.02s | +1.81% After thunking nat_rect After | File Name | Before || Change | % Change ------------------------------------------------------------------------------------------------- 14m43.99s | Total | 8m48.75s || +5m55.24s | +67.18% ------------------------------------------------------------------------------------------------- 11m14.49s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 5m35.50s || +5m38.99s | +101.04% 1m46.30s | Experiments/NewPipeline/Toplevel2 | 1m29.24s || +0m17.06s | +19.11% 1m39.51s | Experiments/NewPipeline/Toplevel1 | 1m40.40s || -0m00.89s | -0.88% 0m01.28s | Experiments/NewPipeline/CLI | 0m01.22s || +0m00.06s | +4.91% 0m01.21s | Experiments/NewPipeline/StandaloneHaskellMain | 0m01.18s || +0m00.03s | +2.54% 0m01.20s | Experiments/NewPipeline/StandaloneOCamlMain | 0m01.21s || -0m00.01s | -0.82%

ppedrot · 2018-07-19T12:22:44Z

I have reported an algorithmic bug in the OCaml compiler that affects native_compute in fiat-crypto: https://caml.inria.fr/mantis/view.php?id=7826.

JasonGross · 2018-07-19T15:01:10Z

Thanks! I guess this is why it takes forever (a couple of minutes) to compile the extracted code. I wonder if Haskell has a similar issue; it takes ghc longer than it takes ocamlopt.

idkvpz · 2018-12-26T10:18:26Z

this is helpful

JasonGross · 2020-04-24T17:35:38Z

This is old and operates on outdated code, so I'm going to close it

JasonGross added the work in progress label Jul 13, 2018

JasonGross added this to To Do in Pipeline without CPS via automation Jul 13, 2018

JasonGross mentioned this pull request Jul 13, 2018

native_compute (and vm_compute) is about 14x-16x slower (about 25x-27x, resp.) than extraction+ocamlopt coq/coq#8019

Open

JasonGross force-pushed the use-native-compute branch from f09f397 to 5ae53b1 Compare July 18, 2018 23:33

JasonGross closed this Apr 24, 2020

JasonGross deleted the use-native-compute branch April 24, 2020 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use native_compute rather than vm_compute #394

Use native_compute rather than vm_compute #394

JasonGross commented Jul 13, 2018

ppedrot commented Jul 15, 2018

JasonGross commented Jul 15, 2018

JasonGross commented Jul 15, 2018

ppedrot commented Jul 16, 2018

ppedrot commented Jul 16, 2018

ppedrot commented Jul 16, 2018

ppedrot commented Jul 16, 2018

JasonGross commented Jul 16, 2018

ppedrot commented Jul 16, 2018

JasonGross commented Jul 16, 2018

JasonGross commented Jul 16, 2018

maximedenes commented Jul 17, 2018

ppedrot commented Jul 17, 2018

maximedenes commented Jul 17, 2018

ppedrot commented Jul 18, 2018

JasonGross commented Jul 18, 2018

JasonGross commented Jul 18, 2018

ppedrot commented Jul 19, 2018

JasonGross commented Jul 19, 2018

idkvpz commented Dec 26, 2018

JasonGross commented Apr 24, 2020

Use native_compute rather than vm_compute #394

Use native_compute rather than vm_compute #394

Conversation

JasonGross commented Jul 13, 2018

ppedrot commented Jul 15, 2018

JasonGross commented Jul 15, 2018

JasonGross commented Jul 15, 2018

ppedrot commented Jul 16, 2018

ppedrot commented Jul 16, 2018

ppedrot commented Jul 16, 2018

ppedrot commented Jul 16, 2018

JasonGross commented Jul 16, 2018

ppedrot commented Jul 16, 2018

JasonGross commented Jul 16, 2018

JasonGross commented Jul 16, 2018

maximedenes commented Jul 17, 2018

ppedrot commented Jul 17, 2018

maximedenes commented Jul 17, 2018

ppedrot commented Jul 18, 2018

JasonGross commented Jul 18, 2018

JasonGross commented Jul 18, 2018

ppedrot commented Jul 19, 2018

JasonGross commented Jul 19, 2018

idkvpz commented Dec 26, 2018

JasonGross commented Apr 24, 2020