Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use native_compute rather than vm_compute #394

Closed
wants to merge 1 commit into from

Conversation

JasonGross
Copy link
Collaborator

This should hopefully speed things up. See also coq/coq#8055.

Except it doesn't speed things up (maybe because we are frequently computing enormous PHOAS syntax trees?), so we probably shouldn't merge this. cc @ppedrot

After     | File Name                                          | Before    || Change    | % Change
--------------------------------------------------------------------------------------------------
15m44.56s | Total                                              | 10m54.17s || +4m50.38s | +44.39%
--------------------------------------------------------------------------------------------------
11m49.08s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 6m50.36s  || +4m58.72s | +72.79%
2m14.24s  | Experiments/NewPipeline/Toplevel2                  | 2m23.03s  || -0m08.78s | -6.14%
1m37.61s  | Experiments/NewPipeline/Toplevel1                  | 1m37.18s  || +0m00.42s | +0.44%
0m01.31s  | Experiments/NewPipeline/CLI                        | 0m01.30s  || +0m00.01s | +0.76%
0m01.20s  | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.20s  || +0m00.00s | +0.00%
0m01.12s  | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.10s  || +0m00.02s | +1.81%

@ppedrot
Copy link
Contributor

ppedrot commented Jul 15, 2018

For some reason, fiat-crypto at f79fdb7 doesn't compile on my machine, and thus I cannot observe the slow native compilation instances.

I get an error:

make: execvp: /bin/sh : Argument list too long

Any idea on how to work around this problem?

@JasonGross
Copy link
Collaborator Author

Try git rm -rf src/Specific && make update-_CoqProject? You might have to manually rm Makefile.coq or whatever also.

@JasonGross
Copy link
Collaborator Author

(It is probably because some line in coq_makefile's Makefile is unhappy with ~5000 .v files in _CoqProject.)

@ppedrot
Copy link
Contributor

ppedrot commented Jul 16, 2018

Still not able to compile it. Now it seems there are missing dependency rules regarding coqprime, but I don't see anything to do from reading the diffs or the makefile.

@ppedrot
Copy link
Contributor

ppedrot commented Jul 16, 2018

OK, I managed to compile it at last. There seems indeed to be an important overhead due to native compilation. I can try to submit a patch working with OCaml 4.06+ so that you can check whether things get noticeably better.

@ppedrot
Copy link
Contributor

ppedrot commented Jul 16, 2018

@JasonGross could you try timing with native-fast-is-accu+linscan?

@ppedrot
Copy link
Contributor

ppedrot commented Jul 16, 2018

Forgot to mention but you need OCaml >= 4.06.0 to take advantage of this.

@JasonGross
Copy link
Collaborator Author

@ppedrot Did you forget to git submodule update --init? (re coqprime issues)

Building now with that branch, output of configure is

You have OCaml 4.06.0. Good!
You have OCamlfind 1.7.3. Good!
You have Camlp5 7.05. Good!
You have native-code compilation. Good!
You have the Num library installed. Good!
CoqIde manually disabled:
=> no CoqIde will be built.

  Architecture                : Linux
  Coq VM bytecode link flags  : -dllib -lcoqrun -dllpath /home/jgross/Documents/repos/coq-native-fast-is-accu+linscan/kernel/byterun
  Other bytecode link flags   :
  OCaml version               : 4.06.0
  OCaml binaries in           : /home/jgross/.opam/4.06.0+fp/bin/
  OCaml library in            : /home/jgross/.opam/4.06.0+fp/lib/ocaml
  OCaml flambda flags         :
  Camlp5 version              : 7.05
  Camlp5 binaries in          : /home/jgross/.opam/4.06.0+fp/bin
  Camlp5 library in           : /home/jgross/.opam/4.06.0+fp/lib/camlp5
  Native dynamic link support : true
  CoqIde                      : no
  Documentation               : None
  Web browser                 : firefox -remote "OpenURL(%s,new-tab)" || firefox %s &
  Coq web site                : http://coq.inria.fr/
  Bytecode VM enabled         : true
  Native Compiler enabled     : true

  Local build, no installation...

If anything is wrong above, please restart './configure'.

*Warning* To compile the system for a new architecture
          don't forget to do a 'make clean' before './configure'.

@ppedrot
Copy link
Contributor

ppedrot commented Jul 16, 2018

@JasonGross Yes, my submodules were in some crappy state. I never quite understood how this whole mess is supposed to work, they're really a wart in my git mental model.

The configure output looks OK, go ahead.

@JasonGross
Copy link
Collaborator Author

@ppedrot It fails with

ocamlopt.opt: unknown option '-Oclassic'.
Usage: ocamlopt <options> <files>
Options are:
  -fPIC  Generate position-independent machine code (default)
  -fno-PIC  Generate position-dependent machine code
  -a  Build a library
  -absname  Show absolute filenames in error messages
  -annot  Save information in <filename>.annot
  -bin-annot  Save typedtree in <filename>.cmt
  -c  Compile only (do not link)
  -cc <command>  Use <command> as the C compiler and linker
  -cclib <opt>  Pass option <opt> to the C linker
  -ccopt <opt>  Pass option <opt> to the C compiler and linker
  -compact  Optimize code size rather than speed
  -config  Print configuration values and exit
  -dtypes  (deprecated) same as -annot
  -for-pack <ident>  Generate code that can later be `packed' with
     ocamlopt -pack -o <ident>.cmx
  -g  Record debugging information for exception backtrace
  -i  Print inferred interface
  -I <dir>  Add <dir> to the list of include directories
  -impl <file>  Compile <file> as a .ml file
  -inline <n>  Set aggressiveness of inlining to <n>
  -intf <file>  Compile <file> as a .mli file
  -intf-suffix <string>  Suffix for interface files (default: .mli)
  -keep-docs  Keep documentation strings in .cmi files
  -keep-locs  Keep locations in .cmi files
  -labels  Use commuting label mode
  -linkall  Link all modules, even unused ones
  -no-alias-deps  Do not record dependencies for module aliases
  -no-app-funct  Deactivate applicative functors
  -no-float-const-prop  Deactivate constant propagation for floating-point operations
  -noassert  Do not compile assertion checks
  -noautolink  Do not automatically link C libraries specified in .cmxa files
  -nodynlink  Enable optimizations for code that will not be dynlinked
  -nolabels  Ignore non-optional labels in types
  -nostdlib  Do not add default directory to the list of include directories
  -o <file>  Set output file name to <file>
  -open <module>  Opens the module <module> before typing
  -output-obj  Output an object file instead of an executable
  -output-complete-obj  Output an object file, including runtime, instead of an executable
  -p  Compile and link with profiling support for "gprof"
     (not supported on all platforms)
  -pack  Package the given .cmx files into one .cmx
  -pp <command>  Pipe sources through preprocessor <command>
  -ppx <command>  Pipe abstract syntax trees through preprocessor <command>
  -principal  Check principality of type inference
  -rectypes  Allow arbitrary recursive types
  -runtime-variant <str>  Use the <str> variant of the run-time system
  -S  Keep intermediate assembly file
  -safe-string  Make strings immutable
  -shared  Produce a dynlinkable plugin
  -short-paths  Shorten paths in types
  -strict-sequence  Left-hand part of a sequence must have type unit
  -strict-formats  Reject invalid formats accepted by legacy implementations
     (Warning: Invalid formats may behave differently from
      previous OCaml versions, and will become always-rejected
      in future OCaml versions. You should use this flag
      to detect and fix invalid formats.)
  -thread  Generate code that supports the system threads library
  -unsafe  Do not compile bounds checking on array and string access
  -unsafe-string  Make strings mutable (default)
  -v  Print compiler version and location of standard library and exit
  -verbose  Print calls to external commands
  -version  Print version and exit
  -vnum  Print version number and exit
  -w <list>  Enable or disable warnings according to <list>:
        +<spec>   enable warnings in <spec>
        -<spec>   disable warnings in <spec>
        @<spec>   enable warnings in <spec> and treat them as errors
     <spec> can be:
        <num>             a single warning number
        <num1>..<num2>    a range of consecutive warning numbers
        <letter>          a predefined set
     default setting is "+a-4-6-7-9-27-29-32..39-41..42-44-45-48-50"
  -warn-error <list>  Enable or disable error status for warnings according
     to <list>.  See option -w for the syntax of <list>.
     Default setting is "-a"
  -warn-help  Show description of warning numbers
  -where  Print location of standard library and exit
  - <file>  Treat <file> as a file name (even if it starts with `-')
  -nopervasives  (undocumented)
  -dsource  (undocumented)
  -dparsetree  (undocumented)
  -dtypedtree  (undocumented)
  -drawlambda  (undocumented)
  -dlambda  (undocumented)
  -dclambda  (undocumented)
  -dcmm  (undocumented)
  -dsel  (undocumented)
  -dcombine  (undocumented)
  -dcse  (undocumented)
  -dlive  (undocumented)
  -dspill  (undocumented)
  -dsplit  (undocumented)
  -dinterf  (undocumented)
  -dprefer  (undocumented)
  -dalloc  (undocumented)
  -dreload  (undocumented)
  -dscheduling  (undocumented)
  -dlinear  (undocumented)
  -dstartup  (undocumented)
  -opaque  Does not generate cross-module optimization information
     (reduces necessary recompilation on module change)
  -help  Display this list of options
  --help  Display this list of options
File "./src/Experiments/NewPipeline/Toplevel2.v", line 92, characters 9-44:
Warning: Native compiler exited with status 2
[native-compiler-failed,native-compiler]
Finished failing transaction in 1.465 secs (1.38u,0.047s) (failure)
File "./src/Experiments/NewPipeline/Toplevel2.v", line 92, characters 9-44:
Error: Anomaly "Compilation failure."
Please report at http://coq.inria.fr/bugs/.

Command exited with non-zero status 129
src/Experiments/NewPipeline/Toplevel2 (real: 2.74, user: 2.37, sys: 0.36, mem: 658172 ko)
Makefile.coq:656: recipe for target 'src/Experiments/NewPipeline/Toplevel2.vo' failed
make: *** [src/Experiments/NewPipeline/Toplevel2.vo] Error 129

@JasonGross
Copy link
Collaborator Author

Oops, I forgot to eval `opam config env`.

@ppedrot I'm not sure how much of a fair comparison this is, given that I have the linscan branch on 4.06.0+fp, and everything else on 4.02.3, but there are some things that got better and other things that got worse, and overall it's a wash.

linscan   | File Name                                          | 8.8       || Change    | % Change
--------------------------------------------------------------------------------------------------
15m43.54s | Total                                              | 15m44.56s || -0m01.01s | -0.10%  
--------------------------------------------------------------------------------------------------
1m52.82s  | Experiments/NewPipeline/Toplevel2                  | 2m14.24s  || -0m21.42s | -15.95% 
12m06.26s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 11m49.08s || +0m17.17s | +2.42%  
1m41.45s  | Experiments/NewPipeline/Toplevel1                  | 1m37.61s  || +0m03.84s | +3.93%  
0m01.12s  | Experiments/NewPipeline/CLI                        | 0m01.31s  || -0m00.18s | -14.50% 
0m01.00s  | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.12s  || -0m00.12s | -10.71% 
0m00.90s  | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.20s  || -0m00.29s | -24.99% 

@maximedenes
Copy link
Contributor

I have the linscan branch on 4.06.0+fp

fp can have significant cost, so it would be nice to have a more fair comparison (4.07.0 with linscan vs 4.07.0 without, for example).

@ppedrot
Copy link
Contributor

ppedrot commented Jul 17, 2018

@maximedenes Really? I though fp essentially made one register unavailable. So at worst it should be on par with the linscan flag.

@maximedenes
Copy link
Contributor

Really? I though fp essentially made one register unavailable. So at worst it should be on par with the linscan flag.

Yeah, but they add. And things are not linear there, probably spilling vs not spilling at all matters, also not all registers play the same role in the instruction set, etc.

So it would be safer to make a more direct comparison.

@ppedrot
Copy link
Contributor

ppedrot commented Jul 18, 2018

@JasonGross Not sure if my correct, but the commented first example marked too long in SlowPrimeSynthesisExamples is not slow because of vm/native reduction but because it's taking all of its time in kernel reduction. I suspect that this is due to some fancy semantics of Ltac, as I have discovered that the following 2-line patch makes the script reach native compilation instantly (and then run out of memory while computing):

diff --git a/src/Experiments/NewPipeline/Toplevel1.v b/src/Experiments/NewPipeline/Toplevel1.v
index dde6bd6c4..9dd4e23f6 100644
--- a/src/Experiments/NewPipeline/Toplevel1.v
+++ b/src/Experiments/NewPipeline/Toplevel1.v
@@ -1563,7 +1563,7 @@ Module Import UnsaturatedSolinas.
 End UnsaturatedSolinas.
 
 Ltac peel_interp_app _ :=
-  lazymatch goal with
+  [> lazymatch goal with
   | [ |- ?R' (?InterpE ?arg) (?f ?arg) ]
     => apply fg_equal_rel; [ | reflexivity ];
        try peel_interp_app ()
@@ -1585,7 +1585,7 @@ Ltac peel_interp_app _ :=
                  => let rc := constr:(GallinaReify.Reify c) in
                     unify ev rc; reflexivity
                end ] ]
-  end.
+  end ].
 Ltac pre_cache_reify _ :=
   cbv [type.app_curried];
   let arg := fresh "arg" in

Is the patch correct? If so, maybe there is something to find out.

@JasonGross
Copy link
Collaborator Author

That patch looks correct. It shouldn't change behavior except in cases where the new version errors with an incorrect number of goals, so if that's not happening, it seems very sketchy that it should change performance. Can you figure out what's going on?

@JasonGross
Copy link
Collaborator Author

For historians: Turns out the confusion here was that Print Ltac prints constr:(ltac:(...)) as _. And it was a reflexivity after unify that was taking forever.

JasonGross added a commit that referenced this pull request Jul 18, 2018
With some help from @ppedrot in tracking things down.  C.f.
#394 (comment)

After    | File Name                                          | Before   || Change    | % Change
------------------------------------------------------------------------------------------------
8m48.43s | Total                                              | 9m07.60s || -0m19.16s | -3.50%
------------------------------------------------------------------------------------------------
5m35.98s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 5m45.63s || -0m09.64s | -2.79%
1m29.20s | Experiments/NewPipeline/Toplevel2                  | 1m38.67s || -0m09.46s | -9.59%
1m39.78s | Experiments/NewPipeline/Toplevel1                  | 1m39.77s || +0m00.00s | +0.01%
0m01.18s | Experiments/NewPipeline/CLI                        | 0m01.31s || -0m00.13s | -9.92%
0m01.16s | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.09s || +0m00.06s | +6.42%
0m01.14s | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.14s || +0m00.00s | +0.00%
This should hopefully speed things up.  See also coq/coq#8055.

Before thunking nat_rect:

After     | File Name                                          | Before    || Change    | % Change
--------------------------------------------------------------------------------------------------
15m44.56s | Total                                              | 10m54.17s || +4m50.38s | +44.39%
--------------------------------------------------------------------------------------------------
11m49.08s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 6m50.36s  || +4m58.72s | +72.79%
2m14.24s  | Experiments/NewPipeline/Toplevel2                  | 2m23.03s  || -0m08.78s | -6.14%
1m37.61s  | Experiments/NewPipeline/Toplevel1                  | 1m37.18s  || +0m00.42s | +0.44%
0m01.31s  | Experiments/NewPipeline/CLI                        | 0m01.30s  || +0m00.01s | +0.76%
0m01.20s  | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.20s  || +0m00.00s | +0.00%
0m01.12s  | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.10s  || +0m00.02s | +1.81%

After thunking nat_rect

After     | File Name                                          | Before   || Change    | % Change
-------------------------------------------------------------------------------------------------
14m43.99s | Total                                              | 8m48.75s || +5m55.24s | +67.18%
-------------------------------------------------------------------------------------------------
11m14.49s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 5m35.50s || +5m38.99s | +101.04%
1m46.30s  | Experiments/NewPipeline/Toplevel2                  | 1m29.24s || +0m17.06s | +19.11%
1m39.51s  | Experiments/NewPipeline/Toplevel1                  | 1m40.40s || -0m00.89s | -0.88%
0m01.28s  | Experiments/NewPipeline/CLI                        | 0m01.22s || +0m00.06s | +4.91%
0m01.21s  | Experiments/NewPipeline/StandaloneHaskellMain      | 0m01.18s || +0m00.03s | +2.54%
0m01.20s  | Experiments/NewPipeline/StandaloneOCamlMain        | 0m01.21s || -0m00.01s | -0.82%
@ppedrot
Copy link
Contributor

ppedrot commented Jul 19, 2018

I have reported an algorithmic bug in the OCaml compiler that affects native_compute in fiat-crypto: https://caml.inria.fr/mantis/view.php?id=7826.

@JasonGross
Copy link
Collaborator Author

Thanks! I guess this is why it takes forever (a couple of minutes) to compile the extracted code. I wonder if Haskell has a similar issue; it takes ghc longer than it takes ocamlopt.

@idkvpz
Copy link

idkvpz commented Dec 26, 2018

this is helpful

@JasonGross
Copy link
Collaborator Author

This is old and operates on outdated code, so I'm going to close it

@JasonGross JasonGross closed this Apr 24, 2020
@JasonGross JasonGross deleted the use-native-compute branch April 24, 2020 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants