-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use native_compute rather than vm_compute #394
Conversation
For some reason, fiat-crypto at f79fdb7 doesn't compile on my machine, and thus I cannot observe the slow native compilation instances. I get an error:
Any idea on how to work around this problem? |
Try |
(It is probably because some line in |
Still not able to compile it. Now it seems there are missing dependency rules regarding coqprime, but I don't see anything to do from reading the diffs or the makefile. |
OK, I managed to compile it at last. There seems indeed to be an important overhead due to native compilation. I can try to submit a patch working with OCaml 4.06+ so that you can check whether things get noticeably better. |
@JasonGross could you try timing with native-fast-is-accu+linscan? |
Forgot to mention but you need OCaml >= 4.06.0 to take advantage of this. |
@ppedrot Did you forget to Building now with that branch, output of configure is
|
@JasonGross Yes, my submodules were in some crappy state. I never quite understood how this whole mess is supposed to work, they're really a wart in my git mental model. The configure output looks OK, go ahead. |
@ppedrot It fails with
|
Oops, I forgot to @ppedrot I'm not sure how much of a fair comparison this is, given that I have the linscan branch on 4.06.0+fp, and everything else on 4.02.3, but there are some things that got better and other things that got worse, and overall it's a wash.
|
|
@maximedenes Really? I though fp essentially made one register unavailable. So at worst it should be on par with the linscan flag. |
Yeah, but they add. And things are not linear there, probably spilling vs not spilling at all matters, also not all registers play the same role in the instruction set, etc. So it would be safer to make a more direct comparison. |
@JasonGross Not sure if my correct, but the commented first example marked too long in SlowPrimeSynthesisExamples is not slow because of vm/native reduction but because it's taking all of its time in kernel reduction. I suspect that this is due to some fancy semantics of Ltac, as I have discovered that the following 2-line patch makes the script reach native compilation instantly (and then run out of memory while computing):
Is the patch correct? If so, maybe there is something to find out. |
That patch looks correct. It shouldn't change behavior except in cases where the new version errors with an incorrect number of goals, so if that's not happening, it seems very sketchy that it should change performance. Can you figure out what's going on? |
For historians: Turns out the confusion here was that |
With some help from @ppedrot in tracking things down. C.f. #394 (comment) After | File Name | Before || Change | % Change ------------------------------------------------------------------------------------------------ 8m48.43s | Total | 9m07.60s || -0m19.16s | -3.50% ------------------------------------------------------------------------------------------------ 5m35.98s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 5m45.63s || -0m09.64s | -2.79% 1m29.20s | Experiments/NewPipeline/Toplevel2 | 1m38.67s || -0m09.46s | -9.59% 1m39.78s | Experiments/NewPipeline/Toplevel1 | 1m39.77s || +0m00.00s | +0.01% 0m01.18s | Experiments/NewPipeline/CLI | 0m01.31s || -0m00.13s | -9.92% 0m01.16s | Experiments/NewPipeline/StandaloneHaskellMain | 0m01.09s || +0m00.06s | +6.42% 0m01.14s | Experiments/NewPipeline/StandaloneOCamlMain | 0m01.14s || +0m00.00s | +0.00%
This should hopefully speed things up. See also coq/coq#8055. Before thunking nat_rect: After | File Name | Before || Change | % Change -------------------------------------------------------------------------------------------------- 15m44.56s | Total | 10m54.17s || +4m50.38s | +44.39% -------------------------------------------------------------------------------------------------- 11m49.08s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 6m50.36s || +4m58.72s | +72.79% 2m14.24s | Experiments/NewPipeline/Toplevel2 | 2m23.03s || -0m08.78s | -6.14% 1m37.61s | Experiments/NewPipeline/Toplevel1 | 1m37.18s || +0m00.42s | +0.44% 0m01.31s | Experiments/NewPipeline/CLI | 0m01.30s || +0m00.01s | +0.76% 0m01.20s | Experiments/NewPipeline/StandaloneOCamlMain | 0m01.20s || +0m00.00s | +0.00% 0m01.12s | Experiments/NewPipeline/StandaloneHaskellMain | 0m01.10s || +0m00.02s | +1.81% After thunking nat_rect After | File Name | Before || Change | % Change ------------------------------------------------------------------------------------------------- 14m43.99s | Total | 8m48.75s || +5m55.24s | +67.18% ------------------------------------------------------------------------------------------------- 11m14.49s | Experiments/NewPipeline/SlowPrimeSynthesisExamples | 5m35.50s || +5m38.99s | +101.04% 1m46.30s | Experiments/NewPipeline/Toplevel2 | 1m29.24s || +0m17.06s | +19.11% 1m39.51s | Experiments/NewPipeline/Toplevel1 | 1m40.40s || -0m00.89s | -0.88% 0m01.28s | Experiments/NewPipeline/CLI | 0m01.22s || +0m00.06s | +4.91% 0m01.21s | Experiments/NewPipeline/StandaloneHaskellMain | 0m01.18s || +0m00.03s | +2.54% 0m01.20s | Experiments/NewPipeline/StandaloneOCamlMain | 0m01.21s || -0m00.01s | -0.82%
f09f397
to
5ae53b1
Compare
I have reported an algorithmic bug in the OCaml compiler that affects native_compute in fiat-crypto: https://caml.inria.fr/mantis/view.php?id=7826. |
Thanks! I guess this is why it takes forever (a couple of minutes) to compile the extracted code. I wonder if Haskell has a similar issue; it takes ghc longer than it takes ocamlopt. |
this is helpful |
This is old and operates on outdated code, so I'm going to close it |
This should hopefully speed things up. See also coq/coq#8055.
Except it doesn't speed things up (maybe because we are frequently computing enormous PHOAS syntax trees?), so we probably shouldn't merge this. cc @ppedrot