Dramatic performance increase when compiling with annotations #6030

vicuna · 2013-06-04T12:50:27Z

Original bug ID: 6030
Reporter: gmelquiond
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2015-12-11T18:19:29Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.00.1
Fixed in version: 4.01.0+dev
Category: typing

Bug description

When compiling the core library of Why3 (lib/why3/why3.cma) on my computer, it takes about 28 seconds. What is interesting is the ratio between user time and system time: actual compilation takes 7 seconds, while the kernel wastes 21 seconds outputting files.

A look at the top four entries of the kernel profile shows:

49,56% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
7,17% ocamlc.opt [kernel.kallsyms] [k] memcpy
4,08% ocamlc.opt [kernel.kallsyms] [k] crypto_xor
1,82% ocamlc.opt [kernel.kallsyms] [k] crypto_cbc_encrypt

(As you can guess from the profile, the disk is encrypted.)

If we now take a look at the compilation of a single file, e.g. src/core/term.cmo, strace shows that ocamlc does 47,000 writes to src/core/term.annot. In some cases, it performs one system call per two characters!

The culprits are the print_ident_annot and print_info functions from typing/stype.ml. They contain format strings such as "@.)@.", which are definitely a recipe for disaster.

With the attached patch, overall compilation speed is multiplied by 3.5, system time is divided by 12, and the kernel profile now looks a lot saner:

6,60% ocamlc.opt ocamlc.opt [.] mark_slice
5,21% ocamlc.opt ocamlc.opt [.] caml_page_table_lookup
3,74% ocamlc.opt ocamlc.opt [.] caml_oldify_one
2,48% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt

Please apply.

File attachments

stypes.patch

vicuna · 2013-06-04T12:55:17Z

Comment author: @alainfrisch

You should try the trunk version. It has already been optimized a lot, after observing (as you did) how slow the previous version was.

Commits:

http://caml.inria.fr/cgi-bin/viewvc.cgi?view=revision&revision=13070

http://caml.inria.fr/cgi-bin/viewvc.cgi?view=revision&revision=13071

vicuna · 2013-06-04T12:57:06Z

Comment author: @gasche

Alain has already made those changes in trunk, commits 13070 and 13071. Can you confirm that compiling trunk as-is solves your problem?

vicuna · 2013-06-04T13:20:18Z

Comment author: gmelquiond

This is a bit better with trunk, but nowhere near as good as with my patch. The profile with trunk looks like:

35,36% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
4,84% ocamlc.opt [kernel.kallsyms] [k] memcpy
3,80% ocamlc.opt ocamlc.opt [.] mark_slice
2,98% ocamlc.opt [kernel.kallsyms] [k] crypto_xor

System time is at 8 seconds, while total time is at 14 seconds. A decent compiler should never waste 60% of its time doing I/O. (With my patch, system time is at 1.5 seconds.)

For instance, if we take a look again at the compilation of src/core/term.cmo in Why3, we see that:

ocaml-4.00.01 performs 47,000 calls to write,
ocaml-trunk performs 15,000 calls to write,
my patch performs only 18 calls to write.

vicuna · 2013-06-04T13:33:24Z

Comment author: @gasche

The only difference I found (by code inspection) that may be relevant to performances is the fact that Alain's code has a flush in the [Ti_expr] case of [print_info], after the opening parenthesis, while Guillaume's patch only uses "@\n" (newline with no flush). If I can validate this hypothesis on some long file, we can validate the change.

vicuna · 2013-06-04T13:40:01Z

Comment author: @alainfrisch

I'm working on it.

vicuna · 2013-06-04T13:48:23Z

Comment author: @alainfrisch

Commit 13742 on trunk should fix the problem. When compiling typing/typecore.cmo (with -annot), the number of writes is reduced from 29494 to 39.

Can you try it and report how it affects the performances on your example?

vicuna · 2013-06-04T13:48:24Z

Comment author: gmelquiond

Note that this is not the only flush in Alain's code; there is also one hidden behind pp_print_newline. Also, I don't know the internals of Format, but the fact that there is no final flush for annotations in trunk worries me a bit. Could annotations be abruptly truncated once the compiler terminates? (If not, then the explicit flush I put in my patch is redundant.)

vicuna · 2013-06-04T13:51:40Z

Comment author: @alainfrisch

Could annotations be abruptly truncated once the compiler terminates?

Annotations are collected and all dumped at once (Stypes.dump), so I don't see how this could happen, except if you kill the process explicitly.

vicuna · 2013-06-04T13:57:40Z

Comment author: @alainfrisch

Note that this is not the only flush in Alain's code; there is also one hidden behind pp_print_newline.

The commit also get rid of this one. All uses of Format are now on Format.str_formatter (no syscall).

vicuna · 2013-06-04T14:00:58Z

Comment author: gmelquiond

I understand that they are all dumped at once, but I don't see where the final half-filled buffer is flushed to disk in the code.

Anyway, as far as I am concerned, the number of system calls for writing annotations is now optimal, so the bug can be marked as fixed.

vicuna · 2013-06-04T15:10:08Z

Comment author: @alainfrisch

I understand that they are all dumped at once, but I don't see where the final half-filled buffer is flushed to disk in the code.

All open out_channels are flushed when the process exits (Pervasives.flush_all is the default exit_function in Pervasives).

vicuna · 2013-06-04T15:16:38Z

Comment author: @alainfrisch

Guillaume, thanks for the detailed report and quick feedback.

vicuna · 2013-06-04T15:38:37Z

Comment author: @alainfrisch

Actually, it would be useful to compare the total time between the trunk and your patch (which is much less invasive that my attempt to reduce the use of Format and of format strings).

vicuna · 2013-06-04T16:44:16Z

Comment author: gmelquiond

I have measured again how long it takes to compile why3.cma. There is no doubt about it: your version is so much better than mine. In fact, the huge timing difference on the actual compilation time (20% overhead) kind of frightens me about the performance of Format.

Trunk:

5.30user 1.80system 0:08.72elapsed 81%CPU
5.30user 1.80system 0:08.71elapsed 81%CPU
5.29user 1.78system 0:08.66elapsed 81%CPU
5.25user 1.96system 0:08.84elapsed 81%CPU
5.35user 1.75system 0:08.71elapsed 81%CPU

Trunk + reverting stypes.ml + applying my patch - the final flush:

6.36user 1.79system 0:09.75elapsed 83%CPU
6.34user 1.78system 0:09.72elapsed 83%CPU
6.25user 1.80system 0:09.68elapsed 83%CPU
6.18user 1.92system 0:09.72elapsed 83%CPU
6.34user 1.82system 0:09.77elapsed 83%CPU

vicuna · 2013-06-04T19:24:39Z

Comment author: @alainfrisch

Thanks again for the results. They justify the extra effort put into de-Formatting this piece of code. And indeed, there are reasons to be concerned with the performance of Format and format strings. See:

http://www.lexifi.com/blog/note-about-performance-printf-and-format

Hopefully, #6017 will remove some overhead related to parsing format strings at runtime (but not the overhead related to Format itself, I guess).

vicuna closed this as completed Dec 11, 2015

vicuna added the typing label Mar 14, 2019

vicuna assigned alainfrisch Mar 14, 2019

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatic performance increase when compiling with annotations #6030

Dramatic performance increase when compiling with annotations #6030

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

Dramatic performance increase when compiling with annotations #6030

Dramatic performance increase when compiling with annotations #6030

Comments

vicuna commented Jun 4, 2013

Bug description

File attachments

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013

vicuna commented Jun 4, 2013