Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile non-speed-critical tools to bytecode only #11993

Merged
merged 3 commits into from Feb 22, 2023

Conversation

xavierleroy
Copy link
Contributor

Currently, when native-code compilation is supported, the following auxiliary tools are compiled both to bytecode and to native-code, and both binaries are installed in $(PREFIX)/bin:

ocamldep ocamlcmt ocamlprof ocamlcp ocamlmklib ocamlmktop ocamlobjinfo

This PR proposes to not compile to native-code the following tools, and to install only a bytecode executable:

ocamlcmt ocamlprof ocamlcp ocamlmklib ocamlmktop ocamlobjinfo

ocamldep is still compiled to native-code whenever possible, as this tool is heavily used and execution speed matters. The other tools are rarely used and not speed-critical (to the best of my knowledge), so compilation to native-code is not warranted.

(I'm not 100% sure about ocamlcmt; please contradict me if there are speed-critical uses.)

The intent of this change is to reduce the size of OCaml installations further, now that #11981 is merged. On x86-64 linux:

  • whole installation: 385M currently, 336M with this PR (-13%)
  • bin subdirectory: 165M currently, 120M with this PR (-27%)

CC: @shindere

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 5, 2023

ocamlobjinfo can be used to discover dependencies of binary objects which I do in various build settings, I'd rather have it treated like ocamldep.

@xavierleroy
Copy link
Contributor Author

My guess is that ocamlobjinfo spends most of its time in I/O and input_value and printf, so native-code compilation should not speed it significantly. Can you try to measure and compare running times? There should be ocamlobjinfo.byte and ocamlobjinfo.opt in your bin/ directory.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 5, 2023

Native code (total 51ms):

> brzo log | grep ocamlobjinfo
[019:spawn 11.5ms exec-build e:85b4a82ad6807c0a] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/poke.cmx' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/brzo.ocaml.linkdep'][0]
[021:spawn 12.1ms ocaml.mod_resolver e:0561bb5a3678077e] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '…/4.14.0/lib/brr/brr.cmxa' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/brr-3c8c1c9c2857053d.cmxa.info'][0]
[024:spawn 11.2ms ocaml.mod_resolver e:44c6a41f69417cbb] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '…/4.14.0/lib/js_of_ocaml-compiler/runtime/jsoo_runtime.cmxa' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/runtime-5eeea1b9768745ce.cmxa.info'][0]
[026:spawn 16.3ms ocaml.mod_resolver e:4dc6384c080889eb] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '…/4.14.0/lib/ocaml/dynlink.cmxa' '…/4.14.0/lib/ocaml/unix.cmxa' '…/4.14.0/lib/ocaml/bigarray.cmxa' '…/4.14.0/lib/ocaml/stdlib.cmxa' '…/4.14.0/lib/ocaml/str.cmxa' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/ocaml-d7813c8b723ad37c.cmxa.info'][0]

Byte code (total: 80.3ms)

> brzo log | grep ocamlobjinfo        
[019:spawn 17.9ms exec-build e:de54b3a2511de238] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/poke.cmx' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/brzo.ocaml.linkdep'][0]
[021:spawn 18.9ms ocaml.mod_resolver e:da880a0484cfd64a] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '…/4.14.0/lib/brr/brr.cmxa' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/brr-3c8c1c9c2857053d.cmxa.info'][0]
[024:spawn 21ms ocaml.mod_resolver e:49550d0ce03eb0d2] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '…/4.14.0/lib/js_of_ocaml-compiler/runtime/jsoo_runtime.cmxa' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/runtime-5eeea1b9768745ce.cmxa.info'][0]
[026:spawn 22.5ms ocaml.mod_resolver e:c6ca7f13fca89746] ['…/4.14.0/bin/ocamlobjinfo' '-no-approx' '-no-code' '…/4.14.0/lib/ocaml/dynlink.cmxa' '…/4.14.0/lib/ocaml/unix.cmxa' '…/4.14.0/lib/ocaml/bigarray.cmxa' '…/4.14.0/lib/ocaml/stdlib.cmxa' '…/4.14.0/lib/ocaml/str.cmxa' > '/private/tmp/brzo/_b0/brzo/ocaml-exec-native/ocaml-d7813c8b723ad37c.cmxa.info'][0]

So in this case native code takes 64% of the time of byte code.

@nojb
Copy link
Contributor

nojb commented Feb 5, 2023

ocamldep is still compiled to native-code whenever possible, as this tool is heavily used and execution speed matters.

I'm not suggesting we do this necessarily, but just to mention it: ocamldep is the same as ocamlc -depend, and it has been suggested in the past to install ocamldep as a link to ocamlc and have ocamlc act as ocamldep if Sys.argv.(0) matches ocamldep.

@xavierleroy
Copy link
Contributor Author

So in this case native code takes 64% of the time of byte code.

That's typical of programs that spend a lot of time in the runtime system. Computation-intensive programs are 2 to 10 times faster in native code.

At any rate, we're seeing 10 ms gains in execution time. Is this worth the extra disk space used for the installation? Hard to know.

@xavierleroy
Copy link
Contributor Author

ocamldep is the same as ocamlc -depend, and it has been suggested in the past to install ocamldep as a link to ocamlc and have ocamlc act as ocamldep if Sys.argv.(0) matches ocamldep.

I thought this trick was discouraged e.g. in GNU coding guidelines, but I can't find a reference. Also, does it work under Windows?

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 6, 2023

At any rate, we're seeing 10 ms gains in execution time. Is this worth the extra disk space used for the installation? Hard to know.

Some people say disk is cheap and others like their builds to be as fast as possible. Both viewpoints seem unreasonable. In any case I'm not sure I care about these 10ms, you only get them on cold builds – after that they are cached away.

I also suspect I'm the only person who actually uses ocamlobjinfo this way in builds. But it's more stable than devising my own via compiler-libs which can break from versions to versions.

That being said personally I'm not very fond of the mix of byte/native install this PR proposes but I can't exactly pin point what disturbs me, so why not.

@alainfrisch
Copy link
Contributor

ocamldep is still compiled to native-code whenever possible, as this tool is heavily used and execution speed matters.

I'm not suggesting we do this necessarily, but just to mention it: ocamldep is the same as ocamlc -depend, and it has been suggested in the past to install ocamldep as a link to ocamlc and have ocamlc act as ocamldep if Sys.argv.(0) matches ocamldep.

Why don't we just stop shipping ocamldep altogether and tell people to use ocamlc -depend? Is it a backward compatibility concern / support for multiple versions of OCaml? (Not shipping ocamldep.opt anymore can also break existing build systems!) The transition could be softened by instructing dune to use either ocamldep or ocaml -depend depending on the version of OCaml; and/or providing an OPAM package that installs an actual ocamldep.

Related : what about doing the same for ocamllex? This would also remove one binary from boot/ (and reduce the "cost" of bootstrapping in terms of increase in the repository size).

@shindere
Copy link
Contributor

shindere commented Feb 7, 2023 via email

@shindere
Copy link
Contributor

shindere commented Feb 7, 2023 via email

@alainfrisch
Copy link
Contributor

Here I do not understand, though. Are you suggesting to embed the lexer in the compiler?

Yes, I was suggesting to embed ocamllex as a sub-command of ocamlc, like we did for ocamldep. I understand this is more controversial, since ocamllex is "less related" to the compiler than ocamldep. But in practice, ocamllex seems to be here to stay, and it induces some storage overhead in the source repository (as a bytecode in boot/) and in installations; so why not?

@xavierleroy
Copy link
Contributor Author

I feel there's too much "whataboutism" in the comments and not enough reviewing of the actual PR.

  • What about replacing ocamldep with ocamlc -depend? Well, it's not the purpose of this PR, and personally I'm not in favor (this would cause huge amounts of breakage).
  • What about merging ocamllex in ocamlc? Again, it's not the purpose of this PR, and personally I'm very much not in favor. (What's next? Merge opam, dune, and ocamlc?)
  • What about controlling all this with configure flags? Well, there are too many configure flags already, most of them not documented, most of them never exercised (I'm afraid). Plus, it's our job to set reasonable defaults that will be acceptable to all users, rather than letting them opt in or out to fix our inadequate default choices.
  • What about fixing the -linkall mode on compilerlibs? Believe me, I tried, but the 5 or so modules that need -linkall drag in 80 to 85% of the code, so we don't save enough code space. The proper fix would be to introduce a notion of "group of compilation units" (which must always be linked together) in the linker and the librarian, but it means finding an appropriate command-line interface for the new feature, so we're getting into RFC territory.

This PR is a low-hanging fruit. With minor Makefile hacking (nothing worse than what @shindere has been feeding us with recently) we cut installation size by 13% without any user-visible change (except @dbuenzli losing a few tens of milliseconds in his uses of ocamlobjinfo). What about reviewing this PR instead of whatabouting?

@lpw25
Copy link
Contributor

lpw25 commented Feb 7, 2023

I also suspect I'm the only person who actually uses ocamlobjinfo this way in builds.

Our build rules rely on ocamlobjinfo to work out which transitive dependency libraries are exposed in interfaces, so we also rely on ocamlobjinfo in the hot path of our builds. Not sure whether using bytecode would be observable in our build times or not.

@xavierleroy
Copy link
Contributor Author

Our build rules rely on ocamlobjinfo to work out which transitive dependency libraries are exposed in interfaces, so we also rely on ocamlobjinfo in the hot path of our builds.

Thanks for the data point. I'm OK with keeping a native-code build of ocamlobjinfo if you and @dbuenzli think it's safer this way.

@kit-ty-kate
Copy link
Member

kit-ty-kate commented Feb 8, 2023

I’m not sure I understand the train of thought behind this change. If we want to make the installation smaller, why not just remove all the .byte suffixed binaries and just ship all binaries with the best backend possible (native if available and byte if not) as it is currently?

I have personally never seen those .byte suffixed binaries being used anywhere in opam so that’s why I’m rather surprised to see the solution where the compiler would be shipping less efficiant ocaml binaries on purpose, instead of the other way around.

Furthermore the bytecode binaries seem to be substencially bigger than the native ones so it would be even more of a plus.

-rwxr-xr-x  1 kit_ty_kate  staff   8.4M 19 Dec 13:01 /Users/kit_ty_kate/.opam/default/bin/ocamlprof.byte
-rwxr-xr-x  1 kit_ty_kate  staff   3.2M 19 Dec 13:02 /Users/kit_ty_kate/.opam/default/bin/ocamlprof.opt

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 8, 2023

On my current local install I also have 50Mo of .opt and 107Mo of .byte but I guess this PR only make sense once you take #11981 into account.

@xavierleroy
Copy link
Contributor Author

Yes, this PR favors bytecode executables over native-code executables for non-speed-critical commands because, now that #11981 is merged, bytecode executables are consistently smaller than native-code executables.

  • What about "remov[ing] all the .byte suffixed binaries and just ship[ping] all binaries with the best backend possible (native if available and byte if not) as it is currently", as you wrote? I would not mind getting rid of dual .byte/.opt installations, but this is the topic for a different PR, and I would still prefer to use bytecode for non-speed-critical commands.

Patiently waiting for the next "what about" comment...

@gasche
Copy link
Member

gasche commented Feb 18, 2023

I would be in favor of moving forward with the proposed change if ocamlobjinfo remains available in native code -- following the feedback of @dbuenzli and @lpw25.

Makefile Outdated
TOOLS_TO_INSTALL = \
ocamldep ocamlprof ocamlcp ocamlmklib ocamlmktop ocamlobjinfo
# Tools to be compiled to native and bytecode, then installed
TOOLS_TO_INSTALL_NAT = ocamldep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the convention that _NAT means "both in native and bytecode" confusing. Could we:

  • not use a prefix in this case, stick to TOOLS_TO_INSTALL, OCAML_PROGRAMS etc. (the _BYT would clearly mean "byte only")
  • or use _BYT_NAT or _BYT_AND_NAT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel you're over-thinking it. I found it pretty clear and visually appealing to split X into X_NAT and X_BYT.

for i in $(TOOLS_TO_INSTALL_BYT); \
do \
$(INSTALL_PROG) "tools/$$i$(EXE)" "$(INSTALL_BINDIR)";\
done
Copy link
Member

@gasche gasche Feb 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when INSTALL_BYTECODE_PROGRAMS is true, the layout we get is that foo.byte and foo.opt are available, and foo is a symbolic link to foo.opt. For bytecode-only programs I would expect to have foo.byte available, and foo as a symbolic link to foo.byte.

If I understand correctly, this is not what this code does, it looks like it just installs foo and not foo.byte. Should we change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The .byte / .opt / symbolic link dance is intended to let users choose between the two implementations if they must, and pick the best implementation by default otherwise. If there's only one implementation, there's no choice to be made. Plus, there's @kit-ty-kate 's suggestion above to get rid of all this symbolic link nonsense, and I'm in favor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, if users for some reason wanted to use a bytecode executable instead of its native counterpart, they would enable INSTALL_BYTECODE_PROGRAMS and then explicitly invoke foo.byte. Unless I am missing something, your change breaks this use-case by making foo.byte unavailable for some tools. (But then: maybe no one was doing that, and especially for those tools nobody cares about in practice.)

Independently of this backward-compatibility aspect, I don't see the problem with having a symlink. (My Linux box is full of symlinks all over, typically many of the /usr/lib/*.so files are symlinks.) It arguably even adds discoverability, if I do ls -l .../bin/ocamlc I can immediately tell that I have the native version on my system.

I also didn't interpret Kate's comment #11993 (comment) to be complaining about symlinks, I read it as suggesting that we install at most one version of each tool, with the absence of symlinks a side-effect of that (or in fact we could keep symlinks if we really wanted, but I agree that in this case it makes less sense).

To summarize: my opinion is still that having the symlinks would be nicer, and I think that they help with compatibility in INSTALL_BYTECODE_PROGRAMS mode. It would probably be okay (not my preference but okay) to do without the links outside the INSTALL_BYTECODE_PROGRAMS mode. Then we could discuss (in a separate PR?) disabling this mode by default, and we would get closer to your ideal world (with a configure option still working for .byte aficionados).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by your argument. For these rarely-used, non-performance-critical tools, I just want to go back to what we did in OCaml 4.02 and earlier. namely compilation to bytecode only and installation under the final name (no symlink to .byte). So, we will agree to disagree here and try to move on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if you insist, that works for me.

tools/ocamlcmt.opt$(EXE) "$(INSTALL_BINDIR)/ocamlcmt$(EXE)"; \
else \
$(INSTALL_PROG) tools/ocamlcmt$(EXE) "$(INSTALL_BINDIR)"; \
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(removing this ad-hoc logic is a nice simplification)

@xavierleroy
Copy link
Contributor Author

I would be in favor of moving forward with the proposed change if ocamlobjinfo remains available in native code

Thank you my good sir. Done in edc4193 .

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go with a Changes entry (we should let users know about the change).

There is an observable change in the installation layout of the tools affected by this change, even their bytecode version, which I think is un-necessary. But then we believe that these tools are almost never used, and probably not explicitly in their .btye version, so this is probably okay.

@gasche gasche merged commit 39a6e64 into ocaml:trunk Feb 22, 2023
@xavierleroy xavierleroy deleted the install-fewer-opt-progs branch February 28, 2023 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants