-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct the bytecode executable header in ocamlc #12751
Conversation
0480299
to
b521109
Compare
if !Clflags.verbose then | ||
Printf.eprintf "+ %s\n" cmd; | ||
if Sys.command cmd = 0 then | ||
In_channel.with_open_text output_file input_line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want with_open_bin
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reading the output of a command, so morally it's text (not that this path is likely ever hit by a Windows build of the compiler, even with cross-compilation)
23ef6d2
to
1e0b48a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for revisiting this old pile of hacks, and for the detailed write-up!
I read the diff and understand most of it. See below for a suggestion for more comments.
The one part I'm not sure about is the late determination of the path to sh by ocamlc. Hard-coding /bin/sh
would work on 99.99% of Unix systems. But even if you prefer not to hard-code, why not determine the path at configure-time?
- "sh" - use a shebang-style launcher. If sh is needed, determine its | ||
location from [command -p -v sh] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why the shell location needs to be determined so late (when ocamlc is generating a bytecode executable), as opposed to being determined by the configure script and stored in runtime-launch-info as /path/to/bin/sh
.
let find_bin_sh () = | ||
let output_file = Filename.temp_file "caml_bin_sh" "" in | ||
let result = | ||
try | ||
let cmd = | ||
Filename.quote_command ~stdout:output_file "command" ["-p"; "-v"; "sh"] | ||
in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of command -p -v
, but again I was expecting this to take place at configure-time.
configure.ac
Outdated
AC_CONFIG_COMMANDS([shebang], | ||
[printf '%s\n%s\000\n' "$launch_method" "$ocaml_bindir" \ | ||
> stdlib/runtime.info | ||
printf '%s\n%s\000\n' "$target_launch_method" "$target_bindir" \ | ||
> stdlib/target_runtime.info], | ||
[launch_method='$(echo "$launch_method" | sed -e "s/'/&\"&\"&/g")' | ||
target_launch_method=\ | ||
'$(echo "$target_launch_method" | sed -e "s/'/&\"&\"&/g")' | ||
ocaml_bindir='$(echo "$ocaml_bindir" | sed -e "s/'/&\"&\"&/g")' | ||
target_bindir='$(echo "$target_bindir" | sed -e "s/'/&\"&\"&/g")']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You lost me :-) The printf
I understand, but not the sed-scripting. A short comment would help...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, yes - doubly confusing by the use of &
instead of '
which I think came from an earlier version. It's our old friend the escaped quotation mark. I added some comments. Overall it means that ./configure --with-target-bindir="/home/'quoted'/bin"
actually makes it to the substitution files correctly!
Many thanks for having gone thorugh the description and the code Xavier!
|
3ea0fff
to
4361ab4
Compare
Thanks, @xavierleroy! For the location of
I don't mind changing it, although I'd be happy if that bit of over-thinking is humourable 馃榿 |
The merge of #12843 has created a conflict here that now needs to be |
4361ab4
to
00c08ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the code and it looks reasonable to me.
There is only one thing that I do not fully understand from the code and your (otherwise very clear) description, namely which runtime-launch-info
is read by boot/ocamlc
(IIUC, stdlib/runtime-launch-info
?).
@@ -75,6 +75,7 @@ META | |||
|
|||
/boot/ocamlrun | |||
/boot/camlheader | |||
/boot/runtime-launch-info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file doesn鈥檛 seem to ever be generated. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LIBFILES
variable in the root Makefile includes runtime-launch-info
(as $(HEADER_NAME)
) and, along with stdlib.cma
, std_exit.cmo
and all the .cmi files, that gets copied to boot/
from stdlib/
in the coldstart
target.
@@ -2407,7 +2407,7 @@ distclean: clean | |||
rm -f tools/eventlog_metadata tools/*.bak | |||
rm -f utils/config.common.ml utils/config.generated.ml | |||
rm -f compilerlibs/META | |||
rm -f boot/ocamlrun boot/ocamlrun.exe boot/camlheader \ | |||
rm -f boot/ocamlrun boot/ocamlrun.exe boot/$(HEADER_NAME) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in .gitignore
, boot/$(HEADER_NAME)
should be removed here if it is never generated.
Thanks, @OlivierNicole! It is indeed The fact that's not been clear to you says there should be more comments, but I'm not immediately clear what to put where. It does hint at a mild inconsistency that the |
David Allsopp (2024/01/18 15:42 -0800):
Thanks, @olivernicole! It is indeed `stdlib/runtime-launch-info` which gets promoted to `boot/runtime-launch-info` in `coldstart` (which exactly replaces what happens with `stdlib/camlheader` at the moment).
The fact that's not been clear to you says there should be more
comments, but I'm not immediately clear what to put where. It does
hint at a mild inconsistency that the `camlheader` files - which are
morally part of the bytecode linker - are built in stdlib/ at all,
although from a technical perspective it makes sense inasmuch as
`stdlib` and `runtime` have to be processed as part of `coldstart`
_before_ the bytecode compiler itself is built.
I believe we _did_ discuss moving the header files to `runtime/` rather
than leaving them in `stdlib/`, didn't we?
I just can't remember why we endedup not doing it. Was it because we
wanted to keep that for later / keep changes separate?
|
I'm not sure - moving those files is indeed related to #11198, and it's probably better to move the file as part of doing that (i.e. move where it's built at the same time as moving where it's installed). |
David Allsopp (2024/01/19 02:54 -0800):
I'm not sure - moving those files is indeed related to #11198, and
it's probably better to move the file as part of doing that (i.e. move
where it's built at the same time as moving where it's installed).
Ah, I see. You mean that, install-wise, stdlib is a better choice (we
do'nt even have a reasonable alternative, do we?)?
|
#11198 got the otherlibs into subdirectories of |
David Allsopp (2024/01/19 05:45 -0800):
#11198 got the otherlibs into subdirectories of `/usr/local/lib/ocaml`
#... the next stage would to have `/usr/local/lib/ocaml/stdlib` with
#_just_ the standard library and `/usr/local/lib/ocaml/runtime`
#containing all the various libasmrun and libcamlrun files _and_
#camlheader/runtime-launch-info (not at present planned - it's useful,
#but not as useful as shifting unix, dynlink, etc. to subdirectories
#was)
This target definitely sounds nice and pleasing to me!
|
This sounds nice, but keep in mind that while testing the compiler, we often use |
Vincent Laviron (2024/01/19 06:11 -0800):
This sounds nice, but keep in mind that while testing the compiler, we
often use `-nostdlib -I stdlib` and that magically finds the runtime
and the header files. It's not a huge pain to have to also add `-I
runtime`, but it's not a huge gain either.
That seems indeed very useful to keep in mind!
Althogh I'm wondering whether it does make sense to use the same option,
-I, for both OCaml files and other library files.
Perhaps this is an area where relocatable willhelp because then the path
to the runtime files will be relative to the path of the compiler and
will thus no longer need to be specified?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to approve this PR, I just have one last, small incomprehension: shouldn鈥檛 the bootstrap commit add a new file boot/runtime-launch-info
?
Makefile.common
Outdated
&& $(INSTALL_PROG) $(1).tmp $(2) \ | ||
&& rm $(1).tmp | ||
endef # INSTALL_STRIPPED_BYTE_PROG | ||
|
||
HEADER_NAME = runtime-launch-info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were mentioning the possibility to add a brief comment to explain the role of runtime-launch-info
and how it is bootstrapped. I think here would be a good place to put it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha - yes! I have pushed some extra text there.
I want to approve this PR, I just have one last, small incomprehension: shouldn鈥檛 the bootstrap commit add a new file
boot/runtime-launch-info
?
I've clarified that in the comment, but no - boot/runtime-launch-info
is the part which can't be committed (the platform- and configuration-specific information).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo one small suggestion.
bytecomp/bytelink.ml
Outdated
let kind = String.sub buffer 0 (bindir_start - 1) in | ||
if kind = "exe" then | ||
Executable | ||
else if bindir_start > 1 && (kind.[0] = '/' || kind = "sh") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be easier to read if you just tested kind
?
else if bindir_start > 1 && (kind.[0] = '/' || kind = "sh") then | |
else if kind <> "" && (kind.[0] = '/' || kind = "sh") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed! Changed, thanks.
Makefile.config.in
Outdated
@@ -230,6 +221,9 @@ NAKED_POINTERS=false | |||
STDLIB_MANPAGES=@build_libraries_manpages@ | |||
WITH_OCAMLDOC=@with_ocamldoc@ | |||
WITH_OCAMLTEST=@ocamltest@ | |||
# This is still processed in configure, but is not needed by the build system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that this comment might become confusing in the future, it seems better to document the current state rather than the history of this Makefile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the idea was meant to be a reminder that if/when SHEBANGSCRIPTS
is removed, it would also be possible to remove shebangscripts
from configure.ac
, but I see what you mean. I think it best therefore to remove the comment entirely and trust that whoever removes SHEBANGSCRIPTS
to consider re-checking for any other uses of shebangscripts
(and so remove it, if it is truly unused at that point)
00c08ad
to
69ba73f
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This option was present in configure since 4.08.0, but not actually implemented.
Newer glibc headers add __attribute__((warn_unused_result)) to write. Switch to fputs instead for the error messages in the header program.
The various camlheader files are now folded into a single runtime-launch-info file. On all systems, this file now contains the executable stub launcher (from stdlib/header{,nt}.c) as well as the following pieces of configuration information: - Whether shebang scripts are supported - How to find sh if the path to the interpreter is not valid for a shebang - The location of the runtime executables (i.e. the bindir OCaml was configure'd with) Since it no longer knows the name of the runtime (ocamlrun, ocamlrund, etc.), the executable header no longer contains a default runtime path. If the executable header is used, then the RNTM section must always be written to the bytecode image. In passing, tools/ocamlsize is updated to handle this and the display of default runtimes correctly. In refactoring this code, several bugs are fixed. The validity checking of shebangs is now correct in all cases. Previously, if OCaml was configure'd with a prefix longer than 125 characters, the compiler's tools were installed with correct shebang headers, but executables produced by ocamlc after installation had invalid shebang headers. In addition to spaces, newline and tab are now recognised as being invalid characters for a shebang header. Finally, ocamlc no longer assumes that sh resides at /bin/sh but by default uses the correct Posix mechanism of querying `command -p -v sh` to determine the location. Since 4.02.2, the compiler has generated a host and target set of header files, permitting external cross-compilation solutions. The path to ocamlrun on the target system can already be customised by specifying --with-target-bindir=<path>. This support is extended further with --with-target-sh which allows the location of sh on the target system to be specified (for when that differs from the host system on which the compiler is run). Furthermore, specifying --without-target-sh will cause the executable header to be used instead.
Historically, shebang headers were disabled on Cygwin partly because of issues in Cygwin itself and partly because it allowed executables produced by ocamlc to be started outside of a Cygwin shell (i.e. from Windows itself, either at the Command Prompt or from Explorer). The build has long since supported a separate set of headers to use when building the compiler vs for use by ocamlc after installation, so the compiler is now configured to default to shebang scripts for the tools themselves (ocamlc.byte, etc.) while still retaining the previous behaviour for executables produced by ocamlc. The user may override this by configuring --with-target-sh=sh
69ba73f
to
6ddbd7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well.
Thank you, everyone... here goes... |
Construct the bytecode executable header in ocamlc The commits were squashed from the origin PR, the bootstrap can be replayed with: - From the previous commit (dbc2800), run ./configure && make -j coldstart - From this commit, reset the boot compiler with git checkout HEAD~1 -- boot - Then verify repeatability with ./configure && make -j coreall && make -j bootstrap (the second call to ./configure is critical) (cherry picked from commit 733c2c5)
This PR is a prerequisite for the "Relocatable Compiler" project, but the changes here are independent of it. It addresses five currently known bugs (see below) in
#!
("shebang") handling in the bytecode compiler and its implementation considerably simplifiesstdlib/Makefile
, in advance of @shindere merging that into the rootMakefile
.Background: Bytecode Executables
When not using one of its C-based linking modes (
-custom
,-output-complete-exe
, etc.),ocamlc
creates bytecode executables by prepending a launch header to the bytecode image. This header's sole responsibility is to locate the actual OCaml runtime and transfer execution to it. There are three ways in which this can be done:#!
(or "shebang") header is used with the full path to the runtime, e.g.#!/usr/local/bin/ocamlrun
. This is the default on Unix systems (except Cygwin, at least before this PR).exec
the runtime. Presently this is only used with-use-runtime
when the path given is too long or contains a space, for example:stdlib/header{,nt}.c
), which is able both to execute a runtime at an absolute location (i.e./usr/local/bin/ocamlrun
) or to search$PATH
for a runtime (i.e. to search forocamlrun
in$PATH
).At present, the choice between shebang scripts (mechanisms 1 and 2) and executable (mechanism 3) is made at
configure
time ($(SHEBANGSCRIPTS)
and$(LONG_SHEBANGS)
), and the result is written by the build system to the filecamlheader
which is kept in the Standard Library directory. This file is either the compiled executable header or it is the full path to whereocamlrun
will be installed.The runtime variants (
-runtime-variant d
and-runtime-variant i
) are supported by building multiple versions of this file, so there are in fact 3 of them:camlheader
,camlheaderd
andcamlheaderi
.Finally, in order to support the
-use-runtime
option, a different filecamlheader_ur
is created.ocamlc
copies this file and immediately starts the bytecode TOC recorder. It then writes the name of the runtime followed by a newline and then marks theRNTM
section.Now, when shebang headers are supported,
camlheader_ur
is exactly the string#!
. This means thatocamlc
's procedure writes a shebang header, though pointlessly records it in theRNTM
section. When mechanism 3 (the small C program) is in use,camlheader_ur
is exactly the same ascamlheader
. The C program, in addition to knowing how to search$PATH
, is also able to read theRNTM
section of the bytecode image. It reads this data, cunningly converts the newline character whichocamlc
wrote into a nul character, in order to make theRNTM
payload a valid C string, and then proceeds to execute that runtime.For
-use-runtime
only,ocamlc
performs some validation on the runtime path to check if it's valid to use in a shebang line. If it's not, then it elects to write a mechanism 2 header (using/bin/sh
).That gets us to OCaml 4.02.1. In OCaml 4.02.2, in order to assist the iOS and Android cross-compilation projects, an additional set of headers was added:
target_camlheader
,target_camlheaderd
andtarget_camlheaderi
. These are the same as their unprefixed counterparts, except that the directory written to them is$(TARGET_BINDIR)
(which can be overridden when callingmake
) instead of$(BINDIR)
. There is notarget_camlheader_ur
because there are no paths embedded in the file (so it never differs).The 馃悰s
As announced above, the five following bugs are present in this mechanism in OCaml. In decreasing order of severity, they are:
target_camlheader*
files are incorrectly generated, and bytecode executables produced by the installedocamlc
will have invalid shebang lines. (this is a very real bug, originally identified by the Sandmark project; see also Add -with-runtime/-without-runtime options聽#2309 (comment) and compiling ocaml in a folder with space in its name creates a binary with an invalid shebang聽#12709)-use-runtime
which is longer than 125 characters causes ocamlc to generate a corrupt executable since it uses a#!/bin/sh
header.configure.ac
andbytecomp/bytelink.ml
, these have subsequently diverged and are (still) both incorrect. In particular,configure.ac
only checks the length (and, even then, in a conservative check which rejects one otherwise valid possible header) and whilebytecomp/bytelink.ml
checks for space characters, both places fail to check for tabs and newline characters, which are also not permitted in a shebang line. (this issue has been separately reported in Avoid shebang fallback on 128 on linux聽#10724, and is also one reason that Windows CI actually includes a stronger test of strange characters in--prefix
than the Linux/macOS one)stdlib/Makefile
(generating the headers) andbytecomp/bytelink.ml
(processing-use-runtime
) assumesh
resides in/bin/sh
, which is not guaranteed by POSIX (and, indeed, is not the case on some, admittedly obscure, systems)camlheader_ur
is just the string#!
,bytecomp/bytelink.ml
still (unnecessarily) records theRNTM
section. This in itself isn't a bug, but when writing a#!/bin/sh
script version, theRNTM
section incorrectly contains the entire/bin/sh
script, rather than just the name of the runtime.Current implementation
The current implementation of all this goes to some lengths to ensure that it is enough for
bytecomp/bytelink.ml
to copy the header blindly and never actually have to inspect its content. Regardless, the header is a relatively subtle piece of configuration state. The compiler and most of the tools will be compiled withboot/ocamlc
which is built with a genericConfig
module. With the current setup of the build, therefore, it is not possible forboot/ocamlc
to use values fromConfig
(althoughConfig.bindir
exists,boot/ocamlc
will see the value it was built with during the bootstrap, not the value used during configuration and, for this reason, there is noConfig.shebangscripts
to mirror the$(SHEBANGSCRIPTS)
variable in the build system). Althoughocamlc
doesn't at present actually analyse the content of the header it's copying, the decisions it takes as to which file to read (based on-use-runtime
and-runtime-variant
) mean that the header is effectively acting as a series of "ghost" command-line arguments! While this is sort of neat, it's causing a few problems:-use-runtime
whencamlheader_ur
is#!
,ocamlc
ends up writing the full path of the runtime twice (even allowing for bug 5)-use-runtime
mode,ocamlc
only needs to mark theRNTM
section for the executable header, but it's unnecessarily marking it even whencamlheader_ur
is just#!
(whichocamlc
has in fact read!)RNTM
section is needed, the string ends with the wrong terminator in order to keep the format valid for the shebang case (i.e. theRNTM
section is written unnecessarily in one case and, in order to ensure that the string is correct in that unnecessary case, the string has to be mangled in the necessary case 馃く)%PATH%
for the runtime, and never use an absolute path. This is very subtly encoded instdlib/Makefile
.camlheader
et al means that the code for validating shebang lines is at present implemented in m4sh (inconfigure.ac
), in OCaml (inbytecomp/bytelink.ml
) and should be being implemented in GNU make (instdlib/Makefile
).Proposed implementation in this PR
Presently, the processing of the header in ocamlc is simple, because it boils down to copying the correct file. I think it's possible to fix these 5 bugs while maintaining that. However, the code (in GNU make and m4sh) won't be terribly tasteful and the various checks will still be duplicated in several places. It is not possible to do Relocatable's switch cloning this way, where
camlheader
instead of being#!/usr/local/bin/ocamlrun
wants to be something akin to#!../../bin/ocamlrun
with the../
interpreted relative to the header itself.So, at last, to the details.
The principle here is to allow
ocamlc
to do all of the work, being given only the information which it can't know in advance via the "header". Since the header is now really a data-file, it's calledruntime-launch-info
. It contains the following three pieces of information:sh
in a shebang$prefix/bin
)stdlib/header.c
, orstdlib/headernt.c
on Windows)ocamlc
is responsible for:-runtime-variant
and the bindir read fromruntime-launch-info
)sh
, ifruntime-launch-info
doesn't contain an absolute path to itRNTM
section is used only with the executable header (and is now null-terminated). Furthermore, when using executable headers, it is required that theRNTM
section is present in the image (this stops the executable header from ever containing an absolute path).It makes sense while overhauling all this to implement (finally) the
--with-target-bindir
option, which was added in the switch to autoconf in 4.08 but never plumbed in. Previously, cross-compilation systems will have specifiedTARGET_BINDIR
tomake
instead. Additionally, a new switch--with-target-sh
has been added to complete the cross-compilation picture, at least in terms of the shell scripting. This allows every aspect ofstdlib/target_runtime-launch-info
to be controlled in the build. In particular, it allows an improvement to Cygwin's compilation (see below), also providing an immediate upstream use-case for this change.While the implementation necessarily adds quite a lot of code to
bytecomp/bytelink.ml
, it removes a relatively complex bit of m4sh fromconfigure.ac
and an exceedingly complex mess fromstdlib/Makefile
. In passing, there are a couple of related issues which can be trivially fixed:sh
may not be found. This is "solved" by always compiling the executable launcher, even on Unix. A minor side-effect of this is to reduce bit-rot in this file, which had started to happen (see first commit).--with-target-sh
allows the use of shebangs on Cygwin to be improved.The approach I've adopted in this PR is to allow
ocamlc
to look at the data it reads fromcamlheader
and act accordingly. There is more than one way to do this! I think it is possible to achieve this using a mix of-use-runtime
in the build system and carefully ensuring that theConfig
module's values are only ever used by a compiler which has been actually installed. Likewise, we have previously discussed being able to dynamically load the completeConfig
module into the boot compiler (see #9291). I think the-use-runtime
approach is likely to be a bit too brittle and although I have a possible approach for loadingConfig
at runtime, it's not a trivial change, and we are also fixing bugs here and now.