Please replace cmi files atomically when writing new versions #7472
Original bug ID: 7472
I've recently found a problem in the omake build rules, and while it is possible to tackle it there I think it is better to do so in the compiler. In short, when both bytecode and native code build paths are enabled, it is generally possible that both of these paths create the cmi file for a module at different points in time, and that the build in the other path sees for a short moment a destroyed cmi file (because it is being overwritten with identical contents). When there is an mli file, it is easy to write the build rules so that the problem is avoided (only either the bytecode or native code path compile the mli file). However, when there is no mli file, it is not that easy.
Apparently, omake used to contain a special ruleset to tackle that case, and the bytecode and native code build is then done together, i.e. ocamlc and ocamlopt are called in the same build step, just to avoid that one of the build paths sees a corrupt cmi file. I recently discovered the problem again for the macro that creates packs (see ocaml-omake/omake#73). The solution works, but there is a price to pay. When the user only requests, say the native code version of an executable, and packs or units are involved that do not have an mli file, some parts of the bytecode build are also being conducted.
There is an easy solution of the problem in the compiler: when writing a cmi file, do not directly overwrite the cmi file, but first write to a second file, and finally rename that file so it has the cmi suffix. This way there is no point in time where a truncated cmi file is visible to other processes of the build. This way build tools need not to take care of the problem.
Steps to reproduce
Assume there is x.ml but no x.mli. We do both a bytecode and a native code build. Let's say the bytecode build was a little bit quicker so far, and the build runs
ocamlc -c x.ml
The bytecode build then continues, assuming that x.cmi is present. Somewhat later the native code build is also at this step, and runs
ocamlopt -c x.ml
when the file x.cmi is written, and at the same time the bytecode compiler accesses x.cmi, there is the danger that the bytecode compiler sees a truncated x.cmi file.
Note that this is more likely to occur for big cmi files. I guess it's because of this I recently stubled upon the problem when creating packs.
The workaround for the problem in omake consists of these lines:
This build rule is only in effect when there (a) both bytecode and native code are enabled, and (b) there is no mli file. As you see both compilers are called, even if this is not needed.
In a recent PR I am now introducing something similar for creating packs:
Comment author: @alainfrisch
I like the proposal.
What about .cmt files? In addition to the risk of concurrent writes (which will become more serious when tools that process .cmt files -- such as documentation generators -- start to be more widely used), there is the problem that .cmt files generated by ocamlc and ocamlopt for the same input file will be different since they keep a copy of the compiler's command-line.
Comment author: @dra27
I too like this proposal. My past solution with make either drops parallelisation or has to use ugly locking for the no .mli case.
Unrelated to this PR, but possibly related to Alain's comment, I wonder given safe-string, flambda, various available runtimes and eventually proper cross-compiling if it's time to review why we have a separate ocamlc and ocamlopt (rather than one driver with output options)? Much of this disappears if it were possible to output bytecode and native code in a single driver invocation.
Comment author: gerd
dra: not sure what the benefit is (besides maybe cleaning up the code base). If we generate both bytecode and native code in one step, this is very much like always calling ocamlc and ocamlopt together. Ok, you can save time by avoiding doing the typing twice, but typically this isn't the time-consuming part of compilation.
Just for the record, there is still another radical solution: simply do not use the same cmi files in both bytecode and native builds. Separate the toolchains completely. I want to mention this because I think the bytecode compiler will become unimportant at some point (in particular when we have a native toplevel), at least for the majority of the users. At the same time ocamlopt advances into the field of whole-project optimizations, and unifying cmi and cmx could become interesting. In this world, the overhead of having parallel builds could become a burden anyway.
Comment author: @diml
Note that at Jane Street we use a different trick. We choose ocamlc to build the .cmi and we pass [-intf .ml] to ocamlopt. ocamlopt then thinks that the mli exists and will read the .cmi rather than re-create it.
It'd still be nice to improve the situation in general, but that's a good workaround for existing versions of OCaml.
Comment author: @dra27
TxF would provide this guarantee on Windows (Vista+ only) but it's partially deprecated. Given how easy these conflicts are to generate with build systems currently, it would not be too hard to see if ReplaceFile provides sufficient guarantees. Its API doesn't mention atomicity, though the TxF migration pages recommend using it as an alternative for this exact situation. For our purposes wouldn't this be OK - the problem is a truncated file existing, rather than no file existing? I think it may also be that the scenarios under which this call can fail would already be a problem for the build system (out of disk space, permissions)?
Comment author: @xavierleroy
All right, ReplaceFile might be atomic enough... I welcome a patch to byterun/sys.c and byterun/win32.c and byterun/include/misc.h that reimplements Sys.rename in terms of ReplaceFile under Win32. The idea being to have Sys.rename behave as much as possible the same under POSIX and under Win32.
Comment author: aha
The current solution (from the Github Pull Requests linked above) doesn't work very well on Windows.
'File "foo.ml", line 1:
So the original hope remains unfulfilled. It must still be solved by build tools.