-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Honor BUILD_PATH_PREFIX_MAP #1515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
05f5c92
to
67f79f7
Compare
Echoing what was said on the GCC patches list: setting build parameters via environment variables is fragile (a pretty good way of ensuring you don't get reproducible builds, in fact). I would rather see this as an option to the compiler used in the build systems of the relevant packages, whose maintainers are interested in this reproducibility property, rather than another It might be worth investigating how Debian ensures this property in the C world. (Is it relying on most things using autoconf, for example, so a standardised option can be passed?) |
If I understood correctly, the reproducible-builds.org infrastructure uses Ximin's gcc patches for its Debian package testing, so that's the approach they use for C. One counter-argument in support of an environment variable, I think, is that reproducibility is a cross-cutting concern. We want all the packages on a user system to be reproducible, not just some of them for which the authors made specific build-system choices. Also, following shared conventions across different languages and build systems makes it much easier to have reproducible build in practice. (Of course, from the perspective of someone who does not care about reproducibility, I think it is fair to ask how much complexity the change adds to the normal workings of the system.) I'm not trying to use a slippery slope, but just as a remark: 4.06 already respects reproducible-builds specifications/conventions for another environment variable, SOURCE_DATE_EPOCH, that can be used to make programs reproducible when they include timestamps in their output. |
One other thing that comes to mind: the notion of "reproducible" might be worth investigating a little more. Is it acceptable for certain sections of the executable to differ? @lpw25 and I have discussed, actually in the context of Spacetime, moving the debug info out of the frametable (which is in the data section) into a separate section. Doing this might help our reproducibility story if only a subset (i.e. all those that matter) of sections are compared without any rewriting of paths. Are there any cases which don't relate to debug info? I think there might be a problem with paths being embedded in the names of Flambda anonymous function symbols, although we should be able to make that go away once the gdb patches land, as the user-visible names can be specified separately in DWARF. I'm not sure I know of any others, but there probably are some. |
Not really I think, see the definition (the bit-by-bit bit). As an end user I really just want to simply hash my binary and compare it to the hash of a binary someone else produced on another machine. |
I include below an exhaustive list of all paths rewritten by the current patch. A word on terminology, I call "completed path" a path that was passed as a relative path, and is turned into an absolute path by the compiler.
Whenever I mentioned that completed paths are rewritten, note that this only applies if the path was passed as a relative path to that part of the compiler. In particular, if the user explicitly specifies an absolute path, then this path is not rewritten. Another way to think of the behavior is that the only paths that are rewritten are those resulting from a |
I take from it than filenames in assertion or pattern matching failure messages, or string literals generated by FILE, are not currently rewritten in case when |
I'm not sure about warning messages, this may be a case of (4) that I overlooked. I'm sure that FILE currently reuses the With the proposed patchset, (Outside Location, |
Note that in the particular case of |
This is normally the case for variables unrelated to reproducible builds, but this variable was specifically designed for achieving reproducible builds, and there are some differences: For things like From another angle: if this were implemented via a command-line flag, one would have to duplicate absolute paths in the calls to And yes, the specific GCC reviewer for my patch ignored this argument, but nevertheless I maintain that this is the preferred approach. |
I'm not sure I fully understand that point. You don't do the rewrite on absolute paths given as arguments (e.g. the From a build system perspective it's much better from a usability point if your build system passes absolute paths to all the tools it schedules (it makes the context in which the invocation occurs obvious in the logs and allows to repeat the a build tool invocation by simply c&p it regardless of your current environment). |
Note that this is the case for |
There is nothing in this patch that rewrites absolute paths given by users, but of course it would be possible to decide to do it -- and probably fairly easy. (I'd rather keep the scope of the current PR fairly small, though.) I am not sure whether systematically passing absolute paths is the best choice for build systems. For ocamlbuild specifically, if I remember correctly we rely on the fact that the paths are relative to the project root to be able to transfer paths from |
I personally really think it is for the usability reasons I mention (build understanding and easy manual replay; it becomes even more obvious when you deal with build system that support simultaenous side-by-side build variants). In any case there are quite a few things that will come to you under the form of absolute paths, like paths to installed libraries.
A trick by which I lost hours of work on documentation numerous time (as soon as last week again) because Making your builds in a separate directory structure that mirrors the source tree and play relative tricks is a brittle idea that has to die in my opinion. I don't think it is unreasonable to ask compiler tools to be able to read sources and writes build artefacts to/from arbitrary locations. |
For now my work on reproducible-builds has been focused on the experimental setup proposed by the reproducible-builds.org setup, which tries to eliminate build variation coming from various factors, which do not include the installation paths of dependencies, but does include variation of the build directory (from where the package is installed). The current PR (and its possible evolution) is more flexible than that, as arbitrary prefix rewrites may be specified through the prefix map (typically capturing and rewriting your I am of course open to making further PRs to improve the reproducibility situation in different scenarios (for example, displacement of the opam build root; but that may quickly run into relocatability issues), provided that an adequate testing and reproduction procedure is available. (This is the joke about the person who search their keys at night under the street lamp; I have enough keys to find under the very helpful lamp they provide to keep me busy before I look at the rest of the street.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few remarks:
- why do you need the
encode_
* functions? - couldn't you do the parsing with
ocamllex
rather than by hand?
The way I went at this problem is to write a self-contained external library (that other projects could also use), https://gitlab.com/gasche/build_path_prefix_map/, and then include it wholesale in my PR. The I would rather keep the included-in-the-compiler source identical to the outside source if that's not to costly, as it makes it easier to see what version it is and whether it should be updated. (I don't expect to update this library much, given that its scope is well-defined and its feature-set satisfying, unless the specification was to change.)
Possibly, but then I'm making the whole thing sensibly more complex to build for little gain. Are you suggesting that using |
No, I was just wondering whether it might be better. I accept your judgement that it isn't. |
@damiendoligez would you, by any chance, be interested in "Approving" this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good except for the fatal error message.
This library is imported from <https://gitlab.com/gasche/build_path_prefix_map/> commit 811691e479b38c0e6c1c0e5c853c555a63177ee3
67f79f7
to
6c55877
Compare
Indeed, that's an excellent point. I pushed a new version using
|
@gasche is there any reason why this new variable and its effects is not mentioned in the manual ? |
That's a very good point. |
Indeed, I should have thought of it. I am not sure where to document this, and would appreciate a recommendation here. It seems that there is no place in which all our environment variables are listed. comp.html has a "Contextual control of command-line options" section, but this is rather specific and not meant to cover all environment variables. I'm not sure if I should start a "environment variables" section, accepting that it will be very non-exhaustive at the start, or whether I should place the BUILD_PATH_PREFIX_MAP documentation in a specific place (which one?). I would like to propose the BUILD_PATH_PREFIX_MAP documentation for cherry-pick in 4.07, so a not-too-invasive change would be welcome -- although we could do a small change in 4.07 and a larger reorganization in trunk. |
An option might be to add a |
That's the section I would be looking for. I would move the contents of "Contextual control of command-line options" there and also since this seems to be shared between P.S. I realized this because it has been mentioned to me that the reproducible build people seem to be struggling with ocaml, I guess that one could solve the problem for these packages. /cc @lamby |
(I often grep for an "environment variables" section in manpages so +1 for that part.)
Well, not so much "struggling" per-se, more that we have been rather busy and overworked!
So I haven't confirmed that this is exactly the same issue or even that these are caused by the same underlying path; there have been other, multiple, ones in the past… but hopefully so. :) Thanks! |
A recent question from @hannesm made me realize that I failed to provide a simple user-friendly example of usage of BUILD_PATH_PREFIX_MAP. The syntax is a list of colon-separated rewrite rules of the form So for example, if in my $ ocamlc -absname -i foo.ml
File "/tmp/foo.ml", line 2, characters 0-0:
Error: Syntax error
# rewrite /tmp/ into nothing: "=/tmp/"
$ BUILD_PATH_PREFIX_MAP="=/tmp/" ocamlc -absname -i foo.ml
File "./foo.ml", line 2, characters 0-0:
Error: Syntax error
# more generaly, rewrite the current directory into nothing
$ BUILD_PATH_PREFIX_MAP="=$(pwd)/" ocamlc -absname -i foo.ml
File "./foo.ml", line 2, characters 0-0:
Error: Syntax error
# rewrite $(pwd) into /PWD
$ BUILD_PATH_PREFIX_MAP="/PWD/=$(pwd)/" ocamlc -absname -i foo.ml
File "/PWD/foo.ml", line 2, characters 0-0:
Error: Syntax error Note that this particular part of the compiler logic will helpfully(?) prefix |
Note that your example doesn't work on macOS because on that platform
you likely want to look for But that's a bit besides the point, I still think that the right strategy for reproducible builds is for your build system to always work with absolute paths (bonus point: it's easy to you understand what's happening when things go wrong and you don't need to be in a particular |
BUILD_PATH_PREFIX_MAP is a particular mechanism to allow you to chop prefixes, and then it's a matter of integrating it with whatever the compiler does internally; if it handles path in a simple, clean, consistent way, then you get good results, otherwise (as today) you get a mixed bag, but this is independent of this specific mechanism. |
That's not really the point. In that particular case it's not about handling things in a consistent way. It's a mismatch between the perception of what the |
* update solvuu link * fix broken links * remove OCaml as fast as C slides link (404), point DKML to the gitlab repo * remove flying frog consultancy and ocaml for scientists since there is no official website for both of these anymore --------- Co-authored-by: sabine <sabine@users.noreply.github.com> Co-authored-by: Christine Rose <christinerose@users.noreply.github.com>
BUILD_PATH_PREFIX_MAP is an environment variable designed for "reproducible builds" (by @infinity0). It allows to specify a series of rewriting rules for absolute paths, to make the output of build tools resilient to build-directory changes. See the BUILD_PATH_PREFIX_MAP specification -- the details are a bit tricky due to the need to portably escape strings within the environment variable, but this can be ignored for the purpose of this PR.
This PR adds a new module in
utils/
,build_path_prefix_map.{ml,mli}
, that implements encoding and decoding of build path prefix maps, according to the specification, and rewriting of absolute paths under a given map. Then a new functionLocation.rewrite_absolute_path
is added to do the lookup the environment variable and do the rewriting (the environment lookup, and map decoding, are cached).For example, when using this PR, the build of the
findlib
library becomes reproducible, while it currently is not, because it uses-g
by default, which includes absolute file names (since #14 and #157).Note that this PR does not change the behaviour of the compiler unless the BUILD_PATH_PREFIX_MAP variable is set (I expect build scripts for distributions interested in reproducible builds, such as Debian, to set it). In particular, it does not disable the use of absolute names in any way.
Final remarks:
build_path_prefix_map.{ml,mli}
as its own standalone library, https://gitlab.com/gasche/build_path_prefix_map/, to let other OCaml projects use it if they would like (see Relocation issues [ocamlfind] opam-repository#10863 for example).