New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor BUILD_PATH_PREFIX_MAP #1515

Merged
merged 3 commits into from Mar 5, 2018

Conversation

Projects
None yet
8 participants
@gasche
Member

gasche commented Dec 4, 2017

BUILD_PATH_PREFIX_MAP is an environment variable designed for "reproducible builds" (by @infinity0). It allows to specify a series of rewriting rules for absolute paths, to make the output of build tools resilient to build-directory changes. See the BUILD_PATH_PREFIX_MAP specification -- the details are a bit tricky due to the need to portably escape strings within the environment variable, but this can be ignored for the purpose of this PR.

This PR adds a new module in utils/, build_path_prefix_map.{ml,mli}, that implements encoding and decoding of build path prefix maps, according to the specification, and rewriting of absolute paths under a given map. Then a new function Location.rewrite_absolute_path is added to do the lookup the environment variable and do the rewriting (the environment lookup, and map decoding, are cached).

For example, when using this PR, the build of the findlib library becomes reproducible, while it currently is not, because it uses -g by default, which includes absolute file names (since #14 and #157).

Note that this PR does not change the behaviour of the compiler under the BUILD_PATH_PREFIX_MAP variable is set (I expect build scripts for distributions interested in reproducible builds, such as Debian, to set it). In particular, it does not disable the use of absolute names in any way.

Final remarks:

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 6, 2017

Contributor

Echoing what was said on the GCC patches list: setting build parameters via environment variables is fragile (a pretty good way of ensuring you don't get reproducible builds, in fact). I would rather see this as an option to the compiler used in the build systems of the relevant packages, whose maintainers are interested in this reproducibility property, rather than another OCAMLPARAM-like hack.

It might be worth investigating how Debian ensures this property in the C world. (Is it relying on most things using autoconf, for example, so a standardised option can be passed?)

Contributor

mshinwell commented Dec 6, 2017

Echoing what was said on the GCC patches list: setting build parameters via environment variables is fragile (a pretty good way of ensuring you don't get reproducible builds, in fact). I would rather see this as an option to the compiler used in the build systems of the relevant packages, whose maintainers are interested in this reproducibility property, rather than another OCAMLPARAM-like hack.

It might be worth investigating how Debian ensures this property in the C world. (Is it relying on most things using autoconf, for example, so a standardised option can be passed?)

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 6, 2017

Member

If I understood correctly, the reproducible-builds.org infrastructure uses Ximin's gcc patches for its Debian package testing, so that's the approach they use for C.

One counter-argument in support of an environment variable, I think, is that reproducibility is a cross-cutting concern. We want all the packages on a user system to be reproducible, not just some of them for which the authors made specific build-system choices. Also, following shared conventions across different languages and build systems makes it much easier to have reproducible build in practice.

(Of course, from the perspective of someone who does not care about reproducibility, I think it is fair to ask how much complexity the change adds to the normal workings of the system.)

I'm not trying to use a slippery slope, but just as a remark: 4.06 already respects reproducible-builds specifications/conventions for another environment variable, SOURCE_DATE_EPOCH, that can be used to make programs reproducible when they include timestamps in their output.

Member

gasche commented Dec 6, 2017

If I understood correctly, the reproducible-builds.org infrastructure uses Ximin's gcc patches for its Debian package testing, so that's the approach they use for C.

One counter-argument in support of an environment variable, I think, is that reproducibility is a cross-cutting concern. We want all the packages on a user system to be reproducible, not just some of them for which the authors made specific build-system choices. Also, following shared conventions across different languages and build systems makes it much easier to have reproducible build in practice.

(Of course, from the perspective of someone who does not care about reproducibility, I think it is fair to ask how much complexity the change adds to the normal workings of the system.)

I'm not trying to use a slippery slope, but just as a remark: 4.06 already respects reproducible-builds specifications/conventions for another environment variable, SOURCE_DATE_EPOCH, that can be used to make programs reproducible when they include timestamps in their output.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 6, 2017

Contributor

One other thing that comes to mind: the notion of "reproducible" might be worth investigating a little more. Is it acceptable for certain sections of the executable to differ?

@lpw25 and I have discussed, actually in the context of Spacetime, moving the debug info out of the frametable (which is in the data section) into a separate section. Doing this might help our reproducibility story if only a subset (i.e. all those that matter) of sections are compared without any rewriting of paths.

Are there any cases which don't relate to debug info? I think there might be a problem with paths being embedded in the names of Flambda anonymous function symbols, although we should be able to make that go away once the gdb patches land, as the user-visible names can be specified separately in DWARF. I'm not sure I know of any others, but there probably are some.

Contributor

mshinwell commented Dec 6, 2017

One other thing that comes to mind: the notion of "reproducible" might be worth investigating a little more. Is it acceptable for certain sections of the executable to differ?

@lpw25 and I have discussed, actually in the context of Spacetime, moving the debug info out of the frametable (which is in the data section) into a separate section. Doing this might help our reproducibility story if only a subset (i.e. all those that matter) of sections are compared without any rewriting of paths.

Are there any cases which don't relate to debug info? I think there might be a problem with paths being embedded in the names of Flambda anonymous function symbols, although we should be able to make that go away once the gdb patches land, as the user-visible names can be specified separately in DWARF. I'm not sure I know of any others, but there probably are some.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Dec 6, 2017

Contributor

One other thing that comes to mind: the notion of "reproducible" might be worth investigating a little more. Is it acceptable for certain sections of the executable to differ?

Not really I think, see the definition (the bit-by-bit bit). As an end user I really just want to simply hash my binary and compare it to the hash of a binary someone else produced on another machine.

Contributor

dbuenzli commented Dec 6, 2017

One other thing that comes to mind: the notion of "reproducible" might be worth investigating a little more. Is it acceptable for certain sections of the executable to differ?

Not really I think, see the definition (the bit-by-bit bit). As an end user I really just want to simply hash my binary and compare it to the hash of a binary someone else produced on another machine.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 7, 2017

Member

Are there any cases which don't relate to debug info?

I include below an exhaustive list of all paths rewritten by the current patch. A word on terminology, I call "completed path" a path that was passed as a relative path, and is turned into an absolute path by the compiler.

  1. the completed paths to file names debug information,
    and the absolute path to the current build directory,
    which the debugger may lookup to find source files
    (they use absolute/completed paths since PRs #14 and #157)
  2. the completed path to a custom bytecode interpreter stored in the RNTM section of cmo files,
    passed using the -use-runtime option
  3. the path to the "build directory" stored in .cmt files
  4. the completed paths shown in error locations in the compiler output

Whenever I mentioned that completed paths are rewritten, note that this only applies if the path was passed as a relative path to that part of the compiler. In particular, if the user explicitly specifies an absolute path, then this path is not rewritten. Another way to think of the behavior is that the only paths that are rewritten are those resulting from a Sys.getcwd () call by the compiler itself.

Member

gasche commented Dec 7, 2017

Are there any cases which don't relate to debug info?

I include below an exhaustive list of all paths rewritten by the current patch. A word on terminology, I call "completed path" a path that was passed as a relative path, and is turned into an absolute path by the compiler.

  1. the completed paths to file names debug information,
    and the absolute path to the current build directory,
    which the debugger may lookup to find source files
    (they use absolute/completed paths since PRs #14 and #157)
  2. the completed path to a custom bytecode interpreter stored in the RNTM section of cmo files,
    passed using the -use-runtime option
  3. the path to the "build directory" stored in .cmt files
  4. the completed paths shown in error locations in the compiler output

Whenever I mentioned that completed paths are rewritten, note that this only applies if the path was passed as a relative path to that part of the compiler. In particular, if the user explicitly specifies an absolute path, then this path is not rewritten. Another way to think of the behavior is that the only paths that are rewritten are those resulting from a Sys.getcwd () call by the compiler itself.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Dec 7, 2017

Contributor

I take from it than filenames in assertion or pattern matching failure messages, or string literals generated by FILE, are not currently rewritten in case when -absname is used. I'm not sure we want to prevent doing so in the future, though.

Contributor

alainfrisch commented Dec 7, 2017

I take from it than filenames in assertion or pattern matching failure messages, or string literals generated by FILE, are not currently rewritten in case when -absname is used. I'm not sure we want to prevent doing so in the future, though.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 7, 2017

Member

I'm not sure about warning messages, this may be a case of (4) that I overlooked. I'm sure that FILE currently reuses the pos_fname value of the lexing location of its AST node. I believe that whether those are absolute or relative depends on how the file path was given to the OCaml compiler.

With the proposed patchset, Location.absolute_path always applies the prefix_map rewrite. This means that if we later decide to complete FILE paths using this function, no change would be required from the point of view of this PR.

(Outside Location, rewrite_absolute_path is only used on one direct getcwd () call and two independent reimplementations of absolute_path.)

Member

gasche commented Dec 7, 2017

I'm not sure about warning messages, this may be a case of (4) that I overlooked. I'm sure that FILE currently reuses the pos_fname value of the lexing location of its AST node. I believe that whether those are absolute or relative depends on how the file path was given to the OCaml compiler.

With the proposed patchset, Location.absolute_path always applies the prefix_map rewrite. This means that if we later decide to complete FILE paths using this function, no change would be required from the point of view of this PR.

(Outside Location, rewrite_absolute_path is only used on one direct getcwd () call and two independent reimplementations of absolute_path.)

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Dec 7, 2017

Contributor

Note that in the particular case of cmt[i] files this will not be sufficient to make them reproducible because of cmt_args.

Contributor

dbuenzli commented Dec 7, 2017

Note that in the particular case of cmt[i] files this will not be sufficient to make them reproducible because of cmt_args.

@infinity0

This comment has been minimized.

Show comment
Hide comment
@infinity0

infinity0 Dec 7, 2017

Contributor

setting build parameters via environment variables is fragile (a pretty good way of ensuring you don't get reproducible builds, in fact).

This is normally the case for variables unrelated to reproducible builds, but this variable was specifically designed for achieving reproducible builds, and there are some differences:

For things like OCAMLPARAM, one is supplying extra information to a build process via a mechanism that is not very transparent (envvars) and more transparent mechanisms (command-line args) are preferred. For BUILD_PATH_PREFIX_MAP however, it is more accurate to think of this as removing information that the build process is already taking via a non-transparent mechanism - i.e. absolute paths on the filesystem.

From another angle: if this were implemented via a command-line flag, one would have to duplicate absolute paths in the calls to ocamlc et.al, polluting what would otherwise be a constant invocation ocamlc $args, where $args is not dependent on a specific build machine, with information that is specific to a build machine. (Then sometimes, higher-level buildsystems also like to save $args into the build output, making it unreproducible again.)

And yes, the specific GCC reviewer for my patch ignored this argument, but nevertheless I maintain that this is the preferred approach.

Contributor

infinity0 commented Dec 7, 2017

setting build parameters via environment variables is fragile (a pretty good way of ensuring you don't get reproducible builds, in fact).

This is normally the case for variables unrelated to reproducible builds, but this variable was specifically designed for achieving reproducible builds, and there are some differences:

For things like OCAMLPARAM, one is supplying extra information to a build process via a mechanism that is not very transparent (envvars) and more transparent mechanisms (command-line args) are preferred. For BUILD_PATH_PREFIX_MAP however, it is more accurate to think of this as removing information that the build process is already taking via a non-transparent mechanism - i.e. absolute paths on the filesystem.

From another angle: if this were implemented via a command-line flag, one would have to duplicate absolute paths in the calls to ocamlc et.al, polluting what would otherwise be a constant invocation ocamlc $args, where $args is not dependent on a specific build machine, with information that is specific to a build machine. (Then sometimes, higher-level buildsystems also like to save $args into the build output, making it unreproducible again.)

And yes, the specific GCC reviewer for my patch ignored this argument, but nevertheless I maintain that this is the preferred approach.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Dec 7, 2017

Contributor

Whenever I mentioned that completed paths are rewritten, note that this only applies if the path was passed as a relative path to that part of the compiler. In particular, if the user explicitly specifies an absolute path, then this path is not rewritten.

I'm not sure I fully understand that point. You don't do the rewrite on absolute paths given as arguments (e.g. the -o, the cmo args, etc.) ?

From a build system perspective it's much better from a usability point if your build system passes absolute paths to all the tools it schedules (it makes the context in which the invocation occurs obvious in the logs and allows to repeat the a build tool invocation by simply c&p it regardless of your current environment).

Contributor

dbuenzli commented Dec 7, 2017

Whenever I mentioned that completed paths are rewritten, note that this only applies if the path was passed as a relative path to that part of the compiler. In particular, if the user explicitly specifies an absolute path, then this path is not rewritten.

I'm not sure I fully understand that point. You don't do the rewrite on absolute paths given as arguments (e.g. the -o, the cmo args, etc.) ?

From a build system perspective it's much better from a usability point if your build system passes absolute paths to all the tools it schedules (it makes the context in which the invocation occurs obvious in the logs and allows to repeat the a build tool invocation by simply c&p it regardless of your current environment).

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Dec 7, 2017

Contributor

(Then sometimes, higher-level buildsystems also like to save $args into the build output, making it unreproducible again.)

Note that this is the case for ocamlc (see my comment about cmt files).

Contributor

dbuenzli commented Dec 7, 2017

(Then sometimes, higher-level buildsystems also like to save $args into the build output, making it unreproducible again.)

Note that this is the case for ocamlc (see my comment about cmt files).

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 7, 2017

Member

There is nothing in this patch that rewrites absolute paths given by users, but of course it would be possible to decide to do it -- and probably fairly easy. (I'd rather keep the scope of the current PR fairly small, though.)

I am not sure whether systematically passing absolute paths is the best choice for build systems. For ocamlbuild specifically, if I remember correctly we rely on the fact that the paths are relative to the project root to be able to transfer paths from _build to . easily -- typically for error messages sent back to users by the compiler.

Member

gasche commented Dec 7, 2017

There is nothing in this patch that rewrites absolute paths given by users, but of course it would be possible to decide to do it -- and probably fairly easy. (I'd rather keep the scope of the current PR fairly small, though.)

I am not sure whether systematically passing absolute paths is the best choice for build systems. For ocamlbuild specifically, if I remember correctly we rely on the fact that the paths are relative to the project root to be able to transfer paths from _build to . easily -- typically for error messages sent back to users by the compiler.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Dec 7, 2017

Contributor

I am not sure whether systematically passing absolute paths is the best choice for build systems.

I personally really think it is for the usability reasons I mention (build understanding and easy manual replay; it becomes even more obvious when you deal with build system that support simultaenous side-by-side build variants). In any case there are quite a few things that will come to you under the form of absolute paths, like paths to installed libraries.

For ocamlbuild specifically, if I remember correctly we rely on the fact that the paths are relative to the project root to be able to transfer paths from _build to . easily -- typically for error messages sent back to users by the compiler.

A trick by which I lost hours of work on documentation numerous time (as soon as last week again) because ocamldoc was making its paths absolute in certain circumstances until recently.

Making your builds in a separate directory structure that mirrors the source tree and play relative tricks is a brittle idea that has to die in my opinion. I don't think it is unreasonable to ask compiler tools to be able to read sources and writes build artefacts to/from arbitrary locations.

Contributor

dbuenzli commented Dec 7, 2017

I am not sure whether systematically passing absolute paths is the best choice for build systems.

I personally really think it is for the usability reasons I mention (build understanding and easy manual replay; it becomes even more obvious when you deal with build system that support simultaenous side-by-side build variants). In any case there are quite a few things that will come to you under the form of absolute paths, like paths to installed libraries.

For ocamlbuild specifically, if I remember correctly we rely on the fact that the paths are relative to the project root to be able to transfer paths from _build to . easily -- typically for error messages sent back to users by the compiler.

A trick by which I lost hours of work on documentation numerous time (as soon as last week again) because ocamldoc was making its paths absolute in certain circumstances until recently.

Making your builds in a separate directory structure that mirrors the source tree and play relative tricks is a brittle idea that has to die in my opinion. I don't think it is unreasonable to ask compiler tools to be able to read sources and writes build artefacts to/from arbitrary locations.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 7, 2017

Member

In any case there are quite a few things that will come to you under the form of absolute paths, like paths to installed libraries.

For now my work on reproducible-builds has been focused on the experimental setup proposed by the reproducible-builds.org setup, which tries to eliminate build variation coming from various factors, which do not include the installation paths of dependencies, but does include variation of the build directory (from where the package is installed).

The current PR (and its possible evolution) is more flexible than that, as arbitrary prefix rewrites may be specified through the prefix map (typically capturing and rewriting your opam root). But in the experimental testing that actually gets tested in practice, only the current directory is part of the build path, and it is its variations that I want to make sure are prevented by the PR. For Debian packages (they get tested in practice, but this would also apply to most other distributions), the installed libraries will be in a place that does not vary from one user system to another.

I am of course open to making further PRs to improve the reproducibility situation in different scenarios (for example, displacement of the opam build root; but that may quickly run into relocatability issues), provided that an adequate testing and reproduction procedure is available.

(This is the joke about the person who search their keys at night under the street lamp; I have enough keys to find under the very helpful lamp they provide to keep me busy before I look at the rest of the street.)

Member

gasche commented Dec 7, 2017

In any case there are quite a few things that will come to you under the form of absolute paths, like paths to installed libraries.

For now my work on reproducible-builds has been focused on the experimental setup proposed by the reproducible-builds.org setup, which tries to eliminate build variation coming from various factors, which do not include the installation paths of dependencies, but does include variation of the build directory (from where the package is installed).

The current PR (and its possible evolution) is more flexible than that, as arbitrary prefix rewrites may be specified through the prefix map (typically capturing and rewriting your opam root). But in the experimental testing that actually gets tested in practice, only the current directory is part of the build path, and it is its variations that I want to make sure are prevented by the PR. For Debian packages (they get tested in practice, but this would also apply to most other distributions), the installed libraries will be in a place that does not vary from one user system to another.

I am of course open to making further PRs to improve the reproducibility situation in different scenarios (for example, displacement of the opam build root; but that may quickly run into relocatability issues), provided that an adequate testing and reproduction procedure is available.

(This is the joke about the person who search their keys at night under the street lamp; I have enough keys to find under the very helpful lamp they provide to keep me busy before I look at the rest of the street.)

@damiendoligez

A few remarks:

  • why do you need the encode_* functions?
  • couldn't you do the parsing with ocamllex rather than by hand?
String.sub str (equal_pos + 1) (String.length str - equal_pos - 1) in
match decode_prefix encoded_target, decode_prefix encoded_source with
| Ok target, Ok source -> Ok { target; source }
| ((Error _ as err), _) | (_, (Error _ as err)) -> err

This comment has been minimized.

@damiendoligez

damiendoligez Dec 13, 2017

Member

@gasche why doesn't this trigger your favorite warning? (ambiguous or-pattern)

@damiendoligez

damiendoligez Dec 13, 2017

Member

@gasche why doesn't this trigger your favorite warning? (ambiguous or-pattern)

This comment has been minimized.

@gasche

gasche Dec 13, 2017

Member

The warning only triggers when there is a guard clause. In the case here (and many similar other cases), the or-pattern actually behaves as a commutative operator if you only look at matcheability (and not which specific variables get returned in the environment) -- the problem is that guard clauses break this property, as then which side gets matched first influences whether the guards works or not, and thus whether the clause is taken or not.

@gasche

gasche Dec 13, 2017

Member

The warning only triggers when there is a guard clause. In the case here (and many similar other cases), the or-pattern actually behaves as a commutative operator if you only look at matcheability (and not which specific variables get returned in the environment) -- the problem is that guard clauses break this property, as then which side gets matched first influences whether the guards works or not, and thus whether the clause is taken or not.

fun path ->
if not !init then begin
init := true;
match Sys.getenv "BUILD_PATH_PREFIX_MAP" with

This comment has been minimized.

@damiendoligez

damiendoligez Dec 13, 2017

Member

You should probably emit a warning (or at least some kind of informational message) when the environment variable is found.

@damiendoligez

damiendoligez Dec 13, 2017

Member

You should probably emit a warning (or at least some kind of informational message) when the environment variable is found.

This comment has been minimized.

@gasche

gasche Dec 13, 2017

Member

This is an interesting idea but I don't completely get it.

The idea is to make it possible for someone to check that the behavior correctly took that variable into account? That seems sensible.

Am I correct that we don't have anything existing in the compiler right now for "informational messages which are not warnings"? (That is, to factually let the user know about something, without suggesting that they are guilty of some offense.) How would you suggest to go about it?

@gasche

gasche Dec 13, 2017

Member

This is an interesting idea but I don't completely get it.

The idea is to make it possible for someone to check that the behavior correctly took that variable into account? That seems sensible.

Am I correct that we don't have anything existing in the compiler right now for "informational messages which are not warnings"? (That is, to factually let the user know about something, without suggesting that they are guilty of some offense.) How would you suggest to go about it?

This comment has been minimized.

@damiendoligez

damiendoligez Feb 7, 2018

Member

The idea is to let me known if I have this variable defined in my environment and I forgot about it. But you're right that it doesn't fit in any of the classes of message that the compiler currently outputs.

@damiendoligez

damiendoligez Feb 7, 2018

Member

The idea is to let me known if I have this variable defined in my environment and I forgot about it. But you're right that it doesn't fit in any of the classes of message that the compiler currently outputs.

This comment has been minimized.

@gasche

gasche Feb 20, 2018

Member

@damiendoligez I have thought about this again, but I don't see an easy way to do it.

It sounds like a lot of work to implement (1) a new way for the compiler to print information about its execution (Information, so having an Error/Warning/Information continuum) and (2) user-interface mechanism to enable/disable printing of this new kind of messages.

I think your general wish is to have a way to observe the reliance of the compiler process over its execution environment. We could implement this as a new compiler option (-trace-environment-access), but note that there are third-party tools that already provide that monitoring:

$ echo "string secure_getenv(string)" > ltrace.conf
$ OCAML_COLOR=auto ltrace -F ./ltrace.conf -e secure_getenv -- ocamlc -c foo.ml
ocamlc->secure_getenv("OCAMLRUNPARAM")                 = nil
ocamlc->secure_getenv("CAMLRUNPARAM")                  = nil
ocamlc->secure_getenv("OCAMLRUNPARAM")                 = nil
ocamlc->secure_getenv("CAMLRUNPARAM")                  = nil
ocamlc->secure_getenv("TMPDIR")                        = nil
ocamlc->secure_getenv("TEMP")                          = nil
ocamlc->secure_getenv("OCAMLLIB")                      = nil
ocamlc->secure_getenv("CAMLLIB")                       = nil
ocamlc->secure_getenv("OCAML_BINANNOT_WITHENV")        = nil
ocamlc->secure_getenv("OCAMLPARAM")                    = nil
ocamlc->secure_getenv("OCAML_COLOR")                   = "auto"
ocamlc->secure_getenv("OCAMLPARAM")                    = nil
ocamlc->secure_getenv("OCAMLPARAM")                    = nil
@gasche

gasche Feb 20, 2018

Member

@damiendoligez I have thought about this again, but I don't see an easy way to do it.

It sounds like a lot of work to implement (1) a new way for the compiler to print information about its execution (Information, so having an Error/Warning/Information continuum) and (2) user-interface mechanism to enable/disable printing of this new kind of messages.

I think your general wish is to have a way to observe the reliance of the compiler process over its execution environment. We could implement this as a new compiler option (-trace-environment-access), but note that there are third-party tools that already provide that monitoring:

$ echo "string secure_getenv(string)" > ltrace.conf
$ OCAML_COLOR=auto ltrace -F ./ltrace.conf -e secure_getenv -- ocamlc -c foo.ml
ocamlc->secure_getenv("OCAMLRUNPARAM")                 = nil
ocamlc->secure_getenv("CAMLRUNPARAM")                  = nil
ocamlc->secure_getenv("OCAMLRUNPARAM")                 = nil
ocamlc->secure_getenv("CAMLRUNPARAM")                  = nil
ocamlc->secure_getenv("TMPDIR")                        = nil
ocamlc->secure_getenv("TEMP")                          = nil
ocamlc->secure_getenv("OCAMLLIB")                      = nil
ocamlc->secure_getenv("CAMLLIB")                       = nil
ocamlc->secure_getenv("OCAML_BINANNOT_WITHENV")        = nil
ocamlc->secure_getenv("OCAMLPARAM")                    = nil
ocamlc->secure_getenv("OCAML_COLOR")                   = "auto"
ocamlc->secure_getenv("OCAMLPARAM")                    = nil
ocamlc->secure_getenv("OCAMLPARAM")                    = nil
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 13, 2017

Member

why do you need the encode_* functions?

The way I went at this problem is to write a self-contained external library (that other projects could also use), https://gitlab.com/gasche/build_path_prefix_map/, and then include it wholesale in my PR. The encode_* functions are not used in the compiler today (although they could be, for example in the testsuite), but they were very useful to test the decode_* function code -- I used Crowbar/afl-fuzz to check the roundtrip properties in all directions.

I would rather keep the included-in-the-compiler source identical to the outside source if that's not to costly, as it makes it easier to see what version it is and whether it should be updated. (I don't expect to update this library much, given that its scope is well-defined and its feature-set satisfying, unless the specification was to change.)

couldn't you do the parsing with ocamllex rather than by hand?

Possibly, but then I'm making the whole thing sensibly more complex to build for little gain. Are you suggesting that using ocamllex would be better?

Member

gasche commented Dec 13, 2017

why do you need the encode_* functions?

The way I went at this problem is to write a self-contained external library (that other projects could also use), https://gitlab.com/gasche/build_path_prefix_map/, and then include it wholesale in my PR. The encode_* functions are not used in the compiler today (although they could be, for example in the testsuite), but they were very useful to test the decode_* function code -- I used Crowbar/afl-fuzz to check the roundtrip properties in all directions.

I would rather keep the included-in-the-compiler source identical to the outside source if that's not to costly, as it makes it easier to see what version it is and whether it should be updated. (I don't expect to update this library much, given that its scope is well-defined and its feature-set satisfying, unless the specification was to change.)

couldn't you do the parsing with ocamllex rather than by hand?

Possibly, but then I'm making the whole thing sensibly more complex to build for little gain. Are you suggesting that using ocamllex would be better?

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Feb 7, 2018

Member

Possibly, but then I'm making the whole thing sensibly more complex to build for little gain. Are you suggesting that using ocamllex would be better?

No, I was just wondering whether it might be better. I accept your judgement that it isn't.

Member

damiendoligez commented Feb 7, 2018

Possibly, but then I'm making the whole thing sensibly more complex to build for little gain. Are you suggesting that using ocamllex would be better?

No, I was just wondering whether it might be better. I accept your judgement that it isn't.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 20, 2018

Member

@damiendoligez would you, by any chance, be interested in "Approving" this PR?

Member

gasche commented Feb 20, 2018

@damiendoligez would you, by any chance, be interested in "Approving" this PR?

@damiendoligez

This looks good except for the fatal error message.

Show outdated Hide outdated parsing/location.ml Outdated
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 1, 2018

Member

IIUC, this failwith means if I launch the compiler with an ill-formed value for BUILD_PATH_PREFIX_MAP I might get a compiler fatal error with a message like:

Fatal error: Failure "invalid character '=' in key or value"

This is not nice. The user-visible error message must at least mention the environment variable name.

Indeed, that's an excellent point.

I pushed a new version using Misc.fatal_error:

$ BUILD_PATH_PREFIX_MAP="\"" ocamlopt -absname test2.cmx
>> Fatal error: Invalid value for the environment variable BUILD_PATH_PREFIX_MAP: invalid key/value pair "\"", no '=' separator
Fatal error: exception Misc.Fatal_error
Member

gasche commented Mar 1, 2018

IIUC, this failwith means if I launch the compiler with an ill-formed value for BUILD_PATH_PREFIX_MAP I might get a compiler fatal error with a message like:

Fatal error: Failure "invalid character '=' in key or value"

This is not nice. The user-visible error message must at least mention the environment variable name.

Indeed, that's an excellent point.

I pushed a new version using Misc.fatal_error:

$ BUILD_PATH_PREFIX_MAP="\"" ocamlopt -absname test2.cmx
>> Fatal error: Invalid value for the environment variable BUILD_PATH_PREFIX_MAP: invalid key/value pair "\"", no '=' separator
Fatal error: exception Misc.Fatal_error

@gasche gasche removed the high-priority label Mar 1, 2018

@gasche gasche merged commit d5eff72 into ocaml:trunk Mar 5, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dra27 dra27 referenced this pull request Jun 8, 2018

Merged

Tweaks for check-typo #1829

4 of 4 tasks complete
@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Jun 19, 2018

Contributor

@gasche is there any reason why this new variable and its effects is not mentioned in the manual ?

Contributor

dbuenzli commented Jun 19, 2018

@gasche is there any reason why this new variable and its effects is not mentioned in the manual ?

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Jun 20, 2018

Member

@gasche is there any reason why this new variable and its effects is not mentioned in the manual ?

That's a very good point.

Member

damiendoligez commented Jun 20, 2018

@gasche is there any reason why this new variable and its effects is not mentioned in the manual ?

That's a very good point.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Jun 20, 2018

Member

Indeed, I should have thought of it. I am not sure where to document this, and would appreciate a recommendation here. It seems that there is no place in which all our environment variables are listed. comp.html has a "Contextual control of command-line options" section, but this is rather specific and not meant to cover all environment variables.

I'm not sure if I should start a "environment variables" section, accepting that it will be very non-exhaustive at the start, or whether I should place the BUILD_PATH_PREFIX_MAP documentation in a specific place (which one?). I would like to propose the BUILD_PATH_PREFIX_MAP documentation for cherry-pick in 4.07, so a not-too-invasive change would be welcome -- although we could do a small change in 4.07 and a larger reorganization in trunk.

Member

gasche commented Jun 20, 2018

Indeed, I should have thought of it. I am not sure where to document this, and would appreciate a recommendation here. It seems that there is no place in which all our environment variables are listed. comp.html has a "Contextual control of command-line options" section, but this is rather specific and not meant to cover all environment variables.

I'm not sure if I should start a "environment variables" section, accepting that it will be very non-exhaustive at the start, or whether I should place the BUILD_PATH_PREFIX_MAP documentation in a specific place (which one?). I would like to propose the BUILD_PATH_PREFIX_MAP documentation for cherry-pick in 4.07, so a not-too-invasive change would be welcome -- although we could do a small change in 4.07 and a larger reorganization in trunk.

@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Jun 20, 2018

Contributor

An option might be to add a Reproducible builds section to comp and document BUILD_PATH_PREFIX_MAP here. If an environment variable index is added at a later date, the BUILD_PATH_PREFIX_MAP entry could simply link to this section.

Contributor

Octachron commented Jun 20, 2018

An option might be to add a Reproducible builds section to comp and document BUILD_PATH_PREFIX_MAP here. If an environment variable index is added at a later date, the BUILD_PATH_PREFIX_MAP entry could simply link to this section.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Jun 20, 2018

Contributor

I'm not sure if I should start a "environment variables" section

That's the section I would be looking for. I would move the contents of "Contextual control of command-line options" there and also since this seems to be shared between ocamlc and ocamlopt simply have all the contents in ocamlc and a link from the ocamlopt section to the ocamlc section.

P.S. I realized this because it has been mentioned to me that the reproducible build people seem to be struggling with ocaml, I guess that one could solve the problem for these packages. /cc @lamby

Contributor

dbuenzli commented Jun 20, 2018

I'm not sure if I should start a "environment variables" section

That's the section I would be looking for. I would move the contents of "Contextual control of command-line options" there and also since this seems to be shared between ocamlc and ocamlopt simply have all the contents in ocamlc and a link from the ocamlopt section to the ocamlc section.

P.S. I realized this because it has been mentioned to me that the reproducible build people seem to be struggling with ocaml, I guess that one could solve the problem for these packages. /cc @lamby

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Jun 20, 2018

(I often grep for an "environment variables" section in manpages so +1 for that part.)

I realized this because it has been mentioned to me that the reproducible build people seem to be struggling with ocaml, I guess that one could solve the problem

Well, not so much "struggling" per-se, more that we have been rather busy and overworked!

for these packages.

So I haven't confirmed that this is exactly the same issue or even that these are caused by the same underlying path; there have been other, multiple, ones in the past… but hopefully so. :) Thanks!

lamby commented Jun 20, 2018

(I often grep for an "environment variables" section in manpages so +1 for that part.)

I realized this because it has been mentioned to me that the reproducible build people seem to be struggling with ocaml, I guess that one could solve the problem

Well, not so much "struggling" per-se, more that we have been rather busy and overworked!

for these packages.

So I haven't confirmed that this is exactly the same issue or even that these are caused by the same underlying path; there have been other, multiple, ones in the past… but hopefully so. :) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment