Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible command line + determinism #450

Closed
3 tasks done
danakj opened this issue Jul 27, 2021 · 4 comments
Closed
3 tasks done

Reproducible command line + determinism #450

danakj opened this issue Jul 27, 2021 · 4 comments
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team

Comments

@danakj
Copy link

danakj commented Jul 27, 2021

Proposal

Hello,

We in Chromium are working on integrating Rust into our distributed build system. A key component of that is reproducible builds. Currently, we can not achieve the level of reproducibility required by our infrastructure teams with rustc, and we would like to propose a command line flag that would get us there.

Background

How Rust is being invoked in Chromium The Chromium build system consists of 3 major components:

GN: this generates the set of inputs and outputs and the command lines to get from one to the other. The output of GN feeds into Ninja.
Ninja: this orders and distributes the set of actions to run across many processes, in order to build in a highly efficient parallel way.
Goma: this acts as a compiler proxy on developers machines, and on build bots. Goma provides a rustc binary that proxies the inputs to a distributed build system, which is distributed across a fleet of machines. It also acts as a build cache, such that most compilation steps, for most developers, simply hit the cache and can immediately send back the resulting build output.

Our builds are run by developers and bots running Linux, Mac, and Windows, and Goma supports the same platforms.

There are 4 forms of build determinism, or reproducibility:
Basic: building the project twice produces the same output.
Incremental: rebuilding just part of the project (change the timestamp on a single file for example) produces the same output.
Local: the build output does not depend on where the source code, build directory, or output is located
Universal: the build output does not depend on any machine-specific state, and produces the same output on any machine.

These are explained in more detail in this blog post: https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html

Like Gecko, we have no external crates, they will all be under a single source root.

This was originally brought up on #38322, and a separate issue was created at #87325, which has now moved to this MCP.

Chromium bug: https://crbug.com/1230302

Reproducibility Issue

When building with rustc, the (absolute path to the) current working directory must appear either:
in the build outputs (DW_AT_comp_dir, DW_AT_decl_file, and coverage output), or
on the rustc command line.

The current working directory is a machine-specific (and typically user-specific) state. Thus it should not appear in any build output from the Rust compiler when determinism is needed. The compiler provides a --remap-path-prefix switch that can replace any path prefix, including the current working directory, within all paths. It was designed and implemented on issue #38322.

While --remap-path-prefix is often used specifically to strip off the current working directory, it is designed to be more general, requiring the user to specify the path to be remapped. This was done to support external crates located in paths that are not all relative to a single root. However, this places a requirement on the build system to include the (absolute path to the) current working directory in the rustc command line in order to rewrite paths in build outputs to be relative.

In a distributed build system, the actual rustc invocation is not run locally, but is proxied over to another machine. With --remap-path-prefix, the developer’s command line includes their own current working directory, so this directory must also be proxied over to the build bot and replace the PWD environment variable there for the command line to work correctly. This places user/machine specific information in both the command line and the environment.

It is a desirable property of a compiler to provide reproducible builds across build environments.

  • A given command line (and input files) on one machine should produce the same execution and the same outputs on another machine.
  • Additionally, the command line should not need to change to see the same execution on different machines.

These are critically important for debugging distributed build systems, in order to locally reproduce and debug problems as they arise on build bots. In particular, this includes if a bug should occur in rustc that needs to be diagnosed. For example, to reproduce and debug a build on a bot means grabbing logs, generated with ltrace, which shows the exact command lines being run. We do this to reproduce builds on a developer machine, and we expect the same commands to run exactly the same way there. There should be nothing machine-specific in the invocation or execution of rustc or the commands can fail to correctly reproduce failures. For this reason, we have a single source root, and Chromium always passes relative paths to the compiler. But by introducing the current working directory into the command line for --remap-path-prefix, this property would be lost.

Secondly, any build cache must maintain a mapping from its inputs to the build output. One of those inputs is the command line, as compiler flags can change the output. A build cache requires the input command line to be machine- and user-independent in order to share the cache. Such a cache will not be reliable unless it is keyed off the fully-resolved command line. If it used a command line with shell invocations embedded (such as --remap-path-prefix=$(pwd)=.), then it would produce incorrect results in other situations like -C debuginfo=$(determine_debug_info.sh). If a script were used to wrap the compiler, not only does this increase the complexity of the build system, but if any flags introduced in the wrapper script would not be visible to the build cache, which could produce caching mistakes.

Wrapper scripts impose other complexity and problems which are not warranted and can be avoided with fairly simple rustc support. As a cross-platform distributed build system, many platforms are supported, and the runtime characteristics and limitations of each platform differ. Of note is that on Windows, spawning a subprocess is expensive, and at scale would be both visible in time and cost dollars. Second, the --remap-path-prefix must be part of the command line on developers’ machines as well, in case the compilation is not proxied (a distributed build decides this at compile time, while command lines are decided at an earlier step), which prevents the server from injecting the flag itself. And beyond that, it does not change the actual invocation of rustc, which continues to interfere with the ability to reproduce a build from a buildbot on a developer machine, as captured rustc executions would be incorrect.

We see the reproducibility of compiler execution as an important property of the Rust compiler in order to scale into large development infrastructure.

Prior work

The same set of problems exist for us with Clang, which must also generate a DW_AT_comp_dir record in its output. Clang provides a -fdebug-compilation-dir which allows the user to specify what should be written as the current working directory for that field explicitly. The flag is described as:

-fdebug-compilation-dir=<arg>, -fdebug-compilation-dir <arg>
The compilation directory to embed in the debug info

This flag supports reproducible command lines by not having the current working directory appear as part of its input. Only the desired DW_AT_comp_dir (and DW_AT_decl_file) path is specified, not the source path.

There exists a related RFC #3127 for cargo, as mentioned by @jyn514, to remove absolute paths in its build. However this does not resolve this issue, as Chromium will not be using cargo to invoke rustc and we are already passing relative paths to rustc with our GN-generated rules.

There is an issue #64839 which states that there are privacy concerns with --remap-path-prefix requiring the user’s absolute path to their project. This proposal may solve that issue as well, as no absolute paths need to be passed, and they will also be stripped from the compiler’s output.

Proposal

There are a few high-level approaches that can resolve this issue.

  1. A command line switch to modify DW_AT_comp_dir and DW_AT_decl_file specifically. This would be similar to Clang’s -fdebug-compilation-dir flag. Internally, the flag’s value would replace rustc asking for the std::env::current_dir(). However, this would not change all outputs, such as coverage output, and --remap-path-prefix already provides machinery for this with the ability to reach everything.

  2. When no external crates are present, and all paths provided to the compiler are relative, then all absolute paths come with a common prefix of the current working directory. Thus the missing functionality for universally reproducible command lines is a way to map path prefixes of the current working directory.

This could be done with a special codeword in --remap-path-prefix such as --remap-path-prefix=%CWD%=. Then the compiler would substitute %CWD% with std::env::current_dir() when doing file path mapping.

  1. A separate command line flag can be provided, which we would propose as --remap-cwd-prefix (based on naming feedback from @jsgf). In order to experiment with the flag, we would land it initially as -Zremap-cwd-prefix. This flag would take precedence over any --remap-path-prefix switches, and otherwise behave exactly like --remap-path-prefix=$(pwd)=, mapping the current working directory prefix to a specified value without requiring shell expansion.

@jsgf flagged the question of if the current working directory should include the trailing path separator. The concern raised was that if the current working directory is /home/me it would also remap /home/meal. The implementation of remapping is found in FilePathMapping::map_prefix(&self, path: PathBuf), in compiler/rustc_span/src/source_map.rs. The remap is done only if PathBuf::strip_prefix() succeeds, which only maps prefixes that are full path components, meaning that /home/meal would (correctly) not be matched by this proposal without any extra work.

Thus, based on the discussion so far, we believe that the last option seems like the simplest and best path forward. This is done in PR #87320.

Future work

Clang has recently landed a flag to rewrite relative paths into generated coverage profiling data, now known as -fcoverage-compilation-dir. Clang also added a switch that rewrites the current working directory in both debug and coverage data. With the --remap-cwd-prefix proposal, Rust should already solve this issue, as long as the coverage profiling data is written with FileName::prefer_remapped(). If not, it should be a simple correction.

Thank you for reading.

Mentors or Reviewers

@michaelwoerister

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@danakj danakj added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Jul 27, 2021
@rustbot
Copy link
Collaborator

rustbot commented Jul 27, 2021

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

cc @rust-lang/compiler @rust-lang/compiler-contributors

@rustbot rustbot added the to-announce Announce this issue on triage meeting label Jul 27, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jul 31, 2021
@michaelwoerister
Copy link
Member

@rustbot second

I think it's generally a good idea to provide this functionality. Let's move forward with implementing this as an unstable feature so we can get some real-world experience with it.

Note that stabilizing this feature needs a separate decision by the whole compiler team.

@rustbot rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Aug 19, 2021
@michaelwoerister
Copy link
Member

And thanks a lot for the detail proposal, @danakj!

@apiraino
Copy link
Contributor

apiraino commented Sep 1, 2021

@rustbot label -final-comment-period +major-change-accepted

@apiraino apiraino closed this as completed Sep 1, 2021
@rustbot rustbot added major-change-accepted A major change proposal that was accepted to-announce Announce this issue on triage meeting and removed final-comment-period The FCP has started, most (if not all) team members are in agreement labels Sep 1, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 2, 2021
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 15, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team
Projects
None yet
Development

No branches or pull requests

4 participants