From 8e0d6e4cd3b00ea7adee3b7c934438df7d5cab7c Mon Sep 17 00:00:00 2001 From: "james.baker@helsing.ai" Date: Sat, 16 Sep 2023 13:00:53 +0100 Subject: [PATCH 1/2] Much faster symlinking, safer interactions with `rustc` This MR represents a large performance improvement in practical terms in simple Crane situations. For some unscientific numbers, I tested in the following fashion. For each of the two versions of crane (master, this branch), and my simple closed source software project (it's 1kloc, about 170 crate deps, mostly just hyper): 1. I ran `nix flake check` to ensure that any crate downloads were done, any supporting derivations complete. 1. I then added a new crate (anyhow) and ran `nix flake check` again. This times the whole flow after a dependency is added. 1. I then changed a constant and re-ran, simulating the usual flow. Because this project is relatively small, I would expect that this represents a 'worst case' scenario. For example, when uncompressed, it contains 500MB of dependencies, whereas another project I work on represents 3.6GB. The results were as follows: Before this PR: - build with new dep: 2m15s - build with new code: 1m19s After this PR: - build with new dep: 1m22s - build with new code: 32s In addition, it is more robust to crate rebuilds. How it works/why it's better: 1. Drop the diffing behaviour when doing symlinking. This is an explicit tradeoff - if one is doing symlinking on inheritance, we would expect any duplicate data to be in the form of symlinks, for which diffing file content is unhelpful. Given that this only helps the case where we are not symlinking on inheritance, are not archiving on install, it seems reasonable for it to be potentially slower in this case. I say potentially slower since if we have target dirs of 1GB, we are trading 2GB of reads for up to 1GB fewer writes. I'd note here that Nix store optimisation will cover for space savings. But, main argument: common case should be archival or symlinking, and we can boost the performance of the common case by removing this behaviour. 1. Instead, we build a `symlinks.tar` containing symlinks to the outputs of this derivation. 1. When inheriting, instead of traversing the tree and creating symlinks, we just extract this tar. This is great beacuse it means that on both derivation end and derivation start, we avoid forking O(num files produced by cargo build) processes. Since even small projects have thousands of files emitted (my own has 2033 output files). Effectively, GNU tar is much more optimised than the pre-existing bash script. 1. At this point, we still have the problem where rustc may try to write to a file. We use a `RUSTC_WRAPPER` to instead write to a temporary directory, and after the command is finished we copy the artifacts from the out dir back to the target location. There is a potential (small) slowdown caused by this - I observe cargo to use rustc's stderr to kick off new builds as soon as it can, and so had to capture rustc's stdout. However, this effect is most likely very minor. --- CHANGELOG.md | 8 +++- lib/cargoClippy.nix | 2 + lib/mkCargoDerivation.nix | 43 +++++++++++++++++++++ lib/setupHooks/inheritCargoArtifactsHook.sh | 31 +++++---------- lib/setupHooks/installCargoArtifactsHook.sh | 23 ++++------- 5 files changed, 68 insertions(+), 39 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b0de00d9..47d2d8f9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,8 +18,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. `Cargo.lock` is exactly what is expected (without implicit changes at build time) but this may end up rejecting builds which were previously passing. To get the old behavior back, set `cargoExtraArgs = "";` -* **Breaking**: `cargoDoc` will no longer install cargo artifacts by default. - Set `doInstallCargoArtifacts = true;` to get the old behavior back. +* **Breaking**: `cargoDoc` and `cargoClippy` will no longer install cargo artifacts + by default. Set `doInstallCargoArtifacts = true;` to get the old behavior back. +* **Breaking**: rustc is now executed with a wrapper script. Therefore providing + your own value of `RUSTC_WRAPPER` will no longer work. We do not believe that + this is commonly used. +* Large performance improvement when symlinking artifacts. * `cargoDoc` will now install generated documentation in `$out/share/doc` * Fixed a bug when testing proc macro crates with `cargoNextest` on macOS. ([#376](https://github.com/ipetkov/crane/pull/376)) diff --git a/lib/cargoClippy.nix b/lib/cargoClippy.nix index 3d9f3a68..ed784156 100644 --- a/lib/cargoClippy.nix +++ b/lib/cargoClippy.nix @@ -19,5 +19,7 @@ mkCargoDerivation (args // { buildPhaseCargoCommand = "cargoWithProfile clippy ${cargoExtraArgs} ${cargoClippyExtraArgs}"; + doInstallCargoArtifacts = false; + nativeBuildInputs = (args.nativeBuildInputs or [ ]) ++ [ clippy ]; }) diff --git a/lib/mkCargoDerivation.nix b/lib/mkCargoDerivation.nix index 929acb45..36328ad2 100644 --- a/lib/mkCargoDerivation.nix +++ b/lib/mkCargoDerivation.nix @@ -8,6 +8,7 @@ , rsync , stdenv , vendorCargoDeps +, writeShellApplication , zstd }: @@ -39,6 +40,46 @@ let "pnameSuffix" "stdenv" ]; + + rustcWrapper = writeShellApplication { + name = "rustc-wrapper"; + text = '' + set -euo pipefail + + args=("$@") + + temp_dir=$(mktemp -d) + trap 'rm -rf -- "$temp_dir"' EXIT + + for i in "''${!args[@]}"; do + if [[ "''${args[i]}" == --out-dir=* ]]; then + current_out_dir="''${args[i]#--out-dir=}" + args[i]="--out-dir=$temp_dir" + break + elif [[ "''${args[i]}" == --out-dir ]]; then + current_out_dir="''${args[i+1]}" + args[i+1]="$temp_dir" + break + fi + done + + if [[ -v current_out_dir ]]; then + stderr="$temp_dir/crane_stderr" + set +e + stdout=$("''${args[@]}" 2>"$stderr") + exit_code=$? + set -e + stderr_text=$(cat "$stderr") + rm "$stderr" + cp -r "$temp_dir/." "$current_out_dir" + echo -n "$stdout" + echo -n "$stderr_text" >&2 + exit $exit_code + else + exec "$@" + fi + ''; + }; in chosenStdenv.mkDerivation (cleanedArgs // { inherit cargoArtifacts; @@ -46,6 +87,8 @@ chosenStdenv.mkDerivation (cleanedArgs // { pname = "${args.pname or crateName.pname}${args.pnameSuffix or ""}"; version = args.version or crateName.version; + RUSTC_WRAPPER="${rustcWrapper}/bin/rustc-wrapper"; + # Controls whether cargo's `target` directory should be copied as an output doInstallCargoArtifacts = args.doInstallCargoArtifacts or true; diff --git a/lib/setupHooks/inheritCargoArtifactsHook.sh b/lib/setupHooks/inheritCargoArtifactsHook.sh index 66dab689..177588a3 100644 --- a/lib/setupHooks/inheritCargoArtifactsHook.sh +++ b/lib/setupHooks/inheritCargoArtifactsHook.sh @@ -50,10 +50,9 @@ inheritCargoArtifacts() { # being built (since changing `Cargo.lock` would rebuild everything anyway), which makes them # good candidates for symlinking (esp. since they can make up 60-70% of the artifact directory # on most projects). Thus we ignore them when copying all other artifacts below as we will - # symlink them afterwards. Note that we scope these checks to the `/deps` subdirectory; the - # workspace's own .rlib and .rmeta files appear one directory up (and these may require being - # writable depending on how the actual workspace build is being invoked, so we'll leave them - # alone). + # symlink them afterwards. Note that we scope these checks to the `/deps` subdirectory, as + # this directory contains only rustc output. We have a rustc wrapper which ensures that any + # writes to these symlinks do not try to change the (immutable) symlink targets. # # NB: keep the executable bit only if set on the original file # but make all files writable as sometimes read-only files will make the build choke @@ -66,27 +65,15 @@ inheritCargoArtifacts() { --times \ --chmod=u+w \ --executability \ - --exclude 'deps/*.rlib' \ - --exclude 'deps/*.rmeta' \ + --exclude '*/deps/*' \ --exclude '.cargo-lock' \ "${preparedArtifacts}/" \ "${cargoTargetDir}/" - - local linkCandidates=$(mktemp linkCandidatesXXXX.txt) - find "${preparedArtifacts}" \ - '(' -path '*/deps/*.rlib' -or -path '*/deps/*.rmeta' ')' \ - -printf "%P\n" \ - >"${linkCandidates}" - - # Next create any missing directories up front so we can avoid redundant checks later - cat "${linkCandidates}" \ - | xargs --no-run-if-empty -n1 dirname \ - | sort -u \ - | (cd "${cargoTargetDir}"; xargs --no-run-if-empty mkdir -p) - - # Lastly do the actual symlinking - cat "${linkCandidates}" \ - | xargs -P ${NIX_BUILD_CORES} -I '##{}##' ln -s "${preparedArtifacts}/##{}##" "${cargoTargetDir}/##{}##" + + symlinks="${cargoArtifacts}/symlinks.tar" + if [[ -f "${symlinks}" ]]; then + tar -xf "${symlinks}" -C "${cargoTargetDir}" --strip-components=1 + fi fi else echo unable to copy cargo artifacts, \"${preparedArtifacts}\" looks invalid diff --git a/lib/setupHooks/installCargoArtifactsHook.sh b/lib/setupHooks/installCargoArtifactsHook.sh index 46fbe244..791f10c7 100644 --- a/lib/setupHooks/installCargoArtifactsHook.sh +++ b/lib/setupHooks/installCargoArtifactsHook.sh @@ -25,23 +25,16 @@ dedupAndInstallCargoArtifactsDir() { mkdir -p "${dest}" - if [ -d "${prevCargoTargetDir}" ]; then - echo "symlinking duplicates in ${cargoTargetDir} to ${prevCargoTargetDir}" - - while read -r fullTargetFile; do - # Strip the common prefix of the current target directory - local targetFile="${fullTargetFile#"${cargoTargetDir}"}" - # Join the path and ensure we don't have a duplicate `/` separator - local candidateOrigFile="${prevCargoTargetDir}/${targetFile#/}" - - if cmp --silent "${candidateOrigFile}" "${fullTargetFile}"; then - ln --symbolic --force --logical "${candidateOrigFile}" "${fullTargetFile}" - fi - done < <(find "${cargoTargetDir}" -type f) - fi - echo installing "${cargoTargetDir}" to "${dest}" mv "${cargoTargetDir}" --target-directory="${dest}" + + local symlinksDir="$(mktemp -d)" + cp -Rs "${dir}/target" "${symlinksDir}" + if test -n "$(shopt -s nullglob; echo $symlinksDir/target/*/deps)"; then + pushd "$symlinksDir" + tar -cf "${dir}/symlinks.tar" target/*/deps + popd + fi } prepareAndInstallCargoArtifactsDir() { From 835b90e377e88f45ee9d33894fe18b904b40ce0b Mon Sep 17 00:00:00 2001 From: "james.baker@helsing.ai" Date: Sat, 16 Sep 2023 13:52:45 +0100 Subject: [PATCH 2/2] api docs --- docs/API.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/API.md b/docs/API.md index 9270d654..1470a577 100644 --- a/docs/API.md +++ b/docs/API.md @@ -1439,10 +1439,6 @@ identical files against a directory of previously prepared cargo artifacts. It takes three positional arguments: 1. the installation directory for the output. * An error will be raised if not specified - * If the specified path is a directory which exists then the current cargo - artifacts will be compared with the contents of said directory. Any files - whose contents and paths match will be symbolically linked together to - reduce the size of the data stored in the Nix store. 1. the path to cargo's artifact directory * An error will be raised if not specified 1. a path to the previously prepared cargo artifacts @@ -1450,6 +1446,9 @@ It takes three positional arguments: * `/dev/null` can be specified here if there is no previous directory to deduplicate against +'symlinks.tar' will be created in the output directory in this case. It contains +symlinks to all of the dep files in this output, simplifying the untar behaviour. + Defines `prepareAndInstallCargoArtifactsDir()` which handles installing cargo's artifact directory to the derivation's output. It takes three positional arguments: