Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much faster symlinking, safer interactions with rustc #386

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
`Cargo.lock` is exactly what is expected (without implicit changes at build
time) but this may end up rejecting builds which were previously passing. To
get the old behavior back, set `cargoExtraArgs = "";`
* **Breaking**: `cargoDoc` will no longer install cargo artifacts by default.
Set `doInstallCargoArtifacts = true;` to get the old behavior back.
* **Breaking**: `cargoDoc` and `cargoClippy` will no longer install cargo artifacts
by default. Set `doInstallCargoArtifacts = true;` to get the old behavior back.
* **Breaking**: rustc is now executed with a wrapper script. Therefore providing
your own value of `RUSTC_WRAPPER` will no longer work. We do not believe that
this is commonly used.
* Large performance improvement when symlinking artifacts.
* `cargoDoc` will now install generated documentation in `$out/share/doc`
* Fixed a bug when testing proc macro crates with `cargoNextest` on macOS.
([#376](https://github.com/ipetkov/crane/pull/376))
Expand Down
7 changes: 3 additions & 4 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -1439,17 +1439,16 @@ identical files against a directory of previously prepared cargo artifacts.
It takes three positional arguments:
1. the installation directory for the output.
* An error will be raised if not specified
* If the specified path is a directory which exists then the current cargo
artifacts will be compared with the contents of said directory. Any files
whose contents and paths match will be symbolically linked together to
reduce the size of the data stored in the Nix store.
1. the path to cargo's artifact directory
* An error will be raised if not specified
1. a path to the previously prepared cargo artifacts
* An error will be raised if not specified
* `/dev/null` can be specified here if there is no previous directory to
deduplicate against

'symlinks.tar' will be created in the output directory in this case. It contains
symlinks to all of the dep files in this output, simplifying the untar behaviour.

Defines `prepareAndInstallCargoArtifactsDir()` which handles installing cargo's
artifact directory to the derivation's output. It takes three positional
arguments:
Expand Down
2 changes: 2 additions & 0 deletions lib/cargoClippy.nix
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,7 @@ mkCargoDerivation (args // {

buildPhaseCargoCommand = "cargoWithProfile clippy ${cargoExtraArgs} ${cargoClippyExtraArgs}";

doInstallCargoArtifacts = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed?

I don't agree with changing the defaults here because cargoClippy is intended to be allow chaining with other builds by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed this to be a large slowdown in practice - specifically around copying and patching interpreter paths in the output. My intuition was that folks are using cargoClippy as a linting task, and so like cargoDoc in the last version, it was an oversight which would primarily affect new users.

What would cargoClippy chaining look like for you? Is there an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that cargo clippy should be a a leaf node 99% of the time, if someone has a good reason to run it other way it should work. It technically can produce artifacts (same as cargo check iirc).

If I read the code here right this change doesn't just change the default, but hardcodes false.

Probably best as a separate PR with changelog info about change in behavior.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep if you'd like to propose a different default value here feel free to submit a separate PR for it (then we can at least have a more focused discussion on it!)


nativeBuildInputs = (args.nativeBuildInputs or [ ]) ++ [ clippy ];
})
43 changes: 43 additions & 0 deletions lib/mkCargoDerivation.nix
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
, rsync
, stdenv
, vendorCargoDeps
, writeShellApplication
, zstd
}:

Expand Down Expand Up @@ -39,13 +40,55 @@ let
"pnameSuffix"
"stdenv"
];

rustcWrapper = writeShellApplication {
name = "rustc-wrapper";
text = ''
set -euo pipefail

args=("$@")

temp_dir=$(mktemp -d)
trap 'rm -rf -- "$temp_dir"' EXIT

for i in "''${!args[@]}"; do
if [[ "''${args[i]}" == --out-dir=* ]]; then
current_out_dir="''${args[i]#--out-dir=}"
args[i]="--out-dir=$temp_dir"
break
elif [[ "''${args[i]}" == --out-dir ]]; then
current_out_dir="''${args[i+1]}"
args[i+1]="$temp_dir"
break
fi
done

if [[ -v current_out_dir ]]; then
stderr="$temp_dir/crane_stderr"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother with capturing stderr and stdout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Cargo uses it to know when a process finished, instead of waiting for it to fully terminate?

:/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: rm -Rf out-dir before starting original, would take care of this, get rid of an EXIT hook, temp dir altogether. If it worked.

But it just occurred to me that it's out-DIR and probably oftentimes the same dir is used to write many files, so that probably doesn't fly.

Copy link
Contributor

@dpc dpc Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could use LD_PRELOAD to intercept open, write syscalls in a rustc-wrapper and basically always follow the link if file is being read, and delete the symlink if being written to.

https://osterlund.xyz/posts/2018-03-12-interceptiong-functions-c.html

A bit involved, but could probably work perfectly and be 100% invisible. Though I don't know if it works on MacOS, and it will only work as long as rustc is using dynamically linked libc.

Copy link
Contributor Author

@j-baker j-baker Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no good ld_preload equivalent on macos which is why i didn't go down that path - and that seems extremely brittle to recompiles. Ideally Nix would let us mount fuse filesystems inside a build - that'd be the 'reliable' solution to this.

I felt this traded off nicely between being robust to change but still getting the job done - like - it depends on public apis of rustc rather than private.

Copy link
Contributor

@dpc dpc Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see a better way than the current one.

set +e
stdout=$("''${args[@]}" 2>"$stderr")
exit_code=$?
set -e
stderr_text=$(cat "$stderr")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought - if you know out dir ... wouldn't it be faster & easier to delete it beforehand instead of changing it and copying from it? Technically cargo might want to read something from it first... but something tells me it never actually happen.

Copy link
Contributor

@dpc dpc Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be a bit worries about these temp paths being written somewhere and then causing rebuilds etc. Instead of making them random, maybe they should be fixed. Eg. always $TMPDIR/rustc-wrapper/${orig-outdir-absolute-path}. Probably not, but who knows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered this. The problem is that what rustc outputs is dependent on what rustc has been asked to do (tell it to make a dylib? it'll make one!), so it's much more tightly coupled. Whereas the output dir is loosely coupled.

Regarding the temp paths being written somewhere - I logicked myself that this should never happen. In particular, compilers put a lot of energy into being reproducible - it's dodgy if the output is dependent on the path the output is to be written to.

rm "$stderr"
cp -r "$temp_dir/." "$current_out_dir"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp -a?

echo -n "$stdout"
echo -n "$stderr_text" >&2
exit $exit_code
else
exec "$@"
fi
'';
};
in
chosenStdenv.mkDerivation (cleanedArgs // {
inherit cargoArtifacts;

pname = "${args.pname or crateName.pname}${args.pnameSuffix or ""}";
version = args.version or crateName.version;

RUSTC_WRAPPER="${rustcWrapper}/bin/rustc-wrapper";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even know crane has functionality like this and I was using my wrapper myself anyway. It seems to me like an unnecessary functionality to expose to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While OK for PoC use, I think a proper wrapper should be handled by crane internally:

  • check where the rustc is (with which rustc)
  • set some CRANE_RUSTC_ORIG_BIN to it
  • prepend PATH with a path to a wrapper binary
  • inside the wrapper binary exec $CRANE_RUSTC_ORIG_BIN $@ ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly opposed to the forced usage of RUSTC_WRAPPER here. Just because we believe it to be rarely used is not sufficient to outright preclude projects from using their own RUSTC_WRAPPER.

(not to mention that plenty of builds work just fine without needing this workaround, so it's the exceptional builds which should opt into different behavior than forcing it on everyone)

Copy link
Contributor Author

@j-baker j-baker Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack - I can handle the wrapper internally (and then also wrap with the user's wrapper). Like, it should be easily possible to wrap twice.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned that this would be too brittle to maintain at the library level (you can set a rustc-wrapper in .cargo/config.toml, and those can be littered in different places in the project).


# Controls whether cargo's `target` directory should be copied as an output
doInstallCargoArtifacts = args.doInstallCargoArtifacts or true;

Expand Down
31 changes: 9 additions & 22 deletions lib/setupHooks/inheritCargoArtifactsHook.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,9 @@ inheritCargoArtifacts() {
# being built (since changing `Cargo.lock` would rebuild everything anyway), which makes them
# good candidates for symlinking (esp. since they can make up 60-70% of the artifact directory
# on most projects). Thus we ignore them when copying all other artifacts below as we will
# symlink them afterwards. Note that we scope these checks to the `/deps` subdirectory; the
# workspace's own .rlib and .rmeta files appear one directory up (and these may require being
# writable depending on how the actual workspace build is being invoked, so we'll leave them
# alone).
# symlink them afterwards. Note that we scope these checks to the `/deps` subdirectory, as
# this directory contains only rustc output. We have a rustc wrapper which ensures that any
# writes to these symlinks do not try to change the (immutable) symlink targets.
#
# NB: keep the executable bit only if set on the original file
# but make all files writable as sometimes read-only files will make the build choke
Expand All @@ -66,27 +65,15 @@ inheritCargoArtifacts() {
--times \
--chmod=u+w \
--executability \
--exclude 'deps/*.rlib' \
--exclude 'deps/*.rmeta' \
--exclude '*/deps/*' \
--exclude '.cargo-lock' \
"${preparedArtifacts}/" \
"${cargoTargetDir}/"

local linkCandidates=$(mktemp linkCandidatesXXXX.txt)
find "${preparedArtifacts}" \
'(' -path '*/deps/*.rlib' -or -path '*/deps/*.rmeta' ')' \
-printf "%P\n" \
>"${linkCandidates}"

# Next create any missing directories up front so we can avoid redundant checks later
cat "${linkCandidates}" \
| xargs --no-run-if-empty -n1 dirname \
| sort -u \
| (cd "${cargoTargetDir}"; xargs --no-run-if-empty mkdir -p)

# Lastly do the actual symlinking
cat "${linkCandidates}" \
| xargs -P ${NIX_BUILD_CORES} -I '##{}##' ln -s "${preparedArtifacts}/##{}##" "${cargoTargetDir}/##{}##"

symlinks="${cargoArtifacts}/symlinks.tar"
if [[ -f "${symlinks}" ]]; then
tar -xf "${symlinks}" -C "${cargoTargetDir}" --strip-components=1
Copy link
Contributor

@dpc dpc Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still compress. Probably lots of redundancy in these paths and that point CPU use will pay for itself in saved IO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Now that everything is a symlink, but the symlink is invisible to Nix because it's in a tar archive, what is going to prevent Nix from GCing the step being linked to?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpc's observation here is correct. Nix will perform a naive scan of the results for any store paths, but will not notice them if they are behind a tarball, meaning a store GC would end up invalidating references.

Copy link
Contributor Author

@j-baker j-baker Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An uncompressed tar file is just the uncompressed data with some special headers and padding - no fancy behaviour. Likewise, the content of a symlink is just the path. My understanding is that Nix will have no trouble at all understanding the dependency. Hex editor image placed below.

image

Copy link
Contributor Author

@j-baker j-baker Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, when compressed, I'd expect it's hard for nix to find what it's looking for, hence the explicit choice to not compress. Does that make sense?

Copy link
Contributor

@dpc dpc Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might work, but I wonder: would compressing + putting 1 symlink somewhere in the output (outside of what is copied to ./target), keep the GC happy? Aren't all these paths pointing to the same artifact in the nix store (or the thing it points to in a daisy chain)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd definitely work - mostly was going for simplicity vs optimality. A 345MB deps dir on my machine ends up with a 900kB symlinks.tar. Might be more clean to explicitly add a symlink though from the perspective of making the purpose clear.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there is a single line somewhere in the output (which can be a separate text file and not necessarily the tarball) which includes /nix/store/...-blah then Nix will consider that path a dependency of the derivation. So it's okay (as far as the GC is concerned) to create a tarball of symlinks, so long as there is some other symlink or text entry which points to the same "parent" directory that the tarball's symlinks will end up pointing to

fi
fi
else
echo unable to copy cargo artifacts, \"${preparedArtifacts}\" looks invalid
Expand Down
23 changes: 8 additions & 15 deletions lib/setupHooks/installCargoArtifactsHook.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,23 +25,16 @@ dedupAndInstallCargoArtifactsDir() {

mkdir -p "${dest}"

if [ -d "${prevCargoTargetDir}" ]; then
echo "symlinking duplicates in ${cargoTargetDir} to ${prevCargoTargetDir}"

while read -r fullTargetFile; do
# Strip the common prefix of the current target directory
local targetFile="${fullTargetFile#"${cargoTargetDir}"}"
# Join the path and ensure we don't have a duplicate `/` separator
local candidateOrigFile="${prevCargoTargetDir}/${targetFile#/}"

if cmp --silent "${candidateOrigFile}" "${fullTargetFile}"; then
ln --symbolic --force --logical "${candidateOrigFile}" "${fullTargetFile}"
fi
done < <(find "${cargoTargetDir}" -type f)
fi

echo installing "${cargoTargetDir}" to "${dest}"
mv "${cargoTargetDir}" --target-directory="${dest}"

local symlinksDir="$(mktemp -d)"
cp -Rs "${dir}/target" "${symlinksDir}"
if test -n "$(shopt -s nullglob; echo $symlinksDir/target/*/deps)"; then
pushd "$symlinksDir"
tar -cf "${dir}/symlinks.tar" target/*/deps
popd
fi
}

prepareAndInstallCargoArtifactsDir() {
Expand Down