Rework rustc output file tracking. #8210

ehuss · 2020-05-05T21:32:27Z

This is some cleanup around how Cargo computes the outputs from rustc, and changes how cargo clean -p works. This also includes some minor observable differences detailed below.

clean -p changes

Previously cargo clean -p would build the unit graph and attempt to guess what all the filename hashes are. This has several drawbacks. It incorrectly guesses the hashes in many cases (such as different features). It also tends to cause bugs because it passes every permutation of every setting into Cargo's internals, which may not be prepared to accept unsupported combinations like "test a build-script".

This new implementation uses a strategy of querying rustc for the filename prefix/suffix, and then deletes the files using globs.

cargo clean -p should now be more complete in deleting a package's artifacts, for example:

Now deletes incremental files.
Deletes dep-info files (both rustc and cargo).
Handles changes in profiles, features, (anything in the hash).
Covers a few more files (for example, dSYM in some cases, etc.). Should delete almost all files for most targets.

Internal changes

There are a bunch of internal changes to make Cargo do a better job of tracking the outputs from rustc, and to make the code easier to understand:

The list of output files is now solely computed in TargetInfo. The files which are uplifted are solely computed in CompilationFiles::uplift_to. Previously both responsibilities were awkwardly spread in both locations.
Whether or not a file should have a hyphen or underscore is now determined in one place (FileType::should_replace_hyphens).
Added CrateType to replace LibKind, to avoid usage of strings, and to use the same structure for all of the target kinds.
Added FileFlavor::Rmeta to be more explicit as to which output files are ".rmeta". (Previously the Linkable{rmeta} flag was specific for pipelining, and rmeta was false for things like cargo check, which was a bit hard to deal with.)
Removed hyphen/underscore renaming in rustc. This was mostly unused, because it did not consider hashes in the filename, so it only applied to binaries without hashes, which is essentially just wasm32 and msvc. This hyphen/underscore translation still happens during "uplift".
Changed it so that Metadata is always computed for every unit. The logic of whether or not something should use it is moved to should_use_metadata. I didn't realize that multiple units were sharing the same fingerprint directory (when they don't have a hash), which caused some bugs (like bad output caching).

Behavioral changes

Cargo now does more complete tracking of the files generated by rustc (and the linker). This means that some files will now be uplifted that previously weren't. It also means they will show up in the artifact JSON notifications. The newly tracked files are:

.exp export files for Windows MSVC dynamic libraries. I don't know if these are actually useful (nobody has asked for them AFAIK). Presumably the linker creates them for some reason. Note that lld doesn't generate these files, this is only link.exe.
Proc-macros on Windows track import/export files.
All targets (like tests, etc.) that generate separate debug files (pdb/dSYM) are tracked.
Added .map files for wasm32-unknown-emscripten.

Some other subtle changes:

A binary example with a hyphen on Windows MSVC will now show up as examples/foo_bar.exe and examples/foo-bar.exe. Previously Cargo would just rename it to contain the hyphen. This is a consequence of simplifying the code, and I was reluctant to add a special case for this very narrow situation.
Example libs now follow the same rules for hyphen/underscore translation as normal libs (they will now use underscores).
Fingerprint changes:
- Fingerprint files no longer have a hash in them. Their parent directory already contained the hash.
- The fingerprint filename now uses hyphens instead of converting to underscores.
- The fingerprint directory is now separated even if the unit doesn't use Metadata, to fix a caching bug.
macOS: dSYM is uplifted for all dynamic libraries (dylib/cdylib/proc-macro) and for build-script-build (in case someone wants to debug a build script?).

Notes

I suspect that the implementation of clean -p may change again in the future. If and when Cargo implements some kind of on-disk database that tracks artifacts (for the purpose of garbage collection), then cargo clean -p can be rewritten to use that mechanism if appropriate.
The build_script_build incremental directory isn't deleted because Cargo doesn't know which one belongs to which package. I'm uncertain if that's reasonably fixable. The only option I've thought of is to place each package's incremental output in a separate directory.
Should Cargo use globset to handle non-utf-8 filenames? I suspect that Cargo's support for non-utf-8 names is pretty poor, so I'm uncertain how important that is.

Closes #8149
Closes #6937
Closes #5788
Closes #5375
Closes #3530

rust-highfive · 2020-05-05T21:32:31Z

r? @Eh2406

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton

Thanks for pushing on this @ehuss! I only sort of lightly skimmed the internal refactorings. It all sounds fine at a high-level and everything internally looked "correct enough" that it should either be covered by tests or we'll figure it out on nightly pretty soon anyway.

I'm left with really only one minor comment, but otherwise this I think is a great improvement to cargo clean -p, so r=me when ready

alexcrichton · 2020-05-06T14:54:28Z

src/cargo/ops/cargo_clean.rs

+        if matches.is_empty() {
+            anyhow::bail!("package ID specification `{}` matched no packages", spec);
+        }
+        packages.extend(pkg_set.get_many(matches)?);


FWIW the get_many here starts a new download session each time, so maybe these could all be collected from the opts.spec.iter() loop and one get_many could be used?

When a unit does not have Metadata, the computation of fingerprints depends on reusing the same fingerprint file to detect if the output needs to be rebuilt. The previous change that put each unit's fingerprint into a separate directory was wrong, and broke change detection in this case (for example, executables on MSVC). Instead, use a different approach to deal with compiler output caching by using the same naming convention as the fingerprint file itself.

[beta] Fix `clean -p` with a build dependency. This is a temporary and simple fix for #8149 where `cargo clean -p foo` would fail if `foo` has a build dependency. The full fix is in #8210, but I think that PR is way too risky to backport.

ehuss · 2020-05-06T20:43:14Z

I pushed an update that reverted the "always compute Metadata" change. It doesn't work for targets like MSVC in the following scenario:

cargo build --bin foo --features f1 <- generates deps/foo.exe
cargo build --bin foo --features f2 <- Overwrites deps/foo.exe
cargo build --bin foo --features f1 <- If there is a separate fingerprint, this is "fresh", and doesn't regenerate the exe when it should.

In the case where there is no metadata hash in the filename, the units must reuse the same fingerprint file.

I changed the fix for the compiler output cache by changing the output filename to use the same format as fingerprints, like output-bin-foo instead of just output.

alexcrichton · 2020-05-07T15:35:29Z

@bors: r+

bors · 2020-05-07T15:35:30Z

📌 Commit 7438770 has been approved by alexcrichton

bors · 2020-05-07T15:35:39Z

⌛ Testing commit 7438770 with merge c719a05...

bors · 2020-05-07T15:59:13Z

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing c719a05 to master...

Fix fingerprinting for lld on Windows with dylib. This fixes an issue where if `lld` is used on Windows, dynamic libraries will never be treated as "fresh". This is a regression from #8210 where Cargo is expecting export files to be created, but lld does not create these. The solution is to ignore "Auxiliary" files in fingerprinting, which AFAIK aren't really needed (only the primary output files really matter). Fixes #8284

Fix fingerprinting for lld on Windows with dylib. This fixes an issue where if `lld` is used on Windows, dynamic libraries will never be treated as "fresh". This is a regression from rust-lang#8210 where Cargo is expecting export files to be created, but lld does not create these. The solution is to ignore "Auxiliary" files in fingerprinting, which AFAIK aren't really needed (only the primary output files really matter). Fixes rust-lang#8284

Fix overzealous `clean -p` for reserved names. #8210 changed the way `clean -p` worked, but in some ways it is a little too sloppy. If a package has a test named `build`, then it would delete the `build` directory thinking an executable named "build" exists. This changes it so that it does not attempt to delete tests/benches from the uplift directory.

rust-highfive assigned Eh2406 May 5, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 5, 2020

ehuss added 3 commits May 5, 2020 14:32

Add CrateType to replace LibKind.

7a06be1

Rework how Cargo computes the rustc file outputs.

eac3b66

Implement new clean -p using globs.

a8997cb

ehuss force-pushed the rework-rustc-output branch from 0cd7d91 to a8997cb Compare May 5, 2020 21:34

alexcrichton reviewed May 6, 2020

View reviewed changes

clean -p: call get_many once.

bd5f563

ehuss mentioned this pull request May 6, 2020

[beta] Fix clean -p with a build dependency. #8216

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 7, 2020

Eh2406 mentioned this pull request May 7, 2020

Suport new fingerprint format. holmgr/cargo-sweep#36

Closed

bors merged commit c719a05 into rust-lang:master May 7, 2020

ehuss mentioned this pull request May 27, 2020

Fix fingerprinting for lld on Windows with dylib. #8290

Merged

not-matthias mentioned this pull request Jun 2, 2020

DLL Suffix Panic #8308

Closed

ehuss mentioned this pull request Jun 22, 2020

Fix overzealous clean -p for reserved names. #8398

Merged

taiki-e mentioned this pull request Jun 22, 2020

Add --clean-per-run flag taiki-e/cargo-hack#49

Merged

ehuss mentioned this pull request Sep 27, 2020

Duplicate artifact tracking issue. #6313

Open

ehuss added this to the 1.45.0 milestone Feb 6, 2022

ehuss mentioned this pull request May 3, 2023

Support cargo clean rust-lang/wg-cargo-std-aware#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework rustc output file tracking. #8210

Rework rustc output file tracking. #8210

ehuss commented May 5, 2020

rust-highfive commented May 5, 2020

alexcrichton left a comment

alexcrichton May 6, 2020

ehuss May 6, 2020

ehuss commented May 6, 2020

alexcrichton commented May 7, 2020

bors commented May 7, 2020

bors commented May 7, 2020

bors commented May 7, 2020

Rework rustc output file tracking. #8210

Rework rustc output file tracking. #8210

Conversation

ehuss commented May 5, 2020

rust-highfive commented May 5, 2020

alexcrichton left a comment

Choose a reason for hiding this comment

alexcrichton May 6, 2020

Choose a reason for hiding this comment

ehuss May 6, 2020

Choose a reason for hiding this comment

ehuss commented May 6, 2020

alexcrichton commented May 7, 2020

bors commented May 7, 2020

bors commented May 7, 2020

bors commented May 7, 2020