Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache usage meta tracking issue #7150

Open
ehuss opened this issue Jul 19, 2019 · 0 comments

Comments

@ehuss
Copy link
Contributor

commented Jul 19, 2019

This issue is to help provide an overview of the different issues around Cargo's excessive disk usage, and tangentially, reducing compile time by reusing artifacts in a shared cache.

Cleaning outdated artifacts

Cargo's target directory can grow substantially over time. It has limited capabilities to clean it with cargo clean. Also, in general, cargo clean has a fair number of bugs and is generally underwhelming.

Various issues and links of interest:

  • #5026 — cargo ./target fills with outdated artifacts as toolchains are updated/changed
  • #5885 — How to effectively clean target folder for CI caching
  • #6229 — Have an option to make Cargo attempt to clean up after itself.
  • #6435 — Remove artifacts for deps removed from Cargo.lock.
  • cargo-sweep — A tool to prune unused files.
  • The -Z mtime-on-use flag is an experiment to have Cargo update the mtime of used files to make it easier for tools like cargo-sweep to detect which files are stale.

I think a way forward here is to experiment and investigate different ways for tracking artifacts and last-use timestamps. mtime-on-use has an issue with cached files in Docker. The filename hash is opaque and doesn't provide any insight into the metadata which would inform whether or not an artifact could be removed.

Cargo currently tracks a variety of things in different ways. It has a .json fingerprint file which is generally unused (only for debug logging). It also has an invoked.timestamp file used for some change tracking. And mtime information is used in a few different ways. It might be interesting to experiment with a different way to coordinate all this information. Perhaps a single, unified file tracking all artifacts, or changing the way the per-artifact .json file works. The key points is that it must be fast and reliable, and should work well in Docker.

Cleaning cargo's home

Cargo's home directory ~/.cargo grows without bounds. There is currently no built-in way to shrink it.

The cargo-cache package is the foremost way to manage it currently (besides rm -rf). Ideally some of this would be a built-in capability of Cargo.

The main issue tracking this is #3289 — cargo clean ~/.cargo.

There has not been much discussion about this. Ideally cargo would have this capability built in, perhaps with some of the easier/safer tasks automated on a periodic basis.

Reusing shared dependencies

sccache is the primary way to share artifacts across projects. It is also possible to share targets with setting the CARGO_TARGET_DIR environment variable.

Issues:

  • #4301 — Suggestion: re-use built dependencies across directories
  • #4436 — Cache compilations of everything from crates.io
  • #5931 — Per-user compiled artefact cache

Since this has the potential to use a substantial amount of disk space, it would be desirable to have better support for pruning as listed above.

There are a fairly large number of tools which dig into the target directory. They would all be broken by this change, so we would need to figure out a strategy for migration before doing this. I began this in #6668, but I have not finished. Ideally #6668 and #6577 would be finished before making this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.