Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo build --dependencies-only #2644

Open
nagisa opened this issue May 4, 2016 · 329 comments
Open

cargo build --dependencies-only #2644

nagisa opened this issue May 4, 2016 · 329 comments
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@nagisa
Copy link
Member

nagisa commented May 4, 2016

cargo team notes:


There should be an option to only build dependencies.

@alexcrichton alexcrichton added the A-configuration Area: cargo config files and env vars label May 4, 2016
@KalitaAlexey
Copy link
Contributor

@nagisa,
Why do you want it?

@nagisa
Copy link
Member Author

nagisa commented Jan 17, 2017

I do not remember exactly why, but I do remember that I ended just running rustc manually.

@KalitaAlexey
Copy link
Contributor

@posborne, @mcarton, @devyn,
You reacted with thumbs up.
Why do you want it?

@mcarton
Copy link
Member

mcarton commented Jan 17, 2017

Sometimes you add a bunch of dependencies to your project, know it will take a while to compile next time you cargo build, but want your computer to do that as you start coding so the next cargo build is actually fast.
But I guess I got here searching for a cargo doc --dependencies-only, which allows you to get the doc of your dependencies while your project does not compile because you'd need the doc to know how exactly to fix that compilation error you've had for a half hour 😄

@gregwebs
Copy link

As described in #3615 this is useful with build to setup a cache of all dependencies.

@alexcrichton
Copy link
Member

@gregwebs out of curiosity do you want to cache compiled dependencies or just downloaded dependencies? Caching compiled dependencies isn't implemented today (but would be with a command such as this) but downloading dependencies is available via cargo fetch.

@gregwebs
Copy link

gregwebs commented Jan 31, 2017

Generally, as with my caching use case, the dependencies change infrequently and it makes sense to cache the compilation of them.

The Haskell tool stack went through all this and they seemed to generally decided to merge things into a single command where possible. For fetch they did end up with something kinda confusing though: build --dry-run --prefetch. For build --dependencies-only mentioned here they do have the same: build --only-dependencies

@alexcrichton
Copy link
Member

@gregwebs ok thanks for the info!

@KalitaAlexey
Copy link
Contributor

@alexcrichton,
It looks like I should continue my work on the PR.
Will Cargo's team accept it?

@alexcrichton
Copy link
Member

@KalitaAlexey I personally wouldn't be convinced just yet, but it'd be good to canvas opinions from others on @rust-lang/tools as well

@KalitaAlexey
Copy link
Contributor

@alexcrichton,
Anyway I have no time right now)

@nrc
Copy link
Member

nrc commented Feb 2, 2017

I don't see much of a use case - you can just do cargo build and ignore the output for the last crate. If you really need to do this (for efficiency) then there is API you can use.

@gregwebs
Copy link

gregwebs commented Feb 4, 2017

What's the API?

@nrc
Copy link
Member

nrc commented Feb 6, 2017

Implement an Executor. That lets you intercept every call to rustc and you can do nothing if it is the last crate.

@gregwebs
Copy link

gregwebs commented Feb 6, 2017

I wasn't able to find any information about an Executor for cargo. Do you have any links to documentation?

@nrc
Copy link
Member

nrc commented Feb 6, 2017

Docs are a little thin, but start here:

/// A glorified callback for executing calls to rustc. Rather than calling rustc
/// directly, we'll use an Executor, giving clients an opportunity to intercept
/// the build calls.

You can look at the RLS for an example of how to use them: https://github.com/rust-lang-nursery/rls/blob/master/src/build.rs#L288

@shepmaster
Copy link
Member

A question of Stack Overflow wanted this feature. In that case, the OP wanted to build the dependencies for a Docker layer.

A similar situation exists for the playground, where I compile all the crates once. In my case, I just put in a dummy lib.rs / main.rs. All the dependencies are built, and the real code is added in the future.

@alexcrichton
Copy link
Member

@shepmaster unfortunately the proposed solution wouldn't satisfy that question because a Cargo.toml won't parse without associated files in src (e.g. src/lib.rs, etc). So that question would still require "dummy files", in which case it wouldn't specifically be serviced by this change.

@lolgesten
Copy link

lolgesten commented Oct 9, 2017

I ended up here because I also am thinking about the Docker case. To do a good docker build I want to:

COPY Cargo.toml Cargo.lock /mything

RUN cargo build-deps --release  # creates a layer that is cached

COPY src /mything/src

RUN cargo build --release       # only rebuild this when src files changes

This means the dependencies would be cached between docker builds as long as Cargo.toml and Cargo.lock doesn't change.

I understand src/lib.rs src/main.rs are needed to do a good build, but maybe build-deps simply builds all the deps.

@ghost
Copy link

ghost commented Oct 9, 2017

The dockerfile template in shepmaster's linked stackoverflow post above SOLVES this problem

I came to this thread because I also wanted the docker image to be cached after building the dependencies. After later resolving this issue, I posted something explaining docker caching, and was informed that the answer was already linked in the stackoverflow post. I made this mistake, someone else made this mistake, it's time to clarify.

RUN cd / && \
    cargo new playground
WORKDIR /playground                      # a new project has a src/main.rs file

ADD Cargo.toml /playground/Cargo.toml 
RUN cargo build                          # DEPENDENCIES ARE BUILD and CACHED
RUN cargo build --release
RUN rm src/*.rs                          # delete dummy src files

# here you add your project src to the docker image

After building, changing only the source and rebuilding starts from the cached image with dependencies already built.

@lolgesten
Copy link

someone needs to relax...

@lolgesten
Copy link

Also @KarlFish what you're proposing is not actually working. If using FROM rust:1.20.0.

  1. cargo new playground fails because it wants USER env variable to be set.
  2. RUN cargo build does not build dependencies for release, but for debug. why do you need that?

@lolgesten
Copy link

lolgesten commented Oct 9, 2017

Here's a better version.

FROM rust:1.20.0

WORKDIR /usr/src

# Create blank project
RUN USER=root cargo new umar

# We want dependencies cached, so copy those first.
COPY Cargo.toml Cargo.lock /usr/src/umar/

WORKDIR /usr/src/umar

# This is a dummy build to get the dependencies cached.
RUN cargo build --release

# Now copy in the rest of the sources
COPY src /usr/src/umar/src/

# This is the actual build.
RUN cargo build --release \
    && mv target/release/umar /bin \
    && rm -rf /usr/src/umar

WORKDIR /

EXPOSE 3000

CMD ["/bin/umar"]

@shepmaster
Copy link
Member

You can always review the complete Dockerfile for the playground.

@maelvls
Copy link

maelvls commented Nov 10, 2017

Hi!
What is the current state of the --deps-only idea? (mainly for dockerization)

@AdrienneCohea
Copy link

I agree that it would be really cool to have a --deps-only option so that we could cache our filesystem layers better in Docker.

I haven't tried replicating this yet, but it looks very promising. This is in glibc and not musl, by the way. My main priority is to get to a build that doesn't take 3-5 minutes ever time, not a 5 MB alpine-based image.

@mcronce
Copy link

mcronce commented Apr 5, 2023

Speaking for myself, I deploy 100% of my services in containers, built with docker build; things like cargo clippy and cargo test are run before starting the container build. I don't have a need to spin up external services in most projects, and never in the same container.

In concrete terms, my container builds always look something like this:

FROM rust:1.68 AS builder

WORKDIR /repo
# Cache downloaded+built dependencies
COPY Cargo.toml Cargo.lock /repo/
RUN \
    mkdir /repo/src && \
    echo 'fn main() {}' > /repo/src/main.rs && \
    cargo build --release && \
    rm -Rvf /repo/src

# Build our actual code
COPY src /repo/src
RUN \
    touch src/main.rs && \
    cargo build --release

FROM gcr.io/distroless/cc-debian11
COPY --from=builder /repo/target/release/whatever /usr/local/bin/
ENTRYPOINT ["/usr/local/bin/whatever"]

EDIT to add: It's worth noting that this workflow works fine; while it makes for a slightly verbose Dockerfile, it's something that changes rarely/never, and there's not much I can really think of that cargo could do to improve the situation without tightly integrating a container runtime. This has probably been mentioned upthread already (apologies for commenting without reading through it first), but cargo build --dependencies-only would eliminate the need for that echo 'fn main() {}' bit, but wouldn't change anything else; adding your own src directory prior to the dependency build will still blow the cache for that layer, causing dependencies to be rebuilt when your code changes.

@scottlamb
Copy link

There can be multiple motivations for using Cargo in Docker, for example:

I think these examples are both off the mark:

[Cargo in Docker] can be used to provide a containarized Rust development environment

If I'm using a containerized Rust development environment, I'm running cargo build from within a relatively long-lived docker run invocation. I'll have the source code and target dir in a volume mount. There's no layer caching of my builds or any need for it.

It can be used to provide an environment with external services required for running the Rust application.

That can be done without cargo running in the same Docker container as those external services, or any Docker container at all.

If folks are trying to use Docker's layer cache, they're by definition running docker build on a Dockerfile that has cargo build in a RUN step. (Some folks might be doing this during development but I'd argue they'd be much happier with the long-lived container and volume mount.)

I don't think cargo chef-style layer caching is the most effective way to cache the target dir (see my previous comment for my alternative and its advantages) but it's better than not caching at all.

I also think they're often (not always) doing these builds on a stateless CI system, where there's the additional complication of how you save/restore the layer cache between invocations. There are various options but they all seem to have (different) caveats.

@tcmal
Copy link
Contributor

tcmal commented Apr 5, 2023

* It can be used to provide a containarized Rust development environment, which is used for all kinds of development activities - performing a type check, compiling, running unit tests, benchmarks, formatting, lints, trying if the project compiles with multiple features enabled, etc. In this case, the user might want to execute a lot of different types of builds (`cargo test`, `cargo check`, `cargo build`, `cargo build --release`, `cargo build --features foo`), which could make the cached Docker layers quite bloated.

In this case, you're most likely to use something like docker run -it --rm -v $PWD:/app rust bash, which means the target directory is stored in the normal place. I suppose you could also set up cargo home if you did this often (so crates.io index was preserved).

I understand the desire to have a more efficient solution also, but it's been nearly 7 years since this issue was opened and the described use-cases still require a third-party tool and some googling.

If I started to implement this feature close to originally described, would it be likely to be accepted?

@jtran
Copy link

jtran commented Apr 6, 2023

I'd like to cache all sorts of commands that I repeatedly execute in CI. For me, this has nothing to do with my local development machine.

But I honestly don't understand the distinction. Why is just cargo build special for the purposes of this issue?

Whether the size of the resulting cache is large is all relative. If it's many gigabytes but cuts CI time by 90%, that's a win in my book.

BTW, I run cargo test --release partly to prevent rebuilding dependencies with multiple profiles. This saves both time and space.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 6, 2023

If I'm using a containerized Rust development environment, I'm running cargo build from within a relatively long-lived docker run invocation. I'll have the source code and target dir in a volume mount. There's no layer caching of my builds or any need for it.

Fair enough, for a development use-case the user will probably indeed have a long-lived container where caching is not such a concern.

That can be done without cargo running in the same Docker container as those external services, or any Docker container at all.

That's true, the services use-case is maybe performed with docker compose or something similar, what I meant was using cargo build in a Docker image as a means of preparing a production image (the external service thing is related, but probably orthogonal).

If I started to implement this feature close to originally described, would it be likely to be accepted?

What do you mean by "close to originally described"? Something akin to --deps-only has been implemented in this issue several times, but it has been shown repeatedly that it does not help with Docker layer caching.

Implementing a command for building dependencies only, while ignoring all source files, could in theory remove the need to create dummy files, but users would still need to copy all Cargo.toml, config.toml files and Cargo.lock. We have examined this and found that it would require a non-trivial refactoring of Cargo to make this work, so currently this solution does not seem viable, since it wouldn't even solve the problem completely, it would just remove a part of the friction.

Whether the size of the resulting cache is large is all relative. If it's many gigabytes but cuts CI time by 90%, that's a win in my book.

That's one of the questions that we wanted to know the answer to, whether users care about the cached layer size or not. My personal expectation is that users don't care about it and just want caching to work.

@brownjohnf
Copy link

brownjohnf commented Apr 6, 2023 via email

@scottlamb
Copy link

scottlamb commented Apr 7, 2023

I want to thank @Kobzol for writing a nice doc laying out the problem nicely and why cargo build --dependencies-only isn't really that helpful for Docker.

I've described my use case and alternative approach above. Below is my opinion about what cargo changes might be helpful.

I'd still like to see cargo doc --dependencies-only. I tried --keep-going (mentioned here). This might seem silly, but it's just not reassuring to have it fail with a bunch of compilation errors, even if it's first produced the output I actually need. If I had both options available, I'd go for --dependencies-only every time.

I'm not sure cargo should make code changes specifically for Docker:

  • For development, I don't think cargo's code needs to change at all. Maybe the Cargo Book can have a section on setting up a dev container with a volume mount of source, and volume or cache mounts of ~/.cargo and target. This approach works well with e.g. VS Code's dev container support, including making stuff like rust-analyzer and IDE debugger integration just work, so to me it's unquestionably the best way.
  • For building docker images, particularly on CI, Docker is an added complication to caching, but it shouldn't necessarily be addressed with code changes either. The problem is getting stuff from the CI cache to within the Docker container during a build and vice versa.
    • Some people do this by layer caching with cargo-chef or equivalent, but it's not my favorite, in part because a single changed dep means the entire dep cache is thrown away.
    • I did this by COPY in and docker create; docker cp out. A more efficient option would be to directly talk to the GHA cache API from within the Docker container via some tool. Neither belongs in cargo IMHO; again, the most helpful thing might be a Book section recommending a way, maybe linking to an example project on GitHub.
    • Alternatively, with sccache-like fine-grained cloud storage (see below), it's just as easy to do it within Docker as outside Docker.

I do think cargo interacts really badly with CI caching in general and this should be improved:

  • I'd love to see (Option to) Fingerprint by file contents instead of mtime #6529 prioritized, so all the "local" crates (current crate/workspace, things referenced by path rather than crates.io or GH hash) aren't rebuilt every time. I worked around this with my retimer code. Others use sccache (with either local or cloud storage) which does support fingerprint caching. But both of those have downsides as compared to it being built in to cargo.
  • Using GHA caches well, keeping in mind their limits (keys start getting evicted when the whole repo hits 10 GiB) and key semantics (existing keys' values are never rewritten, and you can have multiple "restore keys"). cargo really could help more with this:
    • Add cache pruning tools. The third-party cargo-cache and cargo-sweep crates are...okay...but a bit quirky...and extra downloads...
    • Recommend a cache structure in the Book, maybe link to a canned GHA action that sets up ~/.cargo and target/ caching with the right key structure, which is subtle, as I mentioned. Maybe one example that does the builds directly in CI; another that has to deal with getting the cached stuff into/out of Docker (see above).
    • Support separating the dependencies' target dir and the "local" target dir, as the most efficient cache setup will separate less and more frequently modified stuff to separate keys. (Until (Option to) Fingerprint by file contents instead of mtime #6529 is addressed, it's a complete waste to write the local deps' target dir to CI cache at all!) IMHO, this is the nearest thing to cargo build --dependencies-only that would really help, but I think it'd be better to support both build stages in a single command, and having the documentation on how to use it is key.
  • Or, as an alternative to using GHA caches well, build in something like sccache's cloud storage backends, so if you have a fast object store available like S3 for caching, you can use that for fine-grained caching without the downsides of sccache: installing the an external tool, imposing extra fork+exec overhead, and confusing the --timings instrumentation. But not everyone will want this; if for no other reason than GHA actions cache is free, general-purpose cloud storage is not free. Extra setup. I wonder if the latency isn't good enough if there are lots of sequential fetches or something. etc.

@scottlamb
Copy link

scottlamb commented Apr 7, 2023

btw, I think a relatively slick outside-cargo solution would be to have a single tool, let's call it cargo-gha-cache, that could be used from non-docker CI or from within docker build via something like the following:

RUN \
    --mount=type=cache,id=target,target=.../target \
    --mount=type=cache,id=cargo,target=/root/.cargo \
    cargo gha-cache load; \
    cargo clippy --release --workspace && \
    cargo fmt --release --all && \
    cargo build --release --workspace && \
    cargo test --release; \
    rv=$?; \
    cargo gha-cache save; \
    exit $rv

The tool would:

  • save/restore the fingerprint/mtime stuff like my retimer tool, so the local deps don't get rebuilt unnecessarily
  • pick the right GHA keys structure
    • the key format I mentioned above, and even stuff like looking up the current concrete rustc version (as opposed to a placeholder like stable)
    • separate ~/.cargo vs dependencies' target vs local target, combining the target dirs on load and separating them on save
  • directly talk to GHA via their API, so you don't have to deal with copying into and out of Docker. ideally handle all keys in parallel for latency and even stream files->archive->http request and vice versa, rather than writing a tarball to disk as actions/cache does, to save that I/O and disk space.
  • do automatic cargo-cache and cargo-sweep-like pruning on save as it constructs that archive
  • have nice documentation

As a user, if you're going to need an extra tool, better to have one that doesn't need much config. For Docker, I think you need to plumb in the right env variables for the cache API access but otherwise it could be as easy as outside Docker.

It wouldn't be as fun to be the maintainer of the tool. It'd have to know about both GHA's API and cargo's cache structure and adapt to changes in either.

@scottlamb
Copy link

scottlamb commented Apr 7, 2023

It wouldn't be as fun to be the maintainer of the tool. It'd have to know about both GHA's API and cargo's cache structure and adapt to changes in either.

One more thought: maybe the most minimal thing cargo itself could do to help is:

  • have a post-build command that lists the cache contents by (filename, recommended cache key), in a stable JSON output format, maybe in priority order and/or with a size limit argument for it to prune the list.
  • and a pre-build command that lists what cache keys would be helpful, in a stable JSON output format.

My hypothetical gha cache helper command could use this so it doesn't have to know cargo's cache data format (that likely changes between cargo versions), just this stable API with cargo and the GHA API. If folks want another CI cache API to support, it could add it, or another tool could. It'd deal with the compression/archiving. I think that'd be a better division of responsibility.

And have I mentioned #6529 enough yet? ;-)

@jalaziz
Copy link

jalaziz commented Jul 15, 2023

we realized that we're unsure what is the primary motivation of using Cargo in Docker, more specifically, if it is used more commonly for performing development activities, or simply for running a production build.

Our use case is running a production build, but that's trivializing it. For us, the main use case for running cargo inside Docker is cross-compilation and build consistency. While we could run cargo outside of Docker and copy in the final binary, cross compilation can make things much more complicated. By using docker for builds, building for alternative architectures tends to be quite simple.

In the cases were you don't have native builder instances available and cross-compilation is actually necessary, running Cargo in Docker can still help simplify things (see https://github.com/tonistiigi/xx for example).

@UlyssesZh
Copy link

I met with a use case for this requested feature today. I am working on a non-Rust project, but it depends on a CLI tool that can be installed by cargo. It would be nice if I can specify all cargo dependencies in a virtual manifest and install them using the cargo command.

@epage
Copy link
Contributor

epage commented Nov 6, 2023

@UlyssesZh it sounds like you'd want something like #5120 though that was closed because the given use case would be better served by #2267.

@idelvall
Copy link

idelvall commented Dec 5, 2023

Hi, we've created an Earthly function that significantly eases caching cargo builds on CI.

Even if you are not into Earthly, I think these implementation details might be useful for you:

  • It stores Cargo caches in cache mounts rather than in the layer cache.
  • One mount cache is for $CARGO_HOME, shared across all targets of the same Earthfile under the same Linux OS release version, supporting concurrent builds.
  • A second family of mount caches is for ./target, shared across all the builds of the same Earthly target but in a blocking mode, resulting in a serial order of execution across them.
  • Includes $CARGO_HOME/.package-cache in the mount cache so Cargo locking can work and the cache is not corrupted by parallel builds.
  • It also makes sure that $CARGO_HOME/bin binaries are still accessible, after mounting the caches.
  • It ensures installed binaries are stored in the build layers rather than in the mount cache.
  • Also, it uses cargo-sweep to keep the mount caches under control

sakisv added a commit to sakisv/canibeloud that referenced this issue Feb 15, 2024
Cargo generates a bunch of `.fingerprint` directories for each of the
packages that it builds. This tracks changes to files and allows it to
skip compilation on things that have not changed [^1].

The reason that this was not being picked up *even though* the contents
of `main.rs` were different in the two stages, seems to be that the
fingerprint is based on the `mtime` (modification time?) of a file and
not on its contents[^2].

Relevant discussion here: rust-lang/cargo#2644 (comment)

[^1]: https://doc.rust-lang.org/stable/nightly-rustc/cargo/core/compiler/fingerprint/index.html#
[^2]: rust-lang/cargo#6529
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests