Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo build --dependencies-only #2644

Open
nagisa opened this issue May 4, 2016 · 312 comments
Open

cargo build --dependencies-only #2644

nagisa opened this issue May 4, 2016 · 312 comments
Labels
A-configuration Area: cargo config files and env vars C-feature-request Category:*Proposal* for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`

Comments

@nagisa
Copy link
Member

nagisa commented May 4, 2016

cargo team notes:


There should be an option to only build dependencies.

@alexcrichton alexcrichton added the A-configuration Area: cargo config files and env vars label May 4, 2016
@KalitaAlexey
Copy link
Contributor

@nagisa,
Why do you want it?

@nagisa
Copy link
Member Author

nagisa commented Jan 17, 2017

I do not remember exactly why, but I do remember that I ended just running rustc manually.

@KalitaAlexey
Copy link
Contributor

@posborne, @mcarton, @devyn,
You reacted with thumbs up.
Why do you want it?

@mcarton
Copy link
Member

mcarton commented Jan 17, 2017

Sometimes you add a bunch of dependencies to your project, know it will take a while to compile next time you cargo build, but want your computer to do that as you start coding so the next cargo build is actually fast.
But I guess I got here searching for a cargo doc --dependencies-only, which allows you to get the doc of your dependencies while your project does not compile because you'd need the doc to know how exactly to fix that compilation error you've had for a half hour 😄

@gregwebs
Copy link

As described in #3615 this is useful with build to setup a cache of all dependencies.

@alexcrichton
Copy link
Member

@gregwebs out of curiosity do you want to cache compiled dependencies or just downloaded dependencies? Caching compiled dependencies isn't implemented today (but would be with a command such as this) but downloading dependencies is available via cargo fetch.

@gregwebs
Copy link

gregwebs commented Jan 31, 2017

Generally, as with my caching use case, the dependencies change infrequently and it makes sense to cache the compilation of them.

The Haskell tool stack went through all this and they seemed to generally decided to merge things into a single command where possible. For fetch they did end up with something kinda confusing though: build --dry-run --prefetch. For build --dependencies-only mentioned here they do have the same: build --only-dependencies

@alexcrichton
Copy link
Member

@gregwebs ok thanks for the info!

@KalitaAlexey
Copy link
Contributor

@alexcrichton,
It looks like I should continue my work on the PR.
Will Cargo's team accept it?

@alexcrichton
Copy link
Member

@KalitaAlexey I personally wouldn't be convinced just yet, but it'd be good to canvas opinions from others on @rust-lang/tools as well

@KalitaAlexey
Copy link
Contributor

@alexcrichton,
Anyway I have no time right now)

@nrc
Copy link
Member

nrc commented Feb 2, 2017

I don't see much of a use case - you can just do cargo build and ignore the output for the last crate. If you really need to do this (for efficiency) then there is API you can use.

@gregwebs
Copy link

gregwebs commented Feb 4, 2017

What's the API?

@nrc
Copy link
Member

nrc commented Feb 6, 2017

Implement an Executor. That lets you intercept every call to rustc and you can do nothing if it is the last crate.

@gregwebs
Copy link

gregwebs commented Feb 6, 2017

I wasn't able to find any information about an Executor for cargo. Do you have any links to documentation?

@nrc
Copy link
Member

nrc commented Feb 6, 2017

Docs are a little thin, but start here:

/// A glorified callback for executing calls to rustc. Rather than calling rustc
/// directly, we'll use an Executor, giving clients an opportunity to intercept
/// the build calls.

You can look at the RLS for an example of how to use them: https://github.com/rust-lang-nursery/rls/blob/master/src/build.rs#L288

@shepmaster
Copy link
Member

A question of Stack Overflow wanted this feature. In that case, the OP wanted to build the dependencies for a Docker layer.

A similar situation exists for the playground, where I compile all the crates once. In my case, I just put in a dummy lib.rs / main.rs. All the dependencies are built, and the real code is added in the future.

@alexcrichton
Copy link
Member

@shepmaster unfortunately the proposed solution wouldn't satisfy that question because a Cargo.toml won't parse without associated files in src (e.g. src/lib.rs, etc). So that question would still require "dummy files", in which case it wouldn't specifically be serviced by this change.

@lolgesten
Copy link

lolgesten commented Oct 9, 2017

I ended up here because I also am thinking about the Docker case. To do a good docker build I want to:

COPY Cargo.toml Cargo.lock /mything

RUN cargo build-deps --release  # creates a layer that is cached

COPY src /mything/src

RUN cargo build --release       # only rebuild this when src files changes

This means the dependencies would be cached between docker builds as long as Cargo.toml and Cargo.lock doesn't change.

I understand src/lib.rs src/main.rs are needed to do a good build, but maybe build-deps simply builds all the deps.

@ghost
Copy link

ghost commented Oct 9, 2017

The dockerfile template in shepmaster's linked stackoverflow post above SOLVES this problem

I came to this thread because I also wanted the docker image to be cached after building the dependencies. After later resolving this issue, I posted something explaining docker caching, and was informed that the answer was already linked in the stackoverflow post. I made this mistake, someone else made this mistake, it's time to clarify.

RUN cd / && \
    cargo new playground
WORKDIR /playground                      # a new project has a src/main.rs file

ADD Cargo.toml /playground/Cargo.toml 
RUN cargo build                          # DEPENDENCIES ARE BUILD and CACHED
RUN cargo build --release
RUN rm src/*.rs                          # delete dummy src files

# here you add your project src to the docker image

After building, changing only the source and rebuilding starts from the cached image with dependencies already built.

@lolgesten
Copy link

someone needs to relax...

@lolgesten
Copy link

Also @karlfish what you're proposing is not actually working. If using FROM rust:1.20.0.

  1. cargo new playground fails because it wants USER env variable to be set.
  2. RUN cargo build does not build dependencies for release, but for debug. why do you need that?

@lolgesten
Copy link

lolgesten commented Oct 9, 2017

Here's a better version.

FROM rust:1.20.0

WORKDIR /usr/src

# Create blank project
RUN USER=root cargo new umar

# We want dependencies cached, so copy those first.
COPY Cargo.toml Cargo.lock /usr/src/umar/

WORKDIR /usr/src/umar

# This is a dummy build to get the dependencies cached.
RUN cargo build --release

# Now copy in the rest of the sources
COPY src /usr/src/umar/src/

# This is the actual build.
RUN cargo build --release \
    && mv target/release/umar /bin \
    && rm -rf /usr/src/umar

WORKDIR /

EXPOSE 3000

CMD ["/bin/umar"]

@shepmaster
Copy link
Member

You can always review the complete Dockerfile for the playground.

@maelvls
Copy link

maelvls commented Nov 10, 2017

Hi!
What is the current state of the --deps-only idea? (mainly for dockerization)

@AdrienneCohea
Copy link

I agree that it would be really cool to have a --deps-only option so that we could cache our filesystem layers better in Docker.

I haven't tried replicating this yet, but it looks very promising. This is in glibc and not musl, by the way. My main priority is to get to a build that doesn't take 3-5 minutes ever time, not a 5 MB alpine-based image.

@intgr
Copy link

intgr commented Mar 23, 2023

My interpretation (I'm also just a subscribed lurker here): maintainers have indicated that this feature isn't as easy as it may seem at first glance, and have no interest in implementing it themselves right now.

To push this forward, someone needs to come up with a proposal that maintainers could evaluate, that takes into account the complexities and proposes resolutions and a path forward. That's how open source is supposed to work: if you want work done, you need to get involved.

But instead of any in depth discussion, what I see is hundreds of "I need this, why isn't this done yet?" comments, which drown out any remaining signal in this thread.

@epage
Copy link
Contributor

epage commented Mar 23, 2023

To add to intgr's comments, the cargo team is a small group of volunteers. So small in fact that we have this statement in our contribution docs

Due to limited review capacity, the Cargo team is not accepting new features or major changes at this time. Please consult with the team before opening a new PR. Only issues that have been explicitly marked as accepted will be reviewed.

With some new team members, we are starting to get a little more breathing room and are discussing how to communicate out our mentorship capacity. As for where we contribute with our limited time, that is mostly driven by our own priorities. There are many different important things and its ok that we each have different needs and priorities. For me, my priorities are on lowering the barrier for contributors and improving the MSRV experience.

@bkolligs
Copy link

bkolligs commented Mar 23, 2023

With some new team members, we are starting to get a little more breathing room and are discussing how to communicate out our mentorship capacity. As for where we contribute with our limited time, that is mostly driven by our own priorities. There are many different important things and it’s ok that we each have different needs and priorities. For me, my priorities are on lowering the barrier for contributors and improving the MSRV experience.

So in your eyes what would this issue need to see before it was explicitly accepted by the cargo team? Obviously there are thousands of open cargo issues at the moment, and priorities of the team will not align with all of them. In particular I see this issue as a way to improve the experience of using Rust in production environments.

@epage
Copy link
Contributor

epage commented Mar 23, 2023

So in your eyes what would this issue need to see before it was explicitly accepted by the cargo team? Obviously there are thousands of open cargo issues at the moment, and priorities of the team will not align with all of them. In particular I see this issue as a way to improve the experience of using Rust in production environments.

Considering it would take time for a cargo team member to catch up on this when it isn't any of our priorities, what would be most helpful is someone summarizing the current state of this issue, including

  • what are all of the use cases (e.g. caching for faster rebuilds in docker)
  • what requirements do those use cases have
  • what are the solution alternatives / workarounds (e.g. sccache, cache mounts, etc)

From there, as time allows, we could provide guidance on what next steps someone could take to move this along.

@bkolligs
Copy link

Here's my attempt to sum up the current status of this issue, forgive anything I miss out as this thread is long! At the time of writing it is 7 years old with 104 participants.

Overview

This issue has converged to a feature request that adds a build flag to cargo build that would generate build output for just the dependencies of a Rust project while intentionally avoiding compilation of the source of that project.

The majority of participants are interested in this feature so that Rust projects can leverage Docker's layer caching system. Right now, when one is rebuilding a Rust project in Docker, you must recompile every dependency even if your change was to the source files only.

There were a couple other use cases discussed in this thread that don't have as much community support:

  1. Profiling crates at build time by distinguishing between compiling dependencies and compiling source.
  2. Preemptively compiling a project's build dependencies while work still occurs on the main source code.

Therefore I will focus on the first use case I outlined for the remainder of this summary.

Requirements

A hypothetical cargo build --deps-only command shall:

  1. Cache compilation artifacts for all of the build dependencies of a particular crate, such that the same output would be generated by a vanilla cargo build.
  2. Enable docker layer caching by somehow separating the target and src directories during the layer build.
  3. Respect dependency overrides with the manifest key patch
  4. Respect dependencies passed through the path (that could be inside or outside the workspace) or git keys in addition to crates.io dependencies.
  5. Not create extraneous projects (a dummy lib.rs or main.rs) during the build process
  6. Enable workspace wide caching
  7. Invalidate the docker layer or previous local cache only when a dependency version changes.
  8. Support multiple target triples in the same project

Current Solution

The fundamental solution that most of the active work builds on is outlined nicely in a blog post by @LukeMathWalker here. The basic idea consists of:

  1. Copy the lock file to a docker container
  2. Create a dummy main.rs file
  3. Build the project which has now tricked cargo into building your dependencies and caching them in a first layer
  4. Delete the dummy file
  5. Copy over all of your source code
  6. Build again in a second layer

Existing tools

There has been some community effort put into this since this issue was opened which largely derive from the solution above, but in a scalable fashion:

  1. cargo-chef - provides existing docker image with chef installed for convenience
  2. cargo-build-deps
  3. cargo-wharf
  4. Using a remote server with sccache

Prior Art

There are several examples of build systems and package managers allowing for this type of integration. This table originally created by @mroth:

Ecosystem Dependency files Build dependencies only
Ruby Gemfile, Gemfile.lock bundle install --deployment
Node package.json, package-lock.json npm ci
Go go.mod, go.sum go mod download
Elixir mix.exs, mix.lock mix deps.get, mix deps.compile

Rust in Production

I would say that the overall goal from this thread is to lower the friction of deploying Rust in production systems. There do exist tools that ease the pain (listed above). That said I think this is a worthwhile endeavor to bring into the standard tooling of the Rust ecosystem for the following reasons:

  1. CI/CD builds don't need to install additional tools, just the Rust toolchain from https://www.rust-lang.org/tools/install
  2. Cargo can already discriminate source files from dependency files
  3. It is cheaper than using a solution like sccache, which requires you to support and maintain a seperate cloud cache - meaning it is more expensive to deploy with Rust than another modern language in this scenario.
  4. Inherently supports multiple compilation targets

Thanks all!

@epage
Copy link
Contributor

epage commented Mar 23, 2023

@bkolligs while that is a good summary of that specific solution, I was asking for a further step back to make sure the problems are understood, that we've identified the needs of the problems, and explored alternatives. For example, how does RUN caching fit into the solution space?

Were there any challenges raised in the thread that need addressing?

What are the trade offs of those workarounds with each other and with the built-in solution? You mention one in passing buried in the "Rust in Production section" related to sccache but that is an important topic to address, either for people to workaround this in the mean time or for prioritizing mentoring people on solving this.

As for the prior art,

  • Without being familiar with them, only Elixer's actually looks like it is prior art for this proposal
  • The Ruby/node are similar since just initializing the environment is the closest those tools have to a compile step but there are still differences and that should be called out
  • go mod download seems more like "populate the index and cache" than "build dependencies" (ie "do network operations")
  • One of the important aspects of prior art is to learn from it. The more similar the requirements, the more we are likely to learn. What trade offs did these tools make in supporting these? How do they fit within the solutions and challenges raised in this thread?

And as a heads up, with our team's capacity, we can't regularly be providing this level of hand holding for areas that we aren't focused on but we need people to step up, take a wider view, and tackle things like this. I made an exception here to try to help turn this thread around but I'm likely to bow out at this point.

@bkolligs
Copy link

And as a heads up, with our team's capacity, we can't regularly be providing this level of hand holding for areas that we aren't focused on but we need people to step up, take a wider view, and tackle things like this. I made an exception here to try to help turn this thread around but I'm likely to bow out at this point.

Thank you for the feedback, I'll reflect on some of these questions you posed (and invite others to do the same)

@wyfo
Copy link

wyfo commented Mar 24, 2023

As this thread has become hot again, I've decided to try my hand at implementing the feature.
Actually, my implementation is very simple, only a dozen lines of code (without counting test and documentation); it's available here: a007e79. To sum it up, it simply prevent the root compilation units to be added to the job queue, so everything is built except the root packages – it's my understanding of "dependencies only".

Actually, it's my first contribution to cargo (and my first dive into the code), and I'm quite new to the ecosystem, so I may have miss some points. Please, tell me if I'm off the mark.

@epage I understand you are part of cargo team, so can I consult you about opening a PR?

@epage
Copy link
Contributor

epage commented Mar 24, 2023

@wyfo I prefer to not look at PRs until the design phase is resolved. See my above comments on that topic.

@wyfo
Copy link

wyfo commented Mar 24, 2023

Does it mean that this issue has finally entered into a design phase?
So my two cents on it: elixir has indeed mix deps.compile, but this command also has a --skip-local-deps flag to skip local dependencies, and this could be a good feature to have along with --dependencies-only, but it could complicate the interface (how to add a flag to a flag?).

However, how can this design phase be driven, if, as you pointed out, the cargo team is overwhelmed and you may yourself bow out? This feature isn't accepted yet, but still is the most popular feature request ever, by far. No criticism here, I'm just not familiar with cargo design process and I truly wonder how this issue can quit its stale state this way.

Anyway, I let my POC here, hoping it can demonstrate that the feature is quite simple regarding its implementation (again, if I haven't miss anything).

@Kobzol
Copy link

Kobzol commented Mar 24, 2023

It's unfortunately not so simple and your changes won't solve this issue. I think that is also a problem with this issue in general - it seems like it should be a tiny change to cargo, and that's possibly why people are frustrated with the cargo team not implementing it.

But in reality, as has already been mentioned in this thread for several times, the issue is much more complicated. Just avoiding the build of the final crates won't help you with docker.

The core of the problem isn't avoiding the build of the leaf crate, it's how to cooperate with Docker in a way that will allow proper layer caching. Even with your change, there is no easy way how to copy only the files that define all the workspace dependencies into Docker so that the layer is not busted when only leaf code changes.

For that, you'd probably need to export some kind of build plan from Cargo, that would allow it to build the project (potentially without the leaf crate) - this is more or less what cargo chef does. And this has a lot of edge cases and complications (in any case, it's not a trivial feature!).

This is much harder for cargo than e.g. for npm, simply because they define dependencies in a different way. npm basically contains a flat list of dependencies in a JSON file. There are no workspaces in npm! That's mostly trivial to resolve (although
there can be local dependencies and other things). The reality is that cargo uses a quite different format of defining project dependencies than some of the other mentioned tools. And this makes it much more complicated (although surely not impossible) to add proper support for Docker caching to it.

Regarding the design phase: I don't think that it has to be done by the cargo team :) But this would need indeed a thorough design document.

@wyfo
Copy link

wyfo commented Mar 25, 2023

Actually, I should have added a test to show the case where source code is not present; it's done now 38302f8, and it works fine.

Therefore, I think this POC play nicely with Docker, as it doesn't require source code to be added, only manifest(s). However, there is one caveat: it requires explicit targets (cf. commit linked above for an example), so inferred main.rs must be declared in Cargo.toml.

IMO, this is not a big issue, as it is still less additional things to add compared to using cargo chef for example. Also, the feature could be improved later to allow inferred targets.

So it seemed to me that this solution was fulfilling all the requirements listed in the summary above – keeping the manifest untouched was not listed as a requirement – but I should have mentioned the case without source, mea culpa. Did I miss something else?

P.S. Regarding inferred targets, it's not really hard to add them, but it requires a lot more code modifications. However, it may introduce a small edge-case with workspace: if a workspace member has target with the same name as another member, wrongly inferring a target for this second member will cause a name collision. I admit this edge case is quite convoluted, so it might not be an issue if it's properly documented.

@Kobzol
Copy link

Kobzol commented Mar 26, 2023

Again, to reiterate, there are many edge cases that would need to be resolved (with some properly written design, not implementation), before this could be considered for inclusion in cargo. A "solution" that works for one particular usecase, requires changes in Cargo.toml of end crates and doesn't deal with edge cases is fine for a cargo crate, but not for an official cargo command.

If you wanted to demonstrate that your approach solves the issues described here, you should show various situations (normal crate, workspace, patch sections etc.) and how would they work with Docker. Your current solution doesn't seem to make life any easier for workspace projects for example - you still need to copy the Cargo.toml files manually. It's important to think about the use-case - unless the command helps the Docker use-case (or other usecases mentioned here), it's not useful on its own.

This should really be sketched in some design document first.

@teohhanhui
Copy link

you still need to copy the Cargo.toml files manually

I don't think anyone who's used to writing Dockerfiles would consider that a problem at all. If anything, that's a Dockerfile issue for not having support for better COPY directives.

@Kobzol
Copy link

Kobzol commented Mar 26, 2023

Yes, but in that case the latest implementation presented here (the simple "just don't build the leaf crate(s)" code) doesn't help you at all. To get Docker layer caching working for a (workspace) cargo project, you'd need to:

  1. Copy all Cargo.toml files, manually
  2. Create dummy lib.rs/main.rs files, corresponding to the project structure, manually
  3. (Possibly perform other tasks)

And you'd need to do these things regardless if you use cargo build or cargo build --deps-only! This flag, as implemented above, won't help with Docker caching at all.

Btw, all this has been discovered and mentioned in this long issue thread several times over. That's why it's not beneficial to offer simplistic implementations, but rather a design should be proposed, one which takes into account this whole issue thread, realistic use-cases, implementations from cargo chef and other tools (npm ci etc.), and sketches out how would the feature need to work to actually enable Docker layer caching without requiring the user to do all these hacks.

I know that it's tempting to think that the simple implementation will help, but as has been demonstrated repeatedly in this thread, the "naive" simple solution of simply building only the dependencies doesn't help with Docker caching, because of how Cargo projects are structured.

@wyfo
Copy link

wyfo commented Mar 26, 2023

I need to clarify the fact that I've written my POC a few hours before the start of the last discussion with cargo team intervention asking for design phase and the impressive summary written by @bkolligs. I've discovered all theses comments just before writing mine, but as the implementation was ready, I decided to still post it. I didn't want to skip the design phase on purpose. But I should have presented it more clearly, because wrong assumptions were made about it.

@Kobzol I think you misunderstood my previous explanation: with the latest implementation presented here, you don't have to create a dummy lib.rs/main.rs. Here are the steps:

  1. Make your targets explicit in manifests, i.e. add [lib]/[[bin]] sections with name and path (this is only done once in the project lifetime, as long as targets don't change ofc). Actually, this "limitation" of the feature could even be refined later to allow inferred targets, making this step useless. But after all, isn't explicit better than implicit? (joking)
  2. Copy all Cargo.toml/Cargo.lock manually. I know it can be tedious for workspaces, but as @teohhanhui wrote, this may be seen more as a Dockerfile limitation (see Dockerfile COPY with file globs will copy files from subdirectories to the destination directory moby/moby#15858).
  3. Run cargo build --dependencies-only
  4. Copy source code
  5. Run cargo build

Even if this implementation is simplistic (and again, written before the start of design discussion), I do believe it helps with Docker caching. However, I don't want to present it as the solution, I just think it's one possible solution, and if I keep discussing about it particularly, that's mostly because it seems to me it was not correctly understood (especially the "work without source" part). But I agree that the last sentence of my second comment was inappropriate, and I apologize.

That's being said, let's talk about design now.
I see currently two different working approaches:

  • generate a "build plan" in a first stage, and build from it before copying the package in a second state, as done by cargo chef
  • works with manifest only, as done by cargo build-deps, or my POC.

Here is a quick and subjective attempt of comparing both in term of Docker integration (I've implemented the latter, but use he former daily in production)

build plan manifests only
build workflow multi-stage linear
caching build plan is regenerated at each build not build-related changes in Cargo.toml, e.g. adding a benchmark/bumping the package version, invalidate the cache
file copying must be done 2 times, for build plan generation and final build (you may need a .dockerignore if you have a complex project) for workspaces, Docker lacks glob copy, making it tedious to add all manifests
inferred targets not an issue may introduce a small edge case for workspaces

Personally, both are fine to me. I find "manifests only" more straightforward, and slightly more cache-efficient (no regeneration), but at the cost of any Cargo.toml change invalidating the cache. Going into implementation detail, "manifests only" seems a lot simpler (wihtout inferred targets, it's definitively simple), although inferred targets handling may sound quite hacky. "build plan" on the other hand would require both plan generation and parsing/compilation.

However, I haven't read it mentioned, but the more I get familiar with this part of cargo code, the more I think unit-graph feature could actually be directly used the "build plan". Capitalizing on existing features make things more consistent, as well as simplifying the implementation, and it may be the best thing we can do IMO.

@Kobzol
Copy link

Kobzol commented Mar 26, 2023

I did not want to diminish your work, I just wanted to note that similar implementation as you have posted have already been proposed (and implemented) in this issue, and then abandoned, as they wouldn't be solving the Docker issue. I don't want to speak for the Cargo team, but I think that something akin to a RFC should be written first and discussed, before we get to an implementation. Some discussion about this, where the author of cargo chef has participated, has happened on Zulip. I think that creating some HackMD document or something like that with analysis of the current state of affairs would be a good next step. I'll try to create it if I can find some time in April.

I don't want to continue with the endless stream of comments debating the usefulness of the simple approach, but to present a counter-point to --dependencies-only: if the user is required to manually enumerate the Cargo.toml files in the Dockerfile, then they might as well just create the dummy files and it would work the same even with normal cargo build. The flag "only" avoids the need for the dummy files, but doesn't solve the whole Docker use-case. Probably - because we haven't described the use-case properly yet :) And as has also been noted in this issue, there might be other use-cases for this flag, outside of Docker.

Maybe to put it in another way, which is how I interpret it: even if some (e.g. yours) implementation magically solved all of the problems of this issue, I think that it still wouldn't get merged, before the author can describe exactly what problems does it solve, what was the state before, how does it differ after the new implementation, and if there are any alternatives.

@teohhanhui
Copy link

if the user is required to manually enumerate the Cargo.toml files in the Dockerfile, then they might as well just create the dummy files and it would work the same even with normal cargo build

That's demonstrably false. You'd quickly run into roadblocks for all but the simplest projects, which is why this issue has a lot of interest. Otherwise most of us would just use that workaround and call it a day (it's not a feasible workaround).

@Kobzol
Copy link

Kobzol commented Mar 26, 2023

I agree that it's not feasible. That's why I'd like to see a solution that removes the need for manual copying, and also resolves the other issues mentioned here.

@Mange
Copy link

Mange commented Mar 26, 2023

The flag "only" avoids the need for the dummy files, but doesn't solve the whole Docker use-case.

I'm very confused by this. The problem is that you need to create dummy files and then figure out how to invalidate the cache of just those dummy files. Are there other problems with building in Docker? Dynamic linking preventing FROM scratch in an easy way?

@kskalski
Copy link

As @wyfo mentions, the calculation of what to build needs to be based on some part of the source code, but not all, which would invalidate the cache and made the whole feature moot (for docker use-case).
Copying only the relevant parts of the source code that are defining the dependencies graph is a standard approach for dockerfiles to achieve this, so the "manual copy of several Cargo.toml" is not the worst of the things and in fact the need to generate fake source is a much bigger hassle.

However if there is a better approach, e.g. based on unit-graph or even better simply using single Cargo.lock from workspace to download and compile all the external crates mentioned there... it would indeed give even better experience for docker use-case.

Regarding the "design doc", I'm not totally clear having this discussion in some other place would be better than here, in both places you need people to comment and provide feedback, but possibly it would be easier to keep track / pin the freshest conclusions, spec and open issues out there. In any case I feel the person most knowledgeable of those would be best to create it and paste the relevant content into it, to which I would encourage @Kobzol to do if you feel it would keep the info better organized.

@Kobzol
Copy link

Kobzol commented Mar 27, 2023

Ok, I started writing the document, I'll post it here once it's presentable. It will take some time to try all the mentioned workarounds here, find out how other languages do it, examine the existing 3rd party tools etc.

@Kobzol
Copy link

Kobzol commented Mar 29, 2023

I put my findings here. It contains an overview of the problem, description of possible workarounds and some potential solutions. I'll try to follow up the conversation with the cargo team to find out what could be the next steps.

@Kobzol
Copy link

Kobzol commented Mar 30, 2023

After discussing this with @epage, we think that the next step should be to better understand how are people using Cargo with Docker, primarily if they are using some of the workarounds mentioned in the document (primarily cargo chef and/or Docker cache mounts), and if they have any problems with it?

If you're using Cargo with Docker and you have a problem with dependencies being rebuilt unnecessarily because of missing Docker layer caching, could you please answer the questions below in a comment on this issue? The cargo chef and Docker cache mounts workarounds are described in the summary document.

  • Why is Docker critical to your workflow?
  • Have you tried using cargo chef? If not, why not? If yes, what issues do you have with using the tool? Is it somehow insufficient for your use-case?
  • Have you tried using Docker cache mounts? If not, why not? If yes, what issues do you have with using this approach? Is it somehow insufficient for your use-case?

@teohhanhui
Copy link

teohhanhui commented Mar 31, 2023

Have you tried using Docker cache mounts? If not, why not? If yes, what issues do you have with using this approach? Is it somehow insufficient for your use-case?

Tried but had to give up on it, because I'd need the cargo build artifacts being available in the built image to actually use it in CI runs (e.g. cargo test).

Docker's cache mount does not provide a way to keep the cached path in the image, and trying to copy manually (in the Dockerfile) ran into SELinux permission issues on my local machine (using podman).

It also has serious limitations with locking.

@jorgecarleitao
Copy link

Why is Docker critical to your workflow?

When deploying an AWS lambda, a microservice against AWS ECS / , a helm chart in Kubernetes, almost every time people talk about serverless, docker enters the picture. Many organizations rely on hyperscaler managed services to deploy and operate software.

In this context, one common pattern is to use docker as part of the development process. This is used so that the "works in my computer" risk is mitigated. Rust is no exception. When building a microservice, it is more reliable to use docker-compose to setup the scene of the different services (i.e. docker-compose --build up) than to run docker-compose with the correct port bindings and the Rust microservice locally (i.e. docker-compose --build up && cargo run). This requires docker build . (through docker-compose) locally.

Th former is more reliable because (this is docker-compose specific, but the same considerations apply to Helm charts):

  • it allows the use of EXPOSE on the services; the latter requires PORTS (which opens all services to outside the docker network, thus increasing the attack surface in production).
  • the former reproduces with a much higher degree what you will see deployed; the latter is a modification of the environment.

When Rust does not have a good story around Docker (when compared with other languages), people are forced to use the latter, which

  • reduces software reliability (increases the risk of integration bugs)
  • potentially reduces security (by incorrectly using PORTS for convenience in local dev).

Alternatively, if they use the former, it:

  • reduces productivity (specially because Rust compiles almost everything from scratch for every project)
  • reduces software reliability (by introducing hacks to the Dockerfile that are not correct)

I.e. developers need to choose between two "bad" choices. This makes the case for Rust less compelling when compared to other languages.

Coincidentally, the two "bad" choices result in exactly what Rust aims to improve: reliability, productivity and security. This is the reason why the issue on GH is so popular - people that value these aspects gravitate towards Rust. Those same people are trying to explain that Rust does not fulfill these when they integrate it with Docker (for the reasons summarized in the OP's summary).

Have you tried using cargo chef? If not, why not? If yes, what issues do you have with using the tool? Is it somehow insufficient for your use-case?

Yes and I believe it solves this issue. It is superior to what this thread proposes in that changes to Cargo.toml that do not result in a different dependencies / recipe (e.g. a change of the author, URL, and other metadata) do not trigger a re-compilation of the dependencies.

Have you tried using Docker cache mounts? If not, why not? If yes, what issues do you have with using this approach? Is it somehow insufficient for your use-case?

Yes. Docker cache mounts are difficult to get right and I avoid them as much as possible. Docker caching through layers is usually easier to maintain as the layers can be stored in the registry and re-used across (remote) builds.

@Kobzol
Copy link

Kobzol commented Mar 31, 2023

Docker's cache mount does not provide a way to keep the cached path in the image, and trying to copy manually (in the Dockerfile) ran into SELinux permission issues on my local machine (using podman).

I also encountered this, it's pretty annoying. The solutions to this (AFAIK) are:

  • Copy everything you need from the target directory in the RUN command with the mounted cache. This is unwieldy and the commands can get large quickly.
  • Use the same cache mount for subsequent RUN commands that need to access these files. This is also not ideal, because often you want to do e.g. COPY instead of RUN and that does not support cache mounts (AFAIK).

@scottlamb
Copy link

scottlamb commented Mar 31, 2023

My two cents:

  • Docker isn't actually the reason I've been following this issue. What I want is cargo doc --dependencies-only as mentioned in this comment. I'm an...let's say undisciplined...code editor sometimes. I'll realize I want to pull up my dependencies' documentation, but the code in the current crate isn't even close to compiling. I can't just do cargo doc now. And maybe I've done a cargo add since it last compiled, so even if I remembered to do cargo doc before, it's not enough. Obviously there are ways around this (hunting through docs.rs for the right version of whatever crate I want at the moment, doing a git stash push or whatever to get my working copy back in order, etc.) but it'd be convenient to just be able to pull up the right version of all my deps' docs no matter what stupid thing I've done.
  • But I have used cargo with docker, so I'll answer these questions anyway.

Why is Docker critical to your workflow?

Deployment to AWS EC2/ECS, so a Docker image is the intended build product of my GitHub Actions CI infrastructure. Also, we use crusty third-party/proprietary native libraries that work best with a particular distro version, so it's important to build and test our Rust binary within that Docker image, rather than build then copy into a Docker image.

Have you tried using cargo chef? If not, why not? If yes, what issues do you have with using the tool? Is it somehow insufficient for your use-case?

I haven't tried cargo chef. But from looking it over, I think my own setup is better for my needs. It's clunky and was a pain to set up but achieves pretty decent speed-ups. Rough description:

  1. we have two GitHub actions caches:
    a. one for the .cargo dir. It's keyed as ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/Cargo.lock') }}, with an extra restore key of ${{ runner.os }}-build-${{ env.cache-name }}-, so that ideally we pick one with the same Cargo.lock but will still use the next best thing if it's not available. We copy this into and out of the Docker build. We tidy it via cargo cache commands so it doesn't take up too much of the repo-wide GHA cache size limit of 10 GiB.
    b. one for the target dir. It's keyed as ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/Cargo.lock') }}-${{ github.run_id }} with extra restore keys of ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/Cargo.lock') }}- and ${{ runner.os }}-build-${{ env.cache-name }}-. This one caches outputs generated from our current workspace's *.rs files, so Cargo.lock is not a sufficient cache key (if the cache key already exists, contents don't get rewritten!), and we found it best just to use a fresh cache key every single time. We also tidy this, via cargo sweep. Having the less frequently changed and more frequently changed stuff in separate cache names lets us make better use of the 10 GiB limit I mentioned.
  2. We copy the caches into and out of the Docker container. It'd be better to just do the GHA cache stuff from within Docker and avoid this extra IO and disk space, but it was much easier to just use actions/cache from our workflow file.
  3. We specifically work around (Option to) Fingerprint by file contents instead of mtime #6529 not being done. Out of the box, cargo build caching doesn't work well on GHA (even without Docker involved) because each check-out has fresh mtimes, and cargo treats all the files as changed. So I wrote a utility called retimer. After the build, we run retimer save, which traverses the source directory tree and saves each file's mtime and blake3 hash. We put this into the Docker cache. Before the build, we pull this from the cache and run retimer restore, which will go through and set the mtimes back to those in the file if the fingerprint hash matches. This is a klunky extra workflow but it means we more or less have working fingerprint caching.
  4. We prep our base image (with apt dependencies, Rust install, etc., but notably not our cargo deps) and push it to a separate Docker image, built only once a week or when we need to change it. We tried using Docker layer caching for this instead but it was just way too fussy in a variety of ways I can go into, but I think it's a little off-topic here.
  5. We do all the cargo build/test operations in a single Docker layer. We don't save the Docker layer cache between CI runs, instead copying the stuff we want to keep into those GHA action caches I mentioned.
  6. We use docker multi-stage builds so our deploy images don't have extra layers with the build deps, source, target dirs, etc. in them.
  7. and finally, the caching improvements aren't quite as critical for us since we opted into Github's larger runners beta. Throwing CPU at the problem is inelegant but does work.

The reason I say this setup is better for us than cargo chef is we do a decent job of minimizing rebuilds of the (large) workspace we're building, not just cargo deps. It doesn't look like cargo chef can do that.

cargo build --dependencies-only might give us some value by letting us split our target dir cache in two: part that only changes when Cargo.lock does, part that actually depends on our *.rs files, and thus make better use of that 10 GiB GHA limit. But it wouldn't let us do so in a super straightforward fashion. We'd have to do some manipulation to save the *.rs-dependent target dir as a delta from the Cargo.lock-only target dir, so it'd be a bit involved.

Have you tried using Docker cache mounts? If not, why not? If yes, what issues do you have with using this approach? Is it somehow insufficient for your use-case?

Docker cache mounts don't get saved on the host at a consistent location, so there's no straightforward way to save and restore them in a GHA cache. We'd have a fresh cache on each CI invocation, which is not useful.

@epage
Copy link
Contributor

epage commented Mar 31, 2023

I'm an...let's say undisciplined...code editor sometimes. I'll realize I want to pull up my dependencies' documentation, but the code in the current crate isn't even close to compiling. I can't just do cargo doc now.

I just verified the unstable --keep-going flag allows you to still generate documentation for your dependencies.

I ran:

RUSTC_BOOTSTRAP=1 cargo doc --keep-going -Z unstable-options

--open won't run. We could force it to still run but the crate it opens to won't have been rebuilt. Maybe thats sufficient or maybe there are tweaks we can make for what gets --opened

Tracking issue is #10496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-configuration Area: cargo config files and env vars C-feature-request Category:*Proposal* for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`
Projects
None yet
Development

No branches or pull requests