New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature selection in workspace depends on the set of packages compiled #4463
Comments
So, it has to do with features. Namely, two cargo invocations produce two different libcs:
The only difference is So, I get two different libcs in target:
But I get a single memchr:
The file name is the same for both cargo commands, but the actual contents differs. |
Hm, so this looks like more serious then spurious rebuild! Depending on what |
Minimized example here: https://github.com/matklad/workspace-vs-feaures |
@alexcrichton continuing discussion here, instead of #4469 which is somewhat orthogonal, as you've rightly pointed out!
Yeah, it looks like what we ideally want here is that each final artifact gets the minimal set of features. And this should work even withing a single package: currently, activating feature in Though such fine-grained feature activation will cause more compilation work overall, so using union of featues might be a pragmatic choice, as long as we keep features additive, and it sort of makes sense, because crates in workspace share dependencies anyway. And seems better then definitely some random unrelated target activating features for you depending on the command line flags. |
I think one of the main problems right now is that we're doing feature resolution far too soon, during the crate graph resolution. Instead what we should be doing is assuming all features are activated until we actually start compiling crates. That way if you have multiple targets all requesting different sets of features they'll all get separately compiled copies with the correct set of features. Does that make sense? Or perhaps solving a different problem? |
Yeah, totally, "they'll all get separately compiled copies with the correct set of features" is the perfect solution here, and it could be implemented by moving feature selection after the dependency resolution. But I am really worried about additional work to get separately compiled copies, because it is multiplicative. Let's say you have a workspace with the following layout:
Because A and B require different features from libc, and because libc happens to be at the bottom of the dependency graph, that means that for So it's not that only libc will get duplicated, the whole graph may be duplicated in the worst case. |
If we assume that features are additive (as intended), then the innermost crate could be compiled once with the union of all features. Additive features are a bit of a subtle point though (see #3620). Recompiling is the safest way, though expensive. |
@matklad yeah you're definitely right that the more aggressively we cache the more we end up caching :). @nipunn1313 you're also right that it should be safe for features to be unioned, but they often come with runtime or linkage implications. For example if a workspace has a I basically see this as there's a specification of what Cargo should be doing here. We've got, for example, two crates in a workspace, each which activates various sets of features in shared dependencies. Today Cargo does the "thing that caches too much" if you compile each separately (and also suffers a bug when you switch between projects it recompiles too much). Cargo also does the "union all the features" if you build both crates simultaneously (e.g. I'd advocate that Cargo should try to stick to the "caches too much" solution as it's following the letter of the law of what you wrote down for a workspace. It also means that crates in a workspace don't need to worry too much about interfering with other crates in a workspace. Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features. |
This somewhat resolves my concern about build times, but not entirely. I am worried that it might not be easy to unify features manually, if they are turned on by private transitive dependencies. It would be possible to do by adding this private transitive dependency as an explicit and unused dependency, but this looks accidental. But now I too lean towards fine-grained features solution. |
For what it's worth, we've done that exact trick with the parallel feature
of the gcc crate. It does happen, but the workaround is ok.
…On Wed, Sep 6, 2017 at 12:45 AM Aleksey Kladov ***@***.***> wrote:
Projects that run into problems of the "too much is cached" nature I'd
imagine could then do the investigation to figure out what features are
turned on where, and try to get each workspace member to share more
dependencies by unifying the features.
This somewhat resolves my concern about build times, but not entirely. I
am worried that it might not be easy to unify features manually, if they
are turned on by private transitive dependencies. It would be possible to
do by adding this private transitive dependency as an explicit and unused
dependency, but this looks accidental.
But now I too lean towards fine-grained features solution.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4463 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPXoxPIsKCCcH5DEgqtKzPt9ek34uLeks5sfk2EgaJpZM4PLGrK>
.
|
Servo relies on the current behavior to some extent: two "top-level" crates (one executable and one C-compatible static library) depend on a shared library crate but enable different Cargo features. These features are mutually exclusive, enabling the union would not work. Maybe the "right" thing to do here is to have separate workspaces for the different top-level things? Does it make sense for shared (Servo’s build system sets |
I would be in support of What makes this problem so insidious is that there's no way to enforce or even encourage the union property of features. If a project pulls in even one dependency that doesn't obey this property, it could potentially create an incorrect binary. In @SimonSapin's case with Servo, I think Servo is lucky that the feature'd crate (
then I believe that compiling Our project at Dropbox ran into a similar issue with itertools -> libeither, where libeither was compiled with two different features. Lucky for us, libeither's features are union-safe, so the code was correct, but it did create spurious recompiles depending on which sub-crate we were compiling. |
I agree with @nipunn1313 -- I think |
This all sounds like agreement on what should happen. @alexcrichton, what code changes need to happen (on a high level) to get there? |
That's what I was discussing with @alexcrichton at the RustFest impl days, and I have a bunch of refactoring done that I'm still tweaking. Will post a PR ASAP. Do you have a particular dependency/urgency relating to Gecko or Servo on this? |
Nothing urgent. I thought this bug could cause spurious rebuilds after selectively building a crate with |
We've had to fork some deps to unify feature selection to work around this
issue. It's definitely not sustainable for us, but not urgent yet.
…--Nipunn
On Sat, Oct 14, 2017 at 5:27 AM Simon Sapin ***@***.***> wrote:
Nothing urgent. I thought this bug could cause spurious rebuilds after
selectively building a crate with -p, but I couldn’t reproduce. Anyway,
thanks for working on this!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4463 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPXo7HL7OuZhSaMgZMtY9Y5IxnY7dHFks5ssKjIgaJpZM4PLGrK>
.
|
@nipunn1313 for my understanding, can you point me at a commit or otherwise elaborate on what problems you've had due to this issue? |
Here's an example of a problem we had to work around In that particular case, either and itertools were both present in our workspace. |
@SimonSapin taking on this issue will require a relatively significant refactoring of Cargo's backend. Right now feature resolution happens during crate graph resolution, but we need to defer it all the way until the very end when we're actually compiling crates. |
…ild' … and 'cargo test', etc. Include Servo and its unit tests, but not Stylo because that would try to compile the style crate with incompatible feature flags: rust-lang/cargo#4463 `workspace.default-members` was added in rust-lang/cargo#4743. Older Cargo versions ignore it.
…ild' … and 'cargo test', etc. Include Servo and its unit tests, but not Stylo because that would try to compile the style crate with incompatible feature flags: rust-lang/cargo#4463 `workspace.default-members` was added in rust-lang/cargo#4743. Older Cargo versions ignore it.
I find myself in need of this as well. There was a recent discussion on Zulip between @ehuss, myself, and several others, which came to the following rough conclusion:
|
@joshtriplett it sounds like you want feature selection to be dependent on the set of packages compiled? That's the opposite of what this bug is about. If what this bug is about has changed over the years, maybe a moderator could update the title & summary? |
I'm not a mod - but I've been here following with this task for 3+ years now. Here's a writeup of my summary - from reading through this.
Sounds like there are two modes of operating on the table Crate Specific Feature Selection (CSF):Each crate in a workspace is built with its deps (incl vendored) compiled with the minimal features needed for just that crate Pros
Cons
Workspace Aware Feature Selection (WAF)Each crate in a workspace is built with its deps having the union of features needed for the whole workspace Pros
Cons
Today, we're awkwardly in between, causing a lot of recompiles and some confusion: Ideas/ProposalsA) Default CSF
B) Default WAF
C) Keep today's behavior - CSF in crates, WAF at root
My personal thoughts. W.r.t developer environment within workspace, WAF seems better for developer build times when switching crates often. CSF seems better when focusing on one crate. W.r.t. output bloat Hopefully this is helpful! |
I would agree that the workspace-aware unification is probably the right default, personally. When this is combined with For your option (A), though, I'm not sure if we could actually implement that or if it would have the desired effect. We need to unify shared dependencies somehow, and if we ended up doing separate feature resolution for crates that would cause just as many rebuilds as if you did |
I haven't read all comments so maybe this has already been noted: If we have dependencies |
…888) kafka-ssl cannot build on OS X, so running make test-integration is broken atm Therefore we cannot use --all-features for local development (CI does not use makefile) --features does not work at workspace-level, it seems rust-lang/cargo#4463 is related (they just completely disabled it because it was so broken) Solution: cd relay like we do for release
I'm currently implementing the "workspace-aware unification" model described above using guppy, see the documentation for hakari for more details. We're about to ship it in Diem Core. Seems to work well for our use case though there are definitely issues around dependency bloat that we're still working through and iterating on. (We may have to eventually go with a more fine-grained approach where we unify features for certain subsets of the workspace differently, but we expect to do all this through the guppy toolset without involving Cargo.) |
Just wanted to chime in and note that we're relying on per-crate feature unification in Hubris, so, I agree with @joshtriplett's comment that if the default were to change, there should be an override. |
Quick followup as well: I've turned our automatic workspace-hack generator, hakari, into a command-line tool called cargo-hakari. It has a number of config knobs which:
I think any sort of unification strategy within cargo is likely going to need these knobs as well. |
This unhygienic behavior is not specific to workspaces, is it? In the case shown in the cargo documentation, if |
I think the problem you're describing is a different (related) problem - a designed behavior documented here https://doc.rust-lang.org/cargo/reference/features.html#feature-unification. Cargo expects features to be "additive". |
* Now that `control` is a `flowctl-rs` subcommand, we can rely on it being built along with everything else. The tricky spot however is that `control` itself relies on the `flowctl` binary. * Split the CI job into "stage1" (rust/musl) and "stage2" (control tests & publish). This allows us to build `flowctl-rs` with the dependency on `control` as a library, but allows the `control` tests to pull in the built artifact for `flowctl` to run its own tests. * Also splits musl builds from the main `flowctl` job. This avoids accidentally rebuilding things for multiple architectures, while still producing the binaries we need. There are a lot of improvements we can make to speed the build up more, but those are probably beyond the scope of this PR. Particularly, I think we're rebuilding a lot of things due to different feature selection (rust-lang/cargo#4463). Something like cargo hakari (https://facebookincubator.github.io/cargo-guppy/rustdoc/hakari/) might help us quite a bit here.
* Now that `control` is a `flowctl-rs` subcommand, we can rely on it being built along with everything else. The tricky spot however is that `control` itself relies on the `flowctl` binary. * Split the CI job into "stage1" (rust/musl) and "stage2" (control tests & publish). This allows us to build `flowctl-rs` with the dependency on `control` as a library, but allows the `control` tests to pull in the built artifact for `flowctl` to run its own tests. * Also splits musl builds from the main `flowctl` job. This avoids accidentally rebuilding things for multiple architectures, while still producing the binaries we need. There are a lot of improvements we can make to speed the build up more, but those are probably beyond the scope of this PR. Particularly, I think we're rebuilding a lot of things due to different feature selection (rust-lang/cargo#4463). Something like cargo hakari (https://facebookincubator.github.io/cargo-guppy/rustdoc/hakari/) might help us quite a bit here.
* Now that `control` is a `flowctl-rs` subcommand, we can rely on it being built along with everything else. The tricky spot however is that `control` itself relies on the `flowctl` binary. * Split the CI job into "stage1" (rust/musl) and "stage2" (control tests & publish). This allows us to build `flowctl-rs` with the dependency on `control` as a library, but allows the `control` tests to pull in the built artifact for `flowctl` to run its own tests. * Also splits musl builds from the main `flowctl` job. This avoids accidentally rebuilding things for multiple architectures, while still producing the binaries we need. There are a lot of improvements we can make to speed the build up more, but those are probably beyond the scope of this PR. Particularly, I think we're rebuilding a lot of things due to different feature selection (rust-lang/cargo#4463). Something like cargo hakari (https://facebookincubator.github.io/cargo-guppy/rustdoc/hakari/) might help us quite a bit here.
ehuss note: The recompilation was fixed, but this issue is still open regarding having features change based on what is being built simultaneously.
Reproduction:
Check out this commit: matklad/fall@3022be4
Build some test with
cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run
Build other tests with
cargo test --all --verbose --no-run
Run
cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run
again and observe thatmemchr
and some other dependencies are recompiled.Run
cargo test --all --verbose --no-run
and observememchr
recompiled again.The verbose flag gives the following commands for
memchr
:Here's the single difference:
Versions (whyyyyy cargo is 0.21 and rustc is 1.20??? This is soo confusing)
The text was updated successfully, but these errors were encountered: