Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert the suggestion to use the derive feature instead of serde_derive directly #2584

Open
CryZe opened this issue Aug 20, 2023 · 20 comments
Open

Comments

@CryZe
Copy link

CryZe commented Aug 20, 2023

Here it was decided that serde_derive should not be imported directly and instead the derive feature should be used. This however is directly responsible to the incredibly bad compile times that the new prebuilt binary is meant to address. Here's why:

Cargo itself is able to heavily parallelize compilation, however serde with the derive feature active, causes a huge non-parallelizable dependency chain:

proc-macro2 compile build script > proc-macro2 run build script > proc-macro2 > quote > syn > serde_derive > serde > serde_json (or any crate that depends on serde)

This means that while cargo can compile lots of crates in parallel in the beginning, all the crates depending on serde can only start compiling after serde is done compiling, which is very late. So there might be a reasonably long time where cargo isn't able to compile much in parallel at all, resulting in few cores being utilized.

This can be seen here:

https://i.imgur.com/Gs8gKDV.png

However, you can easily break this dependency chain, by not activating the derive and instead depending on serde_derive yourself. This allows serde and its dependents like serde_json and a made up needs-serde in my example to compile from the beginning of the compilation, completely in parallel to all the derive related crates.

https://i.imgur.com/z3ZG7gZ.png

This cuts compile times from 4.7s to 2.6s (and honestly I don't even know where the dubious claim of 9s even came from, the serde with derive feature release build takes 3.8s in this scenario (excluding everything that comes after)).

So this basically gives you the same, if not better compile time performance improvements than the prebuilt binary in almost every case:

  1. It works on every host.
  2. serde has no dependency on any derive crate at all, allowing it to start compiling even earlier than with a prebuilt binary.
  3. syn is still a part of the dependency chain, which realistically is going to be the case for most real world projects anyway, as other derive crates will need it to be built anyway, so the prebuilt binary would only help in mostly toy projects anyway.
  4. The derive dependency chain is now a lot shorter, meaning the bottleneck caused by it is likely to be only perceived on CPUs with over 40 cores or so now anyway.
@dtolnay
Copy link
Member

dtolnay commented Aug 20, 2023

I've planned this for a long time but it's stuck on rust-lang/rust#54363 (comment). By depending separately on serde and serde_derive you no longer get this version equality constraint, which causes mayhem: for example Dependabot will try to upgrade them independently. One version of serde_derive is specific to exactly one version of serde, because it references private APIs that serde exposes specifically for that version of the derive.

With first-class macro dependencies, we'd want to add:

[package]
name = "serde_derive"
version = "1.0.Z"

[lib]
proc-macro = true

[dependencies]
# ...

[macro-dependencies]
serde = "=1.0.Z"

which correctly encodes the version equality constraint.

@CryZe
Copy link
Author

CryZe commented Aug 20, 2023

Dependabot is actually able to correctly upgrade crates in pairs (or more) for other languages, such as JavaScript. Not sure if they require some sort of special support from the package manager like you suggested or they can detect it automatically. It might be a reasonably easily PR to Dependabot to support that. I'll try to look further into it.

But yeah having a way to specify crates that need to be in lock step would be really good too for other tooling. wasmtime for example has various supporting crates (for WASI and co.) that also always get bumped in lock step, so having a way to specify that would be great too.

@soqb
Copy link
Contributor

soqb commented Aug 20, 2023

why not introduce a new crate serde_lockstep that manages the same version for both serde and serde_derive and encourage user's to use the following:

[dependencies]
serde_lockstep = "1"
serde = "1"
serde_derive = "1"

@Nemo157
Copy link

Nemo157 commented Aug 20, 2023

One version of serde_derive is specific to exactly one version of serde, because it references private APIs that serde exposes specifically for that version of the derive.

This should be mentioned on the serde_derive docs then. I know at least one crate that is using it directly, and looking at the docs there is no mention that this is an unsupported setup.

@soqb
Copy link
Contributor

soqb commented Aug 20, 2023

why not introduce a new crate serde_lockstep that manages the same version for both serde and serde_derive and encourage user's to use the following:

[dependencies]
serde_lockstep = "1"
serde = "1"
serde_derive = "1"

i suppose this could be further improved by migrating the meat and potatoes of serde into a common serde_inner (bikeshed) crate and having the serde crate wrap both.

@matklad
Copy link

matklad commented Aug 20, 2023

I think I have a horrible idea here! In serde's Cargo.toml we could add

[package]
name = "serde"
version = "1.0.101"

[target.'cfg(never)'.dependencies]
serde-derive = { version = "=1.0.101", path = "./serde-derive" }

If I understand Cargo's innner workings correctly, this "associated macro" pattern would constrain Cargo to always pick serde and serde derive in lockstep, without putting serde-derive in front of serde in compilation graph. That is, serde derive would still be present in lockfile, with a correct version, but it won't be otherwise downloaded or built.

Light testing with https://github.com/matklad/macro-dep-test seems to work!

@Nemo157
Copy link

Nemo157 commented Aug 20, 2023

Light testing with https://github.com/matklad/macro-dep-test seems to work!

Did you test from outside the workspace? I've found that deactivated dependencies version requirements (using optional, I haven't tested cfg) influence version selection inside the workspace, but not when the crate is used as a dependency or via cargo install.

@weihanglo
Copy link

I am quite interested in the macro-dependencies idea. Is rust-lang/rust#54363 (comment) the full story in your mind, or could you share a generalized semantic meaning of macro-dependencies?

(sorry if that is off-topic 😆)

@matklad
Copy link

matklad commented Aug 20, 2023

@Nemo157 seems to work as (perhaps not) intended?

λ bat Cargo.toml 
[package]
name = "foo"
version = "0.1.0"
edition = "2021"

[dependencies]
macro-dep-test = { version = "=0.1.2" }
macro-dep-test-macros = { version = "=0.1.1"  }

22:39:19|~/tmp/foo|master⚡?
λ cargo build
    Updating crates.io index
error: failed to select a version for `macro-dep-test-macros`.
    ... required by package `macro-dep-test v0.1.2`
    ... which satisfies dependency `macro-dep-test = "=0.1.2"` of package `foo v0.1.0 (/home/matklad/tmp/foo)`
versions that meet the requirements `=0.1.2` are: 0.1.2

all possible versions conflict with previously selected packages.

I've found that deactivated dependencies version requirements (using optional, I haven't tested cfg)

That's the thing! optional dependencies are not present in the lockfile --- Cargo knows what features exist in the root package being built, so it can correct infer that some transitive feature in a dependency can not be activated, and so this dep can be pruned from Cargo.lock.

With cfg-guarded deps though, Cargo has to conservatively assume that they could be present. So they are always present in lockfile, and consequently constraint resolver. (that's why Linux-only projects always include like half-a-dozen of windows-specific crates in their lockfile)

@soqb
Copy link
Contributor

soqb commented Aug 20, 2023

perhaps we could change it to cfg(all(not(x), x)) which (i think) makes it impossible to accidentally enable.

@matklad
Copy link

matklad commented Aug 20, 2023

@dtolnay
Copy link
Member

dtolnay commented Aug 20, 2023

Thanks @matkad (#2584 (comment)) and @soqb (#2584 (comment)), that looks great. I didn't think of that. I would be open to shipping that solution in serde.

@soqb
Copy link
Contributor

soqb commented Aug 20, 2023

im working on a PR now, trying to get something for CI to guarantee it!

@epage
Copy link

epage commented Aug 21, 2023

Alternatively (or in addition), pull out a serde_core for serde_json and others to depend on directly and have serde re-export it and serde_derive using = operators.

  • Parallel builds for serde and the derive machinery
  • No accidental enabling of derive in a way that slows down everything
  • serde_derive and serde (but not serde_core) can break compatibility
  • Minimal retrofit costs to the ecosystem (only packages that don't use derives)

I've done something similar in clap and its worked out well.

@jhpratt
Copy link

jhpratt commented Aug 21, 2023

^^ that is what time will be doing once there's an improvement to rustdoc

@jplatte
Copy link
Contributor

jplatte commented Aug 21, 2023

One possibly big problem with suggesting users to go back to depending on serde and serde_derive separately is that it could re-introduce a problem I saw when serde's derive feature was still pretty new and not everybody was using it: libraries that depended separately on them and didn't activate the derive feature on serde would sometimes import both serde::Serialize and serde_derive::Serialize (or Deserialize from both) in one module, which worked fine at first because importing a macro and type/trait of the same name is not a conflict; however once that library was compiled as part of a larger project where some other thing enabled serde's derive feature, rustc would complain that the two imports conflict as both would now refer to a macro (and one also to the trait).

Maybe this is solved nowadays, I guess rustc should be able to tell that the macro-namespace serde::Serialize is exactly the same as serde_derive::Serialize, so it could allow importing from both locations redundantly. I should be able to test that later, if nobody beats me to it.

If this problem still persists, I think that's a good reason to favor the solution suggested by @epage (I personally prefer if anyways because of convenience and users not having to do anything for the compile-time gains).

@mitsuhiko
Copy link
Contributor

To @jplatte’s point here is the original issue that mentiones that problem: #1441

@RalfJung
Copy link

I wonder if it would be possible to at least detect faulty crate dependency resolutions by having serde export a public constant like const pub SERDE_VERSION: (i32, i32, i32) = (1, 0, 175); and having the derive macro emit code like

const pub SERDE_VERSION_CHECK: () = {
  if SERDE_VERSION.0 != 1 || SERDE_VERSION.1 != 0 || SERDE_VERSION.2 != 175 {
    panic!("serde_derive and serde version mismatch detected");
  }
};

Then at least users would be left less confused about what to do when the versions get out of sync, and it would be guaranteed to fail immediately rather than only if there happened to be an actual conflict.

@Nemo157
Copy link

Nemo157 commented Aug 21, 2023

Maybe this is solved nowadays, I guess rustc should be able to tell that the macro-namespace serde::Serialize is exactly the same as serde_derive::Serialize, so it could allow importing from both locations redundantly. I should be able to rest that later, if nobody beats me to it.

No, this is still not solved. A workaround is to use #[derive(serde_derive::Serialize, serde_derive::Deserialize)] rather than importing them.

CryZe added a commit to CryZe/samply that referenced this issue Aug 27, 2023
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.

As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:

`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)

By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.

Check this issue for a deeper elaboration:
serde-rs/serde#2584

For `samply` I'm seeing a reduction from 5.43s to 3.70s when compiling
`fxprof-processed-profile` in `release` mode.
mstange pushed a commit to mstange/samply that referenced this issue Aug 27, 2023
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.

As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:

`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)

By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.

Check this issue for a deeper elaboration:
serde-rs/serde#2584

For `samply` I'm seeing a reduction from 5.43s to 3.70s when compiling
`fxprof-processed-profile` in `release` mode.
CryZe added a commit to CryZe/wasmtime that referenced this issue Aug 27, 2023
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.

As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:

`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)

By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.

Check this issue for a deeper elaboration:
serde-rs/serde#2584

For `wasmtime` I'm seeing a reduction from 24.75s to 22.45s when
compiling in `release` mode. This is because wasmtime through `gimli`
has a dependency on `indexmap` which can only start compiling when
`serde` is finished, which you want to happen as early as possible so
some of wasmtime's dependencies can start compiling.

To measure the full effect, the dependencies can't by themselves
activate the `derive` feature. I've upstreamed a patch for
`fxprof-processed-profile` which was the only dependency that activated
it for `wasmtime` (not yet published to crates.io). `wasmtime-cli` and
co. may need patches for their dependencies to see a similar
improvement.
CryZe added a commit to CryZe/wasmtime that referenced this issue Aug 28, 2023
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.

As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:

`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)

By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.

Check this issue for a deeper elaboration:
serde-rs/serde#2584

For `wasmtime` I'm seeing a reduction from 24.75s to 22.45s when
compiling in `release` mode. This is because wasmtime through `gimli`
has a dependency on `indexmap` which can only start compiling when
`serde` is finished, which you want to happen as early as possible so
some of wasmtime's dependencies can start compiling.

To measure the full effect, the dependencies can't by themselves
activate the `derive` feature. I've upstreamed a patch for
`fxprof-processed-profile` which was the only dependency that activated
it for `wasmtime` (not yet published to crates.io). `wasmtime-cli` and
co. may need patches for their dependencies to see a similar
improvement.
github-merge-queue bot pushed a commit to bytecodealliance/wasmtime that referenced this issue Aug 29, 2023
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.

As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:

`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)

By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.

Check this issue for a deeper elaboration:
serde-rs/serde#2584

For `wasmtime` I'm seeing a reduction from 24.75s to 22.45s when
compiling in `release` mode. This is because wasmtime through `gimli`
has a dependency on `indexmap` which can only start compiling when
`serde` is finished, which you want to happen as early as possible so
some of wasmtime's dependencies can start compiling.

To measure the full effect, the dependencies can't by themselves
activate the `derive` feature. I've upstreamed a patch for
`fxprof-processed-profile` which was the only dependency that activated
it for `wasmtime` (not yet published to crates.io). `wasmtime-cli` and
co. may need patches for their dependencies to see a similar
improvement.
osiewicz added a commit to osiewicz/serde that referenced this issue Sep 4, 2023
This should help with untangling serde's serialized build. Addresses
serde-rs#2584
@jplatte
Copy link
Contributor

jplatte commented Sep 5, 2023

Maybe this is solved nowadays, I guess rustc should be able to tell that the macro-namespace serde::Serialize is exactly the same as serde_derive::Serialize, so it could allow importing from both locations redundantly. I should be able to rest that later, if nobody beats me to it.

No, this is still not solved. A workaround is to use #[derive(serde_derive::Serialize, serde_derive::Deserialize)] rather than importing them.

It's also possible to always import the traits from serde::de::Deserialize and serde::ser::Serialize in modules where the derive traits are also refered to. The problem I see with both is that they're annoying to enforce. I guess you could have serde with the derive feature as a dev-dependency so cargo check ensures your stuff compiles w/o the derive feature, and cargo test / cargo check --tests ensures that it compiles with the feature. I think that would also mean you'll constantly be recompiling serde when switching between the two build configurations though.

eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this issue Sep 6, 2023
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.

As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:

`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)

By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.

Check this issue for a deeper elaboration:
serde-rs/serde#2584

For `wasmtime` I'm seeing a reduction from 24.75s to 22.45s when
compiling in `release` mode. This is because wasmtime through `gimli`
has a dependency on `indexmap` which can only start compiling when
`serde` is finished, which you want to happen as early as possible so
some of wasmtime's dependencies can start compiling.

To measure the full effect, the dependencies can't by themselves
activate the `derive` feature. I've upstreamed a patch for
`fxprof-processed-profile` which was the only dependency that activated
it for `wasmtime` (not yet published to crates.io). `wasmtime-cli` and
co. may need patches for their dependencies to see a similar
improvement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests