Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate compile time of serde crate #1146

Open
dtolnay opened this issue Jan 21, 2018 · 7 comments
Open

Investigate compile time of serde crate #1146

dtolnay opened this issue Jan 21, 2018 · 7 comments

Comments

@dtolnay
Copy link
Member

dtolnay commented Jan 21, 2018

We were doing so well for a few rustc releases. Building the current master branch of serde:

compiler time in seconds
1.12.0 29.5
1.13.0 8.3
1.14.0 7.2
1.15.0 6.1
1.16.0 5.8
1.17.0 6.2
1.18.0 5.8
1.19.0 5.8

But since then it has been going up and up:

compiler time in seconds
1.20.0 6.2
1.21.0 6.7
1.22.0 6.9
1.23.0 6.4
1.24.0-beta 8.6
1.25.0-nightly 8.6

Will need to look into what makes the current code take 50% longer to build with a modern compiler compared to an old compiler.

@dtolnay
Copy link
Member Author

dtolnay commented Jan 21, 2018

Here is part of it: rust-lang/rust#47628.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 1, 2019

Building the same commit (59017aa19b48a2fff795f4e1915dbb596ac427db) with some newer compilers:

compiler time in seconds
1.24.0 8.6
1.25.0 7.7
1.26.0 8.5
1.27.0 8.5
1.28.0 8.5
1.29.0 8.1
1.30.0 8.0
1.31.0 8.2
1.32.0 7.6
1.33.0-beta 7.6
1.32.0-nightly 7.8

@alexcrichton
Copy link
Contributor

alexcrichton commented Sep 17, 2019

One thought I had today about the compile time of the serde crate is that it very frequently nowadays has the derive feature enabled, which means that the compile time of serde is actually that of serde_derive plus serde itself. That dependency edge can add up to ~20s just to compile, magnifying the compile time much more than the code in serde itself.

There's not really a solution to this today per se. A workaround is to meticulously only use serde_derive in all your crates, and then hopefully some crates only depend on serde to compile faster before serde_derive finishes. Practically though this is unlikely to bring much benefit. Actually solving this would require extensive compiler changes, and I'm not even sure what those would look like!

Put another way though the compile time of serde I don't see as too too important, but serde_derive, which almost everyone using serde needs, is more important because it takes longer to compile than serde itself.

One "quick" way to improve the comiple time of the serde_derive crate is to make the crate itself an empty crate which immediately calls into another crate. Cargo can't pipeline compilation of procedural macros, so procedural macros fully wait for their dependencies to finish (in this case syn). Codegen for syn often takes awhile though (even 40% of time in a debug build) and that's opportunity for parallelism which isn't currently being taken advantage of. If serde_derive were a tiny crate that simply called serde_derive_impl::expand(the_inputs...) then that would enable pipelining, and the compile time of serde_derive itself would be near instantaneous (it's just a link step). That'd improve compile times modestly I think, but it's no slam dunk by any means.

Edit: For a more visceral view of what I mean about serde_derive, here's some compile charts for a project which just depends on serde but enables the derive feature: debug mode, release mode. You can see how serde_derive can't be pipelined at all, but its compile time is quite large and being able to pipeline it with syn would likely directly translate to build time wins equivalent to the codegen time of syn (2s ish)

@dtolnay
Copy link
Member Author

dtolnay commented Sep 17, 2019

Thanks, I can see how that would help. That is a neat visualization.

It is unfortunate that most people would probably perceive it the opposite way if we make this change -- more crates = more bloat = perceived as slow, even if it builds end-to-end faster than the current setup.

@dtolnay
Copy link
Member Author

dtolnay commented Sep 20, 2019

Before splitting serde_derive:

🎉 After:

@alexcrichton
Copy link
Contributor

Here's another data point of a different graph I've seen:

Capture

That's got a lot of dependencies pruned, but the entire compilation of serde_derive could fit within the codegen time of syn. That may happen when syn has all of its features enabled, perhaps? In any case I'm getting similar 1-2s wins locally in my graphs, but I think the wins get magnified on smaller machines and in larger compilation graphs. For example the compilation here could probably save at least 10-20s with a pipelining opportunity.

Anyway, as I already wrote on the forums, there's a feature request on rustc to enable this pipelining and I agree that in general users (even serde) shouldn't have to, in the long term, use a crate organization like this just to get better compile times.

@matklad
Copy link

matklad commented Aug 28, 2021

I have a bunch of observations/questions here!

To me, it seems another issue is that the code that uses serde_derive is slow to compile. Specifically, it seems that JSON serialization turns out to be a non-trivial part of rust-analyzer's overall compile time.


I can't measure that directly, but the following two high-level benches nod in that direction:

First, if I run RUSTC_BOOTSTRAP=1 cargo build -Z timings --release -p rust-analyzer --bin rust-analyzer, I get the following output:

image

cargo_metadata and especially lsp-types are among the heaviest dependencies (rust-analyzer's own crates are noticeably bigger, but they contain a lot of domain logic). Both cargo_metadata and lsp-types are mostly struct and enum definitions with derive(serde).

Second, if I run cargo llvm-lines --release -p rust-analyzer --bin rust-analyzer, I get the following:

  326920 (100%)  9611 (100%)  (TOTAL)
   12281 (3.8%)   196 (2.0%)  core::result::Result<T,E>::map
   10576 (3.2%)    49 (0.5%)  <serde_json::value::de::MapDeserializer as serde::de::MapAccess>::next_key_seed
   10080 (3.1%)    63 (0.7%)  serde_json::value::de::visit_array
    9487 (2.9%)    88 (0.9%)  <serde_json::value::de::SeqDeserializer as serde::de::SeqAccess>::next_element_seed
    7987 (2.4%)    50 (0.5%)  serde_json::value::de::visit_object
    6981 (2.1%)    48 (0.5%)  serde_json::value::de::<impl serde::de::Deserializer for serde_json::value::Value>::deserialize_struct
    6921 (2.1%)    77 (0.8%)  <serde_json::value::de::MapDeserializer as serde::de::MapAccess>::next_value_seed
    6534 (2.0%)    54 (0.6%)  <serde_json::value::ser::SerializeMap as serde::ser::SerializeMap>::serialize_value
    4273 (1.3%)     1 (0.0%)  <lsp_types::_::<impl serde::de::Deserialize for lsp_types::TextDocumentClientCapabilities>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    3808 (1.2%)    28 (0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    3692 (1.1%)    26 (0.3%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    3395 (1.0%)    56 (0.6%)  serde_json::value::de::<impl serde::de::Deserializer for serde_json::value::Value>::deserialize_option
    3192 (1.0%)    56 (0.6%)  alloc::raw_vec::RawVec<T,A>::current_memory
    2877 (0.9%)    49 (0.5%)  <serde_json::value::de::BorrowedCowStrDeserializer as serde::de::Deserializer>::deserialize_any
    2757 (0.8%)    49 (0.5%)  <serde_json::value::de::MapKeyDeserializer as serde::de::Deserializer>::deserialize_any
    2624 (0.8%)   148 (1.5%)  <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
    2538 (0.8%)    51 (0.5%)  serde::de::Visitor::visit_string
    2520 (0.8%)    28 (0.3%)  core::alloc::layout::Layout::array
    2388 (0.7%)     1 (0.0%)  <lsp_types::_::<impl serde::de::Deserialize for lsp_types::TextDocumentClientCapabilities>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    2078 (0.6%)     1 (0.0%)  lsp_types::_::<impl serde::ser::Serialize for lsp_types::ServerCapabilities>::serialize
    2020 (0.6%)     1 (0.0%)  rust_analyzer::run_server

It seems to be that sered-using crates are both heavy to compile themselves, as well as add downstream monomorphisations to the consumers.


With this in mind, I have the following questions (extremely vague, mind you!):

  • Do we understand how much of this perceived cost is essential? Hypothetically, if I remove the serde and just write JSON serialization/deserialization "by hand", will I be able to save much in terms compilation time?
  • This should have been a question about dynamic-dispatch based serializtion, but I figured I should just check how fastre to compli cargo-metadata would be with miniserde!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants