-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greatly speed up doctests by compiling compatible doctests in one file #126245
Greatly speed up doctests by compiling compatible doctests in one file #126245
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of review! Wondering if it'd make sense to run this on crater (with the edition-handling temporarily removed so it'd run unconditionally) to check breakage?
src/librustdoc/doctest.rs
Outdated
|
||
run_tests( | ||
test_args, | ||
nocapture, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Isn't nocapture
now passed as part of opts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually part of RustdocOptions
, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for test_args
, I love when code gets simpler like that. :3
The comment in documentation claims that the slowest part that limits doctests is the compilation time (not the runtime to actually run all the tests). Instead of merging all the doctests into a single program, so that it runs in the same process (which can cause issues mentioned in the previous PR), could we instead perform the compilation in a smarter way, to generate N binaries, but still invoking the compiler just once, to amortize the compilation costs (something like my rustc daemon experiments)? Do you have any expectation of how much time would be saved if compilation was approx. as fast as compiling a single program, but we still invoked N binaries to run the tests? Or, as a different alternative, compile everything into a single program, but then execute it in N processes (each process would run just a single test), to avoid the issues? |
Interesting. I was actually wondering if there was some way to daemonize rustc. However IIRC one of the slowest parts is linking, which would still be an issue with the approach you suggest :( |
It would also be very non-trivial, implementation wise, I imagine. But what about the second approach? We could compile a single binary that looks something like this: fn test_0() {}
fn test_1() {}
fn main() {
let arg = args()[0];
if arg == "test_0" { test_0(); }
if arg == "test_1" { test_1(); }
} And the rustdoc driver would then execute this binary N times (in parallel for all threads on the given machine), and run each test in a separate process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to handle the fact that different doctests may use different #![feature)]
s.
I think this would unfortunately break the second approach too @Kobzol |
Sure, there are possibly additional issues with the "add everything to a single binary" approach, but as far as we can get that working somehow, it seems to me that moving from "run all tests within a single process" to "run each test in a separate process" should be relatively straightforward, and should resolve the issues with shared state between tests. |
Agreed. So sounds like the idea would be:
This should obviate the need for doing this over an edition boundary since it shouldn't break any code. It would also automatically work with code that configures global state, without needing to know a |
Would nightly's new linker (LLD, I think) be a relevant thing to look through for insight? |
I'm not sure. It'd be worth testing. We should also look into turning off debuginfo for doctests since it's unused. |
LLD should help slightly, but it's not a panacea. I tried LLD for doctesting That would also sadly suggest that the overhead from spawning the test processes (even on Linux) is the dominant factor here, which means that my proposed approach wouldn't help that much. |
I agree but with some small changes: However I"m curious about:
I generate merged doctests like rustc does with |
This approach would resolve the problems with sharing state between tests. But it's quite possible that it would also make the perf. gains much smaller. |
Forking on linux has a huge overhead, so the test code itself would likely take less time than spawning it. I'm really not convinced by this approach. Let me check locally the diff. |
If you have a recent-ish glibc and kernel, forking is relatively fast (you could spawn thousands of doctests per second, if needed). Actually I'm pretty sure that forking on Linux would be much faster than on macOS and Windows, so I would worry about these more 😆 But based on some experiments that I have posted above, it seems to me that for crates that have a lot of doc-tests (e.g. But maybe it's only because the rustc test machinery has a lot of costs when starting. I wonder what would happen if we just literally did this, or, as an alternative, set up everything with libtest, and then fork the process just before executing each test. |
True, forking on linux should be pretty fast nowadays. ^^'
Hum... I'm definitely not convinced. Especially since you mentioned Windows and MacOS being slow to fork. I can provide a big source code for what rustdoc generates for |
Finished benchmarking commit (e5b3e68): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 0.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 752.976s -> 752.981s (0.00%) |
this is really cool. nice work all. |
Awesome work! 🚀 This should probably receive |
I got an absolutely wild speed-up in Jiff. Before:
After:
|
Going down from 4min39 to 7 secs is quite wild indeed. We did a good job here. :) |
Note that this feature only takes effect if you set |
Can you say more about why this should be relnotes? It's currently unstable (under the edition). |
...because I forgot about that :) Sorry. It should definitely be mentioned in the 2024 Edition relnotes, as I imagine it will be :) |
Crater for rustdoc test merging Crater run for rust-lang#126245
Fixes #75341.
Take 2 at #123974. It should now be much simpler to review since it's based on #125798.
As discussed, this feature will only be enabled starting with the 2024 edition.
I split the changes as much as possible to make it as easy as possible to review (even though it's a huge lot of code changes...).
The following tests are not included into the combined doctests:
compile_fail
deny
/allow
/warn
are ok)test_harness
--show-output
(because the output is nicer without the extra code surrounding it)Everything else is merged into one file. If this one file fails to compile, we go back to the current strategy: compile each doctest separately.
Because of the
edition
codeblock attribute, I couldn't make them all into one file directly so I grouped them by edition, but it should be pretty anecdotic.In case the users want a specific doctest to be opted-out from this doctest merge, I added the
standalone
codeblock attribute:Now the interesting part, I ran it on a few crates and here are the results (with
cargo test --doc
to only include doctests):r? @camelid
try-job: x86_64-msvc
try-job: aarch64-apple