Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnose extremely slow generate_hash_key time #497

Open
kwaegel opened this issue Aug 15, 2019 · 14 comments
Open

Diagnose extremely slow generate_hash_key time #497

kwaegel opened this issue Aug 15, 2019 · 14 comments

Comments

@kwaegel
Copy link

kwaegel commented Aug 15, 2019

I have an (unfortunately private) project that is seeing very little speedup from sccache. Checking the server logs, generate_hash_key is taking an extremely long time, upwards of 30-200 seconds per file. Is there a good way to diagnose what is going on here?

My initial guess is that the preprocessor is just extremely slow, so the lack of direct mode is hurting me (ccache builds the entire project in 3 minutes with a hot cache), but I'm not sure how to verify this.

@apriori
Copy link

apriori commented Aug 27, 2019

@kwaegel How did you diagnose this? I'd be curious to check hash creation times for my own project. I see questionable slow speed ups as well. I work on a multi-platform project. CCache builds improve build times from 5 minutes to 20 seconds, however, I see nothing similar so far with sccache in windows (did not check against sccache linux, yet).

@apriori
Copy link

apriori commented Aug 27, 2019

Forget the remark, I could diagnose it as well now (using RUST_LOG="sccache::compiler::compiler=debug").

Do you happen to have code with very deep header include hiearchies? Or in other words: Is an expanded (with the preprocessor) translation unit super large for your project?

@apriori
Copy link

apriori commented Aug 27, 2019

See the just created connected issue #504.

@kwaegel
Copy link
Author

kwaegel commented Aug 27, 2019

@apriori Your suspicion is correct. I haven't checked the reprocessed source size, but deep (and wide) include chains are an ongoing problem with the project in question. The include processing step is over half of the compilation time for many of the files. I have (long-pending...) plans to fix this, but it's going to require some significant reorganization.

Your suspicion about SHA-512 seems like a good candidate. If you create a fork with a different hash function, I'd be a happy to give it a shot and share performance deltas. (I'd be willing to help with development too, but I'm not quite as familiar with Rust as I am with C++).

@apriori
Copy link

apriori commented Aug 27, 2019

@kwaegel
This will take a bit (got a functional local version with wyhash, but pretty much zero speed up). Something different seems to be going on here. Btw, what is your platform/os? Windows 64 bit as well?

@apriori
Copy link

apriori commented Aug 27, 2019

@kwaegel
Set

RUST_LOG="sccache::compiler::c=trace, sccache::compiler::compiler=debug" 

and investigate these lines

ccache::compiler::c: [bla.cpp.obj]: Preprocessor output is 4598766 bytes
sccache::compiler::compiler: [bla.cpp.obj]: generate_hash_key took 1.494 s

What about is your size/time ratio?

@apriori
Copy link

apriori commented Aug 27, 2019

Ok, this is weird output:

[0m 2019-08-27T19:09:39Z: sccache::compiler::c: [bla.cpp.obj]: Preprocessor output is 5040185 bytes
[0m 2019-08-27T19:09:39Z: sccache::util: Hashed 0 files in 2.836 s
[0m 2019-08-27T19:09:39Z: sccache::compiler::compiler: [bla.cpp.obj]: generate_hash_key took 2.837 s
[0m 2019-08-27T19:09:39Z: sccache::compiler::compiler: [bla.cpp.obj]: Hash key: 204970d2a9f06737
[0m 2019-08-27T19:09:39Z: sccache::compiler::compiler: [bla.cpp.obj]: Cache hit in 0.038 s

Well, its just wrong logging/measuring. The "Hashed 0 files in 2.836s" is run as a continuation of "future::join_all". It will measure whatever was done before, but not just the portion in question. And in my case that is the preprocessor running. Is maybe that the actual bottle neck?

@luser
Copy link
Contributor

luser commented Aug 27, 2019

I think that Hashed 0 files in... bit is just an artifact of how the code is written. hash_all starts counting when it's initially called, but the code that invokes that while hashing C compilation doesn't ask for the result of the future until after running the preprocessor, so it counts the time spent running the preprocessor (oops).

@apriori
Copy link

apriori commented Aug 27, 2019

Yes it is. Measuring futures invocation without a composable abstraction (hello monad stacks?) is a pain as it becomes very intrusive and therefore error prone.
However, it seems we were just mislead by these logs. A slight extension to the logs lead to this

[0m 2019-08-27T20:09:56Z: sccache::util: Command invocation took 7.121 s
[0m 2019-08-27T20:09:56Z: sccache::compiler::msvc: Weird processing took 0.000 s
[0m 2019-08-27T20:09:56Z: sccache::util: Hashed 0 files in 7.121 s
[0m 2019-08-27T20:09:56Z: sccache::compiler::c: actual generate_hash_key took 0.001 s
[0m 2019-08-27T20:09:56Z: sccache::compiler::compiler: [bla.cpp.obj]: generate_hash_key took 7.124 s
[0m 2019-08-27T20:09:56Z: sccache::compiler::compiler: [bla.cpp.obj]: Cache hit in 0.040 s

Fyi: "actual generate_hash_key" is really measuring "hash_key". And at least for my local fork the runtime of it is completely neglectable.

So the runtime was completely dominated by running the preprocessor. Even more funny:
When attempting to time the exact commandline used for the respective file, I end up (with pipe to nul) with ridiculous 18.91s. I guess this is some additional weirdness of the windows console host. However: If I instead preprocess to a file, that takes just 3.8 seconds. So I wonder, might it be a viable approach to preprocess to a file and read that in windows?

Edit: Additional remark. Piping to a file has equivalent runtime to asking cl.exe to preprocess to a file (~3.8s), so somewhere a lot of time gets lost.

@apriori
Copy link

apriori commented Aug 28, 2019

Additional remark: Even running MSVC with "/Zs" (syntax only) combined with "/showincludes" results in nearly identical runtime as full preprocessing. So unless one would like to roll a completely separate parser, this would be the lower bound for the execution time on windows.

For my case I get (due to usage of BOOST and Eigen) 4784 includes for a single translation unit (of which 2337 are unique). So even if one would implement something like direct mode for sccache, these duplications would need to be cached as well so one does not calculate insane amounts of redundant hashes (context #219)

@luser
Copy link
Contributor

luser commented Aug 28, 2019

It sounds like the core problem here is just "you have a lot of includes and the preprocessor is slow". There's not much we can do about that, unfortunately. You might look into using precompiled headers if you haven't already. I'm not sure if sccache can cache PCH compilations but if that speeds up your build that'd be a reasonable feature request.

@apriori
Copy link

apriori commented Aug 28, 2019

Yes, the sad thing is that this is merely a 200k LOC codebase. And all the file in question does is using boost.filesystem and Eigen/Dense and it immediately blows up in your face.

@kwaegel
Copy link
Author

kwaegel commented Aug 28, 2019

Hmm. Let me add a couple logs to see if they make any sense here.

This is from one file I've identified as reliability problematic, with a hot cache. Total wall time (including a couple of dependencies that I can't easily remove) is 26s, so the 13s below is half the build time.

TRACE 2019-08-28T18:58:01Z: sccache::compiler::c: [myfile.cpp.o]: Preprocessor output is 5779551 bytes
TRACE 2019-08-28T18:58:01Z: sccache::util: Hashed 0 files in 13.111 s
DEBUG 2019-08-28T18:58:01Z: sccache::compiler::compiler: [myfile.cpp.o]: generate_hash_key took 13.127 s
TRACE 2019-08-28T18:58:01Z: sccache::compiler::compiler: [myfile.cpp.o]: Hash key: a9a552a8a8ca0f83e88e5b3426ecba9ea2872f16bd3cfe781431a697bdb7b188335cc671c6329262addb3b0cf4bc31e6a32dec36fa28acd901668ad046f8fc94
TRACE 2019-08-28T18:58:01Z: sccache::cache::disk: DiskCache::get(a9a552a8a8ca0f83e88e5b3426ecba9ea2872f16bd3cfe781431a697bdb7b188335cc671c6329262addb3b0cf4bc31e6a32dec36fa28acd901668ad046f8fc94)
DEBUG 2019-08-28T18:58:01Z: sccache::compiler::compiler: [myfile.cpp.o]: Cache hit in 0.002 s

For reference, here's a comparison with ccache. This isn't an ideal test case, since there are a couple of third-party dependencies of my file that I can't easily remove. For longer builds with more of my problimitic files, the generate_hash_key time starts to dominate the hot cache case.
no caching: real 1m37.955s
ccache (cold): real 2m7.236s
ccache (hot): real 0m6.227s
sccache (cold): real 2m11.385s
sccache (hot): real 0m26.947s

Also of interest, here is a copy of the -ftime-report for myfile.cpp, trimmed to only include lines above 1%:

Time variable                                   usr           sys          wall               GGC
 phase parsing                      :   5.16 ( 14%)  18.51 ( 52%)  23.71 ( 33%)  525635 kB ( 21%)
 phase lang. deferred               :   9.58 ( 26%)   6.25 ( 18%)  15.83 ( 22%)  780607 kB ( 32%)
 phase opt and generate             :  21.12 ( 57%)  10.35 ( 29%)  31.47 ( 43%) 1109970 kB ( 45%)
 phase last asm                     :   1.23 (  3%)   0.31 (  1%)   1.54 (  2%)   52878 kB (  2%)
 |name lookup                       :   2.39 (  6%)   3.16 (  9%)   5.86 (  8%)   46543 kB (  2%)
 |overload resolution               :   6.64 ( 18%)   5.24 ( 15%)  11.81 ( 16%)  611795 kB ( 25%)
 callgraph construction             :   3.17 (  9%)   0.39 (  1%)   3.50 (  5%)  177057 kB (  7%)
 ipa SRA                            :   0.53 (  1%)   0.45 (  1%)   1.00 (  1%)   63436 kB (  3%)
 preprocessing                      :   0.83 (  2%)  13.68 ( 39%)  14.35 ( 20%)   48450 kB (  2%)
 parser (global)                    :   0.92 (  2%)   1.35 (  4%)   2.37 (  3%)  105254 kB (  4%)
 parser struct body                 :   0.38 (  1%)   0.40 (  1%)   0.79 (  1%)   49612 kB (  2%)
 template instantiation             :   9.47 ( 26%)   7.73 ( 22%)  17.22 ( 24%)  788265 kB ( 32%)
 integration                        :   2.15 (  6%)   1.70 (  5%)   4.13 (  6%)  238885 kB ( 10%)
 tree gimplify                      :   0.19 (  1%)   0.06 (  0%)   0.29 (  0%)   41195 kB (  2%)
 tree operand scan                  :   1.45 (  4%)   3.05 (  9%)   4.63 (  6%)   49272 kB (  2%)
 expand                             :   0.20 (  1%)   0.01 (  0%)   0.19 (  0%)   63027 kB (  3%)
 final                              :   0.42 (  1%)   0.15 (  0%)   0.60 (  1%)   46345 kB (  2%)
 symout                             :   2.23 (  6%)   0.64 (  2%)   2.88 (  4%)  403218 kB ( 16%)
 var-tracking emit                  :   0.50 (  1%)   0.01 (  0%)   0.54 (  1%)   45976 kB (  2%)
 TOTAL                              :  37.09         35.45         72.57        2470562 kB

@apriori
Copy link

apriori commented Aug 29, 2019

@kwaegel What is your ccache performance if you force it to not use direct mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants