Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upincr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely. #45873
Comments
This comment has been minimized.
This comment has been minimized.
|
I'm working on this. |
This comment has been minimized.
This comment has been minimized.
|
I've collected some data from a few crates. It appears that duplicate reads happen much more frequently than originally thought. Of the crates I tested, the lowest was 19.2% duplicated reads. @michaelwoerister since this doesn't match what you were expecting, would you mind looking at my changes and verifying that I didn't do something wrong? My changes are available in my incr_duplicate_read_stats branch (diff). |
This comment has been minimized.
This comment has been minimized.
|
Huh, this is very interesting. Thank you so much for collecting this data. Goes to show again that one should always measure before assuming anything. The data collection in your branch looks correct, except for one thing: For anonymous nodes we can't really make deduplication delayed, so counting them too will skew the assessment of what effect the proposed optimization would have. Sorry for not mentioning this earlier. Would you mind adapting your code so that it uses a separate counter for anonymous nodes? I'm not going to make a prediction on how this will affect the numbers |
This comment has been minimized.
This comment has been minimized.
|
Reran the same tests with the updated code. The results aren't significantly different (the above "Full results here" link now points to the updated results). |
This comment has been minimized.
This comment has been minimized.
|
Thanks, @wesleywiser! OK, let's see, I see two options on how to proceed:
I'll let you decide, @wesleywiser |
This comment has been minimized.
This comment has been minimized.
|
I'm game to implement delayed duplication and see how it performs. Do you mind if I go ahead and push up a PR for the stats collection? Also, what's the best way to measure the performance before and after the change? Just use |
This comment has been minimized.
This comment has been minimized.
Cool, I'm really curious how it will do.
No, please go ahead!
I usually use That's for local testing. Once you have a version that you think is optimized enough, you can open a PR and @Mark-Simulacrum will trigger a perf.rust-lang.org measurement for us. That will give us a good idea on what performance will look like. |
wesleywiser
added a commit
to wesleywiser/rust
that referenced
this issue
Nov 17, 2017
wesleywiser
referenced this issue
Nov 17, 2017
Merged
[incremental] Collect stats about duplicated edge reads from queries #46068
bors
added a commit
that referenced
this issue
Nov 20, 2017
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister I've tried delaying deduplication until serialization but I'm not seeing much of a difference in compilation time. I've set a If the issue is reallocations, would preallocating the (For reference purposes, my code is in my incr_delay_dedup branch) |
This comment has been minimized.
This comment has been minimized.
I think the potential for improvement here is about 2% for a debug build with an empty cache. For a rebuild this might actually slow things down, since we'll be deduplicating already deduplicated vectors
Pretty much, but you have to be careful to measure the correct thing. A few remarks:
There are a few things in our implementation that can be improved:
|
This comment has been minimized.
This comment has been minimized.
|
Thanks, that's really helpful! I implemented your feedback but the results don't look very good. (The source is available in the branch I linked above) |
This comment has been minimized.
This comment has been minimized.
|
Well, that's sad. But your implementation looks correct (except for not preserving edge order in the I hope you still found it interesting to try a few things out! I"ll make sure to mention your efforts on the next impl period newsletter. |
This comment has been minimized.
This comment has been minimized.
|
Thanks! One thing I'm left wondering is if the time saved by delaying the deduplication is getting eaten by resizing the |
This comment has been minimized.
This comment has been minimized.
|
If you are up for that you can certainly try. |
This comment has been minimized.
This comment has been minimized.
|
Ok! I'll try to collect some data and report back. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
This graph looks like it's missing some axes. Could you upload the raw data? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister Given the huge number of empty vectors, I think preallocating would probably cause a huge memory blowup. Unless you have any ideas, I think we can probably close this issue. |
This comment has been minimized.
This comment has been minimized.
|
@wesleywiser Yeah. Also, pre-allocating isn't free either. Thanks again for all your work on this! |




michaelwoerister commentedNov 8, 2017
At the moment the compiler will always eagerly deduplicate read-edges in
CurrentDepGraph::read_index. In order to be able to do this the compiler has to allocate aHashSetfor each task. My suspicion is that there is not much duplication to begin with and that there's potential for optimization here:So the first step would be to collect some data on how much read-edge duplication there even is. This is most easily done by modifying the compiler to count duplicates (in
librustc/dep_graph/graph.rs) and print the number/percentage in-Zincremental-info. This modified compiler can then be used to compile a number of crates to get an idea what's going on.If duplication is low (e.g. less than 2% of reads got filtered out), then we could just remove de-duplication and test the effect with a try-build. Otherwise, we can move deduplication to
DepGraph::serialize()and measure the performance impact of that.