Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential growth in compile time, memory usage for obligation evaluation #74456

Closed
syntacticsugarglider opened this issue Jul 17, 2020 · 5 comments
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections

Comments

@syntacticsugarglider
Copy link
Contributor

The source provided here, also used for reproduction of the previously filed #73593, takes an unexpectedly long time to compile and consumes substantial memory, growing exponentially as seemingly trivial complexity is added to the source. For example, uncommenting this line results in rustc crashing due to OOM killer on my system.

I apologize for the lack of MCVE but my attempts to investigate this have been stymied by my lack of understanding of the relevant portions of the compiler. This zulip thread contains the discussion of investigation so far, but the concise overview is that through use of massif and perf I've narrowed the performance issues (both compile time and compile memory) down to the obligations Vec used here in rustc and, more specifically, the subsequent retain call for deduplication. Over half the CPU time for a full build of this crate is spent computing the Hash of ObligationCauseCode in those insert calls.

Identifying this hot path as the problem is informative but insufficient to allow me to minimize this. The most granular causative information I have is from the output of -Zself-profile, which suggests that the time spent is in evaluate_obligation (as is clear from the above information as well) for queries such as

Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Region(U0) }], value: ParamEnvAnd { param_env: ParamEnv { caller_bounds: [], reveal: UserFacing, def_id: None }, value: Binder(TraitPredicate(<Item as __DERIVE_PROTOCOL_Id::__protocol::Unravel<protocol_mve_transport::Transport<std::boxed::Box<dyn CloneSpawn>, std::convert::Infallible, std::convert::Infallible, Item>>>)) } }

The unifying theme among the queries that consume the vast majority of the total CPU time is that they're all root-level evaluations of predication on the Unravel trait, and there doesn't appear to be a way to achieve further granularity with -Zself-profile such that I could establish what resultant predicates are being produced that in turn would result in the voluminous contents of that obligations Vec, whatever they might be. I'm not sure if the issue here is a failure/missed opportunity in deduplication, an accumulation of a huge number of predicates before deduplication such that a single iteration consumes all this time (perf records using sampling so I don't know whether many small calls or fewer more time-consuming calls are being made), or something else entirely.

For profiling I'm using a locally built stage 1 rustc on x86_64-unknown-linux-gnu from commit 8aa18cbdc5d4bc33bd61e2d9a4b643d87f5d21de though this issue occurs on current stable etc. as well.

@syntacticsugarglider syntacticsugarglider added the C-bug Category: This is a bug. label Jul 17, 2020
@wesleywiser wesleywiser added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. labels Jul 17, 2020
@wesleywiser
Copy link
Member

@rustbot ping icebreakers-cleanup-crew

Help creating a smaller MCVE would be appreciated!

@rustbot
Copy link
Collaborator

rustbot commented Jul 17, 2020

Hey Cleanup Crew ICE-breakers! This bug has been identified as a good
"Cleanup ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @senden9 @shekohex @sinato @spastorino @turboladen @woshilapin @yerke

@rustbot rustbot added the ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections label Jul 17, 2020
@tanriol
Copy link
Contributor

tanriol commented Aug 4, 2020

I haven't looked in the protocol crate yet, but the numbers I've seen from println-based debugging look really big

A predicate added 3855177 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3856627 obligations from 7 predicates, deduplicated to 1532
A predicate added 3855069 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29
A predicate added 8793514 obligations; its debug repr takes 40258 bytes
At depth 1 I saw 8795078 obligations from 7 predicates, deduplicated to 1641
A predicate added 3855069 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29
A predicate added 3855177 obligations; its debug repr takes 35046 bytes
At depth 3 I saw 3856627 obligations from 7 predicates, deduplicated to 1532
A predicate added 3855069 obligations; its debug repr takes 35046 bytes
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29
A predicate added 8793514 obligations; its debug repr takes 40270 bytes                
At depth 1 I saw 8795078 obligations from 7 predicates, deduplicated to 1641         
A predicate added 3855069 obligations; its debug repr takes 35046 bytes                
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29           
A predicate added 3855177 obligations; its debug repr takes 35034 bytes                
At depth 2 I saw 3856627 obligations from 7 predicates, deduplicated to 1532         
A predicate added 3855069 obligations; its debug repr takes 35034 bytes                
At depth 2 I saw 3855080 obligations from 7 predicates, deduplicated to 29           
A predicate added 3855177 obligations; its debug repr takes 35034 bytes                
At depth 3 I saw 3856627 obligations from 7 predicates, deduplicated to 1532         
A predicate added 3855069 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29
A predicate added 8793514 obligations; its debug repr takes 40258 bytes
At depth 1 I saw 8795078 obligations from 7 predicates, deduplicated to 1641
A predicate added 3855069 obligations; its debug repr takes 35034 bytes                
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29           
A predicate added 3855177 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3856627 obligations from 7 predicates, deduplicated to 1532
A predicate added 3855069 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29
A predicate added 3855177 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3856627 obligations from 7 predicates, deduplicated to 1532
A predicate added 3855069 obligations; its debug repr takes 35034 bytes
At depth 3 I saw 3855080 obligations from 7 predicates, deduplicated to 29
A predicate added 3855177 obligations; its debug repr takes 35034 bytes
At depth 1 I saw 3856627 obligations from 7 predicates, deduplicated to 1532

One of the (much smaller!) predicates causing this looked like

this
TraitPredicate(<
__DERIVE_PROTOCOL_Id::__protocol::future::MapErr<
__DERIVE_PROTOCOL_Id::__protocol::future::MapErr<
__DERIVE_PROTOCOL_Id::__protocol::future::finalize::VecFinalize<
protocol_mve_transport::Transport<
std::boxed::Box<
dyn CloneSpawn>,
std::convert::Infallible,
std::convert::Infallible,
Item>,
Binding>,
fn(_) -> __DERIVE_PROTOCOL_Id::__protocol::allocated::vec::VecUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::tuples::Tuple2UnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
_,
__DERIVE_PROTOCOL_Id::__protocol::future::finalize::Tuple2Error<
__DERIVE_PROTOCOL_Id::__protocol::option::OptionUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,protocol_mve_transport::SerdeWriteError<
_>,
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::result::ResultUnravelError<
_,
_,
protocol_mve_transport::SerdeWriteError<
_>,
protocol_mve_transport::SerdeWriteError<
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::arrays::ArrayUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>>,
_,
_>>>,
__DERIVE_PROTOCOL_Id::__protocol::future::finalize::Tuple2Error<
__DERIVE_PROTOCOL_Id::__protocol::option::OptionUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::result::ResultUnravelError<
_,
_,
protocol_mve_transport::SerdeWriteError<
_>,
protocol_mve_transport::SerdeWriteError<
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::arrays::ArrayUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>>,
_,
_>>>>>,
_>>,
fn(_) -> __DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::allocated::vec::VecUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::tuples::Tuple2UnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
_,
__DERIVE_PROTOCOL_Id::__protocol::future::finalize::Tuple2Error<
__DERIVE_PROTOCOL_Id::__protocol::option::OptionUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::result::ResultUnravelError<
_,
_,
protocol_mve_transport::SerdeWriteError<
_>,
protocol_mve_transport::SerdeWriteError<
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::arrays::ArrayUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>>,
_,
_>>>,
__DERIVE_PROTOCOL_Id::__protocol::future::finalize::Tuple2Error<
__DERIVE_PROTOCOL_Id::__protocol::option::OptionUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::result::ResultUnravelError<
_,
_,
protocol_mve_transport::SerdeWriteError<
_>,
protocol_mve_transport::SerdeWriteError<
_>,
__DERIVE_PROTOCOL_Id::__protocol::tuples::FlatUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
__DERIVE_PROTOCOL_Id::__protocol::arrays::ArrayUnravelError<
protocol_mve_transport::SerdeWriteError<
_>,
_,
protocol_mve_transport::SerdeWriteError<
_>,
_>>,
_,
_>>>>>,
_>>> as std::marker::Unpin>)

@syntacticsugarglider
Copy link
Contributor Author

syntacticsugarglider commented Aug 6, 2020

(copy-paste from Zulip)

at this point I think what's happening is that normalization generates a bunch of extra obligations. i'm getting output for example "with 885017 obligations" from here: https://doc.rust-lang.org/nightly/nightly-rustc/src/rustc_trait_selection/traits/project.rs.html#270

presumably, after that happens deduplication occurs at some point. i've been messing around with RUSTC_LOG and have noticed that the actual process of generating those obligations is pretty quick but after each execution of normalize_with_depth_to the compiler takes longer and longer to process the next iteration, presumably due to time spent deduplicating (as I earlier ascertained that doing so is a substantial portion of the CPU time spent, in the other thread in #t-compiler/help. it does seem like the amount of time spent deduplicating is nonlinear over the number of obligations generated, which doesn't make sense to me given that it's just inserting them all into a hashset, but I'm not sure. this output from projection is consistent with Denis's findings above i.e. that most obligations are redundant

what needs to happen I suppose is some sort of deduplication of the accumulated obligations during normalization during that process as opposed to after the fact. I don't know what the best path is to do that: I don't know whether storing them all in an FxHashSet in the first place is feasible (maybe some duplicates are actually necessary? i seem to remember seeing some comments about ordering guarantees, how does that impact potential deduplication strategies? etc.) and I'm not sure what the compiler's general design philosophy requires for this to be implemented elegantly and soundly

so I guess at this point I'm in need of

  • confirmation of whether my assessment of the problem is correct
  • confirmation of whether my assumption that earlier deduplication is the best (most feasible & efficient) strategy as opposed to some higher-level algorithmic improvement
  • guidance on how to implement something of the sort without interfering with guarantees of which I might be unaware or causing other problems

@syntacticsugarglider
Copy link
Contributor Author

After further review and as noted on Zulip this appears to just be #20304.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections
Projects
None yet
Development

No branches or pull requests

4 participants