Fix exponential execution time by memoizing format_expr #5139

HKalbasi · 2021-12-15T16:38:34Z

The problem in those issues is, when max_width is tight, rustfmt should check all possible formattings, which is exponential relative to input size. By memoization of format_expr function, worst case execution time becomes O(n*max_width^3). I tried codes in those issues and it completed in a fraction of second.

HKalbasi · 2021-12-15T16:58:01Z

Test are failing strangely. I think my QueryId is not unique enough. What is needed to be added?

calebcartwright · 2021-12-15T17:08:16Z

Test are failing strangely. I think my QueryId is not unique enough. What is needed to be added?

I can't say definitively whether there actually is a combination that will prove sufficiently unique, so you'll probably have to continue experimenting to gauge feasibility. Perhaps the expression's nodeid?

I also wouldn't be terribly keen on introducing a mutexted global, would it be feasible to attach the map to the rewrite context somehow?

HKalbasi · 2021-12-15T20:39:34Z

I can't say definitively whether there actually is a combination that will prove sufficiently unique, so you'll probably have to continue experimenting to gauge feasibility. Perhaps the expression's nodeid?

Expression's node id is shared between all expressions of a def id. But I found that if I exclude macros (memoization doesn't help there anyway) it will pass tests. Probably span value is unique outside of macros.

I also wouldn't be terribly keen on introducing a mutexted global, would it be feasible to attach the map to the rewrite context somehow?

I tried adding a RefCell to rewrite context, (last commit) but it doesn't improve performance and those issues still takes minutes. But the very same code with global mutex works. I don't know what is happening.

calebcartwright · 2021-12-16T17:55:16Z

I tried adding a RefCell to rewrite context, (last commit) but it doesn't improve performance and those issues still takes minutes. But the very same code with global mutex works. I don't know what is happening.

Sorry haven't had a chance to dig into this in detail and probably won't for the next couple weeks due to the holidays. I'd suspect that's a case of a cache miss with the former, perhaps due to some slightly different logic needed vs. the global.

Overall I think you're on to something here so really appreciate you digging in! Would like to continue iterating on the implementation specifics though, as well as incorporating some tests

HKalbasi · 2021-12-17T13:12:11Z

The problem was that rewrite context doesn't live long enough, so I moved it one layer up and it is now fixed.

as well as incorporating some tests

I added examples in those issues as regular tests, so a regression would cause cargo test to not terminate in reasonable time (but it won't fail). A global timeout for all tests would make it to fail.

calebcartwright · 2021-12-21T02:56:06Z

Fantastic thank you!

camsteffen · 2021-12-29T16:20:50Z

src/rewrite.rs

+    pub(crate) span: SpanData,
+}
+
+pub(crate) type Memoize = Option<HashMap<QueryId, Option<String>>>;


Is the outer Option necessary?

Thanks for reviewing! Yes, in case of None function should clear the store at the end, but in Some(empty) it should fill it and keep it.

Hmm wouldn't it be the same to do clean = map.is_empty()?

*map = Some(HashMap::default()); clean = true;

If we set clean = map.is_empty(), then *map = Some(HashMap::default()); should become inserting some dummy thing (which isn't good) to map, otherwise subsequent calls will also receive empty map and clean it, while they shouldn't.

I see. I wonder if there is a cleaner solution here, such as clearing the map after each Item is formatted.

src/rewrite.rs

camsteffen · 2021-12-29T17:14:16Z

Expression's node id is shared between all expressions of a def id.

Maybe this is because id's arent assigned until macro expansion (code), which I assume doesn't run for rustfmt? If that is the case, I wonder if it would be worthwhile to make a change upstream so that node ids may be assigned without macro expansion.

calebcartwright · 2022-01-29T18:06:12Z

Thanks again @HKalbasi! Few questions/asks:

Could you please incorporate some inline comments explaining the why and thinking? Even after just a couple weeks I found I'd forgotten a lot of context, and I can only imagine someone trying to look at this for the first time months/years from now
I feel like we should be able to avoid at least one of clones, most likely once we have a hit. Would using a mutable borrow on the map and remove_entry instead of get here do the trick?
Any thoughts on some of the additional optimizations @camsteffen suggested, e.g. Fix exponential execution time by memoizing format_expr #5139 (comment) ?

src/expr.rs

HKalbasi · 2022-01-29T21:12:17Z

I feel like we should be able to avoid at least one of clones, most likely once we have a hit. Would using a mutable borrow on the map and remove_entry instead of get here do the trick?

Nothing prevent it from hitting more than once. So if we remove it from map in the first hit, we need to calculate it again. If clones are too bad, I think we can use Rc since caller just read the result, but I don't feel it is necessary.

Any thoughts on some of the additional optimizations @camsteffen suggested

I don't think that's an optimization, but it would make code cleaner. I'm not confident where to clean cache so it cleans enough and not more. And in current way all of codes related to memoizing are in the same function, so lets keep it as is. I tried to explain why there is Option<HashMap> in the comments.

The second one (RefCell vs Cell) was an optimization, and it's done.

calebcartwright · 2022-01-29T21:22:54Z

re: my question in #5139 (comment) and 👇

Nothing prevent it from hitting more than once. So if we remove it from map in the first hit, we need to calculate it again.

I was trying to think of a scenario where we'd have to recompute a Some(...) with the exact same shape (and I still can't see that), but I think I could see a scenario where it's a None that we could potentially hit twice. Does that sound right or is there a scenario for the former you can think of?

In short, I'm trying to be mindful of time vs. space complexity, especially since the originating issues occur in only the most extreme of circumstances derived from generated code whereas this change is going to impact quite literally everything. As such, I do want to be mindful of clones. If there's no way around it then so be it, but it's certainly worth the thought exercise

HKalbasi · 2022-01-29T21:39:17Z

I'm not sure at this point if we hit Some in the cache. ~~If there is no such case, we can use a HashSet instead of HashMap and return None if we calculated it before. But would it format correctly if this conjecture becomes wrong?~~ (Edit: I tried it and tests failed)

In short, I'm trying to be mindful of time vs. space complexity, especially since the originating issues occur in only the most extreme of circumstances derived from generated code whereas this change is going to impact quite literally everything. As such, I do want to be mindful of clones. If there's no way around it then so be it, but it's certainly worth the thought exercise

Yes, now I see the second clone is pure overhead in normal cases. Rc is still an option.

calebcartwright · 2022-02-03T03:52:58Z

tests/target/performance/issue-4476.rs

+                                                                .and_then(|state| {
+                                                                    state.sequence(|state| {
+                                                                        self::space(state)
+                                      .and_then(|state| super::hidden::skip(state))


The indentation here is quite off. I can't say offhand whether that's directly related because we don't really have a good frame of reference (as IIRC rustfmt on master is crashing on this input), but it could be indicative of a shape issue. The human-level diff comparison is also tough to do by eye given the heavy ident reuse.

Would you be willing to try to boil this down to a smaller repo, starting with the containing function at the same indentation level to see if we can rule out there being any shape-related oddities in the map entries? If it turns out to just be a chain issue (which wouldn't surprise me in the least) that'll be good to know, but definitely doesn't need to be resolved here

Any example should fill the max_width limit, either with indentation or with decreasing the limit. example in the #4867 is fairly minimal and I tried to minimize it more:

fn func() { state . rule (Rule :: myrule , | state | { state . sequence (| state | { state . sequence (| state | { state . match_string ("abc") . and_then (| state | { super :: hidden :: skip (state) }) . and_then (| state | { state . match_string ("def") }) }) . and_then (| state | { super :: hidden :: skip (state) }) . and_then (| state | { state . sequence (| state | { state . optional (| state | { state . sequence (| state | { state . match_string ("abc") . and_then (| state | { super :: hidden :: skip (state) }) . and_then (| state | { state . match_string ("def") }) }) . and_then (| state | { state . repeat (| state | { state . sequence (| state | { super :: hidden :: skip (state) . and_then (| state | { state . sequence (| state | { state . match_string ("abc") }) }) }) }) }) }) }) }) }) }); }

this will take 7s on my system with (max_width=70) and then return no solution. Examples that have solution in the max_width limit are faster since the solution is not the last thing that we try, so minimizing them is harder because each step halves the execution time. Does this example helps?

Sorry but I don't understand your response in relation to the question that was posed. The issue here is that the emitted formatting is wrong. The runtime performance of how long it takes to produce the incorrect formatting is orthogonal.

Do you have any insight into why this is producing an incorrect result?

Ah, I completely ignored the selected lines and misunderstood your message. This seems very bad, will look at it.

no worries, and thank you!

It seems in lines 192-207 tabs are 2 spaces but in rest tabs are 4 spaces, any ideas?

the input had two spaces iirc, so perhaps it's somehow "successfully" falling back to the span contents of the input? fwiw it might be more helpful on human eyes to butcher the input formatting a bit more and/or use some different idents given how much similarity there is (as that would make things stand out more between input/output)

You are right, if I change the spaces to 1, it will be 1 in the output. And I can reproduce it on rustfmt installed on my machine after ~50s, so it is not caused by this PR.

Okay thank you for confirming. That strikes me as very odd behavior considering that we're dealing with a chain (a massive one but a chain nevertheless) and how chain formatting behaves in less.. intense scenarios, but so long as it's not something being introduced/modified as part of this PR then we should be good.

Slightly minimized, takes 3s on rustfmt master: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d36a8d8e3ddc4b6b0223144a4b73ded9

Increasing max_width will fix the problem, so there is a possiblity that this problem have the same root with the long running time.

calebcartwright · 2022-02-03T03:54:43Z

I'm not sure at this point if we hit Some in the cache. ~~If there is no such case, we can use a HashSet instead of HashMap and return None if we calculated it before. But would it format correctly if this conjecture becomes wrong?~~ (Edit: I tried it and tests failed)

I won't speak to the set vs. map piece, though to the original question, the only way rustfmt would ever format one of these expressions incorrectly would be if we'd done something egregious and were storing the wrong Some(String...) for the wrong query id. In practice the real worst case would be returning a None speciously, in which case rustfmt simply emits the original span contents of the expression (functionally skipping attempting to format that expression). Not ideal of course, but it's something done elsewhere on other code paths so certainly not disastrous either.

Yes, now I see the second clone is pure overhead in normal cases. Rc is still an option.

Will differ to your preference, but I want to note the execution time numbers I'm seeing locally with your current version (clone) and with the adjustment I suggested to remove the entry from the map via a mutable ref so as to avoid the need for the clone in the return case:

(note I've elided the --check contents as it's irrelevant noise in this context):

Current with the get + clone:

$ time target/debug/rustfmt tests/target/performance/issue-5128.rs --check
.....

real	0m0.632s
user	0m0.619s
sys	0m0.012s

With take_entry skipping the clone:

$ time target/debug/rustfmt tests/target/performance/issue-5128.rs --check
.....

real	0m0.635s
user	0m0.622s
sys	0m0.012s

And using the original/source input, as-is with get + clone

$ time target/debug/rustfmt tests/source/performance/issue-5128.rs --check
.....

real	0m8.926s
user	0m8.822s
sys	0m0.069s

with take_entry:

$ time target/debug/rustfmt tests/source/performance/issue-5128.rs --check
.....

real	0m8.776s
user	0m8.734s
sys	0m0.041s

Realize the above doesn't exactly qualify as a statistically significant analysis but I'm seeing the same pattern consistently across multiple runs, and for the other input files as well. This gives me sufficient confidence the extra clone has no substantial execution time impact, at least not in any context known to us today, and we should go with one of the discussed approaches to avoid it.

calebcartwright · 2022-02-03T03:56:11Z

Also just want to say thanks again for all your efforts on this! I think it'll be a great step forward and address several known problems, but also want to be very careful and thoughtful given the code paths that are getting touched.

HKalbasi · 2022-02-03T09:49:08Z

Third hits are not impossible, we can see by testing this (it fails):

if let Some(r) = map.remove(&query_id) {
    map.insert(query_id, Some("we don't hit again".to_string()));
    context.memoize.set(Some(map)); // restore map in the memoize cell for other users
    return r;
}

Your numbers shows that things which we calculate exactly two times are more than things which we calculate more times. Existence of third hits can lead to exponential time theoretically, thought it isn't probably important in practical cases. Anyway, that clone is in cold path in normal codes (issue-5128.rs isn't normal code), and what we did before memoizing was to completely recalculate the string from the ground up, which is definitely more costly than cloning, and it was fine.

What clone that is IMO more important, and has overhead relative to previous state for normal codes, is the second clone:

if let Some(mut map) = context.memoize.take() {
    map.insert(query_id, r.clone()); // insert the result in the memoize map
    context.memoize.set(Some(map)); // so it won't be computed again
}

In normal cases, most of expressions won't even hit once and they will waste. I don't have solution for it. I tried Rc and it doesn't work either because some (most?) of the callers want to mutate the result (for example append something) so they need to clone the actual data anyway.

Also just want to say thanks again for all your efforts on this! I think it'll be a great step forward and address several known problems, but also want to be very careful and thoughtful given the code paths that are getting touched.

I completely understand your concerns, and thanks for maintaining rustfmt!

calebcartwright · 2022-02-11T03:28:13Z

What clone that is IMO more important, and has overhead relative to previous state for normal codes, is the second clone

Yes, though I'd basically accepted that we're going to have to introduce at least one, just as one of the inherent drawbacks with memoization. While I agree that the second clone won't always be at play in normal (non-generated, non-borderline-pathlogical 😆) code, I feel like it may with certain expressions in scenarios such as RHS comparisons.

Third hits are not impossible, we can see by testing this (it fails):

That's a much more practical way of testing it than my trying to mentally execute code flows!

Based on what we're seeing here I'm starting to suspect there's probably a middle ground somewhere that would both be a marked improvement in the generated code cases while also still minimizing unnecessary clones in more standard cases, even if the result is still less than optimal from an asymptotic complexity pov.

I wonder if we could use a more advanced type for the map values which included some kind of a hit counter that would be used to conditionally determine whether to clone + attach the result to the map entry, and potentially conditionally remove the entry as well to split the difference on some of the cloning I was previously referring to

HKalbasi · 2022-02-11T22:12:33Z

I measured execution time of system and idempotence tests (I know they are not necessarily good data, just it was easy to add a timer in run_test_with) in some scenarios, excluding tests in this PR. Results have a ±0.03 second error, I tried to pick the mean in multiple runs.

Without caching:

Time elapsed in system: 0.579 seconds
Time elapsed in idempotence: 0.704 seconds

This PR at current state: Seems it is helping in normal code as well!

Time elapsed in system: 0.531 seconds
Time elapsed in idempotence: 0.640 seconds

Removing from map at first hit:

Time elapsed in system: 0.544 seconds
Time elapsed in idempotence: 0.632 seconds

results are mixed, but I think 3 is slightly faster. The difference is cost of the second clone, minus cost of recalculations in third times.

I tried to make a version of 1 with an additional clone, but optimizer seems catch me and remove the clone.

Based on what we're seeing here I'm starting to suspect there's probably a middle ground somewhere that would both be a marked improvement in the generated code cases while also still minimizing unnecessary clones in more standard cases, even if the result is still less than optimal from an asymptotic complexity pov.

I'm fine with removing from map on first hit, but data above doesn't support it makes a noticeable difference.

And I want to mention Rc/Cow again, using it needs some change in most of the codebase (so out of scope for this PR), but it probably saves the first clones in addition to the second clones.

calebcartwright · 2022-04-01T23:44:14Z

I'd put this on hold for a bit as the release we had pending was already large and I didn't want to try to pull this in on top, but am circling back to it now.

I am, however, unsure as to what you were trying to test/assert with your last comment as it seems duplicative of what was established earlier relative to timing: my prior point was that the run/execution time was largely the same regardless of whether there was an extra clone, so I'd rather avoid allocating increasingly long strings that weren't providing any benefit at runtime nor IMO pre-compile time in terms of the source code.

And I want to mention Rc/Cow again, using it needs some change in most of the codebase (so out of scope for this PR), but it probably saves the first clones in addition to the second clones.

Apologies if I forgot to respond to this thread previously, but I saw it and agreed completely that I don't want to go down this route.

calebcartwright · 2022-04-01T23:49:37Z

Also, the review comment I left in #5139 (comment) is still outstanding and is something we'd need clarity on before proceeding. I can't tell if the shape is just wrong or if we've perhaps got a mapping issue, but something is wrong there.

If we can come to a resolution on that then I think I'll be good to move forward with this as-is, though I'll probably spend a little time afterwards doing more experimentation.

HKalbasi · 2022-04-02T08:20:34Z

so I'd rather avoid allocating increasingly long strings that weren't providing any benefit at runtime nor IMO pre-compile time in terms of the source code.

I don't think its affect on memory usage would be more than ten megabytes even in machine generated codes (we can measure it) so is it that bad?

calebcartwright · 2022-04-02T15:59:18Z

I don't think its affect on memory usage would be more than ten megabytes even in machine generated codes (we can measure it) so is it that bad?

This is an increasingly moot topic of debate because as noted previously the only outstanding issue needing resolution is relative to this change producing incorrect formatting.

I'm not requesting that you make any changes on this piece, however, I want to stress that we don't want to just incorporate unhelpful/unnecessary steps into the codebase unless they clear a threshold of being unnecessarily "bad enough"; we've very much a strategy of "only do that which is necessary".

There's plenty of cases, both those that could easily be contrived and practical ones within the rustfmt codebase, where some minimally (or even be negligibly) impactful step could be unnecessarily taken, e.g. a requiring clone instead of working with a slice. However, we'd still want to avoid such an unnecessary action regardless, i.e. go with the slice instead of the unnecessary clone.

I'm simply pointing out that the exact same principle is applicable here: we need to have a rationalization for the usage of a clone; we don't want to use clones freely/loosely and then have to rationalize the removal/avoidance of them

calebcartwright · 2022-04-17T14:32:37Z

Alrighty, let's give this a try!

ytmimi · 2022-06-22T23:54:13Z

@calebcartwright given that we just had to revert these changes in #5403 do we want to reopen all the linked issues?

HKalbasi · 2022-06-23T16:42:34Z

Ah, sorry for breaking rustfmt.

If I understand correctly, the problem is that the result is not a function of just shape and expr. Would including the fields of RewriteContext (except config, parse_sess and snippet_provider which it seems they are global) in QueryId be enough to solve this problem?

calebcartwright · 2022-06-23T16:46:05Z

No worries! This is a fairly tricky area and I didn't foresee this either.

I suspect there will be more to it and that we may need a more advanced/conditional type of memoization flow like I was alluding to earlier in these discussions (albeit with even more conditions)

HKalbasi force-pushed the master branch 2 times, most recently from a888951 to 6ff4a71 Compare December 17, 2021 13:03

calebcartwright self-requested a review December 21, 2021 02:56

calebcartwright added the pr-not-reviewed label Dec 21, 2021

camsteffen reviewed Dec 29, 2021

View reviewed changes

HKalbasi force-pushed the master branch from 6ff4a71 to 9c65c8c Compare December 30, 2021 00:07

calebcartwright added pr-waiting-on-author and removed pr-not-reviewed labels Jan 29, 2022

Memoize format_expr

2cc6641

HKalbasi force-pushed the master branch from 9c65c8c to 2cc6641 Compare January 29, 2022 20:54

calebcartwright reviewed Jan 29, 2022

View reviewed changes

src/expr.rs Show resolved Hide resolved

calebcartwright removed the pr-waiting-on-author label Jan 29, 2022

calebcartwright reviewed Feb 3, 2022

View reviewed changes

calebcartwright added the pr-waiting-on-author label Feb 3, 2022

calebcartwright removed the pr-waiting-on-author label Feb 4, 2022

calebcartwright added the pr-waiting-on-author label Apr 1, 2022

calebcartwright removed the pr-waiting-on-author label Apr 2, 2022

calebcartwright merged commit a37d3ab into rust-lang:master Apr 17, 2022

This was referenced Jun 23, 2022

rustfmt is slow on pest generated grammars #4476

Open

rustfmt takes 32 seconds for this 1140 character file #4867

Open

Fix exponential execution time by memoizing format_expr #5139

Fix exponential execution time by memoizing format_expr #5139

Conversation

HKalbasi commented Dec 15, 2021

HKalbasi commented Dec 15, 2021

calebcartwright commented Dec 15, 2021

HKalbasi commented Dec 15, 2021

calebcartwright commented Dec 16, 2021

HKalbasi commented Dec 17, 2021

calebcartwright commented Dec 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camsteffen commented Dec 29, 2021

calebcartwright commented Jan 29, 2022

HKalbasi commented Jan 29, 2022

calebcartwright commented Jan 29, 2022 • edited

HKalbasi commented Jan 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calebcartwright commented Feb 3, 2022

calebcartwright commented Feb 3, 2022

HKalbasi commented Feb 3, 2022

calebcartwright commented Feb 11, 2022

HKalbasi commented Feb 11, 2022

calebcartwright commented Apr 1, 2022 • edited

calebcartwright commented Apr 1, 2022 • edited

HKalbasi commented Apr 2, 2022

calebcartwright commented Apr 2, 2022

calebcartwright commented Apr 17, 2022

ytmimi commented Jun 22, 2022

HKalbasi commented Jun 23, 2022

calebcartwright commented Jun 23, 2022

calebcartwright commented Jan 29, 2022 •

edited

HKalbasi commented Jan 29, 2022 •

edited

calebcartwright commented Apr 1, 2022 •

edited

calebcartwright commented Apr 1, 2022 •

edited