LLVM/MIR optimizations to simplify the new RFC3058 `?` desugaring #85133

scottmcm · 2021-05-10T04:50:59Z

With #84767 implementing the new ? desugaring, there's no mir-opt that can simplify the double-match it emits, as the arm identity pass only does Result->Result, and thus cannot handle Result->ControlFlow->Result.

That's not a blocker for checkin, as the arm identity pass is disabled by default currently because of things like #78628 and it depends on the MIR inliner being enabled anyway, but it sure would be nice to do better here.

Zulip conversation with some more context: https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/Suggestions.20for.20a.20beginner/near/237739360

The text was updated successfully, but these errors were encountered:

Moxinilian · 2021-05-10T06:30:48Z

I am interested in taking this one as a first contribution. I will probably have time to work on it this week.

Moxinilian · 2021-05-21T17:00:08Z

For interested people, conversation on Zulip moved to https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/Optimizing.20a.20match.20on.20a.20just-constructed.20enum.20.2385133

michaelmaitland · 2021-07-12T19:57:31Z

I see this issue marked as open. Is it being worked on or available for me to pickup?

Moxinilian · 2021-07-13T11:29:02Z

It is being worked on, see #85646. Any suggestion/help is welcome, though!

…jgillot MIR opt: separate constant predecessors of a switch For each block S ending with a switch, this pass copies S for each of S's predecessors that seem to assign the value being switched over as a const. This is done using a somewhat simple heuristic to determine what seems to be a const transitively. More precisely, this is what the pass does: - find a block that ends in a switch - track if there is an unique place set before the current basic block that determines the result of the switch (this is the part that resolves switching over discriminants) - if there is, iterate over the parents that have a reasonable terminator and find if the found determining place is likely to be (transitively) set from a const within that parent block - if so, add the corresponding edge to a vector of edges to duplicate - once this is done, iterate over the found edges: copy the target block and replace the reference to the target block in the origin block with the new block This pass is not optimal and could probably duplicate in more cases, but the intention was mostly to address cases like in rust-lang#85133 or rust-lang#85365, to avoid creating new enums that get destroyed immediately afterwards (notably making the new try v2 `?` desugar zero-cost). A benefit of this pass working the way it does is that it is easy to ensure its correctness: the worst that can happen is for it to needlessly copy a basic block, which is likely to be destroyed by cleanup passes afterwards. The complex parts where aliasing matters are only heuristics and the hard work is left to further passes like ConstProp. # LLVM blocker Unfortunately, I believe it would be unwise to enable this optimization by default for now. Indeed, currently switch lowering passes like SimplifyCFG in LLVM lose the information on the set of possible variant values, which means it tends to actually generate worse code with this optimization enabled. A fix would have to be done in LLVM itself. This is something I also want to look into. I have opened [a bug report at the LLVM bug tracker](https://bugs.llvm.org/show_bug.cgi?id=50455). When this is done, I hope we can enable this pass by default. It should be fairly fast and I think it is beneficial in many cases. Notably, it should be a sound alternative to simplify-arm-identity. By the way, ConstProp only seems to pick up the optimization in functions that are not generic. This is however most likely an issue in ConstProp that I will look into afterwards. This is my first contribution to rustc, and I would like to thank everyone on the Zulip mir-opt chat for the help and support, and especially `@scottmcm` for the guidance.

BGR360 · 2021-08-23T22:07:52Z

Is this issue now resolved after @Moxinilian 's work?

Moxinilian · 2021-08-23T22:15:49Z

Right now it is not. Right now this is blocked by LLVM doing a sub-optimal transformation that loses the optimization at a very early stage, preventing the duplicated tails to be merged back into one. I've tried investigating if we could not merge them back directly in MIR. While possible in theory, my understanding was that MIR still abstracts away too much details (notably on the discriminant aspect) for the algorithm to be reasonably complex. So right now we either need a breakthrough on that front or work in the LLVM codebase. I want to start investigating more in the LLVM side of things, but I have not successfully allocated enough time for it as of now.

Moxinilian · 2021-08-23T22:32:12Z

More on fixing LLVM:

Right now, rustc calls into LLVM with a pass pipeline that starts with the SimplifyCFG LLVM pass that uses the LowerSwitch utility to turn switches into lower level constructs. An issue with that pass is that it is opinionated in what is worth simplifying away, which includes unreachable switch targets. So for example, the following LLVM IR:

define i64 @demo(i64 %x) {
entry:
    switch i64 %x, label %bb3 [
        i64 0, label %bb1
        i64 1, label %bb2
    ]
bb1:
    ret i64 0
bb2:
    %0 = icmp eq i64 %x, 100 ; this will necessarily be false because %x == 1
    br i1 %0, label %bb4, label %bb5
bb3:
    unreachable
bb4:
    ret i64 200
bb5:
    ret i64 10
}

is turned into

define i64 @demo(i64 %x) {
entry:
    %switch = icmp ult i64 %x, 1
    %0 = icmp eq i64 %x, 100
    %. = select i1 %0, i64 200, i64 10
    %merge = select i1 %switch, i64 0, i64 %.
    ret i64 %merge
}

which throws away the exhaustive nature of the switch, making in this example further passes not able to optimize the useless comparison away.

The code of LowerSwitch seems to contain an explicit part where it removes unreachable branches in switches. Forcing this off might change this behavior and make it work as intended. I have not gotten to try that yet, but plan to.

BGR360 · 2021-08-24T01:21:02Z

Oh, that's annoying. But very intriguing. Thanks for the explanation!

…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in #53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes #53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854

LebedevRI · 2022-02-17T09:15:11Z

FWIW this has been fixed in llvm trunk in llvm/llvm-project@371fcb7.

…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854

nikic · 2022-02-27T16:57:51Z

So ... what is the actual Rust code we're supposed to be optimizing here? Is the goal to get rid of the icmp + zext in https://rust.godbolt.org/z/nd9fEx3ed or is it something else? (The referenced LLVM change does not get rid of those.)

scottmcm · 2022-02-27T18:49:12Z

@nikic That's right. try { x? } not being a nop, as in your example, is the easiest demonstration.

As I understand it, the icmp shows up because there's a trunc in there originally, but LLVM has lost track of the fact that the truncation doesn't ever wrap.

(My uninformed thought: It's odd to me that there's no nsw/nuw flags for trunc, as a way to say "I promise this came from a [sz]ext". If there were, then we could emit those flags in the discriminant loading, and avoid the icmp.)

scottmcm · 2022-02-27T19:05:07Z

Actually, here's a better cut-down version of the example. This is (essentially) the first part of ?, but with no trait calls involved to muddle things: https://rust.godbolt.org/z/MWxdTcGn6

use std::ops::ControlFlow;
pub fn demo(x: Result<u32, i32>) -> ControlFlow<i32, u32> {
    match x {
        Ok(v) => ControlFlow::Continue(v),
        Err(e) => ControlFlow::Break(e),
    }
}

Ok and Continue are both discriminant 0; Err and Break are both discriminant 1.

It looks like the core of what we emit there is

  %8 = load i32, i32* %7, align 4, !dbg !10, !range !11
  %_2 = zext i32 %8 to i64, !dbg !10
  switch i64 %_2, label %bb2 [
    i64 0, label %bb3
    i64 1, label %bb1
  ], !dbg !12

where that range metadata is

!11 = !{i32 0, i32 2}

So AFAIK that PR above was about trying to avoid turning the switch into an icmp+br, which lost that range info.

nikic · 2022-02-27T19:47:57Z

Okay, so I think the PhaseOrdering change is necessary to avoid folding the switch before SROA (of course, the alternative solution is to make SROA emit range assumptions). But it's not sufficient:

*** IR Dump After InstCombinePass on _ZN4test4test17h60e5e3f4ec590b13E ***
; Function Attrs: nonlazybind uwtable
define { i64, i64 } @_ZN4test4test17h60e5e3f4ec590b13E(i64 %x.0, i64 %x.1) unnamed_addr #0 {
start:
  %0 = call fastcc { i64, i64 } @"_ZN79_$LT$core..result..Result$LT$T$C$E$GT$$u20$as$u20$core..ops..try_trait..Try$GT$6branch17hed04f6fc65ae78f0E"(i64 %x.0, i64 %x.1)
  %.fca.0.extract1 = extractvalue { i64, i64 } %0, 0
  %.fca.1.extract2 = extractvalue { i64, i64 } %0, 1
  switch i64 %.fca.0.extract1, label %bb3 [
    i64 0, label %bb2
    i64 1, label %bb4
  ]

bb3:                                              ; preds = %start
  unreachable

bb2:                                              ; preds = %start
  br label %bb6

bb4:                                              ; preds = %start
  %1 = call fastcc i64 @"_ZN153_$LT$core..result..Result$LT$T$C$F$GT$$u20$as$u20$core..ops..try_trait..FromResidual$LT$core..result..Result$LT$core..convert..Infallible$C$E$GT$$GT$$GT$13from_residual17h7fc30301c010dde1E"(i64 %.fca.1.extract2)
  br label %bb6

bb6:                                              ; preds = %bb2, %bb4
  %.sroa.3.0 = phi i64 [ %1, %bb4 ], [ %.fca.1.extract2, %bb2 ]
  %.sroa.0.0 = phi i64 [ 1, %bb4 ], [ 0, %bb2 ]
  %2 = insertvalue { i64, i64 } undef, i64 %.sroa.0.0, 0
  %3 = insertvalue { i64, i64 } %2, i64 %.sroa.3.0, 1
  ret { i64, i64 } %3
}
*** IR Dump After SimplifyCFGPass on _ZN4test4test17h60e5e3f4ec590b13E ***
; Function Attrs: nonlazybind uwtable
define { i64, i64 } @_ZN4test4test17h60e5e3f4ec590b13E(i64 %x.0, i64 %x.1) unnamed_addr #0 {
start:
  %0 = call fastcc { i64, i64 } @"_ZN79_$LT$core..result..Result$LT$T$C$E$GT$$u20$as$u20$core..ops..try_trait..Try$GT$6branch17hed04f6fc65ae78f0E"(i64 %x.0, i64 %x.1)
  %.fca.0.extract1 = extractvalue { i64, i64 } %0, 0
  %.fca.1.extract2 = extractvalue { i64, i64 } %0, 1
  %switch = icmp ult i64 %.fca.0.extract1, 1
  br i1 %switch, label %bb6, label %bb4

bb4:                                              ; preds = %start
  %1 = call fastcc i64 @"_ZN153_$LT$core..result..Result$LT$T$C$F$GT$$u20$as$u20$core..ops..try_trait..FromResidual$LT$core..result..Result$LT$core..convert..Infallible$C$E$GT$$GT$$GT$13from_residual17h7fc30301c010dde1E"(i64 %.fca.1.extract2)
  br label %bb6

bb6:                                              ; preds = %start, %bb4
  %.sroa.3.0 = phi i64 [ %1, %bb4 ], [ %.fca.1.extract2, %start ]
  %.sroa.0.0 = phi i64 [ 1, %bb4 ], [ 0, %start ]
  %2 = insertvalue { i64, i64 } undef, i64 %.sroa.0.0, 0
  %3 = insertvalue { i64, i64 } %2, i64 %.sroa.3.0, 1
  ret { i64, i64 } %3
}

We'd need either InstCombine or SimplifyCFG to fold %.sroa.0.0 into %.fca.0.extract1.

I believe SimplifyCFG would do that as part of ForwardSwitchConditionToPHI, but this is only done at the very end of the pipeline.

nikic · 2022-02-27T22:17:58Z

Just remembered that we have https://github.com/llvm/llvm-project/blob/fdfe26ddbeb1a5eb6106a6a27d6f8240deca543f/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp#L1256 in InstCombine. I guess it would be natural to extend to that to switches and not just conditional branches.

…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854 (cherry picked from commit 371fcb7)

…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854

nikic · 2022-03-18T08:34:16Z

Upstream fix is llvm/llvm-project@4010a7a.

nikic · 2022-08-30T12:55:44Z

The LLVM side of this is fixed now, and #100693 added a codegen test. Is a MIR transform still needed?

scottmcm · 2022-08-30T17:21:45Z

I'll close this one because it does look like the match-of-match, as ? does, is now working great.

Unfortunately, however, when just matching the result directly and rebuilding, it seems to only work for some widths. The codegen test works with Result<i32, u32>, but doesn't optimize away any more for Result<i64, u64>. I've opened #101210 to track that separately, @nikic .

…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm/llvm-project#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm/llvm-project#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854

scottmcm added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-mir-opt Area: MIR optimizations F-try_trait_v2 Tracking issue for RFC#3058 labels May 10, 2021

This was referenced May 10, 2021

Implement the new desugaring from try_trait_v2 #84767

Merged

Tracking Issue for try_trait_v2, A new design for the ? desugaring (RFC#3058) #84277

Open

scottmcm added the E-help-wanted Call for participation: Help is requested to fix this issue. label May 10, 2021

scottmcm mentioned this issue May 18, 2021

Vec::pop() sometimes returns value inefficiently. #85365

Open

Moxinilian mentioned this issue May 24, 2021

MIR opt: separate constant predecessors of a switch #85646

Merged

scottmcm mentioned this issue Jun 9, 2021

Option->Result->Option used in ?-on-Option isn't a no-op #49437

Closed

scottmcm mentioned this issue Nov 27, 2021

take on bool rust-lang/rfcs#3189

Closed

scottmcm mentioned this issue Jan 19, 2022

The panic branch of Some(x).unwrap() is not removed for some types #93011

Closed

scottmcm mentioned this issue Jan 27, 2022

Extend uninhabited enum variant branch elimination to also affect fallthrough #93387

Merged

wweiandrew mentioned this issue Feb 15, 2022

[SimplifyCFG] Need to remove useless comparison after turning switch to icmp llvm/llvm-project#53853

Closed

Amanieu mentioned this issue Feb 26, 2022

[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP rust-lang/llvm-project#132

Closed

nikic self-assigned this Mar 18, 2022

nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Mar 18, 2022

scottmcm mentioned this issue Aug 17, 2022

Reduced performance when using question mark operator instead of try! #37939

Closed

nikic removed their assignment Aug 30, 2022

scottmcm changed the title ~~Wishlist: a MIR optimization to simplify the new RFC3058 ? desugaring~~ LLVM/MIR optimizations to simplify the new RFC3058 ? desugaring Aug 30, 2022

scottmcm closed this as completed Aug 30, 2022

scottmcm removed the E-help-wanted Call for participation: Help is requested to fix this issue. label Aug 30, 2022

scottmcm mentioned this issue Oct 21, 2022

Use br instead of switch in more cases. #103331

Merged

scottmcm mentioned this issue Nov 15, 2023

Change Entity::generation from u32 to NonZeroU32 for niche optimization bevyengine/bevy#9907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM/MIR optimizations to simplify the new RFC3058 `?` desugaring #85133

LLVM/MIR optimizations to simplify the new RFC3058 `?` desugaring #85133

scottmcm commented May 10, 2021

Moxinilian commented May 10, 2021

Moxinilian commented May 21, 2021 •

edited

Loading

michaelmaitland commented Jul 12, 2021

Moxinilian commented Jul 13, 2021 •

edited

Loading

BGR360 commented Aug 23, 2021

Moxinilian commented Aug 23, 2021 •

edited

Loading

Moxinilian commented Aug 23, 2021 •

edited

Loading

Moxinilian commented Aug 23, 2021

BGR360 commented Aug 24, 2021

LebedevRI commented Feb 17, 2022

nikic commented Feb 27, 2022

scottmcm commented Feb 27, 2022

scottmcm commented Feb 27, 2022

nikic commented Feb 27, 2022

nikic commented Feb 27, 2022

nikic commented Mar 18, 2022

nikic commented Aug 30, 2022

scottmcm commented Aug 30, 2022

LLVM/MIR optimizations to simplify the new RFC3058 ? desugaring #85133

LLVM/MIR optimizations to simplify the new RFC3058 ? desugaring #85133

Comments

scottmcm commented May 10, 2021

Moxinilian commented May 10, 2021

Moxinilian commented May 21, 2021 • edited Loading

michaelmaitland commented Jul 12, 2021

Moxinilian commented Jul 13, 2021 • edited Loading

BGR360 commented Aug 23, 2021

Moxinilian commented Aug 23, 2021 • edited Loading

Moxinilian commented Aug 23, 2021 • edited Loading

Moxinilian commented Aug 23, 2021

BGR360 commented Aug 24, 2021

LebedevRI commented Feb 17, 2022

nikic commented Feb 27, 2022

scottmcm commented Feb 27, 2022

scottmcm commented Feb 27, 2022

nikic commented Feb 27, 2022

nikic commented Feb 27, 2022

nikic commented Mar 18, 2022

nikic commented Aug 30, 2022

scottmcm commented Aug 30, 2022

LLVM/MIR optimizations to simplify the new RFC3058 `?` desugaring #85133

LLVM/MIR optimizations to simplify the new RFC3058 `?` desugaring #85133

Moxinilian commented May 21, 2021 •

edited

Loading

Moxinilian commented Jul 13, 2021 •

edited

Loading

Moxinilian commented Aug 23, 2021 •

edited

Loading

Moxinilian commented Aug 23, 2021 •

edited

Loading