Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mir] or-patterns duplicate the same branch #75439

Open
tesuji opened this issue Aug 12, 2020 · 13 comments
Open

[mir] or-patterns duplicate the same branch #75439

tesuji opened this issue Aug 12, 2020 · 13 comments
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations A-patterns Relating to patterns and pattern matching C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such F-or_patterns `#![feature(or_patterns)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tesuji
Copy link
Contributor

tesuji commented Aug 12, 2020

Consider this snippet:

#![feature(const_fn_transmute)]
#![feature(or_patterns)]

use std::mem::transmute;

pub const fn foo3(bytes: [u8; 16]) -> Option<[u8; 4]> {
    // big endian `u32`s
    let dwords: [u32; 4] = unsafe { transmute(bytes) };
    const FF: u32 = 0x0000_ffff_u32.to_be();
    if let [0, 0, 0 | FF, ip] = dwords {
        Some(unsafe { transmute(ip) })
    } else {
        None
    }
}

The or-pattern 0 | FF has this MIR:

    bb3: {
        switchInt(_2[2 of 4]) -> [0_u32: bb6, 4294901760_u32: bb7, otherwise: bb4]; // scope 1 at src/lib.rs:10:19: 10:20
    }

While bb6 and bb7 points to the same branch:

    bb6: {
        StorageLive(_4);                 // scope 1 at src/lib.rs:10:27: 10:29
        _4 = _2[3 of 4];                 // scope 1 at src/lib.rs:10:27: 10:29
        goto -> bb5;                     // scope 1 at src/lib.rs:10:5: 14:6
    }

    bb7: {
        StorageLive(_4);                 // scope 1 at src/lib.rs:10:27: 10:29
        _4 = _2[3 of 4];                 // scope 1 at src/lib.rs:10:27: 10:29
        goto -> bb5;                     // scope 1 at src/lib.rs:10:5: 14:6
    }

I would expect the or-pattern points the same branch.

@tesuji

This comment has been minimized.

@rustbot rustbot added A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-patterns Relating to patterns and pattern matching F-or_patterns `#![feature(or_patterns)]` labels Aug 12, 2020
@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 12, 2020
@JulianKnodt
Copy link
Contributor

Was this resolved by #75537? I believe that this would actually match on it, just because all the statements are the same

@tesuji
Copy link
Contributor Author

tesuji commented Aug 16, 2020

I've just checked. The MIR isn't improved with #75537 .

@JulianKnodt
Copy link
Contributor

Ah woops yea I can't count, this has 3 branches not 2, which is the only permitted number of branches being optimized currently.

@wesleywiser wesleywiser added the A-mir-opt Area: MIR optimizations label Aug 18, 2020
matklad added a commit to matklad/rust that referenced this issue Sep 4, 2020
Add test for checking duplicated branch or-patterns

This adds a regression test for checking `or-patterns` in MIR as shown in rust-lang#75439.
This doesn't introduce a fix as I'm not sure where it would go(I suspect maybe here: src/librustc_mir_build/build/matches/mod.rs), and I'm not particularly able to fix it.

cc: @lzutao
matklad added a commit to matklad/rust that referenced this issue Sep 4, 2020
Add test for checking duplicated branch or-patterns

This adds a regression test for checking `or-patterns` in MIR as shown in rust-lang#75439.
This doesn't introduce a fix as I'm not sure where it would go(I suspect maybe here: src/librustc_mir_build/build/matches/mod.rs), and I'm not particularly able to fix it.

cc: @lzutao
matklad added a commit to matklad/rust that referenced this issue Sep 4, 2020
Add test for checking duplicated branch or-patterns

This adds a regression test for checking `or-patterns` in MIR as shown in rust-lang#75439.
This doesn't introduce a fix as I'm not sure where it would go(I suspect maybe here: src/librustc_mir_build/build/matches/mod.rs), and I'm not particularly able to fix it.

cc: @lzutao
matklad added a commit to matklad/rust that referenced this issue Sep 4, 2020
Add test for checking duplicated branch or-patterns

This adds a regression test for checking `or-patterns` in MIR as shown in rust-lang#75439.
This doesn't introduce a fix as I'm not sure where it would go(I suspect maybe here: src/librustc_mir_build/build/matches/mod.rs), and I'm not particularly able to fix it.

cc: @lzutao
RalfJung added a commit to RalfJung/rust that referenced this issue Sep 19, 2020
Add test for checking duplicated branch or-patterns

This adds a regression test for checking `or-patterns` in MIR as shown in rust-lang#75439.
This doesn't introduce a fix as I'm not sure where it would go(I suspect maybe here: src/librustc_mir_build/build/matches/mod.rs), and I'm not particularly able to fix it.

cc: @lzutao
RalfJung added a commit to RalfJung/rust that referenced this issue Sep 19, 2020
Add test for checking duplicated branch or-patterns

This adds a regression test for checking `or-patterns` in MIR as shown in rust-lang#75439.
This doesn't introduce a fix as I'm not sure where it would go(I suspect maybe here: src/librustc_mir_build/build/matches/mod.rs), and I'm not particularly able to fix it.

cc: @lzutao
@mark-i-m
Copy link
Member

mark-i-m commented Nov 7, 2020

This might be a suitable way to solve the same problem, but I'm not really sure:

fn merge_trivial_subcandidates(

i.e. combine them when building the MIR in the first place.

@osa1
Copy link
Contributor

osa1 commented Jan 24, 2021

I was able to repro this today (85e355e).

@osa1
Copy link
Contributor

osa1 commented Feb 14, 2021

@rustbot claim

I'm not sure if I'll be able to fix this, but I spent some time this weekend studying lowering to MIR and tried to understand the issue here.

First of all, the inefficiency here exists even in the final LLVM (with -O), so I think this may worth fixing. LLVM for the original repro:

issue75439 $ rustc +stage1_2 test.rs --crate-type=rlib --emit=llvm-ir -O
issue75439 $ cat test.ll
; ModuleID = 'test.3a1fbbbh-cgu.0'
source_filename = "test.3a1fbbbh-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; test::foo3
; Function Attrs: norecurse nounwind nonlazybind readnone uwtable
define i40 @_ZN4test4foo317h93f9bb6eba5df219E(i128 %0) unnamed_addr #0 {
start:
  %1 = and i128 %0, 18446744073709551615
  %2 = icmp eq i128 %1, 0
  br i1 %2, label %bb3, label %bb9

bb3:                                              ; preds = %start
  %_3.sroa.5.0.extract.shift = lshr i128 %0, 64
  %_3.sroa.5.0.extract.trunc = trunc i128 %_3.sroa.5.0.extract.shift to i32
  switch i32 %_3.sroa.5.0.extract.trunc, label %bb9 [
    i32 0, label %bb6
    i32 -65536, label %bb7
  ]

bb6:                                              ; preds = %bb3
  br label %bb9

bb7:                                              ; preds = %bb3
  br label %bb9

bb9:                                              ; preds = %bb7, %bb6, %start, %bb3
  %.sroa.0.0 = phi i40 [ 0, %bb3 ], [ 0, %start ], [ 1, %bb6 ], [ 1, %bb7 ]
  %3 = lshr i128 %0, 88
  %4 = trunc i128 %3 to i40
  %.sroa.3.0.insert.shift = and i40 %4, -256
  %.sroa.0.0.insert.insert = or i40 %.sroa.0.0, %.sroa.3.0.insert.shift
  ret i40 %.sroa.0.0.insert.insert
}

attributes #0 = { norecurse nounwind nonlazybind readnone uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}

bb6 and bb7 are the targets of the or pattern and they're duplicates. I'm not sure why LLVM can't eliminate those.

Now, I looked at why these duplicate blocks are generated, in MIR. I believe the reason is the same as the reason why we generate duplicate blocks in this simpler program:

pub fn test(dwords: [u32; 2]) {
    match dwords {
        [0 | 1, _a] => (),
        _ => (),
    }
}

LLVM is able to optimize the duplicate blocks in this program, but in MIR we still have duplication very similar to the original repro:

fn test(_1: [u32; 2]) -> () {
    debug dwords => _1;
    let mut _0: ();
    let _2: u32;
    scope 1 {
        debug _a => _2;
    }

    bb0: {
        switchInt(_1[0 of 2]) -> [0_u32: bb2, 1_u32: bb3, otherwise: bb4];
    }

    bb1: {
        StorageDead(_2);
        goto -> bb4;
    }

    bb2: {
        StorageLive(_2);
        _2 = _1[1 of 2];
        goto -> bb1;
    }

    bb3: {
        StorageLive(_2);
        _2 = _1[1 of 2];
        goto -> bb1;
    }

    bb4: {
        return;
    }
}

Namely bb2 and bb3.

I think if we could generate better MIR for this simpler program that would also fix the problem with the original repro.

Here's the original MIR for this program (before any passes):

fn test(_1: [u32; 2]) -> () {
    debug dwords => _1;
    let mut _0: ();
    let _2: u32;
    scope 1 {
        debug _a => _2;
    }

    bb0: {
        FakeRead(ForMatchedPlace, _1);
        switchInt(_1[0 of 2]) -> [0_u32: bb1, 1_u32: bb2, otherwise: bb3];
    }

    bb1: {
        falseEdge -> [real: bb5, imaginary: bb2];
    }

    bb2: {
        falseEdge -> [real: bb6, imaginary: bb3];
    }

    bb3: {
        _0 = ();
        goto -> bb7;
    }

    bb4: {
        _0 = ();
        StorageDead(_2);
        goto -> bb7;
    }

    bb5: {
        StorageLive(_2);
        _2 = _1[1 of 2];
        goto -> bb4;
    }

    bb6: {
        StorageLive(_2);
        _2 = _1[1 of 2];
        goto -> bb4;
    }

    bb7: {
        return;
    }
}

The reason for this duplication is: originally for this program, in the first arm, we have a candidate with two "match pairs": the or pattern, and the binding (_a). We simplify this (simplify_candidate) to a candidate with a binding (_a) and two subcandidates, for the branches of the or pattern.

When generating MIR for a subcandidate, we first generate a block just to be able to chain blocks of arms for borrow checking (using "false eges"). This is why we create bb5 when generating bb1, and bb6 when generating bb2.

Then in those new blocks (bb5 and bb6) we introduce the bindings of the parent candidate, with some extra stuff for ascriptions and guards. This generates the duplicate code

        StorageLive(_2);
        _2 = _1[1 of 2];
        goto -> ...;

for all alternatives of the or pattern.

(The intermediate blocks bb1 and bb2 are removed after SimplifyBranches and the next SimplifyCfg)

I think it's possible to generate better code here when the subcandidates (transitively) don't have any bindings, ascriptions, and guards (I don't understand the guards in details yet so I don't know if there are room for optimization also). In those cases, before generating MIR for the subcandidates, we create a "target" block where we will generate the binding code. Then create branches for each subcandidate as we do today (with the false edges), but these subcandidates jump to the "target" block instead of to fresh blocks. The target block will then have the code for the binding in the parent candidate.

In other words we generate one block for the parent candidate's bindings instead of introducing the bindings it in each subcandidate.

I actually have an implementation of this, but it causes linker errors when compiling libraries. I'll submit a PR once it's working.

One problem is the idea is the check "do the subcandidates have any bindings, ascriptions, or guards" requires traversing the entire candidate tree. I don't know if we can implement this more efficiently somehow.

The MIR (before passes) with this implementation would look like:

fn test(_1: [u32; 2]) -> () {
    debug dwords => _1;
    let mut _0: ();
    let _2: u32;
    scope 1 {
        debug _a => _2;
    }

    bb0: {
        FakeRead(ForMatchedPlace, _1);
        switchInt(_1[0 of 2]) -> [0_u32: bb1, 1_u32: bb2, otherwise: bb3];
    }

    bb1: {
        falseEdge -> [real: bb5, imaginary: bb2];
    }

    bb2: {
        falseEdge -> [real: bb5, imaginary: bb3];
    }

    bb3: {
        _0 = ();
        goto -> bb6;
    }

    bb4: {
        _0 = ();
        StorageDead(_2);
        goto -> bb6;
    }

    bb5: {
        StorageLive(_2);
        _2 = _1[1 of 2];
        goto -> bb4;
    }

    bb6: {
        return;
    }
}

Because we no longer have duplicate blocks for or pattern alternatives the issue is fixed.

@mark-i-m
Copy link
Member

It looks like you already figured out most of this, but here some notes I wrote from the last time I went through some of that code: rust-lang/rustc-dev-guide#943

@osa1
Copy link
Contributor

osa1 commented Feb 17, 2021

Thanks @mark-i-m, that's very helpful.

Update: I figured this out and have a working prototype. On the way I was also able to come up with examples where the duplication is much larger then the original repro or my simpler repro above, which my code fixes. I'm hoping to submit a PR this weekend.

@osa1
Copy link
Contributor

osa1 commented Feb 20, 2021

Update: my patch works mostly fine. There are two failures in the test suite, both are caused by disappeared StorageDead instructions after merging the duplicate blocks. I don't make any changes related to drop scheduling so I'm not sure what the reason is. I asked a question about this on Zulip. Once that's fixed it will be passing the test suite and I'll submit a PR.

In the meantime, if anyone knows about the question I linked above, that would be very helpful.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@Nadrieril
Copy link
Member

One problem is the idea is the check "do the subcandidates have any bindings, ascriptions, or guards" requires traversing the entire candidate tree.

Isn't that already what we do in merge_trivial_subcandidates though? My surface understanding is that merge_trivial_subcandidates should solve exactly this, but clearly it doesn't :/

@osa1 osa1 removed their assignment Feb 7, 2024
@JulianKnodt
Copy link
Contributor

JulianKnodt commented Feb 7, 2024

since this is reviving a long dead old post, you probably want to double check this behavior is still happening, and close this issue if not.

Even if it's not still happening, it might be good to close the issue and start a new one so it's more easily tracked (and doesn't ping people who aren't working on it actively anymore)

@Nadrieril
Copy link
Member

I did check, it's still duplicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations A-patterns Relating to patterns and pattern matching C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such F-or_patterns `#![feature(or_patterns)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants