Missed optimization comparing TypeId arrays #84253

AngelicosPhosphoros · 2021-04-16T18:48:41Z

Edit: Look nikic example below.

I tried this code:

use std::any::TypeId;

trait CheckEqTuple{
    type OutSliceType: AsRef<[TypeId]> + AsMut<[TypeId]>;
    fn get_unsorted_ids()->Self::OutSliceType;
    fn get_quick_sorted_ids()->Self::OutSliceType{
        let mut ret = Self::get_unsorted_ids();
        ret.as_mut().sort_unstable();
        ret
    }
    fn get_slow_sorted_ids()->Self::OutSliceType{
        let mut ret = Self::get_unsorted_ids();
        ret.as_mut().sort();
        ret
    }
}

impl<A: 'static, B: 'static> CheckEqTuple for (A, B){
    type OutSliceType = [TypeId; 2];
    fn get_unsorted_ids()->Self::OutSliceType{
        [TypeId::of::<A>(), TypeId::of::<B>()]
    }
}

fn cmp_unord<A: CheckEqTuple, B: CheckEqTuple>()->bool{
    A::get_unsorted_ids().as_ref()
     == B::get_unsorted_ids().as_ref()
}

fn cmp_ord_q<A: CheckEqTuple, B: CheckEqTuple>()->bool{
    A::get_quick_sorted_ids().as_ref() 
        == B::get_quick_sorted_ids().as_ref()
}

fn cmp_ord_s<A: CheckEqTuple, B: CheckEqTuple>()->bool{
    A::get_slow_sorted_ids().as_ref()
     == B::get_slow_sorted_ids().as_ref()
}

pub fn cmp_same()->bool{
    cmp_unord::<(i32, bool), (i32, bool)>()
}

pub fn cmp_diff_order()->bool{
    cmp_unord::<(bool, i32), (i32, bool)>()
}

pub fn cmp_ord()->bool{
    cmp_ord_q::<(bool, i32), (i32, bool)>()
}

pub fn cmp_ord2()->bool{
    cmp_ord_s::<(bool, i32), (i32, bool)>()
}

I expected to see this happen:

This code should compile to return true or return false.

example::cmp_same:
        mov     al, 1
        ret

example::cmp_diff_order:
        xor     eax, eax
        ret

example::cmp_ord:
        mov     al, 1
        ret

Instead, this happened:

asm contains useless operations

example::cmp_same:
        sub     rsp, 32
        movabs  rax, -8661621401413125213
        mov     qword ptr [rsp + 8], rax
        mov     al, 1
        add     rsp, 32
        ret

example::cmp_diff_order:
        sub     rsp, 32
        movabs  rax, -5015437470765251660
        mov     qword ptr [rsp + 8], rax
        xor     eax, eax
        add     rsp, 32
        ret

example::cmp_ord:
        sub     rsp, 32
        movabs  rax, -5015437470765251660
        mov     qword ptr [rsp + 8], rax
        mov     al, 1
        add     rsp, 32
        ret

Proposed fix

If I add additional SROA flag, useless code is eliminated. I think, we need to add another SROA to pipeline somewhere before last instcombine and dead code elimination.

Here godbolt link.

The text was updated successfully, but these errors were encountered:

nikic · 2021-04-16T20:56:58Z

A simpler starting point that already doesn't optimize: https://godbolt.org/z/rvMah5b5o

AngelicosPhosphoros · 2021-04-16T21:31:36Z

@nikic Do you know if rustc provides some custom LLVM passes to pipeline? After skimming of source, it looks like it just uses default optimization levels from LLVM.

nikic · 2021-04-17T10:55:41Z

@AngelicosPhosphoros Yes, rust uses only the default optimization pipeline.

In this case, I believe the problem lies with unrolling. After some cleanup, we have this IR going into loop-unroll-full: https://gist.github.com/nikic/9516dede0acedd8d320e9d067559e667

If I understand correctly, it is not unrolled fully because we know small upper bound on the tripcount (3), but the target (X86) does not enable UpperBound unrolling. It does work with -mtriple=arm, which enables it. This was kinda surprising to me, given LLVMs usual propensity to vectorize and unroll everything beyond any reasonable bounds. There's this comment in the implementation:

  // We can unroll by the upper bound amount if it's generally allowed or if
  // we know that the loop is executed either the upper bound or zero times.
  // (MaxOrZero unrolling keeps only the first loop test, so the number of
  // loop tests remains the same compared to the non-unrolled version, whereas
  // the generic upper bound unrolling keeps all but the last loop test so the
  // number of loop tests goes up which may end up being worse on targets with
  // constrained branch predictor resources so is controlled by an option.)
  // In addition we only unroll small upper bounds.

Which makes me think that this option should probably enabled on X86, which is certainly not a system with "constrained branch predictor resources".

Though there is another important factor here: What the constant upper bound doesn't capture, is the fact that in this case, there's two loop exits. One of the exits will be replicated by unrolling. However, the other one is fully determined by unrolling. These exits will be eliminated, together with the induction variable.

This means that the loop unrolling heuristic is inconsistent: If the loop had an interior (non-exiting) branch, then it would get fully unrolled, even though that replicates the exact same number of branches. It's just a difference between whether those branches are exiting or not.

nikic · 2021-04-17T16:30:04Z

Looking a bit further, there is this code: https://github.com/llvm/llvm-project/blob/ebc6608fb79057eaed27435d62d5dea0979bd9d3/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp#L1099-L1108 For exact trip counts, LoopUnroll will actually not use the loop trip count, but rather the trip count of a single exit or the latch exit. This is really the case we want to fall into. Unfortunately, in our case the latch exit is the comparison of the loads (which has an unknown exit count) rather than the comparison of the induction variant (which is known).

There's two ways that could be addressed: The LoopUnroll logic could be more general about which exit count it uses. Or LoopRotate could be adjusted to perform a rotation in this case despite already having an exiting latch (there is already a heuristic for that, but it does not account for this case).

nikic · 2021-06-26T20:49:41Z

This issue should be addressed by https://reviews.llvm.org/D102982 (plus about a dozen preparatory patches to make loop unrolling robust against multiple exits).

nikic · 2021-08-22T13:40:40Z

This is fixed by #87570. New IR folds to true/false:

@_ZN5test28cmp_same17hf172c8bcc9b7859cE = unnamed_addr alias i1 (), i1 ()* @_ZN5test27cmp_ord17ha9ee7b72078df8e5E
@_ZN5test28cmp_ord217h23f0a5da1c84ecaaE = unnamed_addr alias i1 (), i1 ()* @_ZN5test27cmp_ord17ha9ee7b72078df8e5E

; test2::cmp_diff_order
; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind readnone uwtable willreturn
define zeroext i1 @_ZN5test214cmp_diff_order17hd0c16acee48eb39bE() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  ret i1 false
}

; test2::cmp_ord
; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind readnone uwtable willreturn
define zeroext i1 @_ZN5test27cmp_ord17ha9ee7b72078df8e5E() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  ret i1 true
}

AngelicosPhosphoros added the C-bug Category: This is a bug. label Apr 16, 2021

AngelicosPhosphoros changed the title ~~Missed optimization with tuple equality~~ Missed optimization with tuple type equality Apr 16, 2021

AngelicosPhosphoros changed the title ~~Missed optimization with tuple type equality~~ Missed optimization comparing TypeId arrays Apr 17, 2021

nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 17, 2021

nikic self-assigned this Jun 26, 2021

nikic closed this as completed Aug 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed optimization comparing TypeId arrays #84253

Missed optimization comparing TypeId arrays #84253

AngelicosPhosphoros commented Apr 16, 2021 •

edited

Loading

nikic commented Apr 16, 2021

AngelicosPhosphoros commented Apr 16, 2021

nikic commented Apr 17, 2021

nikic commented Apr 17, 2021

nikic commented Jun 26, 2021

nikic commented Aug 22, 2021

Missed optimization comparing TypeId arrays #84253

Missed optimization comparing TypeId arrays #84253

Comments

AngelicosPhosphoros commented Apr 16, 2021 • edited Loading

Meta

Proposed fix

nikic commented Apr 16, 2021

AngelicosPhosphoros commented Apr 16, 2021

nikic commented Apr 17, 2021

nikic commented Apr 17, 2021

nikic commented Jun 26, 2021

nikic commented Aug 22, 2021

AngelicosPhosphoros commented Apr 16, 2021 •

edited

Loading