Skip to content

[X86] Garbage in undemanded vector elements can cause fdiv performance drops #61002

@RKSimon

Description

@RKSimon

This is related to #60632

We've noticed that when dealing with partially demanded or short vectors, the values in the undemanded elements can cause performance drops in some fp instructions (most notable in divps but also sqrtps/dpps), even with DAZ/FTZ enabled. This has been noticed most on btver2 targets, but I expect there's other CPUs that can be affected in other ways.

Sometimes this appears to be values that would raise fp-exceptions (fdivzero etc. - even if they've been disabled), other times its just because the values are particularly large or poorly canonicalized - basically if the element's bits don't represent a typical float value then it seems some weaker fdiv units are likely to drop to a slower execution path.

Pulling out exact examples is proving to be tricky, but something like:

define <2 x float> @fdiv_post_shuffle(<2 x float> %a0, <2 x float> %a1) {
    %d = fdiv <2 x float> %a0, %a1
    %s = shufflevector <2 x float> %d, <2 x float> poison, <2 x i32> <i32 1, i32 0>
    ret <2 x float> %s
}
fdiv_post_shuffle:
        vdivps  %xmm1, %xmm0, %xmm0
        vpermilps       $225, %xmm0, %xmm0      # xmm0 = xmm0[1,0,2,3]
        retq

would be better if actually performed as something like:

fdiv_pre_shuffle:
        vpermilps       $17, %xmm0, %xmm0       # xmm0 = xmm0[1,0,1,0]
        vpermilps       $17, %xmm1, %xmm1       # xmm1 = xmm1[1,0,1,0]
        vdivps  %xmm1, %xmm0, %xmm0
        retq

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions