You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// assume idx >= 1voidfoo(int idx, int16_t* arr) {
for (int i = idx; i < 93; i++)
arr[i - 1] = (int16_t)(3 * i);
}
The loop vectorize pass decides to vectorize this loop.
The vector body code seems to be correct (VF = 16). On the first iteration it stores the value %15 which is the following:
If %iv.start is 1 then %induction is just <1, 2, ..., 16> and %19 is equal to %arr. Then the array after the first vector iteration would look like this:
So this vector body is correct.
However, as we don't know the exact amount of iterations, after some 16-element iterations, we are trying vectorized 8-element loop epilogue.
And this is where the miscompile is.
Let's look at the store perfomed at the first iteration in vec.epilog.vector.body:
store <8 x i16> %27, <8 x i16>* %31, align2
%27 is the following:
%21 = trunci64%iv.starttoi32%.splatinsert8 = insertelement <8 x i32> poison, i32%21, i320%.splat9 = shufflevector <8 x i32> %.splatinsert8, <8 x i32> poison, <8 x i32> zeroinitializer%induction10 = add <8 x i32> %.splat9, <i320, i321, i322, i323, i324, i325, i326, i327>
%vec.ind11 = %induction10%25 = mul <8 x i32> %vec.ind11, <i32196608, i32196608, i32196608, i32196608, i32196608, i32196608, i32196608, i32196608>
%26 = lshrexact <8 x i32> %25, <i3216, i3216, i3216, i3216, i3216, i3216, i3216, i3216>
%27 = trunc <8 x i32> %26to <8 x i16>
%induction10, whose elements are multiplied by 3 and actually get stored, always (no matter whether we entered the 16-element loop or not) is this: <%iv.start, %iv.start + 1, ..., %iv.start + 15>. This is incorrect. After n 16-element loop iterations it must be <%iv.start * n * 16, %iv.start * n * 16 + 1, ..., %iv.start * n * 16 + 15>. The indices are computed correctly:
Here %index7 is the number of array elements processed by now by both loops.
Say we did one 16-element iterations and also one iteration in the vectorized epilogue.
The array would look like this:
opt 14.0.0 generates correct code - https://godbolt.org/z/73Wosh1jd (though it is different, it doesn't use insertelement and shufflevector, just computes values as scalars). Look at vec.epilog.vector.body block - %index9 is added to each stored value in the epilogue which is number of array elements processed by vector loop.
The text was updated successfully, but these errors were encountered:
Repro: https://godbolt.org/z/vWK3Yx8c4
This IR does something similar to:
The loop vectorize pass decides to vectorize this loop.
The vector body code seems to be correct (VF = 16). On the first iteration it stores the value
%15
which is the following:This is correct (
%13
and%14
just perform multiplication by 3 in i16).The store index is also correct:
If
%iv.start
is1
then%induction
is just<1, 2, ..., 16>
and%19
is equal to%arr
. Then the array after the first vector iteration would look like this:So this vector body is correct.
However, as we don't know the exact amount of iterations, after some 16-element iterations, we are trying vectorized 8-element loop epilogue.
And this is where the miscompile is.
Let's look at the
store
perfomed at the first iteration invec.epilog.vector.body
:%27
is the following:%induction10
, whose elements are multiplied by 3 and actually get stored, always (no matter whether we entered the 16-element loop or not) is this:<%iv.start, %iv.start + 1, ..., %iv.start + 15>
. This is incorrect. Aftern
16-element loop iterations it must be<%iv.start * n * 16, %iv.start * n * 16 + 1, ..., %iv.start * n * 16 + 15>
. The indices are computed correctly:Here
%index7
is the number of array elements processed by now by both loops.Say we did one 16-element iterations and also one iteration in the vectorized epilogue.
The array would look like this:
opt 14.0.0 generates correct code - https://godbolt.org/z/73Wosh1jd (though it is different, it doesn't use
insertelement
andshufflevector
, just computes values as scalars). Look atvec.epilog.vector.body
block -%index9
is added to each stored value in the epilogue which is number of array elements processed by vector loop.The text was updated successfully, but these errors were encountered: