SIMD Int64x2 code fails to build in optimized mode. #3788

juj · 2015-09-18T14:44:46Z

Building Int64x2 utilizing code fails under optimization, when PNaCl ExpandI64.cpp attempts to operate on the vectors. Debug builds work ok.

As a result, the following SSE2 Int64x2 functions are currently not available in optimized builds and they are skipped in test_sse2_full:

_mm_add_epi64
_mm_mul_epu32
_mm_sub_epi64
_mm_cvtsd_si64
_mm_cvtsi128_si64
_mm_cvtsi64_sd
_mm_cvtsi64_si128
_mm_cvttsd_si64
_mm_move_epi64
_mm_sll_epi64
_mm_slli_epi64
_mm_srl_epi64
_mm_srli_epi64
_mm_storel_epi64
_mm_stream_si64
_mm_unpackhi_epi64
_mm_unpacklo_epi64

As a small test case, one can attempt to build the following with -O1 or higher:

#include <emmintrin.h>
#include <stdio.h>
#include <stdint.h>

__m128i set_64x2(int64_t a, int64_t b) { union { int64_t x[2]; __m128i m; } u; u.x[0] = a; u.x[1] = b; return u.m; }
int64_t get_64x2_lo(__m128i m) { union { int64_t x[2]; __m128i m; } u; u.m = m; return u.x[0]; }
int64_t get_64x2_hi(__m128i m) { union { int64_t x[2]; __m128i m; } u; u.m = m; return u.x[1]; }

int main()
{
    __m128i a = set_64x2(1, 2);
    __m128i b = set_64x2(3, 4);
    __m128i c = _mm_add_epi64(a, b);
    printf("%lld %lld\n", get_64x2_lo(c), get_64x2_hi(c));
}

which fails on LLVM ERROR: I->getType() == I->getOperand(0)->getType() here https://github.com/kripken/emscripten-fastcomp/blob/master/lib/Transforms/NaCl/ExpandI64.cpp#L831.

The text was updated successfully, but these errors were encountered:

kripken · 2015-09-18T20:33:56Z

We end up with %5 = bitcast i64 %a to <2 x i32>. This can be handled by emscripten-core/emscripten-fastcomp@ce4b824

However, @sunfishcode , is it normal to have vector types like this? I thought our backend reports to LLVM that it can only support certain kinds.

Btw, after that patch, the testcase then fails on extractelement <2 x i64>. Same question about that type, should we be seeing that?

juj · 2015-10-02T23:09:26Z

Checked this again on current incoming, and it is still persists.

kripken · 2015-10-03T04:06:17Z

Note that the testcase here passes with -O2, but fails on -O1.

kripken · 2015-10-03T04:07:54Z

@sunfishcode: this problem happens due to SROA isVectorPromotionViable. There doesn't seem to be a way to tell it not to do this.

It seems that if integer promotion fails, it tries vector promotion. I wonder if vector promotion is ever worth it for us? Do you know if this enables other optimizations, or if we could safely just disable vector promotion in SROA?

sunfishcode · 2015-10-06T00:33:17Z

My sense is that we can safely disable this. SIMD.js doesn't have Int64x2 anyway, so we can revisit this issue when that changes.

kripken · 2015-10-07T23:14:13Z

Fixed on incoming.

vilie · 2016-02-26T11:57:12Z

The test case presented by @juj still fails on 1.36.0.

kripken · 2016-02-29T19:10:18Z

It fails for me with -O1, but provides the hint possible hint: optimize with -O0 or -O2+, and not -O1, and -O2 does indeed fix it.

vrabaud · 2017-02-01T13:24:53Z

Hi, I get something similar with a different integer type.

Unsupported:   %516 = bitcast <8 x i16> %481 to i128
LLVM ERROR: BitCast Instruction not yet supported for integer types larger than 64 bits

Does the same section need to be patched to also handle in16 and int8 or am I misunderstanding the issue ? Thx

juj · 2017-02-01T20:01:32Z

@vrabaud: would you have a test case by any chance?

skal65535 · 2017-02-02T15:27:52Z

Hi,
Here's a repro script. I get the error:

Unsupported:   %32 = bitcast <16 x i8> %6 to i128
LLVM ERROR: BitCast Instruction not yet supported for integer types larger than 64 bits

I could not shrink the example further.

crash.txt

skal/

Note that if i move the function func() inside the main.c file, i get a slightly different error:

Unsupported:   %expanded4 = bitcast <16 x i8> <i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 0), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 0), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 1), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 1), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 2), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 2), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 3), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 3), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 4), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 4), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 5), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 5), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 6), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 6), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 7), i8 extractelement (<16 x i8> bitcast (<4 x i32> <i32 3, i32 0, i32 undef, i32 undef> to <16 x i8>), i32 7)> to i128
LLVM ERROR: BitCast Instruction not yet supported for integer types larger than 64 bits

skal65535 · 2017-02-02T15:49:14Z

Hi again,

actually, here's a single-file version that produces the same error as the original:

#include <emmintrin.h>
int main(int argc,const char** argv) {  
  __m128i ref0 = _mm_cvtsi32_si128(argv[0][0]);
  ref0 = _mm_unpacklo_epi8(ref0, ref0);
  ref0 = _mm_packus_epi16(ref0, ref0);
  return _mm_cvtsi128_si32(ref0);
}

I compile with: emcc -o main.js main.c -O3 -msse2
The error doesn't occur with -O1 or -Os, note.

hope it helps,
skal/

juj added SIMD fastcomp labels Sep 18, 2015

kripken self-assigned this Sep 18, 2015

This was referenced Oct 6, 2015

Handle bitcast of <4 x i32> to i128 emscripten-core/emscripten-fastcomp#123

Closed

Prevent LLVM's optimizer from emitting silly vector types emscripten-core/emscripten-fastcomp#124

Closed

kripken added a commit that referenced this issue Oct 6, 2015

add testcase for #3788

aac4abd

kripken closed this as completed Oct 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD Int64x2 code fails to build in optimized mode. #3788

SIMD Int64x2 code fails to build in optimized mode. #3788

juj commented Sep 18, 2015

kripken commented Sep 18, 2015

juj commented Oct 2, 2015

kripken commented Oct 3, 2015

kripken commented Oct 3, 2015

sunfishcode commented Oct 6, 2015

kripken commented Oct 7, 2015

vilie commented Feb 26, 2016

kripken commented Feb 29, 2016

vrabaud commented Feb 1, 2017

juj commented Feb 1, 2017

skal65535 commented Feb 2, 2017

skal65535 commented Feb 2, 2017

SIMD Int64x2 code fails to build in optimized mode. #3788

SIMD Int64x2 code fails to build in optimized mode. #3788

Comments

juj commented Sep 18, 2015

kripken commented Sep 18, 2015

juj commented Oct 2, 2015

kripken commented Oct 3, 2015

kripken commented Oct 3, 2015

sunfishcode commented Oct 6, 2015

kripken commented Oct 7, 2015

vilie commented Feb 26, 2016

kripken commented Feb 29, 2016

vrabaud commented Feb 1, 2017

juj commented Feb 1, 2017

skal65535 commented Feb 2, 2017

skal65535 commented Feb 2, 2017