Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build from source failed #299

Closed
sunnymoon155 opened this issue Feb 17, 2020 · 4 comments
Closed

build from source failed #299

sunnymoon155 opened this issue Feb 17, 2020 · 4 comments

Comments

@sunnymoon155
Copy link

Scanning dependencies of target cpuid-dump
Scanning dependencies of target gtest
Scanning dependencies of target clog
Scanning dependencies of target fbgemm_avx512
Scanning dependencies of target fbgemm_avx2
Scanning dependencies of target fbgemm_generic
Scanning dependencies of target asmjit
[ 0%] Building C object cpuinfo/deps/clog/CMakeFiles/clog.dir/src/clog.c.o
[ 0%] Building C object cpuinfo/CMakeFiles/cpuid-dump.dir/tools/cpuid-dump.c.o
[ 0%] Building CXX object CMakeFiles/fbgemm_avx512.dir/src/FbgemmBfloat16ConvertAvx512.cc.o
[ 0%] Building CXX object CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
[ 1%] Building CXX object CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
[ 1%] Building CXX object CMakeFiles/fbgemm_avx512.dir/src/FbgemmFP16UKernelsAvx512.cc.o
[ 1%] Building CXX object CMakeFiles/fbgemm_avx512.dir/src/FbgemmFP16UKernelsAvx512_256.cc.o
[ 1%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/EmbeddingSpMDMAvx2.cc.o
[ 2%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8Depthwise3DAvx2.cc.o
[ 2%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8Depthwise3x3Avx2.cc.o
[ 2%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwiseAvx2.cc.o
[ 2%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwisePerChannelQuantAvx2.cc.o
[ 3%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/OptimizedKernelsAvx2.cc.o
[ 3%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/PackDepthwiseConvMatrixAvx2.cc.o
[ 1%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmBfloat16ConvertAvx2.cc.o
[ 1%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmFloat16ConvertAvx2.cc.o
[ 3%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/UtilsAvx2.cc.o
[ 4%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/FbgemmFP16UKernelsAvx2.cc.o
[ 4%] Building CXX object googletest/googlemock/gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 4%] Building CXX object CMakeFiles/fbgemm_avx2.dir/src/QuantUtilsAvx2.cc.o
[ 4%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/arch.cpp.o
[ 4%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/assembler.cpp.o
[ 4%] Linking C executable cpuid-dump
[ 4%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/builder.cpp.o
[ 4%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/EmbeddingSpMDM.cc.o
[ 5%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/EmbeddingSpMDMNBit.cc.o
[ 5%] Linking C static library libclog.a
[ 6%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/callconv.cpp.o
[ 6%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/ExecuteKernel.cc.o
[ 6%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/codeholder.cpp.o
[ 6%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/ExecuteKernelU8S8.cc.o
[ 6%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/compiler.cpp.o
[ 6%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/Fbgemm.cc.o
[ 6%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/constpool.cpp.o
[ 7%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/cpuinfo.cpp.o
[ 8%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmBfloat16Convert.cc.o
[ 8%] Built target cpuid-dump
[ 8%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmConv.cc.o
[ 8%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/emitter.cpp.o
[ 8%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmFP16.cc.o
[ 8%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/func.cpp.o
[ 8%] Built target clog
[ 9%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmFloat16Convert.cc.o
[ 10%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/globals.cpp.o
[ 10%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmI64.cc.o
[ 10%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmI8Spmdm.cc.o
Scanning dependencies of target cpuinfo_internals
[ 10%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/inst.cpp.o
[ 10%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/jitallocator.cpp.o
Scanning dependencies of target cpuinfo
[ 10%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmSpConv.cc.o
[ 10%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/jitruntime.cpp.o
[ 11%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/logging.cpp.o
[ 11%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/init.c.o
[ 11%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/api.c.o
[ 12%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/FbgemmSpMM.cc.o
[ 13%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/init.c.o
[ 14%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/init.c.o
[ 14%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GenerateKernelU8S8S32ACC16.cc.o
[ 14%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/api.c.o
[ 14%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/osutils.cpp.o
[ 14%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/init.c.o
[ 14%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/operand.cpp.o
[ 14%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GenerateKernelU8S8S32ACC16Avx512.cc.o
[ 14%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/ralocal.cpp.o
[ 14%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/info.c.o
[ 14%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GenerateKernelU8S8S32ACC16Avx512VNNI.cc.o
[ 14%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/vendor.c.o
[ 15%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GenerateKernelU8S8S32ACC32.cc.o
[ 15%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/info.c.o
[ 15%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/uarch.c.o
[ 16%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GenerateKernelU8S8S32ACC32Avx512.cc.o
[ 16%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/rapass.cpp.o
[ 16%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/rastack.cpp.o
[ 17%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/vendor.c.o
[ 17%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/uarch.c.o
[ 16%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/string.cpp.o
[ 18%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/support.cpp.o
[ 18%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GenerateKernelU8S8S32ACC32Avx512VNNI.cc.o
[ 19%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/name.c.o
[ 19%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/name.c.o
[ 19%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/topology.c.o
[ 20%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/GroupwiseConvAcc32Avx2.cc.o
[ 20%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/isa.c.o
[ 21%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackAMatrix.cc.o
[ 21%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/cache/init.c.o
[ 22%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/topology.c.o
[ 22%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackAWithIm2Col.cc.o
[ 22%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/isa.c.o
[ 22%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/target.cpp.o
[ 22%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/type.cpp.o
[ 22%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/cache/init.c.o
[ 22%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/virtmem.cpp.o
[ 22%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackBMatrix.cc.o
[ 22%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/cache/descriptor.c.o
[ 23%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/zone.cpp.o
[ 23%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/cache/descriptor.c.o
[ 23%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/cache/deterministic.c.o
[ 22%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/linux/init.c.o
[ 23%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/zonehash.cpp.o
[ 24%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackMatrix.cc.o
[ 25%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/cache/deterministic.c.o
[ 26%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackAWithQuantRowOffset.cc.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/linux/smallfile.c.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/linux/cpuinfo.c.o
[ 26%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackAWithRowOffset.cc.o
[ 26%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackWeightMatrixForGConv.cc.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/linux/init.c.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/linux/multiline.c.o
[ 26%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/zonelist.cpp.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/x86/linux/cpuinfo.c.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/linux/smallfile.c.o
[ 26%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/linux/current.c.o
[ 26%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/QuantUtils.cc.o
[ 26%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/zonestack.cpp.o
[ 28%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/zonetree.cpp.o
[ 29%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/RefImplementations.cc.o
[ 29%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/PackWeightsForConv.cc.o
[ 29%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/linux/processors.c.o
[ 29%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/linux/multiline.c.o
[ 30%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/SparseAdagrad.cc.o
[ 30%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/zonevector.cpp.o
[ 31%] Building C object cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/linux/cpulist.c.o
[ 31%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/linux/current.c.o
[ 31%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/linux/cpulist.c.o
[ 31%] Building CXX object CMakeFiles/fbgemm_generic.dir/src/Utils.cc.o
[ 31%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86assembler.cpp.o
[ 32%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86builder.cpp.o
[ 32%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86callconv.cpp.o
[ 33%] Building C object cpuinfo/CMakeFiles/cpuinfo.dir/src/linux/processors.c.o
[ 33%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86compiler.cpp.o
[ 33%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86features.cpp.o
[ 34%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86internal.cpp.o
[ 34%] Building CXX object asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86instdb.cpp.o
/tmp/ccWuobBM.s: Assembler messages:
/tmp/ccWuobBM.s:52: Error: operand size mismatch for vbroadcastss' /tmp/ccWuobBM.s:54: Error: operand size mismatch for vbroadcastss'
make[2]: *** [CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....

@jiecaoyu
Copy link
Contributor

Hi @sunnymoon155 , could you help run make -d to print more debug information?

@XiaobingSuper
Copy link
Contributor

@jiecaoyu, GCC 5 can't support AVX512 well, it will be better to use a higher GCC version, at least >6.3, there has a test case which you can do a test:

 #include <immintrin.h>
 #include <cstdint>
 #include <cstdlib>

 #define float16 std::uint16_t

 inline void FloatToFloat16KernelAvx512WithClip(const float* src, float16* dst) {

   constexpr float FP16_MAX = 65504.f;
   __m512 neg_fp16_max_vector = _mm512_set1_ps(-FP16_MAX);
   __m512 pos_fp16_max_vector = _mm512_set1_ps(FP16_MAX);

   __m512 float_vector = _mm512_loadu_ps(src);

   // Do the clipping.
   float_vector = _mm512_max_ps(
       neg_fp16_max_vector, _mm512_min_ps(float_vector, pos_fp16_max_vector));

   __m256i half_vector = _mm512_cvtps_ph(
       float_vector, (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC));
   _mm256_storeu_si256((__m256i*)dst, half_vector);

 }
void FloatToFloat16_avx512(
     const float* src,
     float16* dst,
     int size,
     bool do_clip) {
   if (do_clip) {
     int i = 0;
     for (i = 0; i + 16 <= size; i += 16) {
       FloatToFloat16KernelAvx512WithClip(src + i, dst + i);
     }
     //FloatToFloat16_avx2(src + i, dst + i, size - i, do_clip);
   } else {
     int i = 0;
     for (i = 0; i + 16 <= size; i += 16) {
       //FloatToFloat16KernelAvx512(src + i, dst + i);
     }
     //FloatToFloat16_avx2(src + i, dst + i, size - i);
   }
 }

 int main() {
     return 0;
 }

if we use -O1, -O2 or -O3 flag for gcc 5(5.3.1 in my side), such as

g++ -O1  -mavx512f -mavx512bw -mavx512dq -mavx512vl -masm=intel -std=c++11 test.cpp

there will have a build error:

/tmp/cchHLPDt.s: Assembler messages:
/tmp/cchHLPDt.s:14: Error: operand size mismatch for `vbroadcastss'
/tmp/cchHLPDt.s:15: Error: operand size mismatch for `vbroadcastss'

but for GCC 6.3, all optimization level can works.

@jianyuh, in PyTorch side, it always use -O3 optimization flag when build fbgemm:

[1194/4048] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
/opt/rh/devtoolset-4/root/usr/bin/c++  -DFBGEMM_STATIC -DTH_BLAS_MKL -I../third_party/cpuinfo/include -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/fbgemm/include -I../third_party/fbgemm -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /home/xiaobinz/anaconda3/envs/pytorch-3.7/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG -fPIC -fvisibility=hidden   -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -masm=intel -std=c++14 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -c ../third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc
/tmp/ccAI34An.s: Assembler messages:
/tmp/ccAI34An.s:53: Error: operand size mismatch for `vbroadcastss'
/tmp/ccAI34An.s:55: Error: operand size mismatch for `vbroadcastss'

So do we need set the optimization flag to -O0 level when user use GCC 5.X or tell user that Fbgemm need higher GCC version to use AVX512 path?

@dskhudia
Copy link
Contributor

Hi @sunnymoon155 , It's an issue with gcc assembler. Please use a newer version of gcc.

For underlying reason, please check out: https://stackoverflow.com/questions/35758644/gcc4-8-3-generating-invalid-asm-from-intrinsics-operand-size-mismatch

@dskhudia
Copy link
Contributor

It works fine with gcc >= 5.4.

https://godbolt.org/z/2S9fRR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants