Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGILL from libtorch_cpu.so at import with a CPU without AVX #37577

Closed
jasaarim opened this issue Apr 30, 2020 · 11 comments
Closed

SIGILL from libtorch_cpu.so at import with a CPU without AVX #37577

jasaarim opened this issue Apr 30, 2020 · 11 comments
Assignees
Labels
high priority triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jasaarim
Copy link

jasaarim commented Apr 30, 2020

馃悰 Bug

I'm unable to import PyTorch due to a SIGILL presumably related to the lack of AVX in my CPU. This happens with the nightly build and also building from source.

To Reproduce

gdb -ex r --args python -c "import torch" gives

Program received signal SIGILL, Illegal instruction.
0x00007fffe9076f17 in _GLOBAL__sub_I_BinaryOpsKernel.cpp.AVX2.cpp ()
   from /home/j/src/pytorch/torch/lib/libtorch_cpu.so

And a backtrace

#0  0x00007fffe9076f17 in _GLOBAL__sub_I_BinaryOpsKernel.cpp.AVX2.cpp () from /home/j/src/pytorch/torch/lib/libtorch_cpu.so
#1  0x00007ffff7de5733 in call_init (env=0x7fffffffde78, argv=0x7fffffffde58, argc=3, l=<optimized out>) at dl-init.c:72
#2  _dl_init (main_map=main_map@entry=0x555555a015a0, argc=3, argv=0x7fffffffde58, env=0x7fffffffde78) at dl-init.c:119
#3  0x00007ffff7dea1ff in dl_open_worker (a=a@entry=0x7fffffffb970) at dl-open.c:522
#4  0x00007ffff792c2df in __GI__dl_catch_exception (exception=0x7fffffffb950, operate=0x7ffff7de9dc0 <dl_open_worker>, args=0x7fffffffb970) at dl-error-skeleton.c:196
#5  0x00007ffff7de97ca in _dl_open (file=0x7ffff600b2f0 "/home/j/src/pytorch/torch/_C.cpython-38-x86_64-linux-gnu.so", mode=-2147483646, 
    caller_dlopen=0x555555795178 <_PyImport_FindSharedFuncptr+136>, nsid=<optimized out>, argc=3, argv=<optimized out>, env=0x7fffffffde78) at dl-open.c:605
#6  0x00007ffff75c1f96 in dlopen_doit (a=a@entry=0x7fffffffbba0) at dlopen.c:66
#7  0x00007ffff792c2df in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffbb40, operate=0x7ffff75c1f40 <dlopen_doit>, args=0x7fffffffbba0)
    at dl-error-skeleton.c:196
#8  0x00007ffff792c36f in __GI__dl_catch_error (objname=0x55555596fd50, errstring=0x55555596fd58, mallocedp=0x55555596fd48, operate=<optimized out>, 
    args=<optimized out>) at dl-error-skeleton.c:215
#9  0x00007ffff75c2735 in _dlerror_run (operate=operate@entry=0x7ffff75c1f40 <dlopen_doit>, args=args@entry=0x7fffffffbba0) at dlerror.c:162
#10 0x00007ffff75c2051 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#11 0x0000555555795178 in _PyImport_FindSharedFuncptr () at /tmp/build/80754af9/python_1585235154784/work/Python/dynload_shlib.c:99
#12 0x00005555557abb86 in _PyImport_LoadDynamicModuleWithSpec () at /tmp/build/80754af9/python_1585235154784/work/Python/importdl.c:134
#13 0x00005555557acca2 in _imp_create_dynamic_impl.isra.18 (file=0x0, spec=0x7ffff5ff0fa0) at /tmp/build/80754af9/python_1585235154784/work/Python/import.c:2220
#14 _imp_create_dynamic () at /tmp/build/80754af9/python_1585235154784/work/Python/clinic/import.c.h:330
#15 0x000055555569f0cd in cfunction_vectorcall_FASTCALL () at /tmp/build/80754af9/python_1585235154784/work/Objects/methodobject.c:421
#16 0x000055555569afff in PyVectorcall_Call () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:199
#17 0x000055555573eaf6 in do_call_core (kwdict=0x7ffff5f7e540, callargs=0x7ffff5fe5ac0, func=0x7ffff7f85540, tstate=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:5007
#18 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3559
#19 0x00005555556e56bc in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4298
#20 0x00005555556e67be in _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:435
#21 0x000055555573ddb1 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff62f21d0, callable=0x7ffff7f844c0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#22 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#23 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3469
#24 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#25 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#26 0x000055555573943a in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7fb5a30, callable=0x7ffff7f2e790)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#27 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#28 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3486
#29 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#30 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#31 0x0000555555739680 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff62f1f50, callable=0x7ffff7f84e50)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#32 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#33 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3500
#34 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#35 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#36 0x0000555555739680 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff6274df0, callable=0x7ffff7f860d0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#37 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#38 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3500
#39 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#40 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#41 0x0000555555739680 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff62f17a8, callable=0x7ffff7f89310)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#42 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#43 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3500
#44 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#45 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#46 0x000055555569bc54 in _PyObject_Vectorcall (kwnames=0x0, nargsf=2, args=0x7fffffffc990, callable=0x7ffff7f893a0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#47 _PyObject_FastCall () at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:147
#48 object_vacall () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:1186
#49 0x00005555556d8787 in _PyObject_CallMethodIdObjArgs () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:1244
#50 0x0000555555689c8c in import_find_and_load (abs_name=0x7ffff7e50b30) at /tmp/build/80754af9/python_1585235154784/work/Python/import.c:1697
#51 PyImport_ImportModuleLevelObject () at /tmp/build/80754af9/python_1585235154784/work/Python/import.c:1797
#52 0x000055555573c3e2 in import_name (level=0x5555558fad20 <small_ints+160>, fromlist=0x7ffff7ec4ca0, name=0x7ffff7e50b30, f=0x5555559556e0, tstate=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:5163
#53 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:2993
#54 0x00005555556e56bc in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4298
#55 0x00005555556e6564 in PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4327
#56 0x00005555556e658c in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:718
#57 0x0000555555757276 in builtin_exec_impl.isra.14 (locals=0x7ffff7ea5440, globals=0x7ffff7ea5440, source=0x7ffff7e51500)
    at /tmp/build/80754af9/python_1585235154784/work/Python/bltinmodule.c:1033
#58 builtin_exec () at /tmp/build/80754af9/python_1585235154784/work/Python/clinic/bltinmodule.c.h:396
#59 0x000055555569f0cd in cfunction_vectorcall_FASTCALL () at /tmp/build/80754af9/python_1585235154784/work/Objects/methodobject.c:421
#60 0x000055555569afff in PyVectorcall_Call () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:199
#61 0x000055555573eaf6 in do_call_core (kwdict=0x7ffff7e41d40, callargs=0x7ffff7e52300, func=0x7ffff7f6b590, tstate=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:5007
#62 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3559
#63 0x00005555556e56bc in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4298
#64 0x00005555556e67be in _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:435
#65 0x000055555573ddb1 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7ec2550, callable=0x7ffff7f844c0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#66 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#67 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3469
#68 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#69 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#70 0x000055555573943a in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7eae790, callable=0x7ffff7f2a5e0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#71 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#72 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3486
#73 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#74 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#75 0x0000555555739680 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7e375f0, callable=0x7ffff7f860d0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#76 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#77 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3500
#78 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#79 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#80 0x0000555555739680 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7eae1d8, callable=0x7ffff7f89310)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#81 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555901b40)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
#82 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3500
#83 0x00005555556e669b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
#84 _PyFunction_Vectorcall.localalias.353 () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
#85 0x000055555569bc54 in _PyObject_Vectorcall (kwnames=0x0, nargsf=2, args=0x7fffffffd730, callable=0x7ffff7f893a0)
    at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
#86 _PyObject_FastCall () at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:147
#87 object_vacall () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:1186
#88 0x00005555556d8787 in _PyObject_CallMethodIdObjArgs () at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:1244
#89 0x0000555555689c8c in import_find_and_load (abs_name=0x7ffff7e41bf0) at /tmp/build/80754af9/python_1585235154784/work/Python/import.c:1697
#90 PyImport_ImportModuleLevelObject () at /tmp/build/80754af9/python_1585235154784/work/Python/import.c:1797
#91 0x000055555573c3e2 in import_name (level=0x5555558fad20 <small_ints+160>, fromlist=0x5555558da360 <_Py_NoneStruct>, name=0x7ffff7e41bf0, f=0x7ffff7ede800, 
    tstate=<optimized out>) at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:5163
#92 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:2993
#93 0x00005555556e56bc in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4298
#94 0x00005555556e6564 in PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4327
#95 0x00005555556e658c in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:718
#96 0x00005555557873a4 in run_eval_code_obj () at /tmp/build/80754af9/python_1585235154784/work/Python/pythonrun.c:1125
#97 0x00005555557b3a64 in run_mod () at /tmp/build/80754af9/python_1585235154784/work/Python/pythonrun.c:1147
#98 0x00005555557b6cdd in PyRun_StringFlags () at /tmp/build/80754af9/python_1585235154784/work/Python/pythonrun.c:1034
#99 0x00005555557b6d3f in PyRun_SimpleStringFlags () at /tmp/build/80754af9/python_1585235154784/work/Python/pythonrun.c:460
#100 0x00005555557b6ed5 in pymain_run_command (cf=0x7fffffffdc58, command=<optimized out>) at /tmp/build/80754af9/python_1585235154784/work/Modules/main.c:264
#101 pymain_run_python (exitcode=0x7fffffffdc50) at /tmp/build/80754af9/python_1585235154784/work/Modules/main.c:562
#102 Py_RunMain () at /tmp/build/80754af9/python_1585235154784/work/Modules/main.c:650
#103 0x00005555557b72d9 in Py_BytesMain () at /tmp/build/80754af9/python_1585235154784/work/Modules/main.c:1082
#104 0x00007ffff77e6b97 in __libc_start_main (main=0x555555666ae0 <main>, argc=3, argv=0x7fffffffde58, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffde48) at ../csu/libc-start.c:310
#105 0x0000555555757493 in _start () at ../sysdeps/x86_64/elf/start.S:103

Expected behavior

No error

Environment

  • PyTorch Version: 1.6.0.dev20200429
  • OS: Linux
  • How you installed PyTorch: conda and source
  • Build command you used: python setup.py develop
  • Python version: 3.8.2
  • CUDA/cuDNN version: No CUDA
  • GPU models and configuration: No GPU
  • Any other relevant information: CPU AMD A8-3500M

Additional context

I have also tried to use march=native in building and the environment variable ATEN_CPU_CAPABILITY=default in building and running.

cc @ezyang @gchanan @zou3519

@colesbury
Copy link
Member

Thanks for the bug report. Can you try running disas in gdb to show the instructions before/after the one that caused the SIGILL? (Preferably from the nightly build).

@colesbury
Copy link
Member

Never mind -- I see the problem. Seems like a recent change because it's not present in slightly older versions.

000000000053ebc0 <_GLOBAL__sub_I_BinaryOpsKernel.cpp.AVX2.cpp>:
  53ebc0:       55                      push   %rbp
  53ebc1:       48 8d 3d e0 6d 42 04    lea    0x4426de0(%rip),%rdi        # 49659a8 <_ZStL8__ioinit>
  53ebc8:       48 89 e5                mov    %rsp,%rbp
  53ebcb:       41 52                   push   %r10
  53ebcd:       48 83 ec 08             sub    $0x8,%rsp
  53ebd1:       e8 ea ac fd ff          callq  5198c0 <_ZNSt8ios_base4InitC1Ev@plt>
  53ebd6:       48 8b 3d 8b 33 40 04    mov    0x440338b(%rip),%rdi        # 4941f68 <_ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4>
  53ebdd:       48 8d 15 bc 93 41 04    lea    0x44193bc(%rip),%rdx        # 4957fa0 <__dso_handle>
  53ebe4:       48 8d 35 bd 6d 42 04    lea    0x4426dbd(%rip),%rsi        # 49659a8 <_ZStL8__ioinit>
  53ebeb:       e8 60 42 fc ff          callq  502e50 <__cxa_atexit@plt>
  53ebf0:       c5 fd 6f 15 28 9f 71    vmovdqa 0x3719f28(%rip),%ymm2        # 3c58b20 <_ZZN3c104impl19boxAndCallBoxedFuncIlJRKN2at6TensorEEEENSt9enable_ifIXaasrNS_4guts8negationINS7_11disjunctionIJSt19is_lvalue_referenceIT_ENS8_INS9_IJSt16is_constructibleINS_6IValueEJSB_EESt7is_sameINS_13TensorOptionsESB_ESG_IvSB_EEEEEESG_INS_8ArrayRefIlEESB_EDpNS8_INS9_IJSD_ISE_JNSt5decayIT0_E4typeEEESG_ISH_SS_ESG_IvSS_EEEEEEEEEEE5valuentsrSJ_5valueESB_E4typeEPFvPNS_14OperatorKernelERKNS_14OperatorHandleEPSt6vectorISE_SaISE_EEES14_S17_DpSQ_E8__func__+0x7d0>
  53ebf7:       03
  53ebf8:       48 c7 05 9d 6d 42 04    movq   $0x600000,0x4426d9d(%rip)        # 49659a0 <_ZN3c104implL15always_includedE>
  53ebff:       00 00 60 00
  53ec03:       c5 fd 6f 05 75 ee 71    vmovdqa 0x371ee75(%rip),%ymm0        # 3c5da80 <_ZL8_ps256_1+0x11a0>
  53ec0a:       03
  53ec0b:       c5 fd 7f 15 0d 6d 42    vmovdqa %ymm2,0x4426d0d(%rip)        # 4965920 <_ZN2at6vec25612_GLOBAL__N_16Vec256IlE4onesE>
  53ec12:       04
  53ec13:       c5 fd 6f 15 05 b3 71    vmovdqa 0x371b305(%rip),%ymm2        # 3c59f20 <_ZZZN2at6native12_GLOBAL__N_122unfolded2d_copy_kernelERNS_6TensorES3_lllllllllllENKUlvE_clEvE8__func__+0x48>
  53ec1a:       03
  53ec1b:       c5 fc 28 0d 9d 9e 71    vmovaps 0x3719e9d(%rip),%ymm1        # 3c58ac0 <_ZZN3c104impl19boxAndCallBoxedFuncIlJRKN2at6TensorEEEENSt9enable_ifIXaasrNS_4guts8negationINS7_11disjunctionIJSt19is_lvalue_referenceIT_ENS8_INS9_IJSt16is_constructibleINS_6IValueEJSB_EESt7is_sameINS_13TensorOptionsESB_ESG_IvSB_EEEEEESG_INS_8ArrayRefIlEESB_EDpNS8_INS9_IJSD_ISE_JNSt5decayIT0_E4typeEEESG_ISH_SS_ESG_IvSS_EEEEEEEEEEE5valuentsrSJ_5valueESB_E4typeEPFvPNS_14OperatorKernelERKNS_14OperatorHandleEPSt6vectorISE_SaISE_EEES14_S17_DpSQ_E8__func__+0x770>
  53ec22:       03
  53ec23:       c5 fd 7f 05 35 6d 42    vmovdqa %ymm0,0x4426d35(%rip)        # 4965960 <_ZN2at6vec25612_GLOBAL__N_16Vec256IN3c108BFloat16EE4onesE>
  53ec2a:       04
  53ec2b:       c5 fd 7f 15 cd 6c 42    vmovdqa %ymm2,0x4426ccd(%rip)        # 4965900 <_ZN2at6vec25612_GLOBAL__N_16Vec256IiE4onesE>
  53ec32:       04
  53ec33:       c5 fd 28 05 a5 ec 71    vmovapd 0x371eca5(%rip),%ymm0        # 3c5d8e0 <_ZL8_ps256_1+0x1000>
  53ec3a:       03
  53ec3b:       c5 fd 6f 15 5d e0 62    vmovdqa 0x362e05d(%rip),%ymm2        # 3b6cca0 <_ZN6caffe212_GLOBAL__N_1L11ld_st_masksE+0x160>
  53ec42:       03
  53ec43:       c5 fc 29 0d 35 6d 42    vmovaps %ymm1,0x4426d35(%rip)        # 4965980 <_ZN2at6vec25612_GLOBAL__N_16Vec256IfE4onesE>
  53ec4a:       04
  53ec4b:       c5 fd 29 05 ed 6c 42    vmovapd %ymm0,0x4426ced(%rip)        # 4965940 <_ZN2at6vec25612_GLOBAL__N_16Vec256IdE4onesE>
  53ec52:       04
  53ec53:       c5 fd 7f 15 85 6c 42    vmovdqa %ymm2,0x4426c85(%rip)        # 49658e0 <_ZN2at6vec25612_GLOBAL__N_16Vec256IsE4onesE>
  53ec5a:       04
  53ec5b:       c5 fc 29 0d 5d 6c 42    vmovaps %ymm1,0x4426c5d(%rip)        # 49658c0 <_ZN2at6vec25612_GLOBAL__N_16Vec256ISt7complexIfEE4onesE>
  53ec62:       04
  53ec63:       c5 fd 29 05 35 6c 42    vmovapd %ymm0,0x4426c35(%rip)        # 49658a0 <_ZN2at6vec25612_GLOBAL__N_16Vec256ISt7complexIdEE4onesE>
  53ec6a:       04
  53ec6b:       c5 f8 77                vzeroupper
  53ec6e:       48 83 c4 08             add    $0x8,%rsp
  53ec72:       41 5a                   pop    %r10
  53ec74:       5d                      pop    %rbp
  53ec75:       c3                      retq
  53ec76:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)

@jasaarim
Copy link
Author

Yes, that's about the same assembler code I get. I don't get this with v1.5.0, but I get another error with mkldnn so I was trying to see if it's already fixed.

@colesbury
Copy link
Member

@ezyang these lines look suspicious to me:

static const Vec256<float> ones;

const Vec256<float> Vec256<float>::ones(1.0f);

and similar in other files (e.g. complex).

We should stop accepting patches that declare data in Vec256 files. There isn't a good reason for these "1" constants and they cause trouble -- use Vec256<float>(1.0f) instead.

@colesbury
Copy link
Member

Also cc @xuhdev in case you want to take a look at it. To get the assembly I ran objdump -d ./build/lib/libtorch_cpu.so | less and then typed / for search and entered _GLOBAL__sub_I_BinaryOpsKernel.cpp.AVX2.cpp to find the function.

@malfet
Copy link
Contributor

malfet commented May 1, 2020

@jasaarim can you please try to run ATEN_CPU_CAPABILITY=default python -c "import torch" and tell us whether this will make SIGILL go away

@colesbury
Copy link
Member

@malfet the bug triggers before ATEN_CPU_CAPABILITY is parsed. You can tell from the stacktrace that it occurs when the shared library is loaded and initializes global static data. The initialization uses AVX instructions because that object file is compiled with AVX enabled.

We need to avoid non-trivial global data in these architecture specific files because there's no way to guard the global initializers with CPU capability detection.

ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: a0088fd0e7e12780d23c582a6ddc109f4432bbf5
Pull Request resolved: #37767
ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 4755a771dfd0a4c30e718582ea99a281dd6a7df7
Pull Request resolved: #37767
@ezyang ezyang self-assigned this May 4, 2020
@gchanan
Copy link
Contributor

gchanan commented May 4, 2020

can someone confirm this isn't present in 1.5? I know @jasaarim mentioned they didn't see it, but did we check the actual patch that introduced this problem isn't in 1.5?

@pbelevich
Copy link
Contributor

@pbelevich pbelevich added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels May 4, 2020
@ezyang
Copy link
Contributor

ezyang commented May 4, 2020

@colesbury while trying to verify that my fixed work, I realized that we setup other static initializers inside these files (e.g., to do registrations). This seems like another hazard for SIGILL, if the compiler decides that it can start autovectorizing this code. Not sure if there's a good way around it :/

ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue May 4, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: e41f0febe6d55940422c01e1e663f74865b9aa38
Pull Request resolved: #37767
@colesbury
Copy link
Member

@ezyang yeah, if someone is feeling ambitious maybe they can figure out an alternative implementation instead of REGISTER_DISPATCH. In practice those haven't seemed to generate these sorts of instructions yet so it seems less urgent.

ezyang added a commit that referenced this issue May 5, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D21386704](https://our.internmc.facebook.com/intern/diff/D21386704)

[ghstack-poisoned]
ezyang added a commit that referenced this issue May 5, 2020
Fixes #37577

Needs tests, and maybe a lint.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: b604d3fc2124bc8c23008c5da01ef62089c82b79
Pull Request resolved: #37767
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants