Arithmetic exception in Xbyak::util::Cpu::Cpu() when libmkldnn running in virtual machine KVM #215

moting9 · 2018-04-16T02:30:40Z

we build intel caffe with mkldnn as default engine, and running on KVM with caffe time command, there is float exception

gdb error

Program received signal SIGFPE, Arithmetic exception.
0x00007fffee1faacc in Xbyak::util::Cpu::Cpu() ()
from /home/intel/caffe/external/mkldnn/install/lib/libmkldnn.so.0

kvm config

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Stepping: 4
CPU MHz: 2499.996
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm cons tant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1

emfomenk · 2018-04-16T05:39:34Z

Hi @moting9,

Thanks for the report!
Did you try the latest master?
If it still fails, could you please build mkl-dnn with debug (-DCMAKE_BUILD_TYPE=Debug), run under gdb and point to exact place and circumstances of this FPE?

moting9 · 2018-04-16T09:24:25Z

Thanks for the quick response!
I tried to use latest mkldnn and set debug flag

[root@iZhp37p4s3u8hd5b67tpboZ caffe]# ./build/tools/caffe time -model ../intel-c affe-model-v1/caffe_tianrang.prototxt
caffe: /home/work/caffe/external/mkldnn/src/src/cpu/xbyak/xbyak_util.h:180: void Xbyak::util::Cpu::setCacheHierarchy(): Assertion `smt_width != 0' failed.
Aborted

moting9 · 2018-04-16T09:49:05Z

backtrace

(gdb) bt
#0 Xbyak::util::Cpu::setCacheHierarchy (this=0x7fffee86f7e0 <mkldnn::impl::cpu::(anonymous namespace)::cpu>) at /home/work/caffe/external/mkldnn/src/src/cpu/xbyak/xbyak_util.h:179
#1 0x00007fffee188384 in Xbyak::util::Cpu::Cpu (this=0x7fffee86f7e0 <mkldnn::impl::cpu::(anonymous namespace)::cpu>) at /home/work/caffe/external/mkldnn/src/src/cpu/xbyak/xbyak_util.h:405
#2 0x00007fffee17fa6e in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/work/caffe/external/mkldnn/src/src/cpu/jit_generator.hpp:168
#3 0x00007fffee17faaa in _GLOBAL__sub_I_jit_avx2_1x1_conv_kernel_f32.cpp(void) () at /home/work/caffe/external/mkldnn/src/src/cpu/jit_avx2_1x1_conv_kernel_f32.cpp:677
#4 0x00007ffff7dea503 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#5 0x00007ffff7ddc1aa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6 0x0000000000000004 in ?? ()
#7 0x00007fffffffe7e8 in ?? ()
#8 0x00007fffffffe80d in ?? ()
#9 0x00007fffffffe812 in ?? ()
#10 0x00007fffffffe819 in ?? ()
#11 0x0000000000000000 in ?? ()

vpirogov · 2018-04-16T17:21:37Z

Based on this backtrace the issue is the same as #208 and is resolved in revision a5f6077. Would you please update the library to the latest revision from the master branch (not v0.13) and verify that it is resolved?

sinkingsugar · 2018-12-31T02:27:04Z

@vpirogov I am still having similar issues when using a VM (KVM, qemu, host cpu features)
e.g.:

Python 3.7.1 (default, Dec 14 2018, 19:28:38) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
zsh: floating point exception (core dumped)  python

Pretty sure its mkl-dnn cos I had to skip using it when building our framework:
https://github.com/fragcolor-xyz/nimtorch

vpirogov added the bug A confirmed library bug label Apr 16, 2018

vpirogov closed this as completed Apr 17, 2018

vvishal mentioned this issue Jan 26, 2019

core dumped (ver1.0.0) pytorch/pytorch#16183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arithmetic exception in Xbyak::util::Cpu::Cpu() when libmkldnn running in virtual machine KVM #215

Arithmetic exception in Xbyak::util::Cpu::Cpu() when libmkldnn running in virtual machine KVM #215

moting9 commented Apr 16, 2018 •

edited

emfomenk commented Apr 16, 2018 •

edited

moting9 commented Apr 16, 2018

moting9 commented Apr 16, 2018

vpirogov commented Apr 16, 2018

sinkingsugar commented Dec 31, 2018 •

edited

Arithmetic exception in Xbyak::util::Cpu::Cpu() when libmkldnn running in virtual machine KVM #215

Arithmetic exception in Xbyak::util::Cpu::Cpu() when libmkldnn running in virtual machine KVM #215

Comments

moting9 commented Apr 16, 2018 • edited

gdb error

kvm config

emfomenk commented Apr 16, 2018 • edited

moting9 commented Apr 16, 2018

moting9 commented Apr 16, 2018

vpirogov commented Apr 16, 2018

sinkingsugar commented Dec 31, 2018 • edited

moting9 commented Apr 16, 2018 •

edited

emfomenk commented Apr 16, 2018 •

edited

sinkingsugar commented Dec 31, 2018 •

edited