Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build MKL-DNN without linking with OpenMP #166

Closed
xyzsam opened this issue Dec 12, 2017 · 11 comments
Closed

Build MKL-DNN without linking with OpenMP #166

xyzsam opened this issue Dec 12, 2017 · 11 comments
Labels

Comments

@xyzsam
Copy link

xyzsam commented Dec 12, 2017

Is it possible to build MKL-DNN without OpenMP and only use a single thread? I understand that this will lead to poorer performance; however, my goal is to run some of this code inside a simulator that does not support pthreads, and for the purposes of my experiments, I only need to look at single-threaded runtime.

Also, is there a CMake flag that would allow me to specify which SIMD extension to target (for example only using SSE2 instead of AVX2)? Again, yes, I understand the performance implications of such a change.

Any guidance for how to accomplish this would be greatly appreciated.

@emfomenk
Copy link

emfomenk commented Dec 12, 2017

Hi @xyzsam,

To disable openmp I made the following hack.

For SIMD (assuming you are using Intel Compiler): currently the only way is to adjust the cmake/platform.cmake file and change the following line:

60         set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -xHOST")

Please replace "-xHost" with the appropriate value, like "-xSSE2". If you are using gcc, no target is specifedt. Though that would be changed in future. We are also adding -DARCH_OPT_FLAGS= so that you can do the adjustment on the fly (rather than hacking the cmake file).

@xyzsam
Copy link
Author

xyzsam commented Dec 12, 2017

What about the external dependencies that get downloaded? Sounds like I would need to build a custom version of MKL that doesn't use the SIMD extensions I don't want and link MKL-DNN with those instead?

@emfomenk
Copy link

MKL-DNN doesn't download anything implicitly.
You have two options:

  • do not link MKL-DNN against Intel MKL (do not specify MKLROOT)
  • link MKL-DNN against full Intel MKL, and set
export MKL_THREADING_LAYER=sequential # to make MKL not using OpenMP
export MKL_ENABLE_INSTRUCTIONS=SSE4_2

For more details see Intel MKL user's guide.

@xyzsam
Copy link
Author

xyzsam commented Dec 12, 2017

Thanks. I linked MKL-DNN against the full Intel MKL and was able to disable openMP and use SSE4. It looks like the simulated CPU doesn't play too nicely with the JIT though. I get a Xbyak::Error ("internal error") thrown during the simulation. What is the purpose of the JIT?

Also, I used MKL_CBWR to specify which code branch I wanted to use - this should not pose a problem for MKL-DNN since it's acting on the underlying MKL library, right?

@rsdubtso
Copy link

The JIT is used to actually generate the computational code at run-time. Does your simulator execute online or does it collect traces and then simulates them offline? I would guess that the error you are getting is due to Xbyak not being able to map a page with executable permissions...

@xyzsam
Copy link
Author

xyzsam commented Dec 13, 2017

It's possible that this is what is happening. Is there a way to turn off the JIT and use ahead-of-time compilation?

@rsdubtso
Copy link

That is not possible, unfortunately. However, code generation happens at primitive creation time. Can somehow execute but not simulate that part?

@xyzsam
Copy link
Author

xyzsam commented Dec 13, 2017

Hm, fast forwarding the simulation is an idea that I will look into, thanks. Does MKL-DNN do anything at binary load time? As far as I can tell, the simulator never even gets to the main function before it crashes with that Xybak::Error, and gdb produces the following stack trace:

#0 0x00007ffff714b8b0 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff747abbd in mkldnn::impl::cpu::(anonymous namespace)::get_cache_size(int, bool) [clone .constprop.170] () from /usr/local/mkl-dnn/lib/libmkldnn.so.0
#2 0x00007ffff747abea in _GLOBAL__sub_I_jit_avx512_common_conv_winograd_kernel_f32.cpp () from /usr/local/mkl-dnn/lib/libmkldnn.so.0
#3 0x00007ffff7de76ba in call_init (l=, argc=argc@entry=2, argv=argv@entry=0x7fffffffedc8, env=env@entry=0x7fffffffede0) at dl-init.c:72
#4 0x00007ffff7de77cb in call_init (env=0x7fffffffede0, argv=0x7fffffffedc8, argc=2, l=) at dl-init.c:30
#5 _dl_init (main_map=0x7ffff7ffe168, argc=2, argv=0x7fffffffedc8, env=0x7fffffffede0) at dl-init.c:120
#6 0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#7 0x0000000000000002 in ?? ()
#8 0x00007fffffffef45 in ?? ()
#9 0x00007fffffffef89 in ?? ()
#10 0x0000000000000000 in ?? ()

Let me know if that stack trace helps at all. From what I can guess, it's one or more of these reasons:

  1. I know that the mmap system call emulation is somewhat basic, so if not being able to map an executable page is the reason, I can probably fix it.
  2. MKL-DNN is trying to use cpuid to get the cache and core topology of the simulated system, and I know that the cpuid instruction is only partially implemented; in particular, EAX=2 and EAX=7 are unimplemented.
  3. This simulator only supports SSE instructions, not AVX instructions, so the fact that this is happening inside an avx2 related function means it wasn't able to figure out what ISA extensions the simulated CPU supports.

@xyzsam
Copy link
Author

xyzsam commented Dec 13, 2017

Looks like the cpuid instruction is the culprit for this current crash. The INTERNAL_ERROR is thrown after get_cache_size() returns data_cache_levels = 0. I'll implement the missing cpuid functionality in the simulator and report back.

@rsdubtso
Copy link

Okay, closing this for now. Feel free to reopen if you have any additional questions. And if you have any pointers at what you are doing, please do post them here -- sounds very interesting.

@xyzsam
Copy link
Author

xyzsam commented Dec 15, 2017

I was finally able to run MKL-DNN under the simulator, which only supports up to SSE4.2. It was mostly a matter of hacking the cpuid instruction implementation to return "GenuineIntel" for the vendor string and the right cache parameters for leaf 4. I disabled OpenMP by linking against the full MKL and setting the appropriate env variables, and I was able to tell MKL to use only SSE4.2 instructions with the env variable as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants