Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge performance difference between Intel OpenMP and Gcc OpenMp #230

Closed
yinghai opened this issue Apr 30, 2018 · 21 comments
Closed

Huge performance difference between Intel OpenMP and Gcc OpenMp #230

yinghai opened this issue Apr 30, 2018 · 21 comments
Labels

Comments

@yinghai
Copy link

yinghai commented Apr 30, 2018

Hi folks,
I am trying mkl-dnn. We have Intel MKL library installed but for some reason we don't want to use Intel's OpenMP (to prevent clash with upstream project which potentially could use gcc openmp). So I hacked the cmake/OpenMP.cmake to remove lines below https://github.com/intel/mkl-dnn/blob/40eb4d898e4caf7f23fdd25b348915134e878080/cmake/OpenMP.cmake#L51.

After that I ran examples/simple-net-cpp and noticed a huge difference in performance. Here are the comparison:

# With Intel OpenMP
[yinghai:build: (master)]$ ldd examples/simple-net-cpp
        linux-vdso.so.1 =>  (0x00007ffc2c1a2000)
        libmkldnn.so.0 => /home/yinghai/local/aml/mkl-dnn/build/src/libmkldnn.so.0 (0x00007f1e76a70000)
        libmkl_rt.so => /home/engshare/third-party2/IntelComposerXE/2017.0.098/gcc-5-glibc-2.23/9bc6787/mkl/lib/intel64/libmkl_rt.so (0x00007f1e7645f000)
        libiomp5.so => /home/engshare/third-party2/IntelComposerXE/2017.0.098/gcc-5-glibc-2.23/9bc6787/compiler/lib/intel64/libiomp5.so (0x00007f1e760bb000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1e75d9f000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1e75a9c000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f1e75876000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1e75660000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1e75443000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1e75080000)
        /lib64/ld-linux-x86-64.so.2 (0x000056408a68c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1e74e7c000)
[yinghai:build: (master)]$ examples/simple-net-cpp
Use time 13.506

With Gcc OpenMp
[yinghai:build: (master)]$ ldd examples/simple-net-cpp
        linux-vdso.so.1 =>  (0x00007fff35b54000)
        libmkldnn.so.0 => /home/yinghai/local/aml/mkl-dnn/build/src/libmkldnn.so.0 (0x00007f04bfb61000)
        libmkl_rt.so => /home/engshare/third-party2/IntelComposerXE/2017.0.098/gcc-5-glibc-2.23/9bc6787/mkl/lib/intel64/libmkl_rt.so (0x00007f04bf550000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f04bf234000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f04bef32000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f04bed0b000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f04beaf5000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f04be8d9000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f04be515000)
        /lib64/ld-linux-x86-64.so.2 (0x000055d488ad4000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f04be311000)
[yinghai:build: (master)]$ examples/simple-net-cpp
Use time 182.613

My question whether this is expected? And how come change of OpenMP implementations has such an impact? Thanks.

@emfomenk
Copy link

Hi @yinghai,

OpenMP is used for parallelization. More cores you have higher performance you get.

The rough estimation for the scaling is linear: 2x more cores 2x faster MKL-DNN. There are some other factors also, of course. So on the data you sent it seems like you have ~12 core system, for which using OpenMP is essential to get good performance.

@yinghai
Copy link
Author

yinghai commented Apr 30, 2018

@emfomenk Thanks. But my question is not about using or not using OpenMP. It's about the performance difference between using Intel's OpenMP library (libiomp) using gcc's OpenMp. In both cases, I can see from top that it's burning mulit-core CPUs. But when using libiomp it's just much faster, which I didn't expect.

@emfomenk
Copy link

emfomenk commented Apr 30, 2018

Beg your pardon, I read the original message inaccurately...
Could you please export MKL_THREADING_LAYER=GNU and run the example with GNU OpenMP once more?

 $ MKL_THREADING_LAYER=GNU examples/simple-net-cpp

@yinghai
Copy link
Author

yinghai commented Apr 30, 2018

@emfomenk Cool! With this env var, it gives similar performance results to iomp ones. Thanks for the tip. However, MKL_THREADING_LAYER=GNU seems to be undocumented and I cannot find it in mkl-dnn code base. I think using env var as a switch is error prone in production, especially when this flag is undocumented.. Is there a different way to force using gpu threading layer during compilation time? Thanks.

@emfomenk
Copy link

Typically Intel MKL-DNN is built with Intel MKL-ML (the small subset of Intel MKL). In that case if you want to use GNU OpenMP you should link against libmklml_gnu.so (instead of libmklml_intel.so) and no environment variables are required.

In your case you build Intel MKL-DNN with full Intel MKL (by linking with libmkl_rt.so). The libmkl_rt.so is basically a dispatcher and has the logic inside that chooses what threading layer to load. If nothing is specified then Intel OpenMP threading would be chosen. And that was the cause of your issue -- you are not allowed to mix two OpenMP run-times (Intel OpenMP and GNU OpenMP) in one application.

If either MKL_THREADING_LAYER environment variable is set to GNU or mkl_set_threading_layer is called with MKL_THREADING_GNU then GNU OpenMP is used.

Is there a different way to force using gpu threading layer during compilation time?

You have 4 choices:

  • link with full Intel MKL (via libmkl_rt.so) and use mkl_set_threading_layer with MKL_THREADING_GNU. Please note that in later case the function must be called at the very begging of the program.
  • link with full Intel MKL (via libmkl_rt.so) and use envvar
  • link with full Intel MKL (via libmkl_intel_lp64.so libmkl_gnu_thread.so libmkl_core.so). Alas this will require changes in our cmake building system
  • link with libmklml_gnu.so from the Intel MKL-ML, e.g. mklml_lnx_2018.0.3.20180406.tgz

In the later 2 cases no extra settings are required.

@yinghai
Copy link
Author

yinghai commented Apr 30, 2018

link with full Intel MKL (via libmkl_intel_lp64.so libmkl_gnu_thread.so libmkl_core.so). Alas this will require changes in our cmake building system

We do have those libraries in our MKL installation. Let me try that! Thanks!

@yinghai
Copy link
Author

yinghai commented May 1, 2018

@emfomenk Thanks. I tried option 3 by changing cmakefile and it worked!

https://github.com/yinghai/mkl-dnn/blob/mkl/cmake/MKL.cmake#L141-L158

@vpirogov
Copy link
Member

vpirogov commented May 4, 2018

Good to hear that it works!

@vpirogov vpirogov closed this as completed May 4, 2018
@ftian1
Copy link

ftian1 commented Jan 3, 2019

you are not allowed to mix two OpenMP run-times (Intel OpenMP and GNU OpenMP) in one application.

@emfomenk @yinghai beg you two pardon, I don't understand this. according to yinghai's first post of ldd output, the slower one is only using GNU OpenMP. and the faster one seems to mix the GOMP and IOMP? do I miss something?

with Intel OPENMP
libiomp5.so => /home/engshare/third-party2/IntelComposerXE/2017.0.098/gcc-5-glibc-2.23/9bc6787/compiler/lib/intel64/libiomp5.so (0x00007f1e760bb000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1e75d9f000)
libm.so.6 => /lib64/libm.so.6 (0x00007f1e75a9c000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f1e75876000)
Use time 13.506

With Gcc OpenMp
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f04bed0b000)
Use time 182.613

@emfomenk
Copy link

emfomenk commented Jan 3, 2019

Hi @ftian1,

The missing part is:

With Gcc OpenMp
libmkl_rt.so => ...
libgomp.so.1 => ...
Use time 182.613

Here, libmkl_rt by default loads Intel OpenMP. That was the cause of the issue. And that is why I recommended to use:

export MKL_THREADING_LAYER=GNU

The example was compiled with GNU OpenMP, but Intel MKL-DNN was not instructed to use GNU OpenMP, so it used Intel OpenMP. When you mix two OpenMP RTs in one application you might get huge performance penalties and even incorrect results. So this is prohibited.

In general there is no big difference whether you use GNU OpenMP or Intel OpenMP (as long as you use only one of them). Both are very well optimized.

@ftian1
Copy link

ftian1 commented Jan 3, 2019

thanks, Evarist.

"ldd libmklml_intel.so" shows it depends on libiomp5.so.
"ldd libmklml_gnu.so" shows it depends on libgomp.so.
do you know why "ldd libmkl_rt.so" by default doesn't show it depends on libiomp5.so? static build?

@emfomenk
Copy link

emfomenk commented Jan 3, 2019

Both libmklml_intel.so and libmklml_gnu.so are built from the static Intel MKL for simplicity of redistribution.

libmklml_intel.so = DSO{ libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a }
libmklml_gnu.so   = DSO{ libmkl_intel_lp64.a libmkl_gnu_thread.a   libmkl_core.a }

However, libmkl_rt.so is just a dispatcher to different configuration. It doesn't have any explicit dependencies, but loads all necessary stuff at runtime (depending on env.vars or function calls). We call it Intel MKL Single Dynamic Library. You can read more about it here.

@ftian1
Copy link

ftian1 commented Jan 3, 2019

sorry, I may not describe my doubt clearly.

I am still confused by the slow perf of simple_net_cpp with GCC openmp although you have explained libmkl_rt.so dispatcher would invoke iomp or gomp according to env or func call and this app mixed the use of iomp and gomp.

my understanding is "ldd" cmd shows all dynamic dependency libraries of an executive file. From the ldd log shared by yinghai, the simple_net_cpp with gcc openmp build doesn't show it depends on libiomp. then how could it finally invoke iomp implementation through libmkl_rt.so? the only explanation from my side is libmkl_rt.so statically links iomp lib. but you have said it doesn't. I am confused on this point. hoping I present my question clearly:(

With Gcc OpenMp
libmkl_rt.so => ...
libgomp.so.1 => ...
Use time 182.613

@emfomenk
Copy link

emfomenk commented Jan 3, 2019

ldd shows only explicit dependencies that you pass to a linker at DSO or application link time:

$ gcc -shared foo.o -olibfoo.so -lbar
$ ldd libfoo.so
libbar.so => ...

However you can also have an implicit dependency on other libraries using dlopen() and dlsym(). At runtime a library or an application can load other libraries and call a functions from them. In this case ldd would not be able to show those dependencies.

That's exactly what libmkl_rt.so does, and that is why ldd libmkl_rt.so doesn't show anything.

One of the reasons for that is because libmkl_rt should be able to work with different (incompatible) configurations. For instance, if it is explicitly linked with both libmkl_intel_thread.so and libmkl_gnu_thread.so that would be a disaster -- there libraries are conflicting with each other, hence libmkl_rt.so would become useless.

@ftian1
Copy link

ftian1 commented Jan 4, 2019

@emfomenk I see. forgot dlopen() after long time no use...:(

@cjolivier01
Copy link
Contributor

I can;t find any references to omp:

[chriso@chriso-ripper:~/src/mxnet/build (master)]ls /opt/intel/_mkl/lib/intel64/l
libmkl_avx2.so                  libmkl_blacs_sgimpt_lp64.so     libmkl_intel_ilp64.so           libmkl_scalapack_lp64.so
libmkl_avx512_mic.so            libmkl_blas95_ilp64.a           libmkl_intel_lp64.a             libmkl_sequential.a
libmkl_avx512.so                libmkl_blas95_lp64.a            libmkl_intel_lp64.so            libmkl_sequential.so
libmkl_avx.so                   libmkl_cdft_core.a              libmkl_intel_thread.a           libmkl_tbb_thread.a
libmkl_blacs_intelmpi_ilp64.a   libmkl_cdft_core.so             libmkl_intel_thread.so          libmkl_tbb_thread.so
libmkl_blacs_intelmpi_ilp64.so  libmkl_core.a                   libmkl_lapack95_ilp64.a         libmkl_vml_avx2.so
libmkl_blacs_intelmpi_lp64.a    libmkl_core.so                  libmkl_lapack95_lp64.a          libmkl_vml_avx512_mic.so
libmkl_blacs_intelmpi_lp64.so   libmkl_def.so                   libmkl_mc3.so                   libmkl_vml_avx512.so
libmkl_blacs_openmpi_ilp64.a    libmkl_gf_ilp64.a               libmkl_mc.so                    libmkl_vml_avx.so
libmkl_blacs_openmpi_ilp64.so   libmkl_gf_ilp64.so              libmkl_pgi_thread.a             libmkl_vml_cmpt.so
libmkl_blacs_openmpi_lp64.a     libmkl_gf_lp64.a                libmkl_pgi_thread.so            libmkl_vml_def.so
libmkl_blacs_openmpi_lp64.so    libmkl_gf_lp64.so               libmkl_rt.so                    libmkl_vml_mc2.so
libmkl_blacs_sgimpt_ilp64.a     libmkl_gnu_thread.a             libmkl_scalapack_ilp64.a        libmkl_vml_mc3.so
libmkl_blacs_sgimpt_ilp64.so    libmkl_gnu_thread.so            libmkl_scalapack_ilp64.so       libmkl_vml_mc.so
libmkl_blacs_sgimpt_lp64.a      libmkl_intel_ilp64.a            libmkl_scalapack_lp64.a         locale/
[chriso@chriso-ripper:~/src/mxnet/build (master)]ls /opt/intel/_mkl/lib/intel64/libmklml_intel.so
/bin/ls: cannot access '/opt/intel/_mkl/lib/intel64/libmklml_intel.so': No such file or directory
[chriso@chriso-ripper:~/src/mxnet/build (master)]grep libomp  /opt/intel/_mkl/lib/intel64/libmklml_intel.so
grep: /opt/intel/_mkl/lib/intel64/libmklml_intel.so: No such file or directory
[chriso@chriso-ripper:~/src/mxnet/build (master)]grep libomp  /opt/intel/_mkl/lib/intel64/*
grep: /opt/intel/_mkl/lib/intel64/locale: Is a directory
[chriso@chriso-ripper:~/src/mxnet/build (master)]grep libomp  /opt/intel/_mkl/lib/intel64/libmkl_gnu_thread.so
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd  /opt/intel/_mkl/lib/intel64/libmkl_gnu_thread.so
        linux-vdso.so.1 (0x00007ffe6412e000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe301f1a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe301cfb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe30190a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe3039f7000)
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd  /opt/intel/_mkl/lib/intel64/libmkl_intel_thread.so 
        linux-vdso.so.1 (0x00007ffe0eff8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3abe103000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3abdee4000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3abdaf3000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3ac0873000)
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd  /opt/intel/_mkl/lib/intel64/libmkl_intel_ilp64.so
        linux-vdso.so.1 (0x00007fff56f76000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1f04714000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f04323000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1f053d7000)
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd  /opt/intel/_mkl/lib/intel64/libmkl_intel_lp64.so
        linux-vdso.so.1 (0x00007fff68b6f000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdd3b729000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdd3b338000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdd3c499000)
[chriso@chriso-ripper:~/src/mxnet/build (master)]

@cjolivier01
Copy link
Contributor

^^ btw this is latest 2020 version of mkl that I just downloaded today

@vpirogov
Copy link
Member

vpirogov commented Feb 28, 2020

@cjolivier01,

Intel MKL manages OpenMP runtime in dynamic libraries using dlopen, so there's no explicit dependency on libomp, libiomp or libgomp. The dependency is localized in threading layer:

  • libmkl_intel_thread.so loads libiomp5.so
  • libmkl_gnu_thread.so loads libgomp.so
  • libmkl_tbb_thread.so and libmkl_squential.so do not load OpenMP libraries

@cjolivier01
Copy link
Contributor

how can I tell it which one to load or know which one it will load? it’s not reasonable to not allow a previously loaded omp, especially llvm omp, since this is largely out of our control via clang or transitive dependencies. it’s actually quite offensive that MKL tries to dictate to me stuff that should be under my discretion

@rsdubtso
Copy link

You choose the OpenMP threading layer to link to library based on the compiler you use. If you use TBB, you link to the TBB library. If you link to libmkl_rt.so, you need to set it up using the controls provided.

The fact that there are multiple OpenMP threadling layers is an unfortunate consequence of the fact that OpenMP libraries from different compilers are not compatible. Also, note, that Clang OpenMP is not supported, as far as I know. Maybe @aaraujom can correct me here.

@aaraujom
Copy link
Contributor

aaraujom commented Mar 2, 2020

As far as I know clang is not supported by Intel MKL. I recommend checking Intel MKL Link Line Advisor for supported linking configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants