multithreaded dftd3 SCF grad fail with intel compiler #1007

TermeHansen · 2018-05-04T11:47:03Z

When running both version 1.1 and 1.2rc1 psi4 compiled with intel compilers a dftd3 optimization (optimize('pbe0-d3bj')) hangs at the SCF grad start, also I can see cpu usage falls from all cpus to only one.
if i set -n 1 it runs fine.

This problem I didn't have with psi4 compiled with gcc. Any ideas as to why this happens, and what I can do?

dgasmith · 2018-05-09T17:23:09Z

@TermeHansen Did you compile this with intel yourself or did you use the prebuilt conda environments? If you built this yourself please post your CMake script, I suspect this is NumPy and Psi4 OMP shared object issues.

TermeHansen · 2018-05-09T19:42:55Z

Yes I believe you are right. I've experienced only with my own compiled from scratch, and never from conda env., also right now I am migrating to Ubuntu 18.04 and I don't think I have seen the problem with the gcc 7.3.0, compared to the gcc 5.4 in Ubuntu 16.04... I use the python/NumPy from PPA, does this mean that I should consider also to compile NumPy myself with same compiler to avoid this type of issue?? What is your experience on how to avoid it?

dgasmith · 2018-05-09T20:53:07Z

@loriab can discuss this more, but the largest issue is if NumPy and Psi4 have different OMP runtimes via BLAS. GOMP and IOMP do not play well together. So naively you would either need to use GCC as this is likely what NumPy's BLAS is linked against or build your own NumPy and link it against the same Intel BLAS you plan to use with Psi4.

At this point I think we highly recommend using a conda environment or our binaries. Both are ICC compiled with MKL and are optimized for multiple architectures from SSE2 (?) to AVX-512. If you really want to compile Psi4 yourself we recommend using the p4dev environment with the built in path-manager, see here to help avoid these conflicts.

Lots of fun intricacies to make something like Python really work with large C++ backends :)

LAB EDIT: the 1.1 binaries use statically linked MKL in psi4 and are potentially susceptible to the problems mentioned. binaries after mid-July 2017 are safe (provided numpy is MKL RT)

loriab · 2018-05-10T01:27:01Z

I hadn't seen exactly this manifestation of the BLAS issue, but yes, it's why I was strongly advising against statically linking MKL in the other thread.

"MKL Trio" := mkl_intel_lp64 mkl_intel_thread mkl_core
"MKL RT" := libmkl_rt
if the Psi4 CMake can find MKL RT, that's what it'll link to for BLAS rather than MKL trio
current NumPy from defaults conda channel links against MKL RT. (until recently, it linked against MKL Trio, which was why we sometimes advised getting numpy from the Intel conda channel, which has long linked against RT)
so it's safe to get psi4 & numpy via conda install psi4 -c psi4/label/dev because that'll pull numpy from defaults and psi4 from psi4 and both have the same BLAS linking. (may want to conda update numpy to make sure your numpy is the recent build.) can always ldd them to inspect.
for the same reasons as above, it's safe to build psi4 from source against the conda psi4-dev package. That package provides MKL from conda and instructs psi4 cmake to use MKL RT. It also provides NumPy which (so long as recent build) uses MKL RT.
unless you want to build NumPy yourself, the numpy you have pretty much determines how you must build or use psi4. So it's dangerous to:
- use statically linked MKL in psi4 and use NumPy with dynamically linked MKL
- use system BLAS like Apple's Accelerate with psi4 and use NumPy with dynamically linked MKL
- use MKL RT with psi4 and use NumPy with MKL trio

Since PPA probably doesn't have license to distribute MKL, it's likely that Psi4 w/MKL RT plus their NumPy is also dangerous. What are they linked against, if you can find out easily through ldd (have to burrow pretty deep into numpy to find a .so)?

dgasmith · 2018-05-10T01:32:09Z

Numpy .so that links BLAS can be found at python -c "import os; import numpy as np; print(os.path.dirname(np.__file__))" + /numpy/core/multiarray.*.so. Where the star depends on your python installation.

loriab · 2018-05-22T20:07:56Z

This sounds resolved. Please reopen if further issues.

loriab added this to the Psi4 1.2 milestone May 4, 2018

loriab closed this as completed May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multithreaded dftd3 SCF grad fail with intel compiler #1007

multithreaded dftd3 SCF grad fail with intel compiler #1007

TermeHansen commented May 4, 2018

dgasmith commented May 9, 2018

TermeHansen commented May 9, 2018

dgasmith commented May 9, 2018 •

edited by loriab

Loading

loriab commented May 10, 2018

dgasmith commented May 10, 2018 •

edited

Loading

loriab commented May 22, 2018

multithreaded dftd3 SCF grad fail with intel compiler #1007

multithreaded dftd3 SCF grad fail with intel compiler #1007

Comments

TermeHansen commented May 4, 2018

dgasmith commented May 9, 2018

TermeHansen commented May 9, 2018

dgasmith commented May 9, 2018 • edited by loriab Loading

loriab commented May 10, 2018

dgasmith commented May 10, 2018 • edited Loading

loriab commented May 22, 2018

dgasmith commented May 9, 2018 •

edited by loriab

Loading

dgasmith commented May 10, 2018 •

edited

Loading