Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multithreaded dftd3 SCF grad fail with intel compiler #1007

Closed
TermeHansen opened this issue May 4, 2018 · 6 comments
Closed

multithreaded dftd3 SCF grad fail with intel compiler #1007

TermeHansen opened this issue May 4, 2018 · 6 comments
Milestone

Comments

@TermeHansen
Copy link

When running both version 1.1 and 1.2rc1 psi4 compiled with intel compilers a dftd3 optimization (optimize('pbe0-d3bj')) hangs at the SCF grad start, also I can see cpu usage falls from all cpus to only one.
if i set -n 1 it runs fine.

This problem I didn't have with psi4 compiled with gcc. Any ideas as to why this happens, and what I can do?

@loriab loriab added this to the Psi4 1.2 milestone May 4, 2018
@dgasmith
Copy link
Member

dgasmith commented May 9, 2018

@TermeHansen Did you compile this with intel yourself or did you use the prebuilt conda environments? If you built this yourself please post your CMake script, I suspect this is NumPy and Psi4 OMP shared object issues.

@TermeHansen
Copy link
Author

Yes I believe you are right. I've experienced only with my own compiled from scratch, and never from conda env., also right now I am migrating to Ubuntu 18.04 and I don't think I have seen the problem with the gcc 7.3.0, compared to the gcc 5.4 in Ubuntu 16.04... I use the python/NumPy from PPA, does this mean that I should consider also to compile NumPy myself with same compiler to avoid this type of issue?? What is your experience on how to avoid it?

@dgasmith
Copy link
Member

dgasmith commented May 9, 2018

@loriab can discuss this more, but the largest issue is if NumPy and Psi4 have different OMP runtimes via BLAS. GOMP and IOMP do not play well together. So naively you would either need to use GCC as this is likely what NumPy's BLAS is linked against or build your own NumPy and link it against the same Intel BLAS you plan to use with Psi4.

At this point I think we highly recommend using a conda environment or our binaries. Both are ICC compiled with MKL and are optimized for multiple architectures from SSE2 (?) to AVX-512. If you really want to compile Psi4 yourself we recommend using the p4dev environment with the built in path-manager, see here to help avoid these conflicts.

Lots of fun intricacies to make something like Python really work with large C++ backends :)

LAB EDIT: the 1.1 binaries use statically linked MKL in psi4 and are potentially susceptible to the problems mentioned. binaries after mid-July 2017 are safe (provided numpy is MKL RT)

@loriab
Copy link
Member

loriab commented May 10, 2018

I hadn't seen exactly this manifestation of the BLAS issue, but yes, it's why I was strongly advising against statically linking MKL in the other thread.

  • "MKL Trio" := mkl_intel_lp64 mkl_intel_thread mkl_core
  • "MKL RT" := libmkl_rt
  • if the Psi4 CMake can find MKL RT, that's what it'll link to for BLAS rather than MKL trio
  • current NumPy from defaults conda channel links against MKL RT. (until recently, it linked against MKL Trio, which was why we sometimes advised getting numpy from the Intel conda channel, which has long linked against RT)
  • so it's safe to get psi4 & numpy via conda install psi4 -c psi4/label/dev because that'll pull numpy from defaults and psi4 from psi4 and both have the same BLAS linking. (may want to conda update numpy to make sure your numpy is the recent build.) can always ldd them to inspect.
  • for the same reasons as above, it's safe to build psi4 from source against the conda psi4-dev package. That package provides MKL from conda and instructs psi4 cmake to use MKL RT. It also provides NumPy which (so long as recent build) uses MKL RT.
  • unless you want to build NumPy yourself, the numpy you have pretty much determines how you must build or use psi4. So it's dangerous to:
    • use statically linked MKL in psi4 and use NumPy with dynamically linked MKL
    • use system BLAS like Apple's Accelerate with psi4 and use NumPy with dynamically linked MKL
    • use MKL RT with psi4 and use NumPy with MKL trio

Since PPA probably doesn't have license to distribute MKL, it's likely that Psi4 w/MKL RT plus their NumPy is also dangerous. What are they linked against, if you can find out easily through ldd (have to burrow pretty deep into numpy to find a .so)?

@dgasmith
Copy link
Member

dgasmith commented May 10, 2018

Numpy .so that links BLAS can be found at python -c "import os; import numpy as np; print(os.path.dirname(np.__file__))" + /numpy/core/multiarray.*.so. Where the star depends on your python installation.

@loriab
Copy link
Member

loriab commented May 22, 2018

This sounds resolved. Please reopen if further issues.

@loriab loriab closed this as completed May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants