New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psi4/numpy import order affects threading #755

Closed
loriab opened this Issue Jun 30, 2017 · 10 comments

Comments

Projects
None yet
2 participants
@loriab
Member

loriab commented Jun 30, 2017

Looks like we lost threading again at some point, especially in PSIthon. Kudos to @schiebermc for convincing me, even after pytest profiling showed all was in working order. I don't know when this actually happened. Fix seems easy enough for PSIthon, though I still wanted to run it by the core-devs, but PsiAPI is a little trickier.

  • PsiAPI, import psi4 before numpy: python thread.py --> NO thread
  • PsiAPI, import numpy before psi4: python thread.py --> THREAD
  • PSIthon, bin/psi4 imports psi4 which imports numpy: psi4 thread.py --> NO thread (yes, even if you comment out the thread setting line)
  • PSIthon, bin/psi4 inserting import numpy as np here to effectively import numpy before psi4: psi4 thread.py --> THREAD

thread.py test file (essentially this)

import time

# good
import numpy as np
import psi4

# bad
#import psi4
#import numpy as np

def test_threaded_blas():
    threads = 20

    times = {}

    size = [200, 500, 2000, 4000]
    threads = [1, threads]

    for th in threads:
        psi4.set_num_threads(th)

        for sz in size:
            nruns = max(1, int(1.e10 / (sz ** 3)))

            a = psi4.core.Matrix(sz, sz)
            b = psi4.core.Matrix(sz, sz)
            c = psi4.core.Matrix(sz, sz)

            tp4 = time.time()
            for n in range(nruns):
                c.gemm(False, False, 1.0, a, b, 0.0)

            retp4 = (time.time() - tp4) / nruns

            tnp = time.time()
            for n in range(nruns):
                np.dot(a, b, out=np.asarray(c))

            retnp = (time.time() - tnp) / nruns
            print("Time for threads %2d, size %5d: Psi4: %12.6f  NumPy: %12.6f" % (th, sz, retp4, retnp))
            if sz == 4000:
                times["p4-n{}".format(th)] = retp4
                times["np-n{}".format(th)] = retnp
                assert psi4.get_num_threads() == th

    rat1 = times["np-n" + str(threads[-1])] / times["p4-n" + str(threads[-1])]
    rat2 = times["p4-n" + str(threads[0])] / times["p4-n" + str(threads[-1])]
    print("  NumPy@n%d : Psi4@n%d ratio (want ~1): %.2f" % (threads[-1], threads[-1], rat1))
    print("   Psi4@n%d : Psi4@n%d ratio (want ~%d): %.2f" % (threads[0], threads[-1], threads[-1], rat2))

if __name__ == '__main__':
    test_threaded_blas()
@rmcgibbo

This comment has been minimized.

Show comment
Hide comment
@rmcgibbo

rmcgibbo Jun 30, 2017

Contributor

This just has to be connected with the same underling bug/issue as #748.

Contributor

rmcgibbo commented Jun 30, 2017

This just has to be connected with the same underling bug/issue as #748.

@loriab

This comment has been minimized.

Show comment
Hide comment
@loriab

loriab Jun 30, 2017

Member

I agree, although this issue was seen on Linux and both psi4 and numpy are using MKL`, just different ones:

  • Psi4 libmkl_rt.so of a local c.2016 Intel install
  • NumPy libmkl_core.so, thread, lp64 of default-channel c.2017 conda install
Member

loriab commented Jun 30, 2017

I agree, although this issue was seen on Linux and both psi4 and numpy are using MKL`, just different ones:

  • Psi4 libmkl_rt.so of a local c.2016 Intel install
  • NumPy libmkl_core.so, thread, lp64 of default-channel c.2017 conda install
@rmcgibbo

This comment has been minimized.

Show comment
Hide comment
@rmcgibbo

rmcgibbo Jun 30, 2017

Contributor

I can't reproduce this behavior on my linux installation of add49b9 (icc 2017.2.050, mkl 2017.1.143, numpy is also linked to the same libmkl_rt.so). I turned down thread = 20 to threads = 4, but regardless of the import order I see Psi4@n1 : Psi4@n4 ratio (want ~4) close to 4.

Contributor

rmcgibbo commented Jun 30, 2017

I can't reproduce this behavior on my linux installation of add49b9 (icc 2017.2.050, mkl 2017.1.143, numpy is also linked to the same libmkl_rt.so). I turned down thread = 20 to threads = 4, but regardless of the import order I see Psi4@n1 : Psi4@n4 ratio (want ~4) close to 4.

@rmcgibbo

This comment has been minimized.

Show comment
Hide comment
@rmcgibbo

rmcgibbo Jun 30, 2017

Contributor
Contributor

rmcgibbo commented Jun 30, 2017

@loriab

This comment has been minimized.

Show comment
Hide comment
@loriab

loriab Jul 1, 2017

Member

Test subjects:

  • thread.py from above (uses psi4 and np internally)
    • PsiAPI – has to import both
    • Psithon – comment out both good and bad blocks above
  • tu1.py below (uses psi4 internally so imports it)
  • Psi4 1.1 (add49) and current devel head Psi4

Findings:

  • The SCF (tu1.py) scales as expected
    • Psithon doesn't care if Numpy imported in bin/psi4
    • Psithon takes orders from psi4 -nN
    • Psithon & PsiAPI take orders preferentially from psi4.set_num_threads(N)
    • Psithon & PsiAPI ignore :envvar:MKL_NUM_THREADS
  • The DGEMM scaling thest (thread.py) behaves as previously described
    • Psithon and PsiAPI thread if through the file itself or through bin/psi4 (if relevant) NumPy is imported before Psi4
    • Psithon and PsiAPI don't thread otherwise
  • No difference btwn 1.1 and head (bad news for @schiebermc whose tests indicate something happened around June 14-15)
  • This contradicts @rmcgibbo's findings above, so maybe my MKLs are fighting
  • Seem to be ok on most use cases, but not if doing detailed thread setting from input
import psi4
#psi4.set_num_threads(6)

def test_psi4_basic():
    """tu1-h2o-energy"""
    #! Sample HF/cc-pVDZ H2O computation

    h2o = psi4.geometry("""
      O
      H 1 0.96
      H 1 0.96 2 104.5
    """)

    psi4.set_options({'basis': "aug-cc-pV5Z"})
    psi4.energy('scf')

if __name__ == '__main__':
    test_psi4_basic()
Member

loriab commented Jul 1, 2017

Test subjects:

  • thread.py from above (uses psi4 and np internally)
    • PsiAPI – has to import both
    • Psithon – comment out both good and bad blocks above
  • tu1.py below (uses psi4 internally so imports it)
  • Psi4 1.1 (add49) and current devel head Psi4

Findings:

  • The SCF (tu1.py) scales as expected
    • Psithon doesn't care if Numpy imported in bin/psi4
    • Psithon takes orders from psi4 -nN
    • Psithon & PsiAPI take orders preferentially from psi4.set_num_threads(N)
    • Psithon & PsiAPI ignore :envvar:MKL_NUM_THREADS
  • The DGEMM scaling thest (thread.py) behaves as previously described
    • Psithon and PsiAPI thread if through the file itself or through bin/psi4 (if relevant) NumPy is imported before Psi4
    • Psithon and PsiAPI don't thread otherwise
  • No difference btwn 1.1 and head (bad news for @schiebermc whose tests indicate something happened around June 14-15)
  • This contradicts @rmcgibbo's findings above, so maybe my MKLs are fighting
  • Seem to be ok on most use cases, but not if doing detailed thread setting from input
import psi4
#psi4.set_num_threads(6)

def test_psi4_basic():
    """tu1-h2o-energy"""
    #! Sample HF/cc-pVDZ H2O computation

    h2o = psi4.geometry("""
      O
      H 1 0.96
      H 1 0.96 2 104.5
    """)

    psi4.set_options({'basis': "aug-cc-pV5Z"})
    psi4.energy('scf')

if __name__ == '__main__':
    test_psi4_basic()
@rmcgibbo

This comment has been minimized.

Show comment
Hide comment
@rmcgibbo

rmcgibbo Jul 1, 2017

Contributor

If I have time this weekend, I'm going to try making a small little pair of Python extension modules that are each linked to a separate copy of MKL and see if I can reproduce anything like this. I think that must be the relevant difference between my test and yours. Weirdness about two simultaneous copies of BLAS libraries being loaded + threads seems to be involved in #748 as well.

Contributor

rmcgibbo commented Jul 1, 2017

If I have time this weekend, I'm going to try making a small little pair of Python extension modules that are each linked to a separate copy of MKL and see if I can reproduce anything like this. I think that must be the relevant difference between my test and yours. Weirdness about two simultaneous copies of BLAS libraries being loaded + threads seems to be involved in #748 as well.

@rmcgibbo

This comment has been minimized.

Show comment
Hide comment
@rmcgibbo

rmcgibbo Jul 3, 2017

Contributor

If you still have this environment around on Linux, can you add an os.system('grep mkl /proc/%d/maps' % os.getpid()) to the bottom of the script?

Contributor

rmcgibbo commented Jul 3, 2017

If you still have this environment around on Linux, can you add an os.system('grep mkl /proc/%d/maps' % os.getpid()) to the bottom of the script?

@loriab

This comment has been minimized.

Show comment
Hide comment
@loriab

loriab Jul 3, 2017

Member
  • Threading fails (psithon, import psi4, then numpy)
>>> stage/usr/local/psi4/bin/psi4 thread.py
Time for threads  1, size   200: Psi4:     0.000577  NumPy:     0.000628
Time for threads  1, size   500: Psi4:     0.008359  NumPy:     0.008199
Time for threads  1, size  2000: Psi4:     0.502508  NumPy:     0.895125
Time for threads  1, size  4000: Psi4:     3.920520  NumPy:     3.914676
Time for threads 20, size   200: Psi4:     0.000609  NumPy:     0.000175
Time for threads 20, size   500: Psi4:     0.009030  NumPy:     0.001299
Time for threads 20, size  2000: Psi4:     0.569735  NumPy:     0.062893
Time for threads 20, size  4000: Psi4:     3.947274  NumPy:     0.440798
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.11
   Psi4@n1 : Psi4@n20 ratio (want ~20): 0.99
7f945dbfe000-7f9460a53000 r-xp 00000000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460a53000-7f9460c52000 ---p 02e55000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460c52000-7f9460c59000 r--p 02e54000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460c59000-7f9460c62000 rw-p 02e5b000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460c63000-7f9460cb8000 rw-p 03037000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9461156000-7f9461a4d000 r-xp 00000000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461a4d000-7f9461c4c000 ---p 008f7000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461c4c000-7f9461c4d000 r--p 008f6000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461c4d000-7f9461c61000 rw-p 008f7000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461c66000-7f94631cd000 r-xp 00000000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94631cd000-7f94633cd000 ---p 01567000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94633cd000-7f94633d0000 r--p 01567000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94633d0000-7f946358c000 rw-p 0156a000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94638d6000-7f946506b000 r-xp 00000000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f946506b000-7f946526b000 ---p 01795000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f946526b000-7f9465273000 r--p 01795000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f9465273000-7f9465294000 rw-p 0179d000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f946a231000-7f946ba5c000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946ba5c000-7f946bc5c000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bc5c000-7f946bc64000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bc64000-7f946bc85000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bcd8000-7f946bd29000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bd29000-7f946d358000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d358000-7f946d557000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d557000-7f946d55a000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d55a000-7f946d731000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d737000-7f946d78e000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d78e000-7f946df64000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946df64000-7f946e164000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946e164000-7f946e165000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946e165000-7f946e176000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946e17b000-7f946e1b0000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f9472a4a000-7f9472e17000 r-xp 00000000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f9472e17000-7f9473017000 ---p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f9473017000-7f947301d000 r--p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f947301d000-7f947301e000 rw-p 003d3000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
(p4dev36) psilocaluser@bash:psinet:/home/psilocaluser/gits/hrw-lab/objdir36-2: (threadjune) vi stage/usr/local/psi4/bin/psi4 
  • Threading succeeds (Psithon, import numpy, then psi4)
>>> stage/usr/local/psi4/bin/psi4 thread.py
Time for threads  1, size   200: Psi4:     0.000583  NumPy:     0.000615
Time for threads  1, size   500: Psi4:     0.008012  NumPy:     0.008171
Time for threads  1, size  2000: Psi4:     0.753459  NumPy:     0.499194
Time for threads  1, size  4000: Psi4:     3.896844  NumPy:     3.914840
Time for threads 20, size   200: Psi4:     0.000059  NumPy:     0.000118
Time for threads 20, size   500: Psi4:     0.000570  NumPy:     0.000776
Time for threads 20, size  2000: Psi4:     0.033862  NumPy:     0.035520
Time for threads 20, size  4000: Psi4:     0.237118  NumPy:     0.248809
  NumPy@n20 : Psi4@n20 ratio (want ~1): 1.05
   Psi4@n1 : Psi4@n20 ratio (want ~20): 16.43
7f6653882000-7f66566d7000 r-xp 00000000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66566d7000-7f66568d6000 ---p 02e55000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66568d6000-7f66568dd000 r--p 02e54000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66568dd000-7f66568e6000 rw-p 02e5b000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66568e7000-7f665693c000 rw-p 03037000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f665c18e000-7f665c55b000 r-xp 00000000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f665c55b000-7f665c75b000 ---p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f665c75b000-7f665c761000 r--p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f665c761000-7f665c762000 rw-p 003d3000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f667047c000-7f6671ca7000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671ca7000-7f6671ea7000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671ea7000-7f6671eaf000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671eaf000-7f6671ed0000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671f23000-7f6671f74000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671f74000-7f66735a3000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66735a3000-7f66737a2000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66737a2000-7f66737a5000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66737a5000-7f667397c000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f6673982000-7f66739d9000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66739d9000-7f66741af000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66741af000-7f66743af000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66743af000-7f66743b0000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66743b0000-7f66743c1000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66743c6000-7f66743fb000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
  • Linkages of suspects
>>> ldd stage/usr/local/psi4/lib/psi4/core.so 
	linux-vdso.so.1 =>  (0x00007ffea93e5000)
	libpcm.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libpcm.so.1 (0x00007fbe1888f000)
	libxc.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libxc.so (0x00007fbe1851c000)
	libgdma.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libgdma.so (0x00007fbe18130000)
	libderiv.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libderiv.so (0x00007fbe15523000)
	libint.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libint.so (0x00007fbe148ab000)
	libdkh.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libdkh.so (0x00007fbe14590000)
	liberd.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/liberd.so (0x00007fbe14064000)
	libsimint.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libsimint.so (0x00007fbe11a68000)
	libefp.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libefp.so (0x00007fbe10f0d000)
	libmkl_rt.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_rt.so (0x00007fbe10928000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fbe106e6000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fbe103e4000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fbe101df000)
	libchemps2.so.2 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libchemps2.so.2 (0x00007fbe0f086000)
	libimf.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so (0x00007fbe0eb88000)
	libsvml.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so (0x00007fbe0dc7b000)
	libirng.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so (0x00007fbe0d909000)
	libstdc++.so.6 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libstdc++.so.6 (0x00007fbe0d57c000)
	libiomp5.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libiomp5.so (0x00007fbe0d1d1000)
	libgcc_s.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libgcc_s.so.1 (0x00007fbe0cfbb000)
	libintlc.so.5 => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5 (0x00007fbe0cd4f000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fbe0c98d000)
	libz.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/./libz.so.1 (0x00007fbe0c777000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbe21483000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fbe0c56e000)
	libhdf5.so.10 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/./libhdf5.so.10 (0x00007fbe0c0a9000)
	libhdf5_hl.so.10 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/./libhdf5_hl.so.10 (0x00007fbe0be8b000)
>>> ldd /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so 
	linux-vdso.so.1 =>  (0x00007ffceebb6000)
	libmkl_intel_lp64.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so (0x00007fe5de978000)
	libmkl_intel_thread.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so (0x00007fe5dcf13000)
	libmkl_core.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so (0x00007fe5db41a000)
	libiomp5.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so (0x00007fe5db070000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe5dae2e000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fe5dab2c000)
	libpython3.6m.so.1.0 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libpython3.6m.so.1.0 (0x00007fe5da624000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe5da263000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe5df768000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fe5da05e000)
	libgcc_s.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../.././libgcc_s.so.1 (0x00007fe5d9e48000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fe5d9c44000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fe5d9a3c000)
Member

loriab commented Jul 3, 2017

  • Threading fails (psithon, import psi4, then numpy)
>>> stage/usr/local/psi4/bin/psi4 thread.py
Time for threads  1, size   200: Psi4:     0.000577  NumPy:     0.000628
Time for threads  1, size   500: Psi4:     0.008359  NumPy:     0.008199
Time for threads  1, size  2000: Psi4:     0.502508  NumPy:     0.895125
Time for threads  1, size  4000: Psi4:     3.920520  NumPy:     3.914676
Time for threads 20, size   200: Psi4:     0.000609  NumPy:     0.000175
Time for threads 20, size   500: Psi4:     0.009030  NumPy:     0.001299
Time for threads 20, size  2000: Psi4:     0.569735  NumPy:     0.062893
Time for threads 20, size  4000: Psi4:     3.947274  NumPy:     0.440798
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.11
   Psi4@n1 : Psi4@n20 ratio (want ~20): 0.99
7f945dbfe000-7f9460a53000 r-xp 00000000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460a53000-7f9460c52000 ---p 02e55000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460c52000-7f9460c59000 r--p 02e54000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460c59000-7f9460c62000 rw-p 02e5b000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9460c63000-7f9460cb8000 rw-p 03037000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f9461156000-7f9461a4d000 r-xp 00000000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461a4d000-7f9461c4c000 ---p 008f7000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461c4c000-7f9461c4d000 r--p 008f6000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461c4d000-7f9461c61000 rw-p 008f7000 00:28 66346094                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
7f9461c66000-7f94631cd000 r-xp 00000000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94631cd000-7f94633cd000 ---p 01567000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94633cd000-7f94633d0000 r--p 01567000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94633d0000-7f946358c000 rw-p 0156a000 00:28 66346095                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
7f94638d6000-7f946506b000 r-xp 00000000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f946506b000-7f946526b000 ---p 01795000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f946526b000-7f9465273000 r--p 01795000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f9465273000-7f9465294000 rw-p 0179d000 00:28 66346091                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_core.so
7f946a231000-7f946ba5c000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946ba5c000-7f946bc5c000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bc5c000-7f946bc64000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bc64000-7f946bc85000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bcd8000-7f946bd29000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f946bd29000-7f946d358000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d358000-7f946d557000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d557000-7f946d55a000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d55a000-7f946d731000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d737000-7f946d78e000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f946d78e000-7f946df64000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946df64000-7f946e164000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946e164000-7f946e165000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946e165000-7f946e176000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f946e17b000-7f946e1b0000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f9472a4a000-7f9472e17000 r-xp 00000000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f9472e17000-7f9473017000 ---p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f9473017000-7f947301d000 r--p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f947301d000-7f947301e000 rw-p 003d3000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
(p4dev36) psilocaluser@bash:psinet:/home/psilocaluser/gits/hrw-lab/objdir36-2: (threadjune) vi stage/usr/local/psi4/bin/psi4 
  • Threading succeeds (Psithon, import numpy, then psi4)
>>> stage/usr/local/psi4/bin/psi4 thread.py
Time for threads  1, size   200: Psi4:     0.000583  NumPy:     0.000615
Time for threads  1, size   500: Psi4:     0.008012  NumPy:     0.008171
Time for threads  1, size  2000: Psi4:     0.753459  NumPy:     0.499194
Time for threads  1, size  4000: Psi4:     3.896844  NumPy:     3.914840
Time for threads 20, size   200: Psi4:     0.000059  NumPy:     0.000118
Time for threads 20, size   500: Psi4:     0.000570  NumPy:     0.000776
Time for threads 20, size  2000: Psi4:     0.033862  NumPy:     0.035520
Time for threads 20, size  4000: Psi4:     0.237118  NumPy:     0.248809
  NumPy@n20 : Psi4@n20 ratio (want ~1): 1.05
   Psi4@n1 : Psi4@n20 ratio (want ~20): 16.43
7f6653882000-7f66566d7000 r-xp 00000000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66566d7000-7f66568d6000 ---p 02e55000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66568d6000-7f66568dd000 r--p 02e54000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66568dd000-7f66568e6000 rw-p 02e5b000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f66568e7000-7f665693c000 rw-p 03037000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7f665c18e000-7f665c55b000 r-xp 00000000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f665c55b000-7f665c75b000 ---p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f665c75b000-7f665c761000 r--p 003cd000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f665c761000-7f665c762000 rw-p 003d3000 00:28 66346098                   /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64_lin/libmkl_rt.so
7f667047c000-7f6671ca7000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671ca7000-7f6671ea7000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671ea7000-7f6671eaf000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671eaf000-7f6671ed0000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671f23000-7f6671f74000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f6671f74000-7f66735a3000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66735a3000-7f66737a2000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66737a2000-7f66737a5000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66737a5000-7f667397c000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f6673982000-7f66739d9000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f66739d9000-7f66741af000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66741af000-7f66743af000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66743af000-7f66743b0000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66743b0000-7f66743c1000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f66743c6000-7f66743fb000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
  • Linkages of suspects
>>> ldd stage/usr/local/psi4/lib/psi4/core.so 
	linux-vdso.so.1 =>  (0x00007ffea93e5000)
	libpcm.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libpcm.so.1 (0x00007fbe1888f000)
	libxc.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libxc.so (0x00007fbe1851c000)
	libgdma.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libgdma.so (0x00007fbe18130000)
	libderiv.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libderiv.so (0x00007fbe15523000)
	libint.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libint.so (0x00007fbe148ab000)
	libdkh.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libdkh.so (0x00007fbe14590000)
	liberd.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/liberd.so (0x00007fbe14064000)
	libsimint.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libsimint.so (0x00007fbe11a68000)
	libefp.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libefp.so (0x00007fbe10f0d000)
	libmkl_rt.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_rt.so (0x00007fbe10928000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fbe106e6000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fbe103e4000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fbe101df000)
	libchemps2.so.2 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libchemps2.so.2 (0x00007fbe0f086000)
	libimf.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so (0x00007fbe0eb88000)
	libsvml.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so (0x00007fbe0dc7b000)
	libirng.so => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so (0x00007fbe0d909000)
	libstdc++.so.6 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libstdc++.so.6 (0x00007fbe0d57c000)
	libiomp5.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libiomp5.so (0x00007fbe0d1d1000)
	libgcc_s.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libgcc_s.so.1 (0x00007fbe0cfbb000)
	libintlc.so.5 => /theoryfs2/common/software/intel2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5 (0x00007fbe0cd4f000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fbe0c98d000)
	libz.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/./libz.so.1 (0x00007fbe0c777000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbe21483000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fbe0c56e000)
	libhdf5.so.10 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/./libhdf5.so.10 (0x00007fbe0c0a9000)
	libhdf5_hl.so.10 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/./libhdf5_hl.so.10 (0x00007fbe0be8b000)
>>> ldd /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so 
	linux-vdso.so.1 =>  (0x00007ffceebb6000)
	libmkl_intel_lp64.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so (0x00007fe5de978000)
	libmkl_intel_thread.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so (0x00007fe5dcf13000)
	libmkl_core.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so (0x00007fe5db41a000)
	libiomp5.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so (0x00007fe5db070000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe5dae2e000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fe5dab2c000)
	libpython3.6m.so.1.0 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libpython3.6m.so.1.0 (0x00007fe5da624000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe5da263000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe5df768000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fe5da05e000)
	libgcc_s.so.1 => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../.././libgcc_s.so.1 (0x00007fe5d9e48000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fe5d9c44000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fe5d9a3c000)
@loriab

This comment has been minimized.

Show comment
Hide comment
@loriab

loriab Jul 5, 2017

Member
  • Case A, the MKL trio
    • NumPy from defaults linked to the MKL trio
    • Psi4 forced to linked to MKL trio dynamically
    • Result: n before p threads, p before n doesn't thread. p before n never loads libmkl_avx2.so
(p4dev36) objdir-conda >>> ldd stage/usr/local/psi4/lib/psi4/core.so | grep mkl
	libmkl_intel_lp64.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so (0x00007f1379f68000)
	libmkl_intel_thread.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so (0x00007f1378503000)
	libmkl_core.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so (0x00007f1376a0b000)

(p4dev36) >>> ldd ~/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so | grep mkl
	libmkl_intel_lp64.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so (0x00007fb96e623000)
	libmkl_intel_thread.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so (0x00007fb96cbbe000)
	libmkl_core.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so (0x00007fb96b0c5000)

(p4dev36) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
import numpy as np
import psi4

# bad psiapi
#import psi4
#import numpy as np

(p4dev36) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-conda/stage/usr/local/psi4/lib python thread.py
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000772  NumPy:     0.000621
Time for threads  1, size   500: Psi4:     0.008650  NumPy:     0.009042
Time for threads  1, size  2000: Psi4:     0.839143  NumPy:     0.508247
Time for threads  1, size  4000: Psi4:     3.970799  NumPy:     4.251713
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000180  NumPy:     0.000201
Time for threads 20, size   500: Psi4:     0.001111  NumPy:     0.001470
Time for threads 20, size  2000: Psi4:     0.065655  NumPy:     0.064244
Time for threads 20, size  4000: Psi4:     0.520653  NumPy:     0.495231
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.95
   Psi4@n1 : Psi4@n20 ratio (want ~20): 7.63
7ff581622000-7ff584477000 r-xp 00000000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff584477000-7ff584676000 ---p 02e55000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff584676000-7ff58467d000 r--p 02e54000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff58467d000-7ff584686000 rw-p 02e5b000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff584687000-7ff5846dc000 rw-p 03037000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff59c365000-7ff59db90000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59db90000-7ff59dd90000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59dd90000-7ff59dd98000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59dd98000-7ff59ddb9000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59de0c000-7ff59de5d000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59de5d000-7ff59f48c000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f48c000-7ff59f68b000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f68b000-7ff59f68e000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f68e000-7ff59f865000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f86b000-7ff59f8c2000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f8c2000-7ff5a0098000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a0098000-7ff5a0298000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a0298000-7ff5a0299000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a0299000-7ff5a02aa000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a02af000-7ff5a02e4000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so

(p4dev36) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
#import numpy as np
#import psi4

# bad psiapi
import psi4
import numpy as np

(p4dev36) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-conda/stage/usr/local/psi4/lib python thread.py
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000577  NumPy:     0.000640
Time for threads  1, size   500: Psi4:     0.009111  NumPy:     0.009104
Time for threads  1, size  2000: Psi4:     0.509619  NumPy:     0.516402
Time for threads  1, size  4000: Psi4:     3.993908  NumPy:     3.988628
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000613  NumPy:     0.000677
Time for threads 20, size   500: Psi4:     0.008511  NumPy:     0.010932
Time for threads 20, size  2000: Psi4:     0.507125  NumPy:     0.512660
Time for threads 20, size  4000: Psi4:     3.970990  NumPy:     3.986875
  NumPy@n20 : Psi4@n20 ratio (want ~1): 1.00
   Psi4@n1 : Psi4@n20 ratio (want ~20): 1.01
7f849dadd000-7f849f308000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f308000-7f849f508000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f508000-7f849f510000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f510000-7f849f531000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f584000-7f849f5d5000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f5d5000-7f84a0c04000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0c04000-7f84a0e03000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0e03000-7f84a0e06000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0e06000-7f84a0fdd000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0fe3000-7f84a103a000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a103a000-7f84a1810000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1810000-7f84a1a10000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1a10000-7f84a1a11000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1a11000-7f84a1a22000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1a27000-7f84a1a5c000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
  • Case B, the MKL RT
    • NumPy from intel channel linked to mkl_rt.so
    • Psi4 linked to MKL runtime dynamically
    • Result: Both n before p and p before n thread, and both load the same libraries.
(idp3) objdir-idp3-4: >>> ldd stage/usr/local/psi4/lib/psi4/core.so | grep mkl
	libmkl_rt.so => /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so (0x00007f3db8ab8000)

(idp3) >>> ldd ~/miniconda3/envs/idp3/lib/python3.5/site-packages/numpy/core/multiarray.cpython-35m-x86_64-linux-gnu.so | grep mkl
	libmkl_rt.so => /home/psilocaluser/miniconda3/envs/idp3/lib/python3.5/site-packages/numpy/core/../../../../libmkl_rt.so (0x00007fa3164df000)

(idp3) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
import numpy as np
import psi4

# bad psiapi
#import psi4
#import numpy as np

(idp3) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-idp3-4/stage/usr/local/psi4/lib python thread.py 
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000581  NumPy:     0.000603
Time for threads  1, size   500: Psi4:     0.008048  NumPy:     0.008170
Time for threads  1, size  2000: Psi4:     1.063873  NumPy:     0.500018
Time for threads  1, size  4000: Psi4:     3.915369  NumPy:     3.934089
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000091  NumPy:     0.000158
Time for threads 20, size   500: Psi4:     0.001002  NumPy:     0.001107
Time for threads 20, size  2000: Psi4:     0.032268  NumPy:     0.036324
Time for threads 20, size  4000: Psi4:     0.334342  NumPy:     0.259047
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.77
   Psi4@n1 : Psi4@n20 ratio (want ~20): 11.71
7f317948a000-7f317c339000 r-xp 00000000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c339000-7f317c539000 ---p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c539000-7f317c540000 r--p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c540000-7f317c54a000 rw-p 02eb6000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c554000-7f317cd73000 r-xp 00000000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cd73000-7f317cf73000 ---p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cf73000-7f317cf74000 r--p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cf74000-7f317cf85000 rw-p 00820000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cf8b000-7f317e667000 r-xp 00000000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317e667000-7f317e866000 ---p 016dc000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317e866000-7f317e869000 r--p 016db000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317e869000-7f317ea49000 rw-p 016de000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317ea50000-7f3180303000 r-xp 00000000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f3180303000-7f3180502000 ---p 018b3000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f3180502000-7f318050a000 r--p 018b2000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f318050a000-7f318052b000 rw-p 018ba000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f318f3bf000-7f318f7b7000 r-xp 00000000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7f318f7b7000-7f318f9b7000 ---p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7f318f9b7000-7f318f9bd000 r--p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7f318f9bd000-7f318f9be000 rw-p 003fe000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so

(idp3) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
#import numpy as np
#import psi4

# bad psiapi
import psi4
import numpy as np

(idp3) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-idp3-4/stage/usr/local/psi4/lib python thread.py 
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000568  NumPy:     0.000595
Time for threads  1, size   500: Psi4:     0.008035  NumPy:     0.008161
Time for threads  1, size  2000: Psi4:     1.103378  NumPy:     0.499456
Time for threads  1, size  4000: Psi4:     3.909842  NumPy:     3.929370
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000057  NumPy:     0.000126
Time for threads 20, size   500: Psi4:     0.000521  NumPy:     0.000836
Time for threads 20, size  2000: Psi4:     0.033082  NumPy:     0.036403
Time for threads 20, size  4000: Psi4:     0.343739  NumPy:     0.297706
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.87
   Psi4@n1 : Psi4@n20 ratio (want ~20): 11.37
7efc9618a000-7efc99039000 r-xp 00000000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99039000-7efc99239000 ---p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99239000-7efc99240000 r--p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99240000-7efc9924a000 rw-p 02eb6000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99254000-7efc99a73000 r-xp 00000000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99a73000-7efc99c73000 ---p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99c73000-7efc99c74000 r--p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99c74000-7efc99c85000 rw-p 00820000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99c8b000-7efc9b367000 r-xp 00000000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b367000-7efc9b566000 ---p 016dc000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b566000-7efc9b569000 r--p 016db000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b569000-7efc9b749000 rw-p 016de000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b750000-7efc9d003000 r-xp 00000000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efc9d003000-7efc9d202000 ---p 018b3000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efc9d202000-7efc9d20a000 r--p 018b2000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efc9d20a000-7efc9d22b000 rw-p 018ba000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efca5e25000-7efca621d000 r-xp 00000000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7efca621d000-7efca641d000 ---p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7efca641d000-7efca6423000 r--p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7efca6423000-7efca6424000 rw-p 003fe000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
  • Conclusions
    • If we don't want to enforce a loading order of numpy and psi4, must use intel channel numpy
    • When I was only using MKL headers from intel channel, lack of py36 didn't matter. But NumPy has to be compiled for a certain py version
    • Ordinary psi4 core.so 37 MB. One compiled for avx2, avx, w/default sse4.1 is 53 MB. Not bad
Member

loriab commented Jul 5, 2017

  • Case A, the MKL trio
    • NumPy from defaults linked to the MKL trio
    • Psi4 forced to linked to MKL trio dynamically
    • Result: n before p threads, p before n doesn't thread. p before n never loads libmkl_avx2.so
(p4dev36) objdir-conda >>> ldd stage/usr/local/psi4/lib/psi4/core.so | grep mkl
	libmkl_intel_lp64.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so (0x00007f1379f68000)
	libmkl_intel_thread.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so (0x00007f1378503000)
	libmkl_core.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so (0x00007f1376a0b000)

(p4dev36) >>> ldd ~/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so | grep mkl
	libmkl_intel_lp64.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so (0x00007fb96e623000)
	libmkl_intel_thread.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so (0x00007fb96cbbe000)
	libmkl_core.so => /home/psilocaluser/miniconda3/envs/p4dev36/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so (0x00007fb96b0c5000)

(p4dev36) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
import numpy as np
import psi4

# bad psiapi
#import psi4
#import numpy as np

(p4dev36) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-conda/stage/usr/local/psi4/lib python thread.py
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000772  NumPy:     0.000621
Time for threads  1, size   500: Psi4:     0.008650  NumPy:     0.009042
Time for threads  1, size  2000: Psi4:     0.839143  NumPy:     0.508247
Time for threads  1, size  4000: Psi4:     3.970799  NumPy:     4.251713
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000180  NumPy:     0.000201
Time for threads 20, size   500: Psi4:     0.001111  NumPy:     0.001470
Time for threads 20, size  2000: Psi4:     0.065655  NumPy:     0.064244
Time for threads 20, size  4000: Psi4:     0.520653  NumPy:     0.495231
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.95
   Psi4@n1 : Psi4@n20 ratio (want ~20): 7.63
7ff581622000-7ff584477000 r-xp 00000000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff584477000-7ff584676000 ---p 02e55000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff584676000-7ff58467d000 r--p 02e54000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff58467d000-7ff584686000 rw-p 02e5b000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff584687000-7ff5846dc000 rw-p 03037000 fd:02 203799109                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_avx2.so
7ff59c365000-7ff59db90000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59db90000-7ff59dd90000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59dd90000-7ff59dd98000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59dd98000-7ff59ddb9000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59de0c000-7ff59de5d000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7ff59de5d000-7ff59f48c000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f48c000-7ff59f68b000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f68b000-7ff59f68e000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f68e000-7ff59f865000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f86b000-7ff59f8c2000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7ff59f8c2000-7ff5a0098000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a0098000-7ff5a0298000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a0298000-7ff5a0299000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a0299000-7ff5a02aa000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7ff5a02af000-7ff5a02e4000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so

(p4dev36) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
#import numpy as np
#import psi4

# bad psiapi
import psi4
import numpy as np

(p4dev36) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-conda/stage/usr/local/psi4/lib python thread.py
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000577  NumPy:     0.000640
Time for threads  1, size   500: Psi4:     0.009111  NumPy:     0.009104
Time for threads  1, size  2000: Psi4:     0.509619  NumPy:     0.516402
Time for threads  1, size  4000: Psi4:     3.993908  NumPy:     3.988628
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000613  NumPy:     0.000677
Time for threads 20, size   500: Psi4:     0.008511  NumPy:     0.010932
Time for threads 20, size  2000: Psi4:     0.507125  NumPy:     0.512660
Time for threads 20, size  4000: Psi4:     3.970990  NumPy:     3.986875
  NumPy@n20 : Psi4@n20 ratio (want ~1): 1.00
   Psi4@n1 : Psi4@n20 ratio (want ~20): 1.01
7f849dadd000-7f849f308000 r-xp 00000000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f308000-7f849f508000 ---p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f508000-7f849f510000 r--p 0182b000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f510000-7f849f531000 rw-p 01833000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f584000-7f849f5d5000 rw-p 01a3d000 fd:02 203799112                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_core.so
7f849f5d5000-7f84a0c04000 r-xp 00000000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0c04000-7f84a0e03000 ---p 0162f000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0e03000-7f84a0e06000 r--p 0162e000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0e06000-7f84a0fdd000 rw-p 01631000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a0fe3000-7f84a103a000 rw-p 01a8b000 fd:02 203799116                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_thread.so
7f84a103a000-7f84a1810000 r-xp 00000000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1810000-7f84a1a10000 ---p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1a10000-7f84a1a11000 r--p 007d6000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1a11000-7f84a1a22000 rw-p 007d7000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
7f84a1a27000-7f84a1a5c000 rw-p 008ff000 fd:02 203799115                  /home/psilocaluser/miniconda3/envs/p4dev36/lib/libmkl_intel_lp64.so
  • Case B, the MKL RT
    • NumPy from intel channel linked to mkl_rt.so
    • Psi4 linked to MKL runtime dynamically
    • Result: Both n before p and p before n thread, and both load the same libraries.
(idp3) objdir-idp3-4: >>> ldd stage/usr/local/psi4/lib/psi4/core.so | grep mkl
	libmkl_rt.so => /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so (0x00007f3db8ab8000)

(idp3) >>> ldd ~/miniconda3/envs/idp3/lib/python3.5/site-packages/numpy/core/multiarray.cpython-35m-x86_64-linux-gnu.so | grep mkl
	libmkl_rt.so => /home/psilocaluser/miniconda3/envs/idp3/lib/python3.5/site-packages/numpy/core/../../../../libmkl_rt.so (0x00007fa3164df000)

(idp3) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
import numpy as np
import psi4

# bad psiapi
#import psi4
#import numpy as np

(idp3) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-idp3-4/stage/usr/local/psi4/lib python thread.py 
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000581  NumPy:     0.000603
Time for threads  1, size   500: Psi4:     0.008048  NumPy:     0.008170
Time for threads  1, size  2000: Psi4:     1.063873  NumPy:     0.500018
Time for threads  1, size  4000: Psi4:     3.915369  NumPy:     3.934089
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000091  NumPy:     0.000158
Time for threads 20, size   500: Psi4:     0.001002  NumPy:     0.001107
Time for threads 20, size  2000: Psi4:     0.032268  NumPy:     0.036324
Time for threads 20, size  4000: Psi4:     0.334342  NumPy:     0.259047
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.77
   Psi4@n1 : Psi4@n20 ratio (want ~20): 11.71
7f317948a000-7f317c339000 r-xp 00000000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c339000-7f317c539000 ---p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c539000-7f317c540000 r--p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c540000-7f317c54a000 rw-p 02eb6000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7f317c554000-7f317cd73000 r-xp 00000000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cd73000-7f317cf73000 ---p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cf73000-7f317cf74000 r--p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cf74000-7f317cf85000 rw-p 00820000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7f317cf8b000-7f317e667000 r-xp 00000000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317e667000-7f317e866000 ---p 016dc000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317e866000-7f317e869000 r--p 016db000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317e869000-7f317ea49000 rw-p 016de000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7f317ea50000-7f3180303000 r-xp 00000000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f3180303000-7f3180502000 ---p 018b3000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f3180502000-7f318050a000 r--p 018b2000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f318050a000-7f318052b000 rw-p 018ba000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7f318f3bf000-7f318f7b7000 r-xp 00000000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7f318f7b7000-7f318f9b7000 ---p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7f318f9b7000-7f318f9bd000 r--p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7f318f9bd000-7f318f9be000 rw-p 003fe000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so

(idp3) >>> head -13 thread.py 
import os
import time

# none for psithon

# good psiapi
#import numpy as np
#import psi4

# bad psiapi
import psi4
import numpy as np

(idp3) >>> PYTHONPATH=/home/psilocaluser/gits/hrw-lab/objdir-idp3-4/stage/usr/local/psi4/lib python thread.py 
  Threads set to 1 by Python driver.
Time for threads  1, size   200: Psi4:     0.000568  NumPy:     0.000595
Time for threads  1, size   500: Psi4:     0.008035  NumPy:     0.008161
Time for threads  1, size  2000: Psi4:     1.103378  NumPy:     0.499456
Time for threads  1, size  4000: Psi4:     3.909842  NumPy:     3.929370
  Threads set to 20 by Python driver.
Time for threads 20, size   200: Psi4:     0.000057  NumPy:     0.000126
Time for threads 20, size   500: Psi4:     0.000521  NumPy:     0.000836
Time for threads 20, size  2000: Psi4:     0.033082  NumPy:     0.036403
Time for threads 20, size  4000: Psi4:     0.343739  NumPy:     0.297706
  NumPy@n20 : Psi4@n20 ratio (want ~1): 0.87
   Psi4@n1 : Psi4@n20 ratio (want ~20): 11.37
7efc9618a000-7efc99039000 r-xp 00000000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99039000-7efc99239000 ---p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99239000-7efc99240000 r--p 02eaf000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99240000-7efc9924a000 rw-p 02eb6000 fd:02 252025838                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_avx2.so
7efc99254000-7efc99a73000 r-xp 00000000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99a73000-7efc99c73000 ---p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99c73000-7efc99c74000 r--p 0081f000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99c74000-7efc99c85000 rw-p 00820000 fd:02 251997410                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_lp64.so
7efc99c8b000-7efc9b367000 r-xp 00000000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b367000-7efc9b566000 ---p 016dc000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b566000-7efc9b569000 r--p 016db000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b569000-7efc9b749000 rw-p 016de000 fd:02 252025832                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_intel_thread.so
7efc9b750000-7efc9d003000 r-xp 00000000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efc9d003000-7efc9d202000 ---p 018b3000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efc9d202000-7efc9d20a000 r--p 018b2000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efc9d20a000-7efc9d22b000 rw-p 018ba000 fd:02 252025831                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_core.so
7efca5e25000-7efca621d000 r-xp 00000000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7efca621d000-7efca641d000 ---p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7efca641d000-7efca6423000 r--p 003f8000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
7efca6423000-7efca6424000 rw-p 003fe000 fd:02 252003863                  /home/psilocaluser/miniconda3/envs/idp3/lib/libmkl_rt.so
  • Conclusions
    • If we don't want to enforce a loading order of numpy and psi4, must use intel channel numpy
    • When I was only using MKL headers from intel channel, lack of py36 didn't matter. But NumPy has to be compiled for a certain py version
    • Ordinary psi4 core.so 37 MB. One compiled for avx2, avx, w/default sse4.1 is 53 MB. Not bad
@loriab

This comment has been minimized.

Show comment
Hide comment
@loriab

loriab Apr 13, 2018

Member

See also #748

Member

loriab commented Apr 13, 2018

See also #748

@loriab loriab closed this Apr 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment