Multiprocessing slows down matrix multiplication (strange interaction with MKL)

Summary: Multiprocessing slows down completely independent matrix multiplication.

It seems multiprocessing makes completely independent numpy computation processes slower, even with the number of OpenMP threads controlled.

Test code example:

```python
import time
import multiprocessing as mp

import numpy as np
import mkl

def work(n=5000, tries=5):
    """Compute the product of two random nxn matrices. Return the average timing of [tries] runs."""
    np.random.seed()
    timings = []
    for i in range(tries):
        start = time.time()
        a = np.random.rand(n, n)
        b = np.random.rand(n, n)
        res = np.sum(a.dot(b))
        stop = time.time()
        timings.append(stop - start)
    return np.mean(timings)


num_cores = mp.cpu_count()
print("# Cores: {}".format(num_cores))

# Print table header by hand
tab = "\t\t"
print("procs", end=tab)
for i in range(1, num_cores + 1):
    print("{}".format(i), end=tab)
print("")
print("threads")

# Try all combinations of threads/processes
for threads in range(1, num_cores + 1):
    mkl.set_num_threads(threads)
    print("set {} get {}".format(threads, mkl.get_max_threads()), end=tab)

    for procs in range(1, num_cores + 1):
        pool = mp.Pool()
        jobs = [() for _ in range(procs)]
        print("{:0.4f}".format(np.mean(pool.starmap(work, jobs))), end=tab)
    print("")
```

Expectation: when thread is set to 1, running 4 processes on 4 cores should take roughly the same time as running 1 processes on 1 core. 

Result on a laptop with 4 processors (Intel i7-6600U, 2 physical cores) and 16GB memory:

    # Cores: 4
    procs           1               2               3               4
    threads
    set 1 get 1     6.4352          7.9569          11.3162         18.4924
    set 2 get 2     4.3469          9.6891          14.3225         18.6104
    set 3 get 2     4.6828          8.5717          13.1268         16.9085
    set 4 get 2     4.1497          8.6765          13.4662         17.2045

The observed number of processes and cores used (from `htop`) agrees with the reported numbers. For example, for 2 procs and 2 threads, `htop` reports 2 processes using ~200% of CPU and all 4 cores have ~100% load. Yet even with 1 thread, adding independent processes slows down the code substantially. The computation here is not memory bound and there is no communication between processes.

System info:
```
$ uname -a
Linux xxx 4.13.12-1-ARCH #1 SMP PREEMPT Wed Nov 8 11:54:06 CET 2017 x86_64 GNU/Linux

$ python --version
Python 3.6.2 :: Anaconda, Inc.



In[9]: numpy.distutils.system_info.get_info("mkl")
Out[9]: 
{'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)],
 'include_dirs': ['/home/larry/programs/anaconda3/include'],
 'libraries': ['mkl_rt', 'pthread'],
 'library_dirs': ['/home/larry/programs/anaconda3/lib']}
In [10]: numpy.version.full_version
Out[10]: '1.13.1'
```

Related reports:

https://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy
https://stackoverflow.com/questions/15414027/multiprocessing-pool-makes-numpy-matrix-multiplication-slower
https://stackoverflow.com/questions/26258728/parallel-processing-with-multiprocessing-is-slower-than-sequential
https://stackoverflow.com/questions/47380366/dramatic-slow-down-using-multiprocess-and-numpy-in-python

I tried all the "solutions" mentioned above. None seems to be relevant. I can confirm that the affinity is correct and all cores are utilized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiprocessing slows down matrix multiplication (strange interaction with MKL) #10145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Multiprocessing slows down matrix multiplication (strange interaction with MKL) #10145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions