-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with multiprocessing and numpy.fft #8140
Comments
Is that with 1.11.x or current master? Could be related to gh-7712 |
Will have a look at master. Currently running 1.11.2. |
I still get the same problem on |
what kernel version are you using? note cpu_count is not the right method to get the usable cpus, that is os.sched_getaffinity(0) which respects cpu affinity. |
just checked it on a 16 core machine just using np.ones instead of a fft on a 4.4 kernel, the lock used for memory mapping gets heavily contended with higher number of processes and contention explodes when oversubscribing. |
Thanks for the hints @juliantaylor. I believe the server has 48 physical cores but will check with the person who set it up. Running Assuming you were referring to linux kernel version, I get the following from running I will move the memory allocation into the initializer of the multiprocessing call, reduce the allocated memory and use a larger Update: the CPU configuration in
|
It turns out the memory allocation was indeed the problem. Changing the script to allocate the memory ahead of time eliminates the kernel wait (using 1.11.2 and master). Closing this issue because my underlying problem probably has nothing to do with numpy but rather with the way I allocate memory. Thanks for the help, @juliantaylor and @rgommers. |
the len() of that set is the number of available cpus, consider |
The memory allocation contention is still an issue for us when computing spectrograms in different processes on the same machine. I've done a bit more digging and created a minimal reproducible example here. I'm surprised that calculating spectrograms is so intensive in terms of memory allocation and would have assumed that it is limited by the compute power of the machine. Do you think it is possible to make improvements in terms of memory allocation or are we facing a fundamental limit here? In particular, running a single process with
Doing the same when I'm running 64 processes gives me.
The above experiments were run in a docker container on a linux host. Here are the details:
Whilst I run the experiments in a container to make it easier to reproduce, running the code outside the container has the same problem. |
Hmm, that's strange. We do cache some things, but I cannot see how that would cause trouble, not sure how we allocate memory otherwise. Does scipy fft have the same problem? |
I would like to compute a set of ffts in parallel using
numpy.fft.fft
andmultiprocessing
. Unfortunately, running the ffts in parallel results in a large kernel load.Here is a minimal example that reproduces the problem:
Running
time python fft_test.py
gives me the following results:Running with a single core, i.e.
python fft_test.py -s
givesI thought that using
spawn
rather thanfork
might resolve the problem but I had no luck. Any idea what might cause the large kernel wait?I originally posted this issue on stackoverflow but realised this may be a more appropriate place.
The text was updated successfully, but these errors were encountered: