# Multiprocessing with Powerbox

If we want to run powerbox in parallel via multiprocessing, it's important to control the number of threads used during the FFT operation. The simplest way of dealing with this is to uninstall `pyFFTW` from the environment you use to run `powerbox`, that way only the single-threaded `numpy` `FFT` routine is used. However, if you want to keep `pyFFTW` installed in your environment, this can be done in two ways: (i) Use `pyFFTW` while manually setting the number of threads to 1. (ii) Revert to using the single-threaded `numpy` implementation of `FFT`. In this tutorial, we demonstrate how to implement these two methods using the `nthreads` flag.

Let's import the config file and check the defaults. The default number of threads is the number of CPUs available. 

In [1]:
import powerbox as pb
pb.__version__

'0.7.4.dev28+gc727b58.d20240325'

First, let's define a simple `powerbox` operation to test. This function calculates a power spectrum on a random box of dimension $300^3$ and returns the computation time.

In [2]:
from powerbox import get_power
import numpy as np 
from time import time
from multiprocessing import Pool

def run_pb(an_arg):
    shape = (200,200,200) # Size of one chunk
    arr = np.random.rand(np.prod(shape)).reshape(shape) * an_arg
    t0 = time()
    out = get_power(arr, (300,300,300),
                    bins = 50)                                      # default is nthreads = None which uses nthreads = number of available CPUs.
    print('Done: ', np.round(time() - t0,2), flush = True)
    return time() - t0

start = time()
ncalls = 10
all_times = []
for i in range(ncalls):
    all_times.append(run_pb(i))
print('Single iteration time:', np.round(np.mean(all_times),2),'s')

Done:  0.91
Done:  0.88
Done:  0.89
Done:  0.96
Done:  0.9
Done:  0.88
Done:  0.93
Done:  0.9
Done:  0.93
Done:  0.91
Single iteration time: 0.91 s


Now, let's run the same function but with multiple processes, while leaving the number of threads to the default value.
This will be very slow because the threads are already using all of the CPUs, which are now also given separate processes on top of that.

In [None]:
nprocs = 8
ncalls = 8
start = time()
p = Pool(processes=nprocs)
for i in range(ncalls):
    p.apply_async(run_pb, args = (i,))
p.close()
p.join()
print('Total:', np.round(time() - start,2),'s')
# Runs for > 3 mins

KeyboardInterrupt: 

## First solution: Setting the number of `pyFFTW` threads to 1.

In [7]:
def run_pb(an_arg):
    shape = (200,200,200) # Size of one chunk
    arr = np.random.rand(np.prod(shape)).reshape(shape) * an_arg
    t0 = time()
    out = get_power(arr, (300,300,300),
                    bins = 50,
                    nthreads = 1)                                   # Set number of threads
    print('Done: ', np.round(time() - t0,2), flush = True)
    return time() - t0

With a single `pyFFWT` thread and multiprocessing, this calculation takes about 11s:

In [8]:
nprocs = 4
ncalls = 10
start = time()
p = Pool(processes=nprocs)
for i in range(ncalls):
    p.apply_async(run_pb, args = (i,))
p.close()
p.join()
print('Single iteration time:', np.round((time() - start)/ncalls,2),'s')

Done:  3.45
Done:  3.63
Done:  Done:  3.58
3.69
Done:  3.25
Done:  3.19
Done:  3.19
Done:  3.38
Done:  1.58
Done:  1.58
Single iteration time: 0.89 s


## Second solution: Using `numpy` instead of `pyFFTW`

We can also just use the `numpy` FFT routine instead of setting the number of threads in `pyFFTW` to 1.

In [9]:
def run_pb(an_arg):
    shape = (200,200,200) # Size of one chunk
    arr = np.random.rand(np.prod(shape)).reshape(shape) * an_arg
    t0 = time()
    out = get_power(arr, (300,300,300),
                    bins = 50,
                    nthreads = False)                               # Setting nthreads = False disables the use of pyFFTW
    print('Done: ', np.round(time() - t0,2), flush = True)
    return time() - t0

In [10]:
nprocs = 4
ncalls = 10
start = time()
p = Pool(processes=nprocs)
for i in range(ncalls):
    p.apply_async(run_pb, args = (i,))
p.close()
p.join()
print('Single iteration time:', np.round((time() - start)/ncalls,2),'s')

Done:  3.12
Done:  3.25
Done:  3.26
Done:  3.4
Done:  3.14
Done:  3.24
Done:  3.07
Done:  3.38
Done:  1.97
Done:  1.88
Single iteration time: 0.89 s


The runtime is roughly the same whether we use `numpy` or single-threaded `pyFFTW`.