# Multiprocessing with Powerbox

There are two ways to parallelize the FFT calculations in `powerbox`. If you have `pyfftw` installed, you can take advantage of the multithreaded FFT computations that it offers (numpy does not support this), simply by setting `nthreads` to a number greater than one. However, if you would like to run many FFT's simultaneously, you may wish to parallelize on a higher level, i.e. run each FFT on a different process (using `multiprocessing` or `mpi` or similar). In this case, it is important that the underlying FFT library use only a single thread, or you will get **VERY SLOW** computation times because the two layers of threads don't communicate well. In this notebook, we show how to use both multiple threads via `pyfftw` and also multiple processes with either the `numpy` or `pyfftw` backends.

In [1]:
import powerbox as pb
pb.__version__

'0.7.4.dev19+g811310f'

In [2]:
from powerbox import get_power
import numpy as np 
from time import time
from multiprocessing import Pool
from functools import partial

First, let's define a simple `powerbox` operation to test. This function calculates a power spectrum on a random box of dimension $300^3$ and returns the computation time.

In [3]:
shape = (256,) * 3  # Size of the box to FT
arr = np.random.rand(np.prod(shape)).reshape(shape) # Random box on which to calculate the FFT
ncalls = 4
nthreads = 4

def run_pb(idx, **kwargs):
    t0 = time()
    # default is nthreads = None which uses nthreads = number of available CPUs.
    get_power(arr, shape, bins = 50, **kwargs)          
    return time() - t0

### Single-thread using `pyFFTW`

In [5]:
start = time()
all_times = [run_pb(i, nthreads=1) for i in range(ncalls)]
end = time()
print(f'Total wall time: {end - start:.2f} sec')
print(f"Total CPU time: {np.sum(all_times):.2f} sec")

Total wall time: 12.31 sec
Total CPU time: 12.31 sec


### Multi-threaded `pyFFTW`

In [6]:
start = time()
all_times = [run_pb(i, nthreads=nthreads) for i in range(ncalls)]
end = time()
print(f'Total wall time: {end - start:.2f} sec')
print(f"Total CPU time: {np.sum(all_times):.2f} sec")

Total wall time: 9.86 sec
Total CPU time: 9.86 sec


Here, we see that if `pyFFTW` is installed, it can use multiple threads to compute the FFTs, reducing walltime by ~20%. 
This is the fastest way to compute the power spectrum in `powerbox` if you have multiple cores available, and only one FFT to perform.

## Multiprocessing with `pyFFTW` as a backend

We can keep using `pyFFTW` as a backend by setting the `nthreads` argument to 1.

In [9]:
nprocs = ncalls

run_pb1 = partial(run_pb, nthreads=1)

start = time()
p = Pool(processes=nprocs)
all_times = p.map(run_pb1, range(ncalls))
end = time()
print(f'Total wall time: {end - start:.2f} sec')
print(f"Total CPU time: {np.sum(all_times):.2f} sec")

Total wall time: 5.01 sec
Total CPU time: 16.77 sec


Here, the total wall time is reduced by ~50% because we are doing each of the 4 FFTs 
in parallel. Note that here there is significant overhead in starting the processes,
which leads to the meager gains.

## Multiprocessing with `numpy` as a backend

We can also just use the `numpy` FFT backend by setting `nthreads` to `False`.

In [10]:
run_pb1 = partial(run_pb, nthreads=False)

start = time()
p = Pool(processes=nprocs)
all_times = p.map(run_pb1, range(ncalls))
end = time()
print(f'Total wall time: {end - start:.2f} sec')
print(f"Total CPU time: {np.sum(all_times):.2f} sec")

Total wall time: 4.70 sec
Total CPU time: 15.61 sec


The runtime is roughly the same whether we use `numpy` or single-threaded `pyFFTW`.