# Multiprocessing in Python
What do you do if you want to run a Python code in parallel and utilize all cores? There is a `threading` package in Python, but turns out for some reason (something related to GlobalInterpreterLock) it cannot utilize multiple cores (and even multiple CPUs, I suppose)

Creating multiple processes instead of threads is the way to go here. Creating processes is more expensive than threads but once you created it, they are basically the same in terms of speed.

In Python, `multiprocessing` package does that for you.

In [1]:
import multiprocessing as mp
import time # For measuring speedup

I know two ways of spawning processes:
## 1. Doing it manually
In this case, you should
1. Create as many processes as you need.
2. assign a function to the processes.
2. Run them explicitly.
3. Wait for them to finish.

Let's see an example.

We first define an expensive to compute function and some input values and constants.

In [2]:
def factorial(n):
    if n == 0:
        return 1
    res = 1
    for i in range(1, n+1):
        res += i
    print(res)
    return res

NUM_PROCESSES = 10
INPUTS = [10000000+i for i in range(NUM_PROCESSES)]

Now let's run the function on the set of inputs, sequentially.

In [3]:
t_0 = time.time()

for arg in INPUTS:
    factorial(arg)

print('elapsed time = {}'.format(time.time() - t_0))

50000005000001
50000015000002
50000025000004
50000035000007
50000045000011
50000055000016
50000065000022
50000075000029
50000085000037
50000095000046
elapsed time = 7.363128185272217


Let's try the same by making a process for each input and run them in parallel.

In [4]:
from multiprocessing import Process
    
t_0 = time.time()
process_list = []
for arg in INPUTS:
    # Steps 1 and 2
    process = Process(target = factorial, 
                     args = (arg, ))
    # Step 3
    process.start()
    process_list.append(process)
    
for process in process_list:
    # Step 4
    process.join()
    
print('elapsed time = {}'.format(time.time() - t_0))

50000025000004
50000015000002
50000005000001
50000045000011
50000035000007
50000085000037
50000055000016
50000095000046
50000065000022
50000075000029
elapsed time = 5.464713096618652


## 2. Let it figure it out
What if you couldn't split the whole task into a few almost perfectly same sized tasks, i.e. a set of tasks that each can be done in almost the same time as the others? In this case, you can use `Pool`. You just need to
- Specify the number of processes you want to create (it is even possible to not specify that, but at the time of writing this mini-tutorial I don't know how many processes will be created in that case)
- Use `map` function to distribute the work for running a function on all elements of a list over this pool of processes. It returns back the return values of those functions inside another list.

Let's try out the same example using process pool.

In [5]:
from multiprocessing import Pool

t_0 = time.time()

pool = Pool(NUM_PROCESSES)
results = pool.map(factorial, INPUTS)

print('elapsed time = {}'.format(time.time() - t_0))

50000015000002
50000025000004
50000035000007
50000055000016
50000065000022
50000045000011
50000005000001
50000075000029
50000085000037
50000095000046
elapsed time = 5.701684951782227


## Some useful functions and attributes

- `mp.current_process()` returns the Process object corresponding to the current process. Two useful attribute of this class are `name` and `pid`.
- `mp.cpu_count` is supposed to return the number of CPUs on the machine. However, it could be the number of cores available.
- `daemon`: There are two types of processes: daemonic and non-daemonic. When a process exits, it attempts to terminate all of its daemonic processes while it attempts to join its non-daemonic processes right before exitting. Note that setting `process.daemon` should be done before starting the process

In [6]:
print(mp.current_process())
print(mp.current_process().name)
print(mp.current_process().pid)

<_MainProcess(MainProcess, started)>
MainProcess
58219
