These are the exercise problems from the book Python programming and Numerical methods book:

https://pythonnumericalmethods.berkeley.edu/notebooks/Index.html

# Chapter 13: Parallelize Your Python

https://pythonnumericalmethods.berkeley.edu/notebooks/chapter13.04-Summary-and-Problems.html#problems

Only those problems that need to be solved in a notebook environment

---

1. What's parallel computing?

Parellel computing means breaking down a single operation into multiple smaller operations where each sub operation is performed simultaneously and the results of each operation is combined to produce the final result.

For example, lets say we need to take a mean of 10 numbers from the following array.

[3,5,7,8,9,10,4,6,8,11]


lets say we have a computer with 5 cores.

each core can take 2 numbers and computre their mean and the final answer would the mean of all five means. Each of the operations can be performed side by side in the individual cores.

<pre>

[3,5]            [7,8]              [9,10]              [4,6]           [8,11]
 c1                c2                 c3                 c4              c5
(3+5)/2 = 4      (7+8)/2 = 7.5       (9+10)/2 = 9.5      (4+6)/2 = 5      (8+11)/2 = 9.5

final_ans = (4 + 7.5 + 9.5 + 5 + 9.5)/5 = 7.1
</pre>

This is an example of parallel processing.

In [None]:
import numpy as np
np.mean([3,5,7,8,9,10,4,6,8,11])

7.1

---

2. Please specify the difference between process and thread.

Processes can help breakdown a single oepration into multiple operations using cores within a cpu. However, within each core a single process can be handled by multiple sub-processes known as threads.

So, threads can be seen as further breaking down an individual sub-process.

So, let's say if we only had a single core and 5 threads within that core then a similar operation as demonstrated above would be performed by breaking the whole process into multiple threads. Each thread within a core has access to all of the data within that core.

The operations performed with 5 cores and 1 thread will be significantly faster than 1 core 5 threads as the 5 threads within a core would need to fetch the data stored in cpu memory, and with one core that would be the entire data.

The major difference between a process and a thread is that processes are independent of one another whereas threads share information with each other. If you change any variable or code for one thread, it will change for all. For processes that's not the case. Even if you change variables or data of one of the processes, it will not affect the other processes.

---

3. Find out the number of your processors on your computer using the multiprocessing package.

In [None]:
import multiprocessing as mp
print(f'Number of cpu: {mp.cpu_count()}')

Number of cpu: 2


---

4. Use multiprocessing package to parallel the following code, and record the running time. Hint: You may need to check out the pool.apply function.

In [None]:
%%time

results = []

def plus_cube(x, y):
    return (x+y)**3

for x, y in zip(range(100), range(100)):
    results.append(plus_cube(x, y))

print(results[:10])

[0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832]
CPU times: user 858 µs, sys: 2 µs, total: 860 µs
Wall time: 872 µs


In [None]:
%%time

pool = mp.Pool(processes=mp.cpu_count())
async_results = [pool.apply_async(plus_cube, args = (x, y)) for (x,y) in zip(range(100), range(100))]
results = [r.get() for r in async_results]
print(results[:10])

[0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832]
CPU times: user 24.2 ms, sys: 16.1 ms, total: 40.3 ms
Wall time: 46.8 ms


Using starmap

In [None]:
%%time

pool = mp.Pool(processes=mp.cpu_count())
results = pool.starmap(plus_cube, zip(range(100), range(100)))
print(results[:10])

[0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832]
CPU times: user 5.57 ms, sys: 15.3 ms, total: 20.8 ms
Wall time: 26 ms


more details:

https://superfastpython.com/multiprocessing-pool-map-multiple-arguments/

---

5. Can you provide an example to illustrate the difference of pool.map and pool.map_async?

poo.map() applies a function to each iterable one at a time where the entire array or list is passed as an argument.

In [None]:
def square(num):
    return num**2

n_cpu = mp.cpu_count()
pool = mp.Pool(processes=n_cpu)
results = pool.map(square, range(21))
results

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400]

Using map.async()

map.async() only takes one argument at a time and hence would return an async object from which we can retrieve all the results. This allows you to split items between multiple cores using chunksize where it will split evenly sized chunks to all cores or processes.

In [None]:
async_results = pool.map_async(square, range(21))
results = async_results.get()
results

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400]

Thre big advantage of using map.async() is it can accept iterables in chunks.

In [None]:
lst = list(range(21))
results = pool.map_async(square, lst, chunksize = 10).get()
results

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400]

6. What is Python’s GIL?

Python GIL (Global Interpreter Lock) ensures only a single thread can run at a time and prevents multiple threads from running simultaneously.

---

7. Use joblib to parallel the above example, and use the multiprocessing as the backend.

In [None]:
# !pip install joblib
from joblib import Parallel, delayed
import numpy as np

In [None]:
results = Parallel(n_jobs=mp.cpu_count(), backend = 'multiprocessing')(delayed(square)(x) for x in range(21))
results

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400]