# High Performance Computing

## 1. Introduction

### 1.1. Working Environment

In [1]:
import sys
print(sys.version)

3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]


In [12]:
import numpy as np
import matplotlib.pyplot as plt
import time

We now have a working knowledge of Python, and soon we will start to use it to analyze data and numerical analysis. Before we go deeper, we need to know how to enhance the performance of our program.

### 1.2. References

## 4. Parallel Computing

In this section we cover **parallel computing** in Python. This means you will be able to run your code simultaneously on multiple cores on your CPU processors or increase the speed by taking advantage of the wasted CPU cycles while your program is waiting for external resources (e.g., downloading files, API calls, etc.). The fundamental idea of parallel computing is rooted in doing multiple tasks at the same time to reduce the running time of your program. The following figure illustrates the simple idea of doing parallel computing versus serial computing that we use so far.

For example, if you have one million data files and need to apply the same operations to each one of them, you can do this one file at each time, or you can do it by multiple files at the same time; or if you are downloading one million websites, you can take advantage of downloading 10 at a time to reduce the total time of downloading. Therefore, learning the basics of parallel computing will help you design code that is more efficient.

Most of the modern computers are using the multi-core design, which means on a single computing component, there are multiple independent processing units, the so called cores, that are available to do different tasks. 

In Python, there are two basic approaches to conduct parallel computing, that is using the `multiprocessing` or `threading` library. Let's first take a look of the differences of process and thread.

### 4.1. Process and Thread

A **process** is an instance of a program (such as Python interpreter, Jupyter notebook, etc.). A process is created by the operating system to run program, and each process has its own memory block. A **thread** is a sub-process that reside within the process. Each process can have multiple threads, that these threads will share the same memory block within the process. Thus, for multiple threads in a process, due to the shared memory space, the variables or objects are all shared. If you change one variable in one thread, it will change for all the other threads. But things are different in different processes, changing one variable in one process will not change the one in other processes. Process and thread both have advantages or disadvantages, and can be used in different tasks to maximize the benefits.

### 4.2. Python's GIL Problem

Python has something called Global Interpreter Lock (GIL) which allow only one native thread to run at a time, preventing multiple threads from running simultaneously. This is because Python was designed before the multi-core processor on the personal computers. Even though there are workarounds in Python to do multithreading programming, we will only cover the multiprocessing library in the next subsection, which we will use most of the time for taking advantage of multi-core parallel computing.

### 4.3. Disadvantages of Parallel Computing

Of course, there are disadvantages of using parallel computing. Such as, more complicated code, overheads when spawn new processes and maintain them. Thus, if your task is small, using parallel computing will take longer time, since it takes time for the system to initialize new process and maintain them.

### 4.4. Multiprocessing

The multiprocessing library is the Python's standard library to support parallel computing using processes. It has many different features, if you wan to know all the details, you can check the official documentation. Here we will introduce the basics to get you start with parallel computing. Let's start by importing the library.

In [2]:
import multiprocessing as mp

Let's first print out the total number of CPUs that on my machine that can be used for parallel computing.

In [3]:
print(f'Number of CPUs: {mp.cpu_count()}')

Number of CPUs: 4


Let's use an example to show you how to use multiple cores in one machine to reduce the execution time. We will generate 10,000,000 random numbers between 0 and 10 one-by-one, square them, and store the results in a list.

In [11]:
def random_square(seed):
    np.random.seed(seed)
    random_num = np.random.randint(0, 10)
    return random_num**2

The code below is the serial version of the example (what we have done so far).

In [15]:
%%time 
results = []
for i in range(10_000_000):
    results.append(random_square(i))

CPU times: total: 36.7 s
Wall time: 1min 23s


Next we see the parallel version of the example.

In [None]:
%%time
n_cpu = mp.cpu_count()
pool = mp.Pool(processes=n_cpu)
results = [pool.map(random_square, range(10_000_000))]

The simplest way to do parallel computing using the multiprocessing is to sue the `Pool` class. There are four common methods in the class that we may use often, which are `apply`, `map`, `apply_async` and `map_async`. Have a look at the documentation for the differences. Here we only uses `map` method above to paralled the above example. The `map(func, iterable)` method takes in two arguments, and apply the function `func` to each element in the `iterable`, and then collect the results.