# Python Multithreading and Multiprocessing
#### Sheridan B. Green

## Threads vs. Processes
* **Process**
    * Executing instance of an application
    * Used for “heavyweight” tasks
    * Has its own memory space
    * Independent of other processes
    * **Contains** at least one thread
    
    
* **Thread**
    * Used for “lightweight” subroutines of a program
    * Share memory address space with other threads of same process
    * Can read and write to data structures & variables from sibling threads
    * Each thread associated with a different task of the parent program process
    
    
* Multithreaded applications must be programmed carefully to avoid threads “stepping on each other”


* Implemented in Python via the **threading** and **multiprocessing** modules
    * multiprocessing.dummy is identical to multiprocessing but replaces processes with threads


## threading module
`import threading`
* **Pros:**
    * Simple to run any callable function in its own thread
    * Sharing data is fairly simple between threads
    * Useful for I/O bound tasks (networking, writing to disk, and so on)
* **Cons:**
    * Python Global Interpreter Lock (GIL)
        * Only one thread can execute at a time
        * Helps avoid sharing code that is non-thread safe amongst other threads
        * Prevents parallel threads from stepping on each other

**Result**: In Python, the best way to do concurrent computations is through **parallel processes**, rather than threads.

## multiprocessing module
`import multiprocessing`

* **Pros:**
    * Actual concurrency for all tasks (meaning no GIL)
    * Takes full advantage of **multicore systems**, and can even span *multiple machines*
    * Can be used to take advantage of high-throughput cluster computing i.e. Killdevil/Kure
* **Cons:**
    * Individually, processes are slower than threads
        * **multiprocessing** remains faster than **threading** due to GIL (in Python!)
    * Data sharing between processes is more difficult than with threads
        * multiprocessing is better for asynchronous tasks, i.e. those which do not require data from one another
    
Ultimately, we’ll focus on the **multiprocessing module** for this tutorial since **we all have multi-core processors and should take advantage of them!**


## Basic Example

The basic example of how to use the `multiprocessing` module  creates tasks assigned to separate cores using the Process object. Each Process object is given the `target` which is the function to be ran, as well as the necessary parameters `args`.

These processes are ran asynchronously and their results may not necessarily be in order at the end. This method requires more organization than is necessary for most applications that we would be interested in...

### Note:
The `if __name__ == '__main__':` line is crucial when using `multiprocessing`. 


For those who have written code in Java, C/C++, or any other language which uses `main` functions, this should be familiar. The user first initializes this code directly and the code in `__main__` is ran. However, `multiprocessing` will iteratively call this code segment when referencing the function `funSquare` definition.

To only have the `__main__` part ran once, it must be within the `if` statement. 

An alternative solution is to define the functions of interest in a separate file and then `import` them.

In [18]:
import multiprocessing

def funSquare(num):
    print num ** 2
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=funSquare, args=(i,))
        jobs.append(p)
        p.start()

1
0
4
9
16


## Introducing Pool and Pool.map

The **Pool** class contained within `multiprocessing`, i.e. `multiprocessing.Pool`, is used to control a pool of worker processes.

Instead of submitting and starting each successive job in the fashion used above, one can submit an entire suite of jobs at once to the worker pool and let `multiprocessing` handle the distribution of tasks. 

`multiprocessing.Pool()` initializes the worker process pool. The number of worker processes is defaulted to the number of cores in the CPU. However, the class initialization takes an optional integer argument of the number of worker processes desired. **This could be useful in high throughput computing (HPC) contexts.**

The example below is quite simple: First, the worker process pool is initialized, here as `pool`. Second, the `pool.map()` method is called. This method applies a specified function across each element in an *iterable* and returns the results, **in the same order as the values of the iterable.** 

For example, we give the process pool a list of numbers and tell it to square all of them, return the squared values in a list.

In [20]:
import multiprocessing

def funSquare(num):
    return num ** 2

if __name__ == '__main__':
    pool = multiprocessing.Pool() #initializes N workers, where N = # of CPU cores
    results = pool.map(funSquare, range(10))
    print(results)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


## More Detailed Example

In the case below, we define a simple "simulation" in the function `runSimulation(params)` which takes an object and performs some calculation using the various parameters, returning an array.

The crucial advancement in this implementation compared to the previous one is the introduction of a zipped set of parameters. In the `__main__` initialization of the code, two separate parameters to be swept over, `param1` and `param2`, are defined as arrays. `pool.map()` only takes **one** iterable object, so the built-in Python function `zip()` can be used to construct a single tuple object from multiple lists.

The same procedure as above is then used, first initializing the worker process pool and then giving pool.map() the arguments corresponding to the function and the parameter tuple. The worker function runSimulation() then unpacks the zipped tuple as `param1, param2 = params`, **very simple!**

**This will return a list of the output arrays from the function in the same order as the input parameters.**

### Note:

In this case, we are giving as the parameter sets of **pairs**, not simply two arrays to be looped over individually. More careful thought would be needed in order to effectively use `pool.map()` to search over two individual parameter spaces. A simple first-order approach would be to generate an array corresponding to **every possible combination** of the two parameters, send that as the parameter tuple to `pool.map()` and then use `numpy` to reshape the final output array.

### Warning:

Jupyter notebooks do not handle the complete use of all cores on the computer very well. I had to deal with several complete freezes of my computer while testing this code out. Perhaps try using `multiprocessing.Pool(N-1)`, where N is the number of cores in your processor. This way you should be guaranteed to have one free core for regular computer processing while your code is running. Ideally, this will prevent hang-ups.

In [2]:
import multiprocessing
import numpy as np

def runSimulation(params):
    """This is the main processing function. It will contain whatever
    code should be run on multiple processors.
    
    """
    param1, param2 = params
    # Example computation
    processedData = []
    for ctr in range(10000):
        processedData.append(param1 * ctr - param2 ** 2)

    return processedData

if __name__ == '__main__':
    # Define the parameters to test
    param1 = range(100)
    param2 = range(2, 202, 2)

    # Zip the parameters because pool.map() takes only one iterable
    params = zip(param1, param2)

    pool = multiprocessing.Pool()
    results = pool.map(runSimulation, params)


## Using Objects

We briefly note that a class can be used to instantiate objects which can encapsulate several different parameters and methods and then passed through to the function of interest with `pool.map()`.

### Note:
We implement two loops across the two parameters in order to create a list of all possible combinations of the two parameters, as introduced in the note above the previous code block.

In [1]:
import multiprocessing
import numpy as np

class simObject():
    def __init__(self, params):
        self.param1, self.param2 = params

def runSimulation(objInstance):
    """This is the main processing function. It will contain whatever
    code should be run on multiple processors.
    
    """
    param1, param2 = objInstance.param1, objInstance.param2
    # Example computation
    processedData = []
    for ctr in range(100):
        processedData.append(param1 * ctr - param2 ** 2)

    return processedData

if __name__ == '__main__':
    # Define the parameters to test
    param1 = range(10)
    param2 = range(2, 202, 2)

    objList = []
    # Create a list of objects to feed into pool.map()
    for p1 in param1:
        for p2 in param2:
            objList.append(simObject((p1, p2)))

    pool = multiprocessing.Pool()
    results = pool.map(runSimulation, objList)

## Time Comparisons

By running the function across our parameter set using `pool.map()` as well as the built-in Python function `map()` which takes the same arguments, we can compare the total CPU time used by each. The Python `time` module has a routine called `time.clock()` which can be used to get the current CPU time. By comparing the difference between two CPU times, we can look at the total time used in a code block.

For my four-core laptop, the non-parallelized version of the code runs slower by approximately a factof of 4, as we would expect!

In [9]:
import multiprocessing
import time

def runSimulation(params):
    """This is the main processing function. It will contain whatever
    code should be run on multiple processors.
    
    """
    param1, param2 = params
    # Example computation
    processedData = []
    for ctr in range(100000):
        processedData.append(param1 * ctr - param2 ** 2)

    return processedData

if __name__ == '__main__':
    # Define the parameters to test
    param1 = range(100)
    param2 = range(2, 202, 2)

    params = zip(param1, param2)

    pool = multiprocessing.Pool()

    # Parallel map
    tic = time.clock()
    results = pool.map(runSimulation, params)
    toc = time.clock()

    # Serial map
    tic2 = time.clock()
    results = map(runSimulation, params)
    toc2 = time.clock()

    print('Parallel processing time: %r\nSerial processing time: %r'
          % (toc - tic, toc2 - tic2))
    
    print('Speedup factor: %r') % ((toc2-tic2)/(toc-tic))

Parallel processing time: 1.246099000000001
Serial processing time: 4.796491000000003
Speedup factor: 3.849205400212984


## Usage Ideas

Among many other uses, `multiprocessing` can be easily employed to replace any computationally intensive loop that is used where the individual computations occuring inside each iteration in the loop do not depend on computations elsewhere in the loop. A problem of this type can utilize **asynchronous parallel processing.**

Two simple examples relevant to astrostatisics:
* Calculation of posterior distributions in a Bayesian model
* Monte Carlo sampling of a parameter space (**not MCMC, as this requires knowledge of the previous step**)

**With only a couple simple lines, you can drastically speed up any Python code which involves large amounts of asynchronous calculations. The benefits from this are too big to pass up!**

## References and Helpful Links

### Primary code tutorial adapted from:
http://kmdouglass.github.io/posts/learning-pythons-multiprocessing-module.html

### More rigorous explanation of the multiprocessing.Process() method:
https://pymotw.com/2/multiprocessing/basics.html

### Extremely simple alternative explanation of Pool.map():
http://chriskiehl.com/article/parallelism-in-one-line/

### Pros and cons of multithreading and multiprocessing in Python (Stack Exchange):
http://stackoverflow.com/questions/1190206/threading-in-python

### Official multiprocessing documentation from the Python documentation site:
https://docs.python.org/2/library/multiprocessing.html

### One last method of asynchronously utilizing multiprocessing.Pool():
http://pyinsci.blogspot.com/2009/02/usage-pattern-for-multiprocessing.html

### Documentation and list of official included Python functions, such as map(), zip():
https://docs.python.org/3/library/functions.html
