# Parallel computing the easy way
## or
## How to boost performance by 300% in 5 minutes (\*)

(\*) depends on how many cores you have and the first implementation is probably going to take longer than 5 minute, but I wanted a catchy title.

---
In this tutorial we are looking at an easy way to implement CPU-based parallelization for 'embarrassingly parallel' problems (see the [wikipedia article](https://en.wikipedia.org/wiki/Embarrassingly_parallel) for details). In short 'embarrassingly parallel' means that the individual processes run independent of each other and there is no need for communication.

As an example, say you are running three different IAMs for three different scenarios each. None of these calculations influence each other in any way. In principle they can be run in arbitrary order. 

Generally speaking, the chance is high that you are looking at an embarrassingly parallel task whenever you have something like the following pattern:

In [1]:
import time

def some_computation(model, scenario):
    # simulate some expensive calculation for the model, scenario combination
    time.sleep(1)
    # return what we have calculated so that we 
    return f'{model = }, {scenario = }'

models = ["MESSAGE", "GCAM", "REMIND"]
scenarios = ["Current policies", "1.5C target"]

res = []
start_time = time.time()
for m in models:
    for s in scenarios:
        res.append(some_computation(m, s))
            
print(f'That took {time.time()-start_time:.2f} seconds')
print(res)

That took 6.03 seconds
["model = 'MESSAGE', scenario = 'Current Policies'", "model = 'MESSAGE', scenario = '1.5C target'", "model = 'GCAM', scenario = 'Current Policies'", "model = 'GCAM', scenario = '1.5C target'", "model = 'REMIND', scenario = 'Current Policies'", "model = 'REMIND', scenario = '1.5C target'"]


### Side Notes:

* The function `some_computation` can have an arbitrary amount of arguments. 
* Feeding `m` and `s` into the function using a double loop is not the most efficient way. Using e.g. [`itertools.product`](https://docs.python.org/3/library/itertools.html#itertools.product) from the python standard library is much better. However, it should illustrate the point that the individual iterations have no dependency.

## Using [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html) to speed up computation

In [2]:
from joblib import Parallel, delayed
import itertools

start_time = time.time()
res = Parallel(n_jobs=-1)(delayed(some_computation)(m, s) for m in models for s in scenarios)
# using itertools.product we need to unpack the output tuple.
# res = Parallel(n_jobs=-1)(delayed(some_computation)(*arg_tuple) for arg_tuple in itertools.product(models,scenarios))
print(f'That took {time.time()-start_time:.2f} seconds')
print(res)

That took 2.52 seconds
["model = 'MESSAGE', scenario = 'Current Policies'", "model = 'MESSAGE', scenario = '1.5C target'", "model = 'GCAM', scenario = 'Current Policies'", "model = 'GCAM', scenario = '1.5C target'", "model = 'REMIND', scenario = 'Current Policies'", "model = 'REMIND', scenario = '1.5C target'"]


### Taking a closer look at the parallel implementation

There are three main componentens to using `joblib` to parallelize computation:

1. **Using the `Parallel` object**: `Parallel` can take a number of parameters, the most important being `n_jobs`. This specifies the number of parallel processes to run. If `n_jobs=-1` all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used (from the [joblib documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html)). In the documentation information about the additional input parameters and more examples can be found.
2. **Wrapping the function in the `delayed` decorator**: This step involves nothing more than, in our case, just calling `delayed(some_computation)`.
3. **Feeding in the input values**: Finally we need to 'feed' the data for each iteration into the function. In the previous example we did this using a list comprehension. `(m, s) for m in models for s in scenarios`. In this example and in the example using `itertools.product` we fed the data into the function as positional arguments. Sometimes it can be convenient to add them as keyword arguments. This can be the case if we have a function that has some default arguments which we do not want to change when running the function. For example if we have a function `def f(a,b,c=3,d=5)` and we want to use the default `c=3` but change the input d. We need to call the function with keyword arguments like this `f(a=1, b=3, d= 17)`. This can also be achieved using `joblib.Parallel` as the following exmple illustrates. 

In [3]:
input_kwargs = ({'model':m, 'scenario':s} for m in models for s in scenarios)
res = Parallel(n_jobs=-1)(delayed(some_computation)(**kwargs) for kwargs in input_kwargs)
print(res)

["model = 'MESSAGE', scenario = 'Current Policies'", "model = 'MESSAGE', scenario = '1.5C target'", "model = 'GCAM', scenario = 'Current Policies'", "model = 'GCAM', scenario = '1.5C target'", "model = 'REMIND', scenario = 'Current Policies'", "model = 'REMIND', scenario = '1.5C target'"]


### Return values

As seen in the examples, the results of the parallel executions are simply returned as a list of all return values. If the parallel function we are running returns for example a pandas DataFrame it can be useful to use the following pattern:

In [4]:
import pandas as pd
import numpy as np

i=0

def return_pandas_df():
    # the global variable i acts as a counter so that the 
    # index of the dataframe corresponds to how many times
    # the function was called in total
    global i 
    i+= 1
    return pd.DataFrame(np.random.rand(1,3), columns=('a', 'b', 'c'), index=[i])

# catch the results as a list of pandas dataframes
res = Parallel(n_jobs=1)(delayed(return_pandas_df)() for i in range(3))
# concatenate the list of data frames into a single one
pd.concat(res)

Unnamed: 0,a,b,c
1,0.949969,0.539793,0.192273
2,0.947121,0.418268,0.338098
3,0.146354,0.726021,0.84628
