# Machine-Learning and Parallel Computing

It is a very important technique when building up machine-learning models: If we
have a single computation-intensive task, it will only run on one core, even if our
computer has multiple cores. If nothing is done explicitly, we will waste a lot of
computation power!

In machine-learning one task which can be distributed to many cores is **hyperparameter testing**. This will siginificantly reduce the amount of computation time used to evaluate the hyperparameter space.

## Distributed Computing

Parallelization can also be performed by means of **distributed computing**. 

In distributed computing, a large number of discrete
computers, named nodes, distributed across a network (e.g., the Internet) devote
some or all of their computation time to solving a common problem; each node
receives and completes many small tasks, reporting the results to a central server
which integrates the results into the overall solution. Each of the nodes has its own
local memory and thus tasks that run on different computers do not need to coordinate
access to it. However, since information is exchanged through the network, care must
be taken in order to select the amount of information that is passed so as to optimize
the computational performance.

# Parallelization for Hyperparameter Testing

## Manual Parallelization

Step 1: Put all your hyperparameters in a list:

In [None]:
# Get Parameter Configurations.
configurations = []
for a in param_test['learning_rate']:
    for b in param_test['n_estimators']:
        for c in param_test['max_depth']:
            for d in param_test['min_child_weight']:
                for e in param_test['gamma']:
                    for f in param_test['subsample']:
                        for g in param_test['colsample_bytree']:
                            for h in param_test['nthread']:
                                for i in param_test['scale_pos_weight']:
                                    for j in param_test['seed']:
                                        param = {'learning_rate'    : a,
                                                 'n_estimators'     : b,
                                                 'max_depth'        : c,
                                                 'min_child_weight' : d,
                                                 'gamma'            : e,
                                                 'subsample'        : f,
                                                 'colsample_bytree' : g,
                                                 'nthread'          : h,
                                                 'scale_pos_weight' : i,
                                                 'seed'             : j,
                                                 'objective'        : 'reg:squarederror'} 
                                        configurations.append(param)

Step 2: Use the joblib library to parallelize the code:

In [None]:
from joblib import Parallel, delayed
#with parallel_backend('multiprocessing'): #threading
models, params, selections, val_errors, trn_errors = zip(*Parallel(
                                                                n_jobs=-1,
                                                                verbose=50,
                                                                backend="multiprocessing"
                                                            )(
                                                                map(delayed(XGB_fit), 
                                                                [train]*len(configurations), 
                                                                [val]*len(configurations), 
                                                                configurations, 
                                                                np.repeat(model_type, len(configurations)))
                                                            )
                                                        )

Joblib user manual: https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism

In [None]:
from multiprocessing import Pool

## Built-In Parallelization

When the underlying implementation uses joblib, the number of workers (threads or processes) that are spawned in parallel can be controlled via the n_jobs parameter

In [None]:
model_cv = GridSearchCV(lrg, param_grid, scoring=scoring, n_jobs=3, cv=cross_val, verbose=10, return_train_score=True)