### Tuning Machine Learning Hyperparameters using Evolutionary Computing 

Many machine learning algorithms require a set of hyper parameters during its learning phase. These hyper parameters affect the training procedure and hence the machine learning algorithm's accuracy.

For example in sci-kit learn library for decision tree, there are hyper parameters such as max depth of the tree, tree splitting strategy and minimum samples to split a node. Each of these parameters affects the way the decision tree is constructed during the learning process, and their combined effect on the results of the learning process – and, consequently, on the performance of the model – can be significant.

As the choice of hyper parameter impacts on the performance of the machine learning model, we need to tune these hyper parameters to obtain the best possible results. 

A common way to find which hyper parameters are the best is too use grid search. This approach requires systematic search by changing the parameters one by one. 

As an example, given the Decision Tree classifier, we can choose the subset of
values {2, 5, 10} for the max_depth parameter, while, for the splitter parameter, we choose both possible values – {"best", "random"}. Then, we try out all six possible combinations of these values. For each combination, the classifier is trained and evaluated for a certain performance criterion, for example, accuracy. At the end of the process, we pick the combination of hyperparameter values that yielded the best performance. The main drawback of the grid search is the exhaustive search it conducts over all the
possible combinations, which can prove very lengthy.

A better option – that is of particular interest to this course – when it comes to performing the search, is to harness an evolutionary algorithm to look for the best combination(s) of hyperparameters within the predefined grid. This method offers the potential for finding the best grid combinations in a shorter amount of time than the original, exhaustive grid search.


### Dataset 

Let's get the wine dataset from UCI Machine Learning database. The dataset contains the results of a chemical analysis that was conducted for 178 different wines that were grown in the same region in Italy and categorizes these wines into one of three types.



The chemical analysis consists of 13 different measurements, representing the quantities of the following constituents that are found in each wine:
   - Alcohol
   -  Malic acid
   -  Ash
   -  Alkalinity of ash
   -  Magnesium
   -  Total phenols
   -  Flavanoids
   -  Nonflavanoid phenols
   -  Proanthocyanins
   -  Color intensity
   -  Hue
   -  OD280/OD315 of diluted wines
   -  Proline

Columns 2-14 of the dataset contain the values for the preceding measurements, while the classification outcome – the wine type itself (1, 2, or 3) – is found in the first column.

In [1]:
import numpy as np
from pandas import read_csv

url = 'riceClassification.csv'

data = read_csv(url, header=None, usecols=range(0, 11))
X = data.iloc[:, 0:10]
y = data.iloc[:, 10]
X

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,4537,92.229316,64.012769,0.719916,4677,76.004525,0.657536,273.085,0.764510,1.440796
1,2872,74.691881,51.400454,0.725553,3015,60.471018,0.713009,208.317,0.831658,1.453137
2,3048,76.293164,52.043491,0.731211,3132,62.296341,0.759153,210.012,0.868434,1.465950
3,3073,77.033628,51.928487,0.738639,3157,62.551300,0.783529,210.657,0.870203,1.483456
4,3693,85.124785,56.374021,0.749282,3802,68.571668,0.769375,230.332,0.874743,1.510000
...,...,...,...,...,...,...,...,...,...,...
18180,5853,148.624571,51.029281,0.939210,6008,86.326537,0.498594,332.960,0.663444,2.912535
18181,7585,169.593996,58.141659,0.939398,7806,98.272692,0.647461,385.506,0.641362,2.916910
18182,6365,154.777085,52.908085,0.939760,6531,90.023162,0.561287,342.253,0.682832,2.925396
18183,5960,151.397924,51.474600,0.940427,6189,87.112041,0.492399,343.371,0.635227,2.941216


X are the dataset attributes from column 1 to 14. Column 0 is the class. 

For the classifier we are going to use `ADABoost` classifier. It uses a boosting algorithm where individual weak classifiers are combined to create a powerful classifier. The sklearn library's implementation of this model, `AdaboostClassifier`,uses several hyperparameters, some of which are as follows:

 - n_estimators : integer: the number of weak learners
 - learning_rate : float :
 - algorithm : {SAMME, SAMME.R} :real or discrete boosting

These are the parameters we are going to optimize. If we set these hyper parameters to be:

 - The n_estimators parameter is tested across 10 values, linearly spaced between 10 and 100.
 - The learning_rate parameter is tested across 10 values, logarithmically spaced between 0.1 ($10^{-2}$) and 1 ($10^0$).
 - Both possible values of the algorithm parameter, 'SAMME' and 'SAMME.R', are tested.

This will create a total of 10x10x2=200 combinations to be tested. 

Notes: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

### Chromosomes 

We need to create the genetic individuals (chromosome) with these hyper parameters. Let's see how we can create them ...

Remember we need to define the Individual type with creator. Then we need to fill it up with the appropriate function in IndividualCreator. With the IndividualCreator then we fill up the population.

In [2]:
from deap import base
from deap import creator
from deap import tools
import random

# define a single objective, maximizing fitness strategy:
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
# create the Individual class based on list:
creator.create("Individual", list, fitness=creator.FitnessMax)


### Creating the individual

In order to fill up the individual chromosome, the DEAP toolbox provides 3 choices:

- initRepeat
- initIterate
- initCycle

`initRepeat` is for filling up the Individual with a single function call in a repeated loop.

`initIterate` is for filling up the Individual with a Generator that remembers which previous values have been called.

`initCycle` is for filling up the Individual with multiple functions that can be called repeatedly (if n is greater than 1).

In [3]:
toolbox = base.Toolbox()
# fill in the with random floats for each type of hyper parameter
toolbox.register("h1", random.uniform,1,100)
toolbox.register("h2", random.uniform,0.01,1.0)
toolbox.register("h3", random.uniform,1,20)

toolbox.register("IndividualCreator",tools.initCycle,creator.Individual,
                 (toolbox.h1,toolbox.h2,toolbox.h3),n=1)

In [16]:
# Test IndividualCreator
for i in range(2):
    print(toolbox.IndividualCreator())

[69.21775141184514, 0.5312243671762628, 2.700150888494897]
[9.04765911758967, 0.33753062575009884, 19.1110698361401]


### Individual Chromosome to Hyper Parameter

We need to convert back the Individual chromosome back to hyper parameter values.


In [8]:
a = [32.96578879203422, 0.43511523698489224, 1.4889191294943279]

In [9]:
def getParams(individual):
    n_est = round(individual[0])
    crit = ["gini", "entropy"][round(individual[1])] 
    mx_dep = round(individual[2])
    return n_est, crit, mx_dep

In [10]:
getParams(a)

(33, 'gini', 1)

## Fitness Evaluation

We use the accuracy of the machine learning algorithm - AdaBoostClassifier - for evaluation of the hyper parameters. 

In [11]:
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier

kfold = model_selection.KFold(n_splits=10, random_state=42, shuffle=True)

def getAccuracy(individual):
    n_estimators, criterion, max_depth = getParams(individual)
    classifier = RandomForestClassifier(random_state=42,
                                        n_estimators=n_estimators,
                                        criterion=criterion,
                                        max_depth=max_depth
                                        )

    cv_results = model_selection.cross_val_score(classifier,
                                                 X,
                                                 y,
                                                 cv=kfold,
                                                 scoring='accuracy')
    return cv_results.mean()

### Population and Evaluation Fitness

We register PopulationCreator and Evaluate with toolboox.

In [12]:
# create the population operator to generate a list of individuals:
toolbox.register("populationCreator", tools.initRepeat, list, 
                 toolbox.IndividualCreator)
# fitness calculation
def classificationAccuracy(individual):
    return getAccuracy(individual),

toolbox.register("evaluate", classificationAccuracy)


###  Selection, Mutation and Crossover

Lookup here: https://deap.readthedocs.io/en/master/api/tools.html#module-deap.tools

In [13]:
# genetic operators:
# YOUR TASK fill in your own operators

toolbox.register("mate", tools.cxUniform, indpb = 0.7)
toolbox.register("mutate", tools.mutGaussian, mu=[20,0.2,0.2], sigma=[5,0.1,0.1], indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)

## The main algorithm

In [17]:
from deap import algorithms

# Genetic Algorithm constants:
POPULATION_SIZE = 5
P_CROSSOVER = 0.9  # probability for crossover
P_MUTATION = 0.5   # probability for mutating an individual
MAX_GENERATIONS = 10
HALL_OF_FAME_SIZE = 5

# create initial population (generation 0):
population = toolbox.populationCreator(n=POPULATION_SIZE)

# prepare the statistics object:
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("max", np.max)
stats.register("avg", np.mean)

# define the hall-of-fame object:
hof = tools.HallOfFame(HALL_OF_FAME_SIZE)

# perform the Genetic Algorithm flow with hof feature added:
population, logbook = algorithms.eaSimple(population,
                                          toolbox,
                                          cxpb=P_CROSSOVER,
                                          mutpb=P_MUTATION,
                                          ngen=MAX_GENERATIONS,
                                          stats=stats,
                                          halloffame=hof,
                                          verbose=True)

# print best solution found:
print("- Best solution is: ")
print("params = ", hof.items[0])
print("Accuracy = %1.5f" % hof.items[0].fitness.values[0])

gen	nevals	max     	avg     
0  	5     	0.990487	0.989651
1  	5     	0.990487	0.990377
2  	5     	0.990487	0.990443
3  	5     	0.990487	0.990476
4  	5     	0.990487	0.990487
5  	5     	0.990487	0.990443
- Best solution is: 
params =  [78.75844315985037, 0.18640307534680517, 15.017021730830045]
Accuracy = 0.99049
