### Visualizing Coarse to Fine
- undertake the first part of a Coarse to Fine search. This involves analyzing the results of an initial random search that took place over a large search space, then deciding what would be the next logical step to make your hyperparameter search finer.

We have available:

- combinations_list - a list of the possible hyperparameter combinations the random search was undertaken on.
- results_df - a DataFrame that has each hyperparameter combination and the resulting accuracy of all 500 trials. Each hyperparameter is a column, with the header the hyperparameter name.
- visualize_hyperparameter() - a function that takes in a column of the DataFrame (as a string) and produces a scatter plot of this column's values compared to the accuracy scores. An example call of the function would be visualize_hyperparameter('accuracy')

In [None]:
def visualize_hyperparameter(name):
    plt.clf()
    plt.scatter(results_df[name],results_df['accuracy'], c=['blue']*500)
    plt.gca().set(xlabel='{}'.format(name), ylabel='accuracy', title='Accuracy for different {}s'.format(name))
    plt.gca().set_ylim([0,100])
    plt.show()

In [None]:
# Confirm the size of the combinations_list
print(len(combinations_list))

# Sort the results_df by accuracy and print the top 10 rows
print(results_df.sort_values(by='accuracy', ascending=False).head(10))

# Confirm which hyperparameters were used in this search
print(results_df.columns)

# Call visualize_hyperparameter() with each hyperparameter in turn
visualize_hyperparameter('max_depth')
visualize_hyperparameter('min_samples_leaf')
visualize_hyperparameter('learn_rate')

### Coarse to Fine Iterations
- now visualize the first random search undertaken, construct a tighter grid and check the results. We will have available:

- results_df - a DataFrame that has the hyperparameter combination and the resulting accuracy of all 500 trials. Only the hyperparameters that had the strongest visualizations are included (max_depth and learn_rate)
- visualize_first() - This function takes no arguments but will visualize each of our hyperparameters against accuracy for our first random search.


In [2]:
def visualize_first():
    for name in results_df.columns[0:2]:
        plt.clf()
        plt.scatter(results_df[name],results_df['accuracy'], c=['blue']*500)
        plt.gca().set(xlabel='{}'.format(name), ylabel='accuracy', title='Accuracy for different {}s'.format(name))
        plt.gca().set_ylim([0,100])
        x_line = 20
        if name == "learn_rate":
            x_line = 1
            plt.axvline(x=x_line, color="red", linewidth=4)
            plt.show()

- Use the visualize_first() function to check the values of max_depth and learn_rate that tend to perform better. A convenient red line will be added to make this explicit.
- Now create a more narrow grid search, testing for max_depth values between 1 and 20 and for 50 learning rates between 0.001 and 1.

In [3]:
def visualize_second():
    for name in results_df2.columns[0:2]:
        plt.clf()
        plt.scatter(results_df2[name],results_df2['accuracy'], c=['blue']*1000)
        plt.gca().set(xlabel='{}'.format(name), ylabel='accuracy', title='Accuracy for different {}s'.format(name))
        plt.gca().set_ylim([0,100])
        plt.show()

In [None]:
# Use the provided function to visualize the first results
# visualize_first()

# Create some combinations lists & combine:
max_depth_list = list(range(1,21))
learn_rate_list = np.linspace(0.001,1,50)

# Call the function to visualize the second results
visualize_second()

### Bayes Rule in Python
- undertake a practical example of setting up Bayes formula, obtaining new evidence and updating our 'beliefs' in order to get a more accurate result. The example will relate to the likelihood that someone will close their account for our online software product.
- These are the probabilities we know:

- 7% (0.07) of people are likely to close their account next month
- 15% (0.15) of people with accounts are unhappy with your product (we don't know who though!)
- 35% (0.35) of people who are likely to close their account are unhappy with our product


- Assign the different probabilities (as decimals) to variables. p_unhappy is the likelihood someone is unhappy, p_unhappy_close is the probability that someone is unhappy with the product, given they are going to close their account.
- Assign the probability that someone will close their account next month to the variable p_close as a decimal.
- We interview one of our customers and discover they are unhappy. What is the probability they will close their account, now that we know this evidence? Assign the result to p_close_unhappy and print it.

In [1]:
# Assign probabilities to variables 
p_unhappy = 0.15
p_unhappy_close = 0.35

# Probabiliy someone will close
p_close = 0.07

# Probability unhappy person will close
p_close_unhappy = (p_unhappy_close * p_close) /p_unhappy
print(p_close_unhappy)

0.16333333333333336


- We correctly were able to frame this problem in a Bayesian way, and update our beliefs using new evidence. There's a 16.3% chance that a customer, given that they are unhappy, will close their account

### Bayesian Hyperparameter tuning with Hyperopt
-  set up and run a bayesian hyperparameter optimization process using the package Hyperopt.We will set up the domain (which is similar to setting up the grid for a grid search), then set up the objective function. Finally, we will run the optimizer over 20 iterations.


- We will need to set up the domain using values:


- `max_depth` using quniform distribution (between 2 and 10, increasing by 2)
- `learning_rate` using uniform distribution (0.001 to 0.9)


In [22]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from hyperopt import hp, fmin, tpe
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier

data = load_iris()
X = data['data']
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [23]:
# Set up space dictionary with specified hyperparameters
space = {'max_depth': hp.quniform('max_depth', 2, 10, 2),'learning_rate': hp.uniform('learning_rate', 0.001,0.9)}

# Set up objective function
def objective(params):
    params = {'max_depth': int(params['max_depth']),'learning_rate': params['learning_rate']}
    gbm_clf = GradientBoostingClassifier(n_estimators=100, **params) 
    best_score = cross_val_score(gbm_clf, X_train, y_train, scoring='accuracy', cv=2, n_jobs=4).mean()
    loss = 1 - best_score
    return loss

# Run the algorithm
best = fmin(fn=objective,space=space, max_evals=20, rstate=np.random.RandomState(42), algo=tpe.suggest)
print(best)

100%|████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.56trial/s, best loss: 0.0535714285714286]
{'learning_rate': 0.7495361970511488, 'max_depth': 6.0}


### Genetic Hyperparameter Tuning with TPOT
- example of genetic hyperparameter tuning. TPOT is a very powerful library that has a lot of features. We're just scratching the surface 
- In real life, TPOT is designed to be run for many hours to find the best model. We would have a much larger population and offspring size as well as hundreds more generations to find a good model.
- We will create the estimator, fit the estimator to the training data and then score this on the test data.

- For this example we wish to use:

- 3 generations
- 4 in the population size
- 3 offspring in each generation
- accuracy for scoring

- A random_state of 2 has been set for consistency of results.


In [27]:
from tpot import TPOTClassifier

In [28]:
# Assign the values outlined to the inputs
number_generations = 3
population_size = 4
offspring_size = 3
scoring_function = 'accuracy'

# Create the tpot classifier
tpot_clf = TPOTClassifier(generations=number_generations, population_size=population_size,
                          offspring_size=offspring_size, scoring=scoring_function,
                          verbosity=2, random_state=2, cv=2)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=13.0, style=ProgressStyle(des…


Generation 1 - Current best internal CV score: 0.9642857142857143
Generation 2 - Current best internal CV score: 0.9642857142857143
Generation 3 - Current best internal CV score: 0.9642857142857143
Best pipeline: GaussianNB(VarianceThreshold(input_matrix, threshold=0.0001))
0.9736842105263158


- We can see in the output the score produced by the chosen model over each generation, and then the final accuracy score with the hyperparameters chosen for the final model. This is a great first example of using TPOT for automated hyperparameter tuning.