# Visualizing Coarse to Fine

You're going to undertake the first part of a Coarse to Fine search. This involves analyzing the results of an initial random search that took place over a large search space, then deciding what would be the next logical step to make your hyperparameter search finer.

In [16]:
from itertools import product
import numpy as np
a = list(range(1,6))
b = list(range(3,17))
c= np.linspace(0.01, 1.33, 143)
combinations_list = list(product(a,b,c))

In [21]:
import matplotlib.pyplot as plt

# Confirm the size of the combinations_list
print(len(combinations_list))


10010


In [19]:
# def visualize_hyperparameter(name):
#   plt.clf()
#   plt.scatter(results_df[name],results_df['accuracy'], c=['blue']*500)
#   plt.gca().set(xlabel='{}'.format(name), ylabel='accuracy', title='Accuracy for different {}s'.format(name))
#   plt.gca().set_ylim([0,100])
#   plt.show()

In [18]:
# # Confirm the size of the combinations_list
# print(len(combinations_list))

# # Sort the results_df by accuracy and print the top 10 rows
# print(results_df.sort_values(by='accuracy', ascending=False).head(10))

# # Confirm which hyperparameters were used in this search
# print(results_df.columns)

# # Call visualize_hyperparameter() with each hyperparameter in turn
# visualize_hyperparameter("max_depth")
# visualize_hyperparameter("min_samples_leaf")
# visualize_hyperparameter("learn_rate")

# Coarse to Fine Iterations

You will now visualize the first random search undertaken, construct a tighter grid and check the results. You will have available:

- `results_df` - a DataFrame that has the hyperparameter combination and the resulting accuracy of all 500 trials. Only the hyperparameters that had the strongest visualizations from the previous exercise are included (max_depth and learn_rate)
- `visualize_first()` - This function takes no arguments but will visualize each of your hyperparameters against accuracy for your first random search.

In [23]:
# def visualize_first():
#   for name in results_df.columns[0:2]:
#     plt.clf()
#     plt.scatter(results_df[name],results_df['accuracy'], c=['blue']*500)
#     plt.gca().set(xlabel='{}'.format(name), ylabel='accuracy', title='Accuracy for different {}s'.format(name))
#     plt.gca().set_ylim([0,100])
#     x_line = 20
#     if name == "learn_rate":
#       	x_line = 1
#     plt.axvline(x=x_line, color="red", linewidth=4)
#     plt.show()

In [26]:
# def visualize_second():
#   for name in results_df2.columns[0:2]:
#     plt.clf()
#     plt.scatter(results_df2[name],results_df2['accuracy'], c=['blue']*1000)
#     plt.gca().set(xlabel='{}'.format(name), ylabel='accuracy', title='Accuracy for different {}s'.format(name))
#     plt.gca().set_ylim([0,100])
#     plt.show()

In [24]:
# # Use the provided function to visualize the first results
# visualize_first()

In [25]:
# # Create some combinations lists & combine
# max_depth_list = list(range(1, 21))
# learn_rate_list = np.linspace(0.001, 1, 50)

In [27]:

# # Call the function to visualize the second results
# visualize_second()

# Bayes Rule in Python

In this exercise you will undertake a practical example of setting up Bayes formula, obtaining new evidence and updating your 'beliefs' in order to get a more accurate result. The example will relate to the likelihood that someone will close their account for your online software product.

These are the probabilities we know:

- 7% (0.07) of people are likely to close their account next month
- 15% (0.15) of people with accounts are unhappy with your product (you don't know who though!)
- 35% (0.35) of people who are likely to close their account are unhappy with your product

In [28]:
# Assign probabilities to variables 
p_unhappy = 0.15
p_unhappy_close = 0.35

# Probabiliy someone will close
p_close = 0.07

# Probability unhappy person will close
p_close_unhappy = (p_unhappy_close * p_close) / p_unhappy
print(p_close_unhappy)

0.16333333333333336


# Bayesian Hyperparameter tuning with Hyperopt

n this example you will set up and run a Bayesian hyperparameter optimization process using the package Hyperopt (already imported as hp for you). You will set up the domain (which is similar to setting up the grid for a grid search), then set up the objective function. Finally, you will run the optimizer over 20 iterations.

In [29]:
import pandas as pd
df = pd.read_csv("dataset/credit-card-full.csv")
# df.head()
# df.select_dtypes(include="int")
# df['default payment next month']


In [30]:
from sklearn.linear_model import LogisticRegression
y= df['default payment next month']
X = df.drop('default payment next month', axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

In [40]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
from hyperopt import fmin, tpe, hp, Trials
# Set up space dictionary with specified hyperparameters
space = {'max_depth': hp.quniform('max_depth', 2, 10, 3),'learning_rate': hp.uniform('learning_rate', 0.1,0.5)}

# Set up objective function
def objective(params):
    params = {'max_depth': int(params['max_depth']),'learning_rate': params['learning_rate']}
    gbm_clf = GradientBoostingClassifier(n_estimators=100, **params) 
    best_score = cross_val_score(gbm_clf, X_train, y_train, scoring='accuracy', cv=2, n_jobs=4).mean()
    loss = 1 - best_score
    return loss

# Run the algorithm
best = fmin(fn=objective,space=space, max_evals=20, rstate=np.random.default_rng(42), algo=tpe.suggest)
print(best)

100%|██████████| 20/20 [09:23<00:00, 28.16s/trial, best loss: 0.17829166666666674]
{'learning_rate': 0.11650414294836509, 'max_depth': 3.0}


# Genetic Hyperparameter Tuning with TPOT

You're going to undertake a simple example of genetic hyperparameter tuning. TPOT is a very powerful library that has a lot of features. You're just scratching the surface in this lesson, but you are highly encouraged to explore in your own time.

This is a very small example. In real life, TPOT is designed to be run for many hours to find the best model. You would have a much larger population and offspring size as well as hundreds more generations to find a good model.

You will create the estimator, fit the estimator to the training data and then score this on the test data.

In [34]:
from tpot import TPOTClassifier
# Assign the values outlined to the inputs
number_generations = 3
population_size = 4
offspring_size = 3
scoring_function = 'accuracy'

# Create the tpot classifier
tpot_clf = TPOTClassifier(generations=number_generations, population_size=population_size,
                          offspring_size=3, scoring=scoring_function,
                          verbosity=2, random_state=2, cv=2)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

                                                                           
Generation 1 - Current best internal CV score: 0.8185
                                                                            
Generation 2 - Current best internal CV score: 0.8185
                                                                            
Generation 3 - Current best internal CV score: 0.8188333333333333
                                                                            
Best pipeline: RandomForestClassifier(input_matrix, bootstrap=False, criterion=entropy, max_features=0.6000000000000001, min_samples_leaf=17, min_samples_split=11, n_estimators=100)
0.8191666666666667


# Analysing TPOT's stability

You will now see the random nature of TPOT by constructing the classifier with different random states and seeing what model is found to be best by the algorithm. This assists to see that TPOT is quite unstable when not run for a reasonable amount of time.

In [35]:
# Create the tpot classifier 
tpot_clf = TPOTClassifier(generations=2, population_size=4, offspring_size=3, scoring='accuracy', cv=2,
                          verbosity=2, random_state=42)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

                                                                           
Generation 1 - Current best internal CV score: 0.8200416666666667
                                                                            
Generation 2 - Current best internal CV score: 0.8200416666666667
                                                                            
Best pipeline: ExtraTreesClassifier(input_matrix, bootstrap=True, criterion=entropy, max_features=0.6000000000000001, min_samples_leaf=15, min_samples_split=10, n_estimators=100)
0.8198333333333333


In [36]:
# Create the tpot classifier 
tpot_clf = TPOTClassifier(generations=2, population_size=4, offspring_size=3, scoring='accuracy', cv=2,
                          verbosity=2, random_state=122)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

                                                                           
Generation 1 - Current best internal CV score: 0.8113333333333332
                                                                            
Generation 2 - Current best internal CV score: 0.8113333333333332
                                                                            
Best pipeline: LogisticRegression(RobustScaler(input_matrix), C=0.5, dual=False, penalty=l2)
0.8023333333333333


In [37]:
# Create the tpot classifier 
tpot_clf = TPOTClassifier(generations=2, population_size=4, offspring_size=3, scoring='accuracy', cv=2,
                          verbosity=2, random_state=99)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

                                                                           
Generation 1 - Current best internal CV score: 0.8160000000000001
                                                                            
Generation 2 - Current best internal CV score: 0.8189583333333333
                                                                            
Best pipeline: GradientBoostingClassifier(RFE(input_matrix, criterion=entropy, max_features=0.8500000000000001, n_estimators=100, step=0.6000000000000001), learning_rate=0.5, max_depth=1, max_features=0.7000000000000001, min_samples_leaf=10, min_samples_split=17, n_estimators=100, subsample=0.6000000000000001)
0.8121666666666667
