# Hyperparameter Tuning with Optuna
Purpose: To select the best classifier model and hyperparameters

Data: [Telco Churn Data](https://www.kaggle.com/datasets/blastchar/telco-customer-churn)

Metric: Recall

In [11]:
import pandas as pd
import sklearn
import optuna

# Optuna

Option 1: AdaBoost 

Option 2: LogisticRegressionCV

---

In [12]:
df = pd.read_csv('preprocessed.csv') #insert dataset here

y = df.pop('Churn')
X = df

In [15]:
%%time

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

def objective(trial):
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

    # Selecting the best model out of these two candidates
    classifier_name = trial.suggest_categorical("classifier", ["Logistic", "AdaBoost"])
    if classifier_name == "Logistic":
        
        # Add parameters here
        penalty = trial.suggest_categorical('penalty', ['l2', 'l1'])
        if penalty == 'l1':
            solver = 'saga'
        else:
            solver = 'lbfgs'
        regularization = trial.suggest_uniform('logistic-regularization', 0.01, 10)
        model = LogisticRegression(penalty=penalty, 
                                   C=regularization, 
                                   solver=solver, 
                                   random_state=0)
    else:
        
        # Add parameters here
        ada_n_estimators = trial.suggest_int("n_estimators", 10, 500, step = 10)
        ada_learning_rate = trial.suggest_float("learning_rate", 0.1, 3)
        
        model = sklearn.ensemble.AdaBoostClassifier(
            n_estimators=ada_n_estimators,
            learning_rate = ada_learning_rate,
            random_state=0
        )

    model.fit(X_train, y_train)
    recall = model.score(X_test, y_test)
    return recall


# 3. Create a study object and optimize the objective function.
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=200)

print("Number of finished trials: {}".format(len(study.trials)))

print("Best trial:")
trial = study.best_trial

print("  Value: {}".format(trial.value))

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

[32m[I 2022-09-06 20:24:14,715][0m A new study created in memory with name: no-name-365e0396-9684-4f56-82ca-80a980bad6b9[0m
[32m[I 2022-09-06 20:24:15,247][0m Trial 0 finished with value: 0.2601279317697228 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 350, 'learning_rate': 2.6820523278655486}. Best is trial 0 with value: 0.2601279317697228.[0m
[32m[I 2022-09-06 20:24:16,446][0m Trial 1 finished with value: 0.8002842928216063 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 320, 'learning_rate': 1.4970886285073715}. Best is trial 1 with value: 0.8002842928216063.[0m
[32m[I 2022-09-06 20:24:16,672][0m Trial 2 finished with value: 0.7981520966595593 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 60, 'learning_rate': 1.737100491636285}. Best is trial 1 with value: 0.8002842928216063.[0m
[32m[I 2022-09-06 20:24:16,774][0m Trial 3 finished with value: 0.7398720682302772 and parameters: {'classifier': 'Logistic', 'penalty': 'l1', 'logistic-reg

[32m[I 2022-09-06 20:24:28,280][0m Trial 22 finished with value: 0.806680881307747 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 110, 'learning_rate': 0.18839990969001152}. Best is trial 21 with value: 0.806680881307747.[0m
[32m[I 2022-09-06 20:24:28,326][0m Trial 23 finished with value: 0.8137882018479033 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.555344080589061}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:28,372][0m Trial 24 finished with value: 0.8137882018479033 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.5518318016037921}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:28,455][0m Trial 25 finished with value: 0.7974413646055437 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 20, 'learning_rate': 0.6416417273836444}. Best is trial 23 with value: 0.8137882018479033.[0m
STOP: TOTAL NO. of ITERATIONS REACHED LIM

[32m[I 2022-09-06 20:24:32,873][0m Trial 48 finished with value: 0.7739872068230277 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 1.5950159801270372}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:34,115][0m Trial 49 finished with value: 0.7995735607675906 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 320, 'learning_rate': 0.39890582787312556}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:34,313][0m Trial 50 finished with value: 0.8024164889836531 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 50, 'learning_rate': 0.8811385664908157}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:34,624][0m Trial 51 finished with value: 0.806680881307747 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 80, 'learning_rate': 0.34382822070239466}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:35,164][0m T

[32m[I 2022-09-06 20:24:42,453][0m Trial 77 finished with value: 0.7903340440653873 and parameters: {'classifier': 'Logistic', 'penalty': 'l2', 'logistic-regularization': 2.4337697273176335}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:42,500][0m Trial 78 finished with value: 0.8052594171997157 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.5056083613266483}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:42,955][0m Trial 79 finished with value: 0.8059701492537313 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 120, 'learning_rate': 0.20999445073531886}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:44,780][0m Trial 80 finished with value: 0.7981520966595593 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 490, 'learning_rate': 0.6363931733956808}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:44,90

[32m[I 2022-09-06 20:24:50,876][0m Trial 109 finished with value: 0.8059701492537313 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 20, 'learning_rate': 0.44101928726051476}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:51,038][0m Trial 110 finished with value: 0.8009950248756219 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 40, 'learning_rate': 0.16470916247413295}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:51,160][0m Trial 111 finished with value: 0.8073916133617626 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 30, 'learning_rate': 0.5644761482016566}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:51,426][0m Trial 112 finished with value: 0.8045486851457001 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 70, 'learning_rate': 0.34733539259857593}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:51,472]

[32m[I 2022-09-06 20:24:55,603][0m Trial 140 finished with value: 0.7945984363894811 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.7100757857705924}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:55,653][0m Trial 141 finished with value: 0.8017057569296375 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.6366340980881569}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:55,741][0m Trial 142 finished with value: 0.8059701492537313 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 20, 'learning_rate': 0.5330743800814837}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:55,870][0m Trial 143 finished with value: 0.8052594171997157 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 30, 'learning_rate': 0.43379237528741166}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:24:55,962][0

[32m[I 2022-09-06 20:25:01,602][0m Trial 173 finished with value: 0.7945984363894811 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.6719038855753406}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:25:01,654][0m Trial 174 finished with value: 0.8095238095238095 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.6044886019302786}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:25:01,743][0m Trial 175 finished with value: 0.8024164889836531 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 20, 'learning_rate': 0.7235054183179682}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:25:01,792][0m Trial 176 finished with value: 0.8137882018479033 and parameters: {'classifier': 'AdaBoost', 'n_estimators': 10, 'learning_rate': 0.5896560165280798}. Best is trial 23 with value: 0.8137882018479033.[0m
[32m[I 2022-09-06 20:25:01,842][0m

Number of finished trials: 200
Best trial:
  Value: 0.8137882018479033
  Params: 
    classifier: AdaBoost
    n_estimators: 10
    learning_rate: 0.555344080589061
CPU times: user 51.5 s, sys: 10.3 s, total: 1min 1s
Wall time: 48.9 s


# Test

In [17]:
%%time

# Adaboost
from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(n_estimators=10, 
                           learning_rate = 0.555344080589061,
                           random_state=0)
model.fit(X_train, y_train)

recall = model.score(X_test, y_test)
print(f"Accuracy (recall) is {round(recall, 4)}\n")

Accuracy (recall) is 0.8138

CPU times: user 67.1 ms, sys: 3.18 ms, total: 70.3 ms
Wall time: 69.4 ms
