# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [3]:
#Libraries
import pandas as pd
import numpy as np

import optuna
import optuna.visualization as vis
import time

import scipy.stats as st

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error, root_mean_squared_error, make_scorer

from sklearn.model_selection import cross_val_score

In [4]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


In [5]:
spaceship = spaceship.dropna(axis=0)

In [6]:
spaceship['Cabin'] = spaceship ['Cabin']. str[0]

In [7]:
spaceship = spaceship.drop(columns = ['PassengerId', 'Name'], axis=1)

In [11]:
#defining target and features
cols_non_numeric = ['HomePlanet','CryoSleep','Cabin','VIP','Transported']

X = spaceship.drop(columns=['Destination'])
y = spaceship['Destination']

# Drop rows with missing values
X = X.dropna(subset=cols_non_numeric)
y = y.loc[X.index]
X['Transported'].value_counts(dropna=False)
X['Transported'] = X['Transported'].astype(str)

In [22]:
#Divide test and train samples
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# One-hot encode - transforming categorical colums in numerical
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(
    handle_unknown='ignore',   # ignora categorias novas no teste
    sparse_output=False,       # retorna array "normal" em vez de matriz esparsa
    drop='first'               # evita colunas redundantes
)

ohe.fit(X_train[cols_non_numeric])

# Separa as colunas numéricas
X_train_num = X_train.drop(columns=cols_non_numeric)
X_test_num  = X_test.drop(columns=cols_non_numeric)

# Transforma as categóricas
X_train_cat = ohe.transform(X_train[cols_non_numeric])
X_test_cat  = ohe.transform(X_test[cols_non_numeric])

feature_names_out = ohe.get_feature_names_out(cols_non_numeric)

X_train_ohe_df = pd.DataFrame(X_train_cat, columns=feature_names_out, index=X_train.index)
X_test_ohe_df  = pd.DataFrame(X_test_cat,  columns=feature_names_out, index=X_test.index)

X_train_final = pd.concat([X_train_num, X_train_ohe_df], axis=1)
X_test_final  = pd.concat([X_test_num,  X_test_ohe_df], axis=1)

# Combina de volta (concatenando)
import numpy as np
#X_train_trans_np = np.concatenate([X_train_num, X_train_cat], axis=1) - errado
#X_test_trans_np  = np.concatenate([X_test_num, X_test_cat], axis=1) - errado

#pd.get_dummies() (do pandas) no caso vou usar pois esta dando erro nas procimas céluas, fala que Mars nao foi convertido

#✅ Mais simples e rápido para usar diretamente em DataFrames.
#✅ Ideal quando você está explorando dados ou pré-processando manualmente.
#✅ Retorna um DataFrame já pronto com as novas colunas.

X_train = pd.get_dummies(X_train, drop_first=True)
X_test  = pd.get_dummies(X_test, drop_first=True)

#Para garantir que ambas tenham as mesmas colunas
X_test = X_test.reindex(columns=X_train.columns, fill_value=0)
#🔸 Limitação:

#Ele faz a codificação imediatamente, e não guarda como transformar novos dados (ou seja, se chegar um dado novo, você teria que refazer o processo manualmente).

OneHotEncoder (do scikit-learn)

✅ Ideal quando você está construindo pipelines de Machine Learning.
✅ Ele “aprende” as categorias durante o fit() e pode transformar novos dados depois com o mesmo mapeamento.
✅ Pode gerar matrizes esparsas (mais leves para grandes datasets).

🔸 Um pouco mais técnico — precisa de fit() e transform():

In [23]:
X_train.select_dtypes(include='object').columns

Index(['HomePlanet', 'CryoSleep', 'Cabin', 'VIP', 'Transported'], dtype='object')

In [24]:
#pd.get_dummies() (do pandas) no caso vou usar pois esta dando erro nas procimas céluas, fala que Mars nao foi convertido

#✅ Mais simples e rápido para usar diretamente em DataFrames.
#✅ Ideal quando você está explorando dados ou pré-processando manualmente.
#✅ Retorna um DataFrame já pronto com as novas colunas.

X_train = pd.get_dummies(X_train, drop_first=True)
X_test  = pd.get_dummies(X_test, drop_first=True)

#Para garantir que ambas tenham as mesmas colunas
X_test = X_test.reindex(columns=X_train.columns, fill_value=0)
#🔸 Limitação:

#Ele faz a codificação imediatamente, e não guarda como transformar novos dados (ou seja, se chegar um dado novo, você teria que refazer o processo manualmente).

Now perform the same as before:
- Feature Scaling
- Feature Selection


In [25]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
normalizer = MinMaxScaler() # x_new = (x - min(x)) / (max(x) -min(x))

In [26]:
normalizer.fit(X_train_final)

In [27]:
X_train_norm_np = normalizer.transform(X_train)
X_test_norm_np = normalizer.transform(X_test)

In [28]:
X_train_norm_df = pd.DataFrame(X_train_norm_np, columns = X_train.columns, index=X_train.index)
X_test_norm_df = pd.DataFrame(X_test_norm_np, columns = X_test.columns, index=X_test.index)

Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

In [31]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def evaluate_classifier(name, model, X_train, X_test, y_train, y_test):
    """
    Treina o modelo, faz previsões e retorna métricas de avaliação.
    """
    # Treina o modelo
    model.fit(X_train, y_train)
    
    # Faz previsões
    y_pred = model.predict(X_test)
    
    # Calcula métricas
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred, average='weighted', zero_division=0)
    rec = recall_score(y_test, y_pred, average='weighted', zero_division=0)
    f1 = f1_score(y_test, y_pred, average='weighted', zero_division=0)
    
    # Exibe resultado
    print(f"\n{name}")
    print(f"Accuracy: {acc:.3f}")
    print(f"Precision: {prec:.3f}")
    print(f"Recall: {rec:.3f}")
    print(f"F1-score: {f1:.3f}")
    
    # Retorna um dicionário (pra montar DataFrame depois)
    return {
        'Model': name,
        'Accuracy': acc,
        'Precision': prec,
        'Recall': rec,
        'F1': f1
    }

In [32]:
# Imports necessários
from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

log_model = LogisticRegression(max_iter=1000, random_state=42)

# Lista de modelos
models = [
    ("Logistic Regression", log_model)
]

# Avaliando modelo
results = []
for name, model in models:
    results.append(evaluate_classifier(name, model, X_train_final, X_test_final, y_train, y_test))

# Organizando resultados
results_df = pd.DataFrame(results)
results_df = results_df.sort_values(by='F1', ascending=False)
print(results_df)



Logistic Regression
Accuracy: 0.718
Precision: 0.602
Recall: 0.718
F1-score: 0.614
                 Model  Accuracy  Precision    Recall        F1
0  Logistic Regression  0.717852   0.601706  0.717852  0.614085


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Evaluate your model

The model shows good overall performance, with an accuracy of about 72%.

Recall (0.72) indicates it correctly identifies most of the true classes.

Precision (0.60) is moderate, meaning some predictions are incorrect.

F1-score (0.61) shows a balanced but not perfect trade-off between precision and recall.

Overall, the model performs decently, but there’s room for improvement — especially in precision, possibly through hyperparameter tuning or feature optimization.

**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [None]:
#your code here

- Run Grid Search

In [42]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import time
import pandas as pd
import numpy as np
# First we need to setup a dicstionary with all the values that we want to try for each hyperparameter

# 1️⃣ Definir os hiperparâmetros que queremos testar
param_grid = {
    "C": [0.01, 0.1, 1, 10],        # força da regularização
    "solver": ["liblinear", "lbfgs"], # otimizadores (liblinear é bom para dados pequenos)
    "penalty": ["l2"],                # tipo de regularização
    "max_iter": [500, 1000]           # número máximo de iterações
}

# We create an instance or our machine learning model
dt = LogisticRegression(random_state=123)

# We need to set this two variables to be able to compute a confidence interval
confidence_level = 0.95
folds = 10

# Now we need to create an intance of the GridSearchCV class
folds = 10
gs = GridSearchCV(log_model, param_grid=param_grid, cv=folds, verbose=10, scoring="accuracy")

start_time = time.time()
gs.fit(X_train_norm_df, y_train)
end_time = time.time()

print("\n")
print(f"⏱️ Time taken to find best hyperparameters: {end_time - start_time:.4f} seconds\n")
# 5️⃣ Exibir os melhores resultados
print(f"🏆 Best parameters found: {gs.best_params_}")

print(f"📊 Best cross-validated accuracy: {gs.best_score_:.4f}\n")
results_gs_df = pd.DataFrame(gs.cv_results_).sort_values(by="mean_test_score", ascending=False)

# 6️⃣ Avaliar o melhor modelo no conjunto de teste
best_model = gs.best_estimator_
y_pred_test = best_model.predict(X_test_norm_df)


print(f"The R2 confidence interval for the best combination of hyperparameters is: \
    ({gs_lower_bound: .4f}, {gs_mean_score: .4f}, {gs_upper_bound: .4f}) ")

print("\n✅ Test set evaluation:")
print(f"Accuracy:  {accuracy_score(y_test, y_pred_test):.3f}")
print(f"Precision: {precision_score(y_test, y_pred_test, average='weighted'):.3f}")
print(f"Recall:    {recall_score(y_test, y_pred_test, average='weighted'):.3f}")
print(f"F1-score:  {f1_score(y_test, y_pred_test, average='weighted'):.3f}")

Fitting 10 folds for each of 16 candidates, totalling 160 fits
[CV 1/10; 1/16] START C=0.01, max_iter=500, penalty=l2, solver=liblinear........
[CV 1/10; 1/16] END C=0.01, max_iter=500, penalty=l2, solver=liblinear;, score=0.686 total time=   0.0s
[CV 2/10; 1/16] START C=0.01, max_iter=500, penalty=l2, solver=liblinear........
[CV 2/10; 1/16] END C=0.01, max_iter=500, penalty=l2, solver=liblinear;, score=0.686 total time=   0.0s
[CV 3/10; 1/16] START C=0.01, max_iter=500, penalty=l2, solver=liblinear........
[CV 3/10; 1/16] END C=0.01, max_iter=500, penalty=l2, solver=liblinear;, score=0.686 total time=   0.2s
[CV 4/10; 1/16] START C=0.01, max_iter=500, penalty=l2, solver=liblinear........
[CV 4/10; 1/16] END C=0.01, max_iter=500, penalty=l2, solver=liblinear;, score=0.684 total time=   0.0s
[CV 5/10; 1/16] START C=0.01, max_iter=500, penalty=l2, solver=liblinear........
[CV 5/10; 1/16] END C=0.01, max_iter=500, penalty=l2, solver=liblinear;, score=0.686 total time=   0.0s
[CV 6/10; 1/

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [43]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
import time
import pandas as pd
import numpy as np
import scipy.stats as st

# 1️⃣ Definir o grid de hiperparâmetros a testar
param_distributions = {
    "C": np.logspace(-3, 3, 10),               # força da regularização
    "solver": ["liblinear", "lbfgs"],          # otimizadores
    "penalty": ["l2"],                         # tipo de regularização
    "max_iter": [500, 1000, 2000]              # iterações máximas
}

# 2️⃣ Instanciar o modelo base
log_reg = LogisticRegression(random_state=123)

# 3️⃣ Definir os parâmetros do RandomizedSearchCV
folds = 10
confidence_level = 0.95

rs = RandomizedSearchCV(
    estimator=log_reg,
    param_distributions=param_distributions,
    n_iter=16,                     # número de combinações aleatórias
    cv=folds,
    verbose=10,
    random_state=123,
    scoring="accuracy"
)

# 4️⃣ Treinar o RandomizedSearch
start_time = time.time()
rs.fit(X_train_norm_df, y_train)
end_time = time.time()

print("\n⏱️ Time taken to find best hyperparameters: {:.4f} seconds\n".format(end_time - start_time))
print(f"🏆 Best parameters found: {rs.best_params_}")
print(f"📊 Best cross-validated accuracy: {rs.best_score_:.4f}")

# 5️⃣ Calcular intervalo de confiança para o score
results_rs_df = pd.DataFrame(rs.cv_results_).sort_values(by="mean_test_score", ascending=False)

rs_mean_score = results_rs_df.iloc[0, -3]
rs_sem = results_rs_df.iloc[0, -2] / np.sqrt(folds)
rs_tc = st.t.ppf(1 - ((1 - confidence_level) / 2), df=folds - 1)
rs_lower_bound = rs_mean_score - (rs_tc * rs_sem)
rs_upper_bound = rs_mean_score + (rs_tc * rs_sem)

print(f"📈 Confidence interval (Accuracy): ({rs_lower_bound:.4f}, {rs_mean_score:.4f}, {rs_upper_bound:.4f})")

# 6️⃣ Avaliar o melhor modelo no conjunto de teste
best_model = rs.best_estimator_
y_pred_test = best_model.predict(X_test_norm_df)

print("\n✅ Test set evaluation:")
print(f"Accuracy:  {accuracy_score(y_test, y_pred_test):.3f}")
print(f"Precision: {precision_score(y_test, y_pred_test, average='weighted'):.3f}")
print(f"Recall:    {recall_score(y_test, y_pred_test, average='weighted'):.3f}")
print(f"F1-score:  {f1_score(y_test, y_pred_test, average='weighted'):.3f}")

print("\n📊 Classification Report:")
print(classification_report(y_test, y_pred_test))

print("\n🧩 Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_test))


Fitting 10 folds for each of 16 candidates, totalling 160 fits
[CV 1/10; 1/16] START C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear
[CV 1/10; 1/16] END C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear;, score=0.679 total time=   0.0s
[CV 2/10; 1/16] START C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear
[CV 2/10; 1/16] END C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear;, score=0.682 total time=   0.0s
[CV 3/10; 1/16] START C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear
[CV 3/10; 1/16] END C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear;, score=0.692 total time=   0.0s
[CV 4/10; 1/16] START C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear
[CV 4/10; 1/16] END C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear;, score=0.669 total time=   0.0s
[CV 5/10; 1/16] START C=0.46415888336127775, max_iter=500, penalty=l2, solver=liblinear
[CV 5/10; 1/16] END C

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


- Evaluate your model

Grid Search
⏱️ Time taken to find best hyperparameters: 6.5766 seconds

🏆 Best parameters found: {'C': 0.01, 'max_iter': 500, 'penalty': 'l2', 'solver': 'liblinear'}
📊 Best cross-validated accuracy: 0.6857

The R2 confidence interval for the best combination of hyperparameters is:     ( 0.6853,  0.6857,  0.6860) 

✅ Test set evaluation:
Accuracy:  0.721
Precision: 0.520
Recall:    0.721
F1-score:  0.604

Random Search
⏱️ Time taken to find best hyperparameters: 8.1345 seconds

🏆 Best parameters found: {'solver': 'lbfgs', 'penalty': 'l2', 'max_iter': 2000, 'C': np.float64(0.004641588833612777)}
📊 Best cross-validated accuracy: 0.6857
📈 Confidence interval (Accuracy): (0.6853, 0.6857, 0.6860)

✅ Test set evaluation:
Accuracy:  0.721
Precision: 0.520
Recall:    0.721
F1-score:  0.604

📊 Classification Report:
               precision    recall  f1-score   support

  55 Cancri e       0.00      0.00      0.00       260
PSO J318.5-22       0.00      0.00      0.00       109
  TRAPPIST-1e       0.72      1.00      0.84       953

     accuracy                           0.72      1322
    macro avg       0.24      0.33      0.28      1322
 weighted avg       0.52      0.72      0.60      1322


🧩 Confusion Matrix:
[[  0   0 260]
 [  0   0 109]
 [  0   0 953]]


oth Grid Search and Random Search produced almost identical results, indicating that the model’s performance is quite stable across hyperparameter choices.

⚙️ Best parameters: Both searches selected similar configurations with strong regularization (very small C), suggesting that simpler models generalize better.

📊 Cross-validated accuracy: ~0.686 in both methods — consistent and reliable.

🧠 Test performance: Accuracy = 0.721, F1 = 0.604 — showing decent generalization.

⚠️ Class imbalance: The classification report and confusion matrix reveal that the model only predicts one dominant class (TRAPPIST-1e), ignoring the others completely.

🔍 Conclusion: Although the tuning methods agree, the logistic regression model struggles with minority classes — future steps should address class imbalance (e.g., class_weight='balanced' or oversampling).