# Day 9: Hyperparameter Tuning

So far, we trained Random Forest and Gradient Boosting with default settings.  
Today, we will perform **hyperparameter tuning** using `GridSearchCV` to find the best model configuration.  

Steps:
1. Load train-test splits from Day 7.  
2. Define parameter grids for Random Forest and Gradient Boosting.  
3. Run GridSearchCV to find the best hyperparameters.  
4. Evaluate the best models on the test set.  
5. Save the best-performing model for future use.


In [2]:
import joblib

# Load train-test splits
X_train = joblib.load("X_train.pkl")
X_test = joblib.load("X_test.pkl")
y_train = joblib.load("y_train.pkl")
y_test = joblib.load("y_test.pkl")

print("✅ Train-test data loaded successfully!")


✅ Train-test data loaded successfully!


In [3]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor

# Define parameter grid
rf_params = {
    "n_estimators": [50, 100, 200],
    "max_depth": [None, 5, 10],
    "min_samples_split": [2, 5, 10]
}

rf_grid = GridSearchCV(
    RandomForestRegressor(random_state=42),
    rf_params,
    cv=3,
    scoring="neg_mean_squared_error",
    n_jobs=-1
)

rf_grid.fit(X_train, y_train)

print("🔹 Best RF Params:", rf_grid.best_params_)
print("🔹 Best RF CV Score:", -rf_grid.best_score_)


🔹 Best RF Params: {'max_depth': 5, 'min_samples_split': 2, 'n_estimators': 100}
🔹 Best RF CV Score: 0.28519176077257774


In [4]:
from sklearn.ensemble import GradientBoostingRegressor

# Define parameter grid
gb_params = {
    "n_estimators": [100, 200],
    "learning_rate": [0.05, 0.1, 0.2],
    "max_depth": [2, 3, 4]
}

gb_grid = GridSearchCV(
    GradientBoostingRegressor(random_state=42),
    gb_params,
    cv=3,
    scoring="neg_mean_squared_error",
    n_jobs=-1
)

gb_grid.fit(X_train, y_train)

print("🔹 Best GB Params:", gb_grid.best_params_)
print("🔹 Best GB CV Score:", -gb_grid.best_score_)


🔹 Best GB Params: {'learning_rate': 0.05, 'max_depth': 2, 'n_estimators': 200}
🔹 Best GB CV Score: 0.27374185355154196


In [5]:
from sklearn.metrics import mean_squared_error, r2_score

# Evaluate best RF
best_rf = rf_grid.best_estimator_
rf_preds = best_rf.predict(X_test)
rf_mse = mean_squared_error(y_test, rf_preds)
rf_r2 = r2_score(y_test, rf_preds)

# Evaluate best GB
best_gb = gb_grid.best_estimator_
gb_preds = best_gb.predict(X_test)
gb_mse = mean_squared_error(y_test, gb_preds)
gb_r2 = r2_score(y_test, gb_preds)

print("✅ Test Results after Hyperparameter Tuning")
print("Random Forest → MSE:", round(rf_mse, 2), " R²:", round(rf_r2, 2))
print("Gradient Boosting → MSE:", round(gb_mse, 2), " R²:", round(gb_r2, 2))


✅ Test Results after Hyperparameter Tuning
Random Forest → MSE: 0.23  R²: 0.98
Gradient Boosting → MSE: 0.21  R²: 0.98


In [6]:
# Save best model (choose based on performance)
if gb_mse < rf_mse:
    joblib.dump(best_gb, "best_gb_model.pkl")
    print("🌟 Best Model: Gradient Boosting saved as best_gb_model.pkl")
else:
    joblib.dump(best_rf, "best_rf_model.pkl")
    print("🌟 Best Model: Random Forest saved as best_rf_model.pkl")


🌟 Best Model: Gradient Boosting saved as best_gb_model.pkl


### ✅ Conclusion
- Hyperparameter tuning improved both models’ performance.  
- Gradient Boosting generally performed slightly better.  
- The **best model was saved** for future predictions (`best_gb_model.pkl` or `best_rf_model.pkl`).  
