# Day 9: Hyperparameter Tuning

So far, we trained multiple models and compared their performance.  
In this step, I will try to improve model accuracy by tuning hyperparameters.

I will use **GridSearchCV** from scikit-learn to find the best parameters for:

- Random Forest  
- Gradient Boosting  

This process uses cross-validation to test different parameter combinations and selects the best set based on performance (R² score).


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset again (or reuse if already loaded)
data = pd.read_csv("synthetic_stock_data.csv")

# Features and target
X = data[["open", "high", "low", "volume"]]
y = data["close"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ---------------------------
# 🔹 Random Forest Tuning
# ---------------------------
rf_params = {
    "n_estimators": [50, 100, 200],
    "max_depth": [None, 5, 10],
    "min_samples_split": [2, 5, 10]
}

rf_grid = GridSearchCV(RandomForestRegressor(random_state=42),
                       rf_params,
                       cv=3,
                       scoring="r2",
                       n_jobs=-1)
rf_grid.fit(X_train, y_train)

print("Best RF Params:", rf_grid.best_params_)
print("Best RF CV Score:", rf_grid.best_score_)

# ---------------------------
# 🔹 Gradient Boosting Tuning
# ---------------------------
gb_params = {
    "n_estimators": [50, 100, 200],
    "learning_rate": [0.01, 0.05, 0.1],
    "max_depth": [3, 5, 7]
}

gb_grid = GridSearchCV(GradientBoostingRegressor(random_state=42),
                       gb_params,
                       cv=3,
                       scoring="r2",
                       n_jobs=-1)
gb_grid.fit(X_train, y_train)

print("Best GB Params:", gb_grid.best_params_)
print("Best GB CV Score:", gb_grid.best_score_)


Best RF Params: {'max_depth': 5, 'min_samples_split': 2, 'n_estimators': 100}
Best RF CV Score: 0.9822250411581845
Best GB Params: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
Best GB CV Score: 0.98127315055905


### ✅ Day 9 Summary — Hyperparameter Tuning  

- We used **GridSearchCV** to tune the hyperparameters of **Random Forest** and **Gradient Boosting**.  

- **Best Results:**  
  - **Random Forest** → Best Params: `max_depth=5, min_samples_split=2, n_estimators=100` → CV R² = **0.982**  
  - **Gradient Boosting** → Best Params: `learning_rate=0.1, max_depth=3, n_estimators=100` → CV R² = **0.981**  

👉 Both models performed very well after tuning, with **Random Forest slightly ahead**.  

📌 Next step (Day 10): Test these tuned models on the holdout **test set**.  
