# Day 7: Trying Advanced Models - Random Forest & Gradient Boosting

In Day 6, we built a basic model.  
Now, we’ll try **Random Forest** and **Gradient Boosting**, two powerful ensemble methods.  
We’ll evaluate their performance and save train-test splits for future experiments.


In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
stock_data = pd.read_csv("synthetic_stock_data.csv")

# Features and target
X = stock_data[["open", "high", "low", "volume"]]
y = stock_data["close"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# ---- Random Forest ----
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
rf_preds = rf_model.predict(X_test)

rf_mse = mean_squared_error(y_test, rf_preds)
rf_r2 = r2_score(y_test, rf_preds)

# ---- Gradient Boosting ----
gb_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)
gb_preds = gb_model.predict(X_test)

gb_mse = mean_squared_error(y_test, gb_preds)
gb_r2 = r2_score(y_test, gb_preds)

# ---- Print results ----
print("🔹 Random Forest → MSE:", round(rf_mse, 2), " R²:", round(rf_r2, 2))
print("🔹 Gradient Boosting → MSE:", round(gb_mse, 2), " R²:", round(gb_r2, 2))


🔹 Random Forest → MSE: 0.24  R²: 0.98
🔹 Gradient Boosting → MSE: 0.18  R²: 0.99


In [5]:
import joblib

# Save train-test splits
joblib.dump(X_train, "X_train.pkl")
joblib.dump(X_test, "X_test.pkl")
joblib.dump(y_train, "y_train.pkl")
joblib.dump(y_test, "y_test.pkl")

print("✅ Train-test splits saved for later use!")


✅ Train-test splits saved for later use!


In [6]:
import os
print(os.listdir())


['.ipynb_checkpoints', 'best_gb_model.pkl', 'data', 'day_1_introduction.ipynb', 'day_2_data_creation.ipynb', 'day_3_eda.ipynb', 'day_4_feature_engineering.ipynb', 'day_5_baseline_model.ipynb', 'day_6_model_building.ipynb', 'day_7_advanced_models.ipynb', 'day_8_model_comparison.ipynb', 'day_9_hyperparameter_tuning.ipynb', 'notebooks', 'src', 'synthetic_stock_data.csv', 'synthetic_stock_data_with_features.csv', 'Untitled.ipynb', 'X_test.pkl', 'X_train.pkl', 'y_test.pkl', 'y_train.pkl']


### ✅ Conclusion
- Random Forest and Gradient Boosting both performed very well.  
- Gradient Boosting achieved slightly better performance.  
- We saved the train-test splits (`X_train.pkl`, `X_test.pkl`, `y_train.pkl`, `y_test.pkl`) for use in later experiments.
