# 🎓 Chapter 5: Gradient Boosting with XGBoost

In this notebook, we will train a gradient boosting model using XGBoost for predicting power consumption. This includes cross-validation, plotting the learning curve, saving the model, and interpreting results.

## 1. Install & Import Required Libraries

In [None]:
!pip install xgboost --quiet

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xgboost as xgb

## 2. Load Dataset and Prepare Features

In [None]:
df = pd.read_csv('data/features.csv')
X = df.drop(columns=['Global_active_power', 'datetime'])
y = df['Global_active_power']

## 3. Convert Dataset to DMatrix Format

In [None]:
dtrain = xgb.DMatrix(X, label=y)

## 4. Define XGBoost Training Parameters

In [None]:
params = {
    'objective': 'reg:squarederror',
    'max_depth': 6,
    'eta': 0.1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'seed': 42
}

## 5. Cross-Validation

In [None]:
cv_results = xgb.cv(
    params=params,
    dtrain=dtrain,
    num_boost_round=100,
    nfold=5,
    metrics='rmse',
    early_stopping_rounds=10,
    verbose_eval=False
)

cv_results.tail()

## 6. Train Final Model

In [None]:
model = xgb.train(params=params, dtrain=dtrain, num_boost_round=cv_results.shape[0])

## 7. Save the Trained Model

In [None]:
model.save_model('models/xgb_energy_model.json')

## 8. Plot Learning Curve

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(cv_results['train-rmse-mean'], label='Train')
plt.plot(cv_results['test-rmse-mean'], label='Test')
plt.title('Learning Curve (RMSE)')
plt.xlabel('Boosting Round')
plt.ylabel('RMSE')
plt.legend()
plt.grid(True)
plt.show()

## 9. Feature Importance

In [None]:
xgb.plot_importance(model, max_num_features=10)
plt.title('Top 10 Feature Importances')
plt.show()