# 🤝 Chapter 5: 2-Hour Pair Programming Task – Gradient Boosting with XGBoost

This notebook contains core tasks and **stretch challenges**. Fill in all `___` blanks to complete your implementation.

## ✅ Core Task 1: Install and Import Packages

In [None]:
!pip install ___ --quiet

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xgboost as ___

## ✅ Core Task 2: Load & Prepare Dataset

In [None]:
df = pd.read_csv('data/features.csv')
X = df.drop(columns=['Global_active_power', 'datetime'])
y = df['___']

## ✅ Core Task 3: Convert Data to DMatrix

In [None]:
dtrain = xgb.___(X, label=y)

## ✅ Core Task 4: Define Training Parameters

In [None]:
params = {
    'objective': 'reg:___',
    'max_depth': ___,
    'eta': ___,
    'subsample': ___,
    'colsample_bytree': ___,
    'seed': 42
}

## ✅ Core Task 5: Cross-Validation

In [None]:
cv_results = xgb.cv(
    params=params,
    dtrain=___,
    num_boost_round=100,
    nfold=___,
    metrics='___',
    early_stopping_rounds=10,
    verbose_eval=False
)

cv_results.tail()

## ✅ Core Task 6: Train and Save Model

In [None]:
model = xgb.___(params=params, dtrain=dtrain, num_boost_round=cv_results.shape[0])
model.save_model('models/xgb_energy_model.json')

## ✅ Core Task 7: Plot Learning Curve

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(cv_results['train-rmse-mean'], label='Train')
plt.plot(cv_results['test-rmse-mean'], label='Test')
plt.legend()
plt.title('Learning Curve (RMSE)')
plt.xlabel('Boosting Round')
plt.ylabel('RMSE')
plt.grid(True)
plt.show()

---

## 💪 Stretch Task 1: Feature Importance Visualization

In [None]:
xgb.plot_importance(model, max_num_features=10)
plt.title('Top 10 Feature Importances')
plt.show()

## 💪 Stretch Task 2: Model Inference (Test Run)

In [None]:
# Simulate single prediction (e.g. first row of X)
X_example = X.iloc[[0]]
dmatrix = xgb.DMatrix(X_example)
y_pred_example = model.predict(dmatrix)
print('Predicted power usage:', y_pred_example[0])

## 💪 Stretch Task 3: Save Predictions to CSV

In [None]:
dall = xgb.DMatrix(X)
y_all_preds = model.predict(dall)
df['Predicted_Power'] = y_all_preds
df[['datetime', 'Global_active_power', 'Predicted_Power']].to_csv('data/predictions.csv', index=False)