# Housing Prices: Gradient Boosting Fine-Tuning and Optimization

**Date:** 2025-11-02  
**Author:** *Luis Renteria Lezano*

## Executive Summary
- **Goal:** Perform advanced hyperparameter tuning and optimization of a Gradient Boosting Regressor to improve predictive performance on house sale prices.  
- **Source:** Based on the cleaned and engineered dataset from the previous EDA stage [`/01_eda_and_feature_engineering.ipynb`](./01_eda_and_feature_engineering.ipynb).  
- **Data:** [`../data/interim/features_cleaned.csv`](../data/interim/features_cleaned.csv)  
- **Scope:** This notebook focuses exclusively on the Gradient Boosting model, exploring hyperparameter tuning, learning curves, feature importance, and residual analysis to identify the best-performing configuration.


## 0. Reproducibility & Environment Setup
- Pin versions in [`../requirements.txt`](../).
- Keep raw data immutable [`../data/raw/`](../data/raw/).
- Export model outputs (metrics, predictions, feature importance) tables to [`../reports/tables/`](../reports/tables/) and figures to [`../reports/figures/gradient_boosting`](../reports/figures/gradient_boosting).

In [None]:
#  Imports & basic setup
import os
import sys
import joblib
from time import time
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, KFold, cross_val_score
from sklearn.metrics import root_mean_squared_error, r2_score

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingRegressor

# Reproducibility
SEED = 42
np.random.seed(SEED)

# Load Script
sys.path.append('../src')
from build_features import build_features

# Paths
DATA_RAW = Path('../data/raw')
DATA_INTERIM = Path('../data/interim')
DATA_PROCESSED = Path('../data/processed/gradient_boosting')
REPORTS_TABLES = Path('../reports/tables')
REPORTS_FIGURES = Path('../reports/figures/gradient_boosting')
MODELS = Path('../models')

# Create directories if missing
for folder in [DATA_RAW, DATA_INTERIM, DATA_PROCESSED, REPORTS_TABLES, REPORTS_FIGURES, MODELS]:
    folder.mkdir(parents=True, exist_ok=True)

# Plot themes and palettes defaults
plt.rcParams['figure.figsize'] = (10, 5)
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.facecolor'] = '#fafafa'
plt.rcParams['grid.alpha'] = 0.3
sns.set_theme(style='whitegrid', palette='colorblind', context='notebook')

print('Folder statuses:')
for d in [DATA_RAW, DATA_INTERIM, DATA_PROCESSED, REPORTS_TABLES, REPORTS_FIGURES]:
    print('\t', d.resolve(), '- Ready' if d.exists() else '- Missing')
print('> Environment setup completed.')