# Carseats Sales Prediction using Tree-Based Models

## Objective
The goal of this project is to predict numerical sales values using:
- Regression Trees
- Pruned Regression Trees (via Cross-Validation)
- Bagging

In [6]:
import pandas as pd
import numpy as np
import sys
import os
sys.path.append(os.path.abspath(".."))

from src.data_loader import load_carseats
from src.preprocess import preprocess
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [8]:
# Load and Prepare data

df = load_carseats("../data/Carseats.csv")
df = preprocess(df)

X = df.drop("Sales", axis=1)
y = df["Sales"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


## Regression Tree

We first fit a regression tree on the training data and evaluate its performance on the test set.


In [11]:
from joblib import load

tree = load("../models/tree_model.joblib")
y_pred_tree = tree.predict(X_test)

mse_tree = mean_squared_error(y_test, y_pred_tree)
mse_tree


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


6.093451666666666

## Pruned Regression Tree

Using cost-complexity pruning and cross-validation, we selected an optimal tree size.


In [12]:
pruned_tree = load("../models/pruned_tree_model.joblib")
y_pred_pruned = pruned_tree.predict(X_test)

mse_pruned = mean_squared_error(y_test, y_pred_pruned)
mse_pruned


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


4.615269382245914

## Bagging Analysis
Bagging aggregates multiple trees trained on bootstrap samples.
This reduces variance and usually improves predictive performance.


In [13]:
bagging = load("../models/bagging_model.joblib")
y_pred_bagging = bagging.predict(X_test)

mse_bagging = mean_squared_error(y_test, y_pred_bagging)
mse_bagging


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


2.5759218064999994

In [14]:
pd.DataFrame({
    "Model": ["Regression Tree", "Pruned Tree", "Bagging"],
    "Test MSE": [mse_tree, mse_pruned, mse_bagging]
})


Unnamed: 0,Model,Test MSE
0,Regression Tree,6.093452
1,Pruned Tree,4.615269
2,Bagging,2.575922


## Model Performance Analysis

We evaluated three tree-based regression models on the same train–test split using **Mean Squared Error (MSE)** as the primary evaluation metric.  
Lower MSE values indicate better predictive performance.

| Model            | Test MSE |
|------------------|----------|
| Regression Tree  | 6.093452 |
| Pruned Tree      | 4.615269 |
| Bagging          | 2.575922 |

---

## 1. Regression Tree (Baseline)

The **unpruned regression tree** exhibits the weakest performance among the evaluated models.

### Interpretation
- Single decision trees are **high-variance** models.
- They tend to overfit the training data by capturing noise.
- As a result, generalization to unseen data is limited.

### Conclusion
This model is useful as a **baseline**, but its relatively high test MSE indicates that it is not suitable for reliable prediction on its own.

---

## 2. Pruned Tree (Improved Bias–Variance Tradeoff)

Applying **cost-complexity pruning** reduces the test MSE from **6.09 to 4.62**, representing an improvement of approximately **24%**.

### Interpretation
- Pruning removes splits that do not contribute meaningfully to predictive performance.
- This reduces model variance while slightly increasing bias.
- The net effect is improved generalization.

### Key Insight
The performance gain confirms that the original regression tree was **overfitting**, and pruning helped control model complexity.

### Conclusion
Pruning is an essential step when using decision trees in practice, but it does not fully address the variance issue inherent to tree-based models.

---

## 3. Bagging (Ensemble Learning)

Bagging delivers the strongest performance, reducing test MSE to **2.58**, which corresponds to:
- ~44% improvement over the pruned tree
- ~58% improvement over the unpruned tree

### Interpretation
- Bagging aggregates predictions from multiple trees trained on bootstrapped samples.
- This averaging process significantly reduces variance.
- Decision trees benefit greatly from bagging due to their instability.

### Theoretical Justification
This result aligns with learning theory:
> Ensemble methods improve predictive accuracy by stabilizing high-variance base learners.

---

## Comparative Summary

| Aspect          | Regression Tree | Pruned Tree | Bagging |
|-----------------|-----------------|-------------|---------|
| Overfitting     | High            | Moderate    | Low     |
| Bias            | Low             | Slightly Higher | Low |
| Variance        | Very High       | Reduced     | Very Low |
| Generalization  | Poor            | Improved    | Best    |

---

## Final Conclusion

1. **Controlling model complexity is necessary**  
   Pruning improves generalization but has limited impact on variance reduction.

2. **Variance reduction is the key driver of performance**  
   Bagging effectively addresses the primary weakness of decision trees.

3. **Recommended model**  
   The **Bagging model** is the most suitable choice for deployment due to its superior generalization performance.

---

## Project Takeaway

This experiment demonstrates how moving from a single decision tree to a pruned tree and finally to an ensemble method leads to syste
