# Decision Tree for Regression
In this document, we develop a decision tree regressor using scikit-sklearn for California Housing Dataset. This dataset involves predicting house prices in California districts, with features like median income, house age, and population.

## Import Libraries

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV       # for hyper-parameter tuning

import warnings
warnings.filterwarnings('ignore')

## Load Dataset

In [2]:
dataset = datasets.fetch_california_housing()
X, y = dataset.data, dataset.target

In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## Fit Model

In [4]:
param_grid = {
    'criterion'         : ['squared_error', 'friedman_mse'], # the criterion for splitting
    'max_depth'         : ['None', 10, 50, 100],             # maximum depth of tree for convergence
    'min_samples_split' : [2, 5, 10],                        # minumum number of samples required to split an internal node (convergence)
    'min_samples_leaf'  : [1, 2, 4]                          # minimum number of samples required to be at a leaf node (convergence)
}

reg = DecisionTreeRegressor(random_state=42)

grid_search = GridSearchCV(
    estimator  = reg,
    param_grid = param_grid,
    cv         = 5,
    scoring    = 'neg_mean_squared_error',
    n_jobs     = -1
)

grid_search.fit(X_train, y_train)
best_parameters = grid_search.best_params_
best_score      = grid_search.best_score_
best_reg        = grid_search.best_estimator_

print("Best Parameters:")
for k, v in best_parameters.items():
    print(f"{k}: {v}")

print(f"Training Loss: {best_score:.4f}")

Best Parameters:
criterion: friedman_mse
max_depth: 10
min_samples_leaf: 4
min_samples_split: 2
Training Loss: -0.3868


## Testing

In [5]:
from sklearn.metrics import mean_squared_error

y_pred = best_reg.predict(X_test)
loss   = mean_squared_error(y_pred, y_test)
print(f"Loss: {loss:.4f}")

Loss: 0.4084
