## Regression Tree

A regression tree is a type of decision tree used for regression tasks, where the goal is to predict a continuous target variable rather than a categorical one. In a regression tree, each leaf node represents a prediction for the target variable based on the input features, and the tree structure is learned from the training data by recursively partitioning the feature space into regions that minimize the variance of the target variable within each region.

### Working of Regression Tree:

1. **Node Splitting**: 
   - Starting from the root node, the algorithm selects the best feature and the best split point to partition the dataset into two subsets.
   - The "best" split is determined based on a criterion that minimizes the variance of the target variable within each subset.
   - This process is repeated recursively for each subset until a stopping criterion is met (e.g., maximum depth, minimum samples per leaf).

2. **Leaf Node Prediction**:
   - Once a stopping criterion is reached, the algorithm assigns a prediction value to each leaf node.
   - The prediction value is typically the mean (or median) of the target variable values within the leaf node.

3. **Prediction**:
   - To make predictions for new data points, the algorithm traverses the tree from the root node to a leaf node based on the values of the input features.
   - The predicted value for the target variable is the prediction value associated with the leaf node reached.

### Splitting Criteria:

In regression trees, the splitting criterion is typically based on the reduction in variance achieved by the split. Common splitting criteria include:

1. **Variance Reduction (MSE)**:
   - Measures the reduction in mean squared error (MSE) achieved by splitting the dataset based on a particular feature and split point.
   - The split that minimizes the weighted sum of the variances of the target variable within each subset is chosen.

### Hyperparameters:

Some common hyperparameters for regression trees include:

1. **Max Depth**: The maximum depth of the tree.
2. **Min Samples Split**: The minimum number of samples required to split an internal node.
3. **Min Samples Leaf**: The minimum number of samples required to be at a leaf node.
4. **Max Features**: The number of features to consider when looking for the best split.
5. **Max Leaf Nodes**: The maximum number of leaf nodes in the tree.
6. **Min Impurity Decrease**: A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
7. **Min Impurity Split**: Threshold for early stopping in tree growth.

### Advantages of Regression Trees:

- Interpretability: Regression trees are easy to interpret and visualize, making them suitable for exploratory data analysis.
- Non-linearity: They can capture non-linear relationships between the input features and the target variable.
- Robustness to outliers: Regression trees are less sensitive to outliers compared to linear regression models.

### Limitations of Regression Trees:

- Overfitting: They are prone to overfitting, especially when the tree depth is not controlled.
- Instability: Small changes in the data can lead to large changes in the structure of the tree.
- Lack of smoothness: Regression trees produce piecewise constant predictions, which may not capture continuous trends in the data as effectively as other models.

In [16]:
import pandas as pd
from pandas_datareader import data
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import r2_score
from sklearn.datasets import load_boston
from sklearn.model_selection import GridSearchCV

In [4]:
boston = load_boston()
df = pd.DataFrame(boston.data)

In [6]:
df.columns = boston.feature_names
df['MEDV'] = boston.target

In [7]:
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [8]:
X = df.iloc[:,0:13]
y = df.iloc[:,13]

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)

In [11]:
rt = DecisionTreeRegressor(criterion = 'mse', max_depth=5)

In [12]:
rt.fit(X_train,y_train)

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=5,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=None, splitter='best')

In [13]:
y_pred = rt.predict(X_test)

In [14]:
r2_score(y_test,y_pred)

0.8833565347917997

# Hyperparameter Tuning

In [17]:
param_grid = {
    'max_depth':[2,4,8,10,None],
    'criterion':['mse','mae'],
    'max_features':[0.25,0.5,1.0],
    'min_samples_split':[0.25,0.5,1.0]
}

In [19]:
reg = GridSearchCV(DecisionTreeRegressor(),param_grid=param_grid)

In [20]:
reg.fit(X_train,y_train)

GridSearchCV(cv=None, error_score=nan,
             estimator=DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse',
                                             max_depth=None, max_features=None,
                                             max_leaf_nodes=None,
                                             min_impurity_decrease=0.0,
                                             min_impurity_split=None,
                                             min_samples_leaf=1,
                                             min_samples_split=2,
                                             min_weight_fraction_leaf=0.0,
                                             presort='deprecated',
                                             random_state=None,
                                             splitter='best'),
             iid='deprecated', n_jobs=None,
             param_grid={'criterion': ['mse', 'mae'],
                         'max_depth': [2, 4, 8, 10, None],
                         'max_features'

In [21]:
reg.best_score_

0.6452352174104019

In [22]:
reg.best_params_

{'criterion': 'mse',
 'max_depth': None,
 'max_features': 0.5,
 'min_samples_split': 0.25}

# Feature Importance

In [15]:
for importance, name in sorted(zip(rt.feature_importances_, X_train.columns),reverse=True):
  print (name, importance)

RM 0.6344993240692652
LSTAT 0.19426427075925173
CRIM 0.07395590730917082
DIS 0.06744514557703153
B 0.011905660139828182
AGE 0.006176126174365511
PTRATIO 0.004391097507128497
NOX 0.0035610403857026535
INDUS 0.002627468726682041
RAD 0.0011739593515739223
ZN 0.0
TAX 0.0
CHAS 0.0
