# Bagged Trees Regression

* This model is based on the evaluation of the predictions produced by more than one decision tree, which is based on the bootstrap method.

## 1-)Data Preprocessing

In [2]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [5]:
hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.25, 
                                                    random_state=42)

## 2-) Model

In [6]:
from sklearn.ensemble import BaggingRegressor

In [7]:
bag_model = BaggingRegressor(bootstrap_features = True)
bag_model.fit(X_train, y_train)

BaggingRegressor(bootstrap_features=True)

In [9]:
bag_model.n_estimators

#n_estimators = 10 means that, in this model, 10 different trees will be created.

10

In [10]:
bag_model.estimators_


[DecisionTreeRegressor(random_state=1637822621),
 DecisionTreeRegressor(random_state=732526238),
 DecisionTreeRegressor(random_state=217250275),
 DecisionTreeRegressor(random_state=1142408285),
 DecisionTreeRegressor(random_state=1458852853),
 DecisionTreeRegressor(random_state=99234303),
 DecisionTreeRegressor(random_state=1865253721),
 DecisionTreeRegressor(random_state=1026835606),
 DecisionTreeRegressor(random_state=759358013),
 DecisionTreeRegressor(random_state=1479874635)]

* Above shows 10 different DecisionTreeRegressor tree models




* These  10 models consist of samples with different index values, the random_state values are different from each other.

In [12]:
bag_model.estimators_[1]

DecisionTreeRegressor(random_state=732526238)

In [13]:
bag_model.estimators_[2]

DecisionTreeRegressor(random_state=217250275)

## 3-) Prediction

In [15]:
from sklearn.metrics import mean_squared_error

In [16]:
y_pred = bag_model.predict(X_test)

In [17]:
test_error_before=np.sqrt(mean_squared_error(y_test, y_pred))
test_error_before #test error before  model tuning

366.9114623177455

### 3-1)Prediction by using 2.subset DecisionTreeRegressor

* We will create our model by choosing the 2nd model tree among 10 tree models. 2nd model tree has  first index



* We want to see the test error we will get from this model.



* We will compare this value with the model test error we obtained using 10 trees.

In [43]:
second_tree_model=bag_model.estimators_[1]

In [44]:
second_tree_model.fit(X_train, y_train)

DecisionTreeRegressor(random_state=732526238)

In [45]:
y_pred_second=second_tree_model.predict(X_test)

In [46]:
test_error_before=np.sqrt(mean_squared_error(y_test, y_pred_second))
test_error_before #test error before  model tuning

434.61636089466094

## 4-) Model Tuning

* In this section, we will try to determine the optimum **"n_estimators**  with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **"n_estimators** .





* **"n_estimators** are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.


In [51]:
from sklearn.model_selection import GridSearchCV

In [52]:
bag_model = BaggingRegressor(bootstrap_features = True)
bag_model.fit(X_train, y_train)

BaggingRegressor(bootstrap_features=True)

In [53]:
bag_params = {"n_estimators": range(2,20)}

In [54]:
bag_cv_model = GridSearchCV(bag_model, bag_params, cv = 10)

In [55]:
bag_cv_model.fit(X_train, y_train)

GridSearchCV(cv=10, estimator=BaggingRegressor(bootstrap_features=True),
             param_grid={'n_estimators': range(2, 20)})

In [56]:
bag_cv_model.best_params_

{'n_estimators': 16}

### 4.1)Tuned Model

In [57]:
bag_tuned = BaggingRegressor( n_estimators = 16, random_state = 45)

In [58]:
bag_tuned.fit(X_train, y_train)

BaggingRegressor(n_estimators=16, random_state=45)

In [59]:
y_pred2= bag_tuned.predict(X_test)

In [60]:
test_error_after=np.sqrt(mean_squared_error(y_test, y_pred2))
test_error_after # test error after model tuning

344.6182842519663