# Exercise M6.01

The aim of this notebook is to investigate if we can tune the hyperparameters of a bagging regressor and evaluate the gain obtained.  

In [3]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

data, target = fetch_california_housing(as_frame=True, return_X_y=True)
target *= 100
data_train, data_test, target_train, target_test = train_test_split(data,
                                                                   target,
                                                                   random_state=0,
                                                                   test_size=.5)

Create a `BaggingRegressor` and provide a `DecisionTreeRegressor` to its parameter `base_estimator`. Train the regressor and evaluate its generalization performance on the testing set using the mean absolute error.

In [4]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor

In [6]:
bagged_trees = BaggingRegressor(
        base_estimator=DecisionTreeRegressor(), n_jobs=2)
bagged_trees.fit(data_train, target_train);
target_predicted = bagged_trees.predict(data_test)

In [7]:
from sklearn.metrics import mean_absolute_error
print(f"Basic mean absolute error of the bagging regressor:\n"
      f"{mean_absolute_error(target_test, target_predicted):.2f} k$")

Basic mean absolute error of the bagging regressor:
36.15 k$


Now, create a `RandomizedSearchCV` instance using the previous model and tune the important parameters of the bagging regressor. Find the best parameters and check if you are able to find aset of parameters that improve the default regressor still using the mean absolute error as metric.

In [8]:
bagged_trees.get_params()

{'base_estimator__ccp_alpha': 0.0,
 'base_estimator__criterion': 'squared_error',
 'base_estimator__max_depth': None,
 'base_estimator__max_features': None,
 'base_estimator__max_leaf_nodes': None,
 'base_estimator__min_impurity_decrease': 0.0,
 'base_estimator__min_samples_leaf': 1,
 'base_estimator__min_samples_split': 2,
 'base_estimator__min_weight_fraction_leaf': 0.0,
 'base_estimator__random_state': None,
 'base_estimator__splitter': 'best',
 'base_estimator': DecisionTreeRegressor(),
 'bootstrap': True,
 'bootstrap_features': False,
 'max_features': 1.0,
 'max_samples': 1.0,
 'n_estimators': 10,
 'n_jobs': 2,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

In [9]:
from scipy.stats import randint
from sklearn.model_selection import RandomizedSearchCV

In [17]:
param_grid = {
    "n_estimators": randint(10, 50),
    "max_samples": [0.5, 0.8, 1.0],
    "max_features": [0.5, 0.8, 1.0],
    "base_estimator__max_depth": randint(3, 10),
}
search = RandomizedSearchCV(
    bagged_trees, param_grid, n_iter=30, scoring="neg_mean_absolute_error"
)
_ = search.fit(data_train, target_train)

In [18]:
import pandas as pd

columns = [f"param_{name}" for name in param_grid.keys()]
columns += ["mean_test_score", "std_test_score", "rank_test_score"]
cv_results = pd.DataFrame(search.cv_results_)
cv_results = cv_results[columns].sort_values(by="rank_test_score")
cv_results["mean_test_score"] = -cv_results["mean_test_score"]
cv_results

Unnamed: 0,param_n_estimators,param_max_samples,param_max_features,param_base_estimator__max_depth,mean_test_score,std_test_score,rank_test_score
18,28,1.0,0.8,9,38.306286,1.188887,1
26,47,0.8,0.8,9,38.666826,1.489754,2
9,11,0.8,0.8,9,39.760116,0.61854,3
27,28,0.8,1.0,8,40.739568,1.120723,4
12,33,0.5,1.0,8,40.815774,0.857368,5
29,15,0.5,0.8,8,41.30931,0.982196,6
22,41,0.5,0.8,7,42.418271,1.132872,7
8,34,0.8,1.0,7,43.016294,1.130351,8
6,41,1.0,0.5,9,45.307578,1.644684,9
3,47,1.0,0.5,8,45.827778,1.798991,10


In [19]:
target_predicted = search.predict(data_test)
print(f"Mean absolute error after tuning of the bagging regressor:\n"
      f"{mean_absolute_error(target_test, target_predicted):.2f} k$")

Mean absolute error after tuning of the bagging regressor:
38.95 k$


We see that the predictor provided by the bagging regressor does not need much hyperparameter tuning compared to a single decision tree.