# 📝 Exercise M6.01

The aim of this notebook is to investigate if we can tune the hyperparameters
of a bagging regressor and evaluate the gain obtained.

We will load the California housing dataset and split it into a training and a
testing set.

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

data, target = fetch_california_housing(as_frame=True, return_X_y=True)
target *= 100  # rescale the target in k$
data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0, test_size=0.5
)

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Note</p>
<p class="last">If you want a deeper overview regarding this dataset, you can refer to the
Appendix - Datasets description section at the end of this MOOC.</p>
</div>

Create a `BaggingRegressor` and provide a `DecisionTreeRegressor` to its
parameter `estimator`. Train the regressor and evaluate its generalization
performance on the testing set using the mean absolute error.

In [11]:
# Write your code here.
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_absolute_error

tree = DecisionTreeRegressor()
bagged_trees = BaggingRegressor(estimator=tree, n_jobs=2)

_ = bagged_trees.fit(data_train, target_train)

MAE = mean_absolute_error(bagged_trees.predict(data_test), target_test)
print(f"Basic MAE value of the bagging regressor is: {MAE:.2f} k$")

Basic MAE value of the bagging regressor is: 36.38 k$


Now, create a `RandomizedSearchCV` instance using the previous model and tune
the important parameters of the bagging regressor. Find the best parameters
and check if you are able to find a set of parameters that improve the default
regressor still using the mean absolute error as a metric.

<div class="admonition tip alert alert-warning">
<p class="first admonition-title" style="font-weight: bold;">Tip</p>
<p class="last">You can list the bagging regressor's parameters using the <tt class="docutils literal">get_params</tt> method.</p>
</div>

In [5]:
# Write your code here.
from sklearn.model_selection import RandomizedSearchCV

param_grid = {
    "n_estimators": range(10, 100),
    "estimator__max_depth": range(1, 20),
}
search = RandomizedSearchCV(bagged_trees, param_grid, n_iter=20, scoring="neg_mean_absolute_error", n_jobs=2)
_ = search.fit(data_train, target_train)


In [8]:
import pandas as pd

cv_results = pd.DataFrame(search.cv_results_)

cv_results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_n_estimators,param_estimator__max_depth,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.423446,0.009556,0.023815,0.000719,75,1,"{'n_estimators': 75, 'estimator__max_depth': 1}",-74.497693,-74.580435,-71.649896,-71.927202,-71.582975,-72.84764,1.386105,20
1,7.986447,0.551773,0.26766,0.011809,81,19,"{'n_estimators': 81, 'estimator__max_depth': 19}",-35.546366,-35.559777,-34.217447,-36.038984,-33.504505,-34.973416,0.952545,2
2,1.395361,0.025621,0.026754,0.003706,18,12,"{'n_estimators': 18, 'estimator__max_depth': 12}",-36.612964,-38.428236,-35.511539,-38.076615,-35.15454,-36.756779,1.317134,7
3,6.359766,0.179435,0.184522,0.004019,59,18,"{'n_estimators': 59, 'estimator__max_depth': 18}",-35.982081,-35.990069,-34.054921,-36.138161,-33.40556,-35.114159,1.149816,3
4,9.39793,0.246299,0.299826,0.020626,86,19,"{'n_estimators': 86, 'estimator__max_depth': 19}",-35.776681,-35.576009,-34.090774,-35.818553,-33.513591,-34.955121,0.962402,1
5,0.418329,0.033814,0.022628,0.000274,34,2,"{'n_estimators': 34, 'estimator__max_depth': 2}",-65.244389,-64.263214,-63.746445,-63.666726,-62.790052,-63.942165,0.805034,16
6,1.054188,0.021994,0.037269,0.000767,87,2,"{'n_estimators': 87, 'estimator__max_depth': 2}",-64.972777,-64.475263,-63.812095,-63.585176,-62.923948,-63.953852,0.710886,17
7,5.33862,0.032412,0.069973,0.001448,85,10,"{'n_estimators': 85, 'estimator__max_depth': 10}",-38.134089,-38.550209,-36.749221,-38.914431,-36.07584,-37.684758,1.088747,8
8,4.248593,0.326697,0.049984,0.009469,81,8,"{'n_estimators': 81, 'estimator__max_depth': 8}",-40.944072,-41.583052,-39.934313,-41.919803,-38.854086,-40.647065,1.123115,10
9,0.80281,0.025403,0.02437,0.001454,67,2,"{'n_estimators': 67, 'estimator__max_depth': 2}",-65.233113,-64.69294,-63.858356,-63.777225,-62.820398,-64.076406,0.828577,18
