In this colab, we will apply ensemble technqiues regression problem in california housing dataset.  

We have already applied different regressors on california housing dataset.  In this colab, we will make use of 
* AdaBoost regressor
* Gradient boosting regressor
* XGBoost regressor

In [None]:
import pandas as pd
import numpy as np

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor

from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import ShuffleSplit

In [None]:
np.random.seed(306)

Let's use `ShuffleSplit` as cv with 10 splits and 20% examples set aside as test examples.

In [None]:
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=42)

Let's download the data and split it into training and test sets.

In [None]:
# fetch dataset
features, labels = fetch_california_housing(as_frame=True, return_X_y=True)
labels *= 100

# train-test split
com_train_features, test_features, com_train_labels, test_labels = train_test_split(
    features, labels, random_state=42)

# train --> train + dev split
train_features, dev_features, train_labels, dev_labels = train_test_split(
    com_train_features, com_train_labels, random_state=42)

## Training different regressors

Let's train different regressors:

In [None]:
def train_regressor(estimator, X_train, y_train, cv, name):
  cv_results = cross_validate(estimator,
                              X_train, 
                              y_train, 
                              cv=cv,
                              scoring="neg_mean_absolute_error",
                              return_train_score=True,
                              return_estimator=True)

  cv_train_error = -1* cv_results['train_score']
  cv_test_error = -1 * cv_results['test_score']

  print(f"On an average, {name} makes an error of "
        f"{cv_train_error.mean():.3f}k +/- {cv_train_error.std():.3f}k on the training set.")
  print(f"On an average, {name} makes an error of "
        f"{cv_test_error.mean():.3f}k +/- {cv_test_error.std():.3f}k on the test set.")

In [None]:
#@title AdaBoostRegressor
train_regressor(
    AdaBoostRegressor(), com_train_features,
    com_train_labels, cv, 'AdaBoostRegressor')

On an average, AdaBoostRegressor makes an error of 73.263k +/- 6.031k on the training set.
On an average, AdaBoostRegressor makes an error of 73.623k +/- 6.057k on the test set.


In [None]:
#@title GradientBoostingRegressor
train_regressor(
    GradientBoostingRegressor(), com_train_features, com_train_labels, cv,
   'GradientBoostingRegressor')

On an average, GradientBoostingRegressor makes an error of 35.394k +/- 0.273k on the training set.
On an average, GradientBoostingRegressor makes an error of 36.773k +/- 0.723k on the test set.


In [None]:
?GradientBoostingRegressor


# XGBoost

In [None]:
!pip install xgboost



Extreme gradient boosting (XGBoost) is the latest boosting technique.  It is more regularized form of gradient boosting.  With regularization, it is able to achieve better generalization performance than gradient boosting.

In [None]:
from xgboost import XGBRegressor
xgb_regressor = XGBRegressor(objective='reg:squarederror')

In [None]:
?XGBRegressor

In [None]:
train_regressor(
    xgb_regressor, com_train_features,
    com_train_labels, cv, 'XGBoostRegressor')

On an average, XGBoostRegressor makes an error of 35.441k +/- 0.228k on the training set.
On an average, XGBoostRegressor makes an error of 36.815k +/- 0.707k on the test set.


In [None]:
xg_reg = XGBRegressor(objective ='reg:squarederror',
                      colsample_bytree = 1, eta=0.3,
                      learning_rate = 0.1, max_depth = 5, 
                      alpha = 10, n_estimators = 2000)

In [None]:
train_regressor(
    xg_reg, com_train_features,
    com_train_labels, cv, 'XGBoostRegressor')

On an average, XGBoostRegressor makes an error of 8.119k +/- 0.068k on the training set.
On an average, XGBoostRegressor makes an error of 30.000k +/- 0.630k on the test set.


In [None]:
xg_reg = XGBRegressor(objective ='reg:squarederror', max_depth = 5, 
                      alpha = 10, n_estimators = 2000)

In [None]:
train_regressor(
    xg_reg, com_train_features,
    com_train_labels, cv, 'XGBoostRegressor')

On an average, XGBoostRegressor makes an error of 8.119k +/- 0.068k on the training set.
On an average, XGBoostRegressor makes an error of 30.000k +/- 0.630k on the test set.
