### **Objective**

In this notebook, we will apply ensemble techniques regression problem in california housing dataset.


We have already applied different regressors on california housing dataset. In this notebook, we will make use of : 

  * AdaBoost regressor 

  * Gradient Boosting regressor 

  * XGBoost regressor 

### **Importing basic libraries**

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
from xgboost import XGBRegressor

from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import ShuffleSplit

import warnings 
warnings.filterwarnings('ignore')

  from pandas import MultiIndex, Int64Index


In [2]:
np.random.seed(306)

Let's use `ShuffleSplit` as cv with 10 splits and 20% examples set aside as text examples.

In [3]:
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=42)

Let's download the data and split it into training and test sets.

In [4]:
features, labels = fetch_california_housing(as_frame=True, return_X_y=True)
labels *= 100


In [5]:
com_train_features, test_features, com_train_labels, test_labels = train_test_split(
    features, labels, random_state=42)

train_features, dev_features, train_labels, dev_labels = train_test_split(
    com_train_features, com_train_labels, random_state=42)


### **Training different Regressors**

Let's train different regressors :

In [6]:
def train_regressor(estimator, X_train, y_train, cv, name):
    cv_results = cross_validate(estimator,
                                X_train,
                                y_train,
                                cv=cv,
                                scoring='neg_mean_absolute_error',
                                return_train_score=True,
                                return_estimator=True)

    cv_train_error = -1 * cv_results['train_score']
    cv_test_error = -1 * cv_results['test_score']

    print(f'On an average, {name} makes an error of ',
          f'{cv_train_error.mean():.3f} (+/-) {cv_train_error.std():.3f} on the training set.')

    print(f'On an average, {name} makes an error of ',
          f'{cv_test_error.mean():.3f} (+/-) {cv_test_error.std():.3f} on the testing set.')


#### **AdaBoost Regressor**

In [7]:
train_regressor(AdaBoostRegressor(), com_train_features,com_train_labels, cv, 'AdaBoostRegressor')

On an average, AdaBoostRegressor makes an error of  73.263 (+/-) 6.031 on the training set.
On an average, AdaBoostRegressor makes an error of  73.623 (+/-) 6.057 on the testing set.


#### **Gradient Boosting Regressor**

In [8]:
train_regressor(GradientBoostingRegressor(), com_train_features,
com_train_labels, cv, 'GradientBoostingRegressor')

On an average, GradientBoostingRegressor makes an error of  35.394 (+/-) 0.273 on the training set.
On an average, GradientBoostingRegressor makes an error of  36.773 (+/-) 0.723 on the testing set.


#### **XGBoost Regressor**

In [9]:
train_regressor(XGBRegressor(), com_train_features,
                com_train_labels, cv, 'XGBoostRegressor')


On an average, XGBoostRegressor makes an error of  18.308 (+/-) 0.182 on the training set.
On an average, XGBoostRegressor makes an error of  31.845 (+/-) 0.753 on the testing set.
