## Boosting techniques to solve regression problems  

`California housing dataset`  
[video link](https://youtu.be/yJjCDkjNNaM)  

Three regressors will be demonstrated:  
* `AdaBoost regressor`  
* `Gradient boosting regressor`  
* `XGBoost regressor`  

**Import basic libraries**  

In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor

from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import ShuffleSplit

In [3]:
np.random.seed(306)

Let's use `ShuffleSplit` as cv with 10 splits and 20% examples set aside as test examples.  

In [4]:
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=42)

Let's download the data and split into training and test sets.  

In [5]:
# fetch dataset
features, labels = fetch_california_housing(as_frame=True, return_X_y=True)
labels *= 100

# train-test split
com_train_features, test_features, com_train_labels, test_labels = train_test_split(
    features, labels, random_state=42)

# train --> train + devs split
train_features, dev_features, train_labels, dev_labels = train_test_split(
    com_train_features, com_train_labels, random_state=42)

## Training different regressors  

In [6]:
def train_regressor(estimator, X_train, y_train, cv, name):
    cv_results = cross_validate(estimator,
                                X_train,
                                y_train,
                                cv=cv,
                                scoring='neg_mean_absolute_error',
                                return_train_score=True,
                                return_estimator=True)
    cv_train_error = -1 * cv_results['train_score']
    cv_test_error = -1 * cv_results['test_score']

    print(f"On an average, {name} makes an error of "
            f"{cv_train_error.mean():.3f}k +/- {cv_train_error.std():.3f}k on the training set.")
    print(f"On an average, {name} makes an error of "
            f"{cv_test_error.mean():.3f}k +/- {cv_test_error.std():.3f}k on the test set.")

### AdaBoostRegressor  

In [8]:
#@title AdaBoostRegressor
train_regressor(
    AdaBoostRegressor(), com_train_features, com_train_labels,
    cv, 'AdaBoostRegressor')

On an average, AdaBoostRegressor makes an error of 73.263k +/- 6.031k on the training set.
On an average, AdaBoostRegressor makes an error of 73.623k +/- 6.057k on the test set.


### GradientBoostingRegressor  

In [9]:
#@title GradientBoostingRegressor
train_regressor(
    GradientBoostingRegressor(), com_train_features, com_train_labels,
    cv, 'GradientBoostingRegressor')

On an average, GradientBoostingRegressor makes an error of 35.394k +/- 0.273k on the training set.
On an average, GradientBoostingRegressor makes an error of 36.773k +/- 0.723k on the test set.


### XGBoost  

Extreme gradient boosting (XGBoost) is the latest boosting technique. It is more regularized form of gradient boosting. With regularization, it is able to achieve better generalization performance than gradient boosting.  

In [12]:
#@title XGBoost  
from xgboost import XGBRegressor
xgb_regressor = XGBRegressor(objecive='reg:squarederror')

  from pandas import MultiIndex, Int64Index


In [None]:
?XGBRegressor

In [14]:
train_regressor(
    xgb_regressor, com_train_features, com_train_labels,
    cv, 'XGBoostRegressor')

  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




  elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):


Parameters: { "objecive" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


On an average, XGBoostRegressor makes an error of 18.308k +/- 0.182k on the training set.
On an average, XGBoostRegressor makes an error of 31.845k +/- 0.753k on the test set.
