### This Notebook has been Forked from Vladyslav Zabudko's Ensemble (LGB + XGB) with Hyperopt

[Parameter Tuning with Hyperopt](https://medium.com/district-data-labs/parameter-tuning-with-hyperopt-faa86acdfdce)


# Ensemble Modeling with LGBM & XGB and Hyperparameter Optimization using Hyperopt 

# Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import time
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
from hyperopt.pyll.base import scope
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Importing Data

In [None]:
train = pd.read_csv("../input/tabular-playground-series-feb-2021/train.csv")
test = pd.read_csv("../input/tabular-playground-series-feb-2021/test.csv")
target = train.target

In [None]:
train.head()

#### The data consists of a continuous target based on a number of feature columns given in the data. All of the feature columns, cat0 - cat9 are categorical, and the feature columns cont0 - cont13 are continuous.



# Data Preprocessing

In [None]:
def preprocess(df, encoder=None,
               scaler=None, cols_to_drop=None,
               cols_to_encode=None, cols_to_scale=None):
    """
    Preprocess input data
    :param df: DataFrame with data
    :param encoder: encoder object with fit_transform method
    :param scaler: scaler object with fit_transform method
    :param cols_to_drop: columns to be removed
    :param cols_to_encode: columns to be encoded
    :param cols_to_scale: columns to be scaled
    :return: DataFrame
    """

    if encoder:
        for col in cols_to_encode:
            df[col] = encoder.fit_transform(df[col])

    if scaler:
        for col in cols_to_scale:
            df[col] = scaler.fit_transform(df[col].values.reshape(-1, 1))

    if cols_to_drop:
        df = df.drop(cols_to_drop, axis=1)

    return df

### Label Encoding the Categorical Features
#### A machine learning model unfortunately cannot deal with categorical variables (except for some models such as LightGBM). Therefore, we label encode them.
#### Label encoding assigns each unique category in a categorical variable with an integer. No new columns are created. An example is shown below
![](https://raw.githubusercontent.com/WillKoehrsen/Machine-Learning-Projects/master/label_encoding.png)
#### The problem with label encoding is that it gives the categories an arbitrary ordering. The value assigned to each of the categories is random and does not reflect any inherent aspect of the category. In the example above, programmer recieves a 4 and data scientist a 1, but if we did the same process again, the labels could be reversed or completely different. The actual assignment of the integers is arbitrary. Therefore, when we perform label encoding, the model might use the relative value of the feature (for example programmer = 4 and data scientist = 1) to assign weights which is not what we want. If we only have two unique values for a categorical variable (such as Male/Female), then label encoding is fine, but for more than 2 unique categories, one-hot encoding is the safe option because it does not impose arbitrary values to categories. 
#### The only downside to one-hot encoding is that the number of features (dimensions of the data) can explode with categorical variables with many categories. To deal with this, we can perform one-hot encoding followed by PCA or other dimensionality reduction methods to reduce the number of dimensions (while still trying to preserve information).

[Source](https://huntdatascience.wordpress.com/2019/07/26/encoding-categorical-variables/)

In [None]:
cat_cols = ['cat' + str(i) for i in range(10)]
cont_cols = ['cont' + str(i) for i in range(14)]

train = preprocess(train, encoder=LabelEncoder(), scaler=StandardScaler(),
                  cols_to_drop=['id', 'target'], cols_to_encode=cat_cols,
                  cols_to_scale=cont_cols)

test = preprocess(test, encoder=LabelEncoder(), scaler=StandardScaler(),
                 cols_to_drop=['id'], cols_to_encode=cat_cols,
                 cols_to_scale=cont_cols)

# Defining the EnsembleModel Class 

In [None]:
class EnsembleModel:
    def __init__(self, params):
        """
        LGB + XGB model
        """
        self.lgb_params = params['lgb']
        self.xgb_params = params['xgb']

        self.lgb_model = LGBMRegressor(**self.lgb_params)
        self.xgb_model = XGBRegressor(**self.xgb_params)

    def fit(self, x, y, *args, **kwargs):
        return (self.lgb_model.fit(x, y, *args, **kwargs),
                self.xgb_model.fit(x, y, *args, **kwargs))

    def predict(self, x, weights=[1.0, 1.0]):
        """
        Generate model predictions
        :param x: data
        :param weights: weights on model prediction, first one is the weight on lgb model
        :return: array with predictions
        """
        return (weights[0] * self.lgb_model.predict(x) +
                weights[1] * self.xgb_model.predict(x)) / 2

# Hyperparameter Tuning with Hyperopt
![](https://camo.githubusercontent.com/e98afeb0a769a1d6ad1e56214324a18ac426d189196c622ac9dc56de04534d2d/68747470733a2f2f692e706f7374696d672e63632f54506d66665772702f68797065726f70742d6e65772e706e67)

#### There are two common methods of parameter tuning: grid search and random search. Each path has it's own puddles. 

#### Grid search is slow but effective at searching the whole search space, while random search is fast, but could miss important points in the search space. 

#### Luckily, a third option exists: Bayesian optimization. Using Bayesian optimization for parameter tuning allows us to obtain the best parameters for a given model, e.g., logistic regression. This also allows us to perform optimal model selection.


## Defining the Objective Function

#### The function fmin first takes a function to minimize, denoted fn, which we here specify with ensemble_search

In [None]:
def ensemble_search(params):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=22)

    model = EnsembleModel(params)

    evaluation = [(X_test, y_test)]

    model.fit(X_train, y_train,
              eval_set=evaluation, eval_metric='rmse',
              early_stopping_rounds=100, verbose=False)

    val_preds = model.predict(X_test)
    rmse = mean_squared_error(y_test, val_preds, squared=False)

    return {"loss": rmse, "status": STATUS_OK}

## Defining the Search Space

#### The search space is the continuous range of numbers between lower and upperbound,              specified by *hp.uniform(label, lower bound, upper bound)*

In [None]:
ensemble_params = {
    "lgb" : {
        "num_leaves": scope.int(hp.quniform("num_leaves", 31, 200, 1)),
        "max_depth": scope.int(hp.quniform("max_depth", 10, 24, 1)),
        'learning_rate': hp.uniform('learning_rate', 0.01, 0.3),
        'min_split_gain': hp.uniform('min_split_gain', 0, 1.0),
        'min_child_samples': scope.int(hp.quniform("min_child_samples", 2, 700, 1)),
        "subsample": hp.uniform("subsample", 0.2, 1.0),
        "colsample_bytree": hp.uniform("colsample_bytree", 0.5, 1.0),
        'reg_alpha': hp.uniform('reg_alpha', 1e-5, 1.0),
        'reg_lambda': hp.uniform('reg_lambda', 0, 50),
        'n_jobs': -1,
        'n_estimators': 2000},
    'xgb': {
        'max_depth': scope.int(hp.quniform('xgb.max_depth', 10, 24, 1)),
        'learning_rate': hp.uniform('xgb.learning_rate', 0.01, 0.3),
        'gamma': hp.uniform('xgb.gamma', 1, 10),
        'min_child_weight': scope.int(hp.quniform('xgb.min_child_weight', 2, 700, 1)),
        'n_estimators': 2000,
        'colsample_bytree': hp.uniform('xgb.colsample_bytree', 0.5, 0.9),
        'subsample': hp.uniform('xgb.subsample', 0.5, 1.0),
        'reg_lambda': hp.uniform('xgb.reg_lambda', 0, 100),
        'reg_alpha': hp.uniform('xgb.reg_alpha', 1e-5, 0.5),
        'objective': 'reg:squarederror',
        'tree_method': 'gpu_hist',
        'n_jobs': -1}
}

## Defining Algo, Max_Evals and Trials

#### The parameter algo takes a search algorithm, in this case tpe which stands for tree of Parzen estimators.

#### We then specify the maximum number of evaluations max_evals the fmin function will perform. This fmin function returns a python dictionary of values.

#### The Trials object allows us to store info at each time step. We can then print them out and see what the evaluations of the function were for a given parameter at a given time step.

In [None]:
X = train.copy()
y = target

trials = Trials()

best_hyperparams = fmin(fn=ensemble_search,
                       space=ensemble_params,
                       algo=tpe.suggest,
                       max_evals=100,
                       trials=trials)

## Optimal Hyperparameters

In [None]:
best_hyperparams

# Training

In [None]:
since = time.time()
columns = train.columns

ensemble_params = {
    "lgb" : {
        "num_leaves": 36,
        "max_depth": 21,
        'learning_rate': 0.049019854828962754,
        'min_split_gain': 0.2579555416739361,
        'min_child_samples': 500,
        "subsample": 0.2595537456780356,
        "colsample_bytree": 0.6203517996970486,
        'reg_alpha': 0.33867231210286647,
        'reg_lambda': 42.071411120949854,
        'n_jobs': -1,
        'n_estimators': 5000},
    'xgb': {
        'max_depth': 13,
        'learning_rate': 0.020206705089028228,
        'gamma': 3.5746731812451156,
        'min_child_weight': 564,
        'n_estimators': 5000,
        'colsample_bytree': 0.5015940592112956,
        'subsample': 0.6839489639112909,
        'reg_lambda': 18.085502002853246,
        'reg_alpha': 0.17532087359570606,
        'objective': 'reg:squarederror',
        'tree_method': 'gpu_hist',
        'n_jobs': -1}
}
    
preds = np.zeros(test.shape[0])
kf = KFold(n_splits=10, random_state=22, shuffle=True)
rmse = []
n = 0

for trn_idx, test_idx in kf.split(train[columns], target):

    X_tr, X_val=train[columns].iloc[trn_idx], train[columns].iloc[test_idx]
    y_tr, y_val=target.iloc[trn_idx], target.iloc[test_idx]

    model = EnsembleModel(ensemble_params)

    model.fit(X_tr, y_tr, eval_set=[(X_val, y_val)], early_stopping_rounds=100, verbose=False)

    preds += model.predict(test[columns]) / kf.n_splits
    rmse.append(mean_squared_error(y_val, model.predict(X_val), squared=False))
    
    print(f"Fold {n+1}, RMSE: {rmse[n]}")
    n += 1


print("Mean RMSE: ", np.mean(rmse))
end_time = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
        end_time // 60, end_time % 60))

In [None]:
submissions = pd.read_csv("../input/tabular-playground-series-feb-2021/sample_submission.csv")
submissions['target'] = preds

submissions.to_csv("ensemble_model_2.csv", index=False)