### Задание к занятию «Обзор библиотеки XGBoost»

#### Описание задания:

1. Решите уже знакомую задачу регрессии — предсказание цены на недвижимость. Датасет — [train.csv](http://https//www.kaggle.com/c/house-prices-advanced-regression-techniques/data);
2. Используйте objective = "reg:linear" в xgboost;
3. Настройте гиперпараметры, используя hyperopt либо ручную настройку (как вам больше нравится);
4. Используйте отложенную выборку (как на занятии), чтобы следить за процессом обучения xgboost'а, но — как и в предыдущем домашнем задании — финальную оценку качества давайте, используя 10-fold кросс-валидацию;
5. Проанализируйте, насколько согласованы оценка на отложенной выборке и на кросс-валидации (одновременно уменьшаются/увеличиваются при изменении гиперпараметров или ведут себя по-разному);
6. Проанализируйте признаки, используя XGBFI, сделайте выводы об интересных взаимодействиях;

In [1]:
import pandas as pd
from sklearn.metrics import mean_squared_error
import tqdm
import numpy as np
from sklearn.model_selection import GridSearchCV

### Загрузка данных

In [2]:
data = pd.read_csv('train.csv')

Разбиение на обучение и hold-out тест 70/30

In [3]:
from sklearn.model_selection import ShuffleSplit
splitter = ShuffleSplit(n_splits=1, test_size=0.3, random_state=777)

for train_index, test_index in splitter.split(data, data.SalePrice):
    d_train = data.iloc[train_index]
    d_test = data.iloc[test_index]
    
    y_train = data['SalePrice'].iloc[train_index]
    y_test = data['SalePrice'].iloc[test_index]

### Предобработка данных

Находим категориальные признаки

Чтобы в разы не увеличивать число признаков при построении dummy, будем использовать категориальные признаки с < 30 уникальных значений



In [4]:
cat_feat = list(data.dtypes[data.dtypes == object].index)

#закодируем пропущенные значений строкой, факт пропущенного значения тоже может нести в себе информацию
data[cat_feat] = data[cat_feat].fillna('nan')

#отфильтруем непрерывные признаки
num_feat = [f for f in data if f not in (cat_feat + ['ID', 'SalePrice'])]

cat_nunique = d_train[cat_feat].nunique()
#print(cat_nunique)
cat_feat = list(cat_nunique[cat_nunique < 30].index)

**Создаем признаки для "деревянных" моделей**

1. Заменяем пропуски на специальное значение -999, чтобы деревья могли их отличить
3. Создаем дамми-переменные для категорий

In [5]:
dummy_train = pd.get_dummies(d_train[cat_feat], columns=cat_feat)
dummy_test = pd.get_dummies(d_test[cat_feat], columns=cat_feat)

dummy_cols = list(set(dummy_train) & set(dummy_test))

dummy_train = dummy_train[dummy_cols]
dummy_test = dummy_test[dummy_cols]


X_train = pd.concat([d_train[num_feat].fillna(-999),
                     dummy_train], axis=1)

X_test = pd.concat([d_test[num_feat].fillna(-999),
                     dummy_test], axis=1)

### XGBOOST

In [6]:
import xgboost as xgb

**Важные гиперпараметры алгоритма**

a. Параметры деревьев
    1. max_depth - максимальная глубина дерева (обычно 3-10, больше глубина -> больше риск переобучения)
    2. min_child_weight - минимальное число объектов в листе (обычно до 20, больше объектов -> меньше риск переобучения, но должен быть согласован с глубиной дерева)
    3. gamma - минимально необходимый прирост качества для разбиения листа (редко используется)

b. Параметры бустинга
    0. objective - оптимизируемый функционал (встроен для классификации и регрессии, можно написать свой дифференцируемый)
    1. n_estimators - кол-во базовых алгоритмов (чем меньше learning_rate, тем больше деревьев)
    2. learning_rate - шаг создания ансамбля (зависит от n_estimators, но обычно 0.01 - 0.1)
    2. colsample_bytree - доля признаков, случайно выбирающихся для построения дерева
    3. subsample - доля объектов, случайно выбирающихся для построения дерева
    4. n_jobs - кол-во потоков для одновременного построения деревьев (большая прибавка к скорости на многоядерных процах)
    5. reg_alpha - вес L1 регуляризации (редко используется)
    6. reg_lambda - вес L2 регуляризации (редко используется)

In [7]:
params = {
          'objective':'reg:linear',
          'n_estimators': 100,
          'learning_rate': 0.1,
          'max_depth': 3,
          'min_child_weight': 1,
          'subsample': 1,
          'colsample_bytree': 1,
          'n_jobs': 4}


rg_xgb = xgb.XGBRegressor(**params)
rg_xgb.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
       n_jobs=4, nthread=None, objective='reg:linear', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)

In [8]:
y_pred_xgb_test = rg_xgb.predict(X_test)
y_pred_xgb_train = rg_xgb.predict(X_train)

Среднеквадратическая ошибка на обучающей выборке

In [9]:
mean_squared_error(y_train, y_pred_xgb_train)

214336152.28028813

Среднеквадратическая ошибка на тестовой выборке

In [10]:
mean_squared_error(y_test, y_pred_xgb_test)

403867251.32861537

### HyperOpt

In [11]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials

#функция, которую будем МИНИМИЗИРОВАТЬ
def score(params):
    params['max_depth'] = int(params['max_depth'])
    params['n_jobs'] = -1
    print("Training with params : ", params)
    clf = xgb.XGBRegressor(**params)
    clf.fit(X_train, y_train)
    y_pred_xgb_test = clf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred_xgb_test)
    result = {'loss': mse, 'status': STATUS_OK}
    print('TEST mean squared error: {0:.4f}'.format(mse))
    return result



space = {'max_depth' : hp.quniform('max_depth', 1, 10, 1),
         'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
         'subsample' : hp.quniform('subsample', 0.5, 1, 0.05),
         'gamma' : hp.quniform('gamma', 0.5, 1, 0.05),
         'colsample_bytree' : hp.quniform('colsample_bytree', 0.5, 1, 0.05),
         'silent' : 1,
         'n_estimators': 50,
         'learning_rate': 0.03
         }
trials = Trials()

best = fmin(score, space, algo=tpe.suggest, trials=trials, max_evals=20)

Training with params :  {'colsample_bytree': 0.55, 'gamma': 0.9500000000000001, 'learning_rate': 0.03, 'max_depth': 9, 'min_child_weight': 7.0, 'n_estimators': 50, 'silent': 1, 'subsample': 0.5, 'n_jobs': -1}
TEST mean squared error: 2401041399.1883
Training with params :  {'colsample_bytree': 0.9500000000000001, 'gamma': 0.55, 'learning_rate': 0.03, 'max_depth': 10, 'min_child_weight': 5.0, 'n_estimators': 50, 'silent': 1, 'subsample': 0.9500000000000001, 'n_jobs': -1}
TEST mean squared error: 2335749526.0504
Training with params :  {'colsample_bytree': 0.55, 'gamma': 0.8500000000000001, 'learning_rate': 0.03, 'max_depth': 1, 'min_child_weight': 4.0, 'n_estimators': 50, 'silent': 1, 'subsample': 0.7000000000000001, 'n_jobs': -1}
TEST mean squared error: 3201579530.3823
Training with params :  {'colsample_bytree': 0.7000000000000001, 'gamma': 0.65, 'learning_rate': 0.03, 'max_depth': 4, 'min_child_weight': 2.0, 'n_estimators': 50, 'silent': 1, 'subsample': 0.9, 'n_jobs': -1}
TEST mean 

In [12]:
best

{'colsample_bytree': 0.8,
 'gamma': 0.7000000000000001,
 'max_depth': 7.0,
 'min_child_weight': 2.0,
 'subsample': 0.9500000000000001}

In [13]:
trials.best_trial

{'book_time': datetime.datetime(2018, 5, 5, 20, 16, 59, 454000),
 'exp_key': None,
 'misc': {'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'idxs': {'colsample_bytree': [18],
   'gamma': [18],
   'max_depth': [18],
   'min_child_weight': [18],
   'subsample': [18]},
  'tid': 18,
  'vals': {'colsample_bytree': [0.8],
   'gamma': [0.7000000000000001],
   'max_depth': [7.0],
   'min_child_weight': [2.0],
   'subsample': [0.9500000000000001]},
  'workdir': None},
 'owner': None,
 'refresh_time': datetime.datetime(2018, 5, 5, 20, 16, 59, 669000),
 'result': {'loss': 2272486540.89688, 'status': 'ok'},
 'spec': None,
 'state': 2,
 'tid': 18,
 'version': 0}

#### XGBFI

Позволяет оценивать важности взаимодействия признаков

https://github.com/limexp/xgbfir

### Проанализируем признаки, используя XGBFI

In [14]:
import xgbfir
xgbfir.saveXgbFI(rg_xgb, OutputXlsxFile='xgbfi_report.xlsx')

In [15]:
pd.read_excel('xgbfi_report.xlsx', sheetname=0)

  return func(*args, **kwargs)


Unnamed: 0,Interaction,Gain,FScore,wFScore,Average wFScore,Average Gain,Expected Gain,Gain Rank,FScore Rank,wFScore Rank,Avg wFScore Rank,Avg Gain Rank,Expected Gain Rank,Average Rank,Average Tree Index,Average Tree Depth
0,OverallQual,16895607971000,34,19.533268,0.574508,4.969296e+11,1.578298e+13,1,4,3,28,1,1,6.333333,34.176471,1.029412
1,GrLivArea,3835829410000,58,26.306262,0.453556,6.613499e+10,1.792899e+12,2,1,1,39,7,2,8.666667,26.172414,1.465517
2,GarageCars,2223614240000,10,4.101761,0.410176,2.223614e+11,1.283884e+12,3,15,22,43,3,4,15.000000,18.700000,1.300000
3,BsmtFinSF1,2149223219000,48,19.324853,0.402601,4.477548e+10,9.554792e+11,4,3,4,45,13,6,12.500000,36.520833,1.541667
4,TotalBsmtSF,1915425934000,34,15.354207,0.451594,5.633606e+10,1.470497e+12,5,5,6,40,9,3,11.333333,39.882353,1.176471
5,BsmtQual_Ex,1268563700000,4,3.048924,0.762231,3.171409e+11,1.233063e+12,6,33,31,20,2,5,16.166667,12.250000,0.500000
6,2ndFlrSF,941206220000,24,8.981409,0.374225,3.921693e+10,2.182794e+11,7,9,8,50,14,11,16.500000,28.125000,1.375000
7,LotArea,779187813700,54,26.027397,0.481989,1.442940e+10,4.538190e+11,8,2,2,37,26,7,13.666667,53.814815,1.296296
8,YearRemodAdd,426437783000,19,8.354207,0.439695,2.244409e+10,2.786171e+11,9,10,10,41,19,8,16.166667,43.894737,1.368421
9,YearBuilt,345671924000,25,13.343444,0.533738,1.382688e+10,1.987015e+11,10,8,7,33,27,12,16.166667,50.240000,1.520000


4 Используйте отложенную выборку (как на занятии), чтобы следить за процессом обучения xgboost'а, но — как и в предыдущем домашнем задании — финальную оценку качества давайте, используя 10-fold кросс-валидацию;

In [16]:
class MeanClassifier():
    def __init__(self, col):
        self._col = col
        
    def fit(self, X, y):
        self._y_mean = y.mean()
        self._means = y.groupby(X[self._col].astype(str)).mean()

    def predict_proba(self, X):
        new_feature = X[self._col].astype(str)\
            .map(self._means.to_dict())\
            .fillna(self._y_mean)
        return np.stack([1-new_feature, new_feature], axis=1)
    
    
def get_meta_features(clf, X_train, y_train, X_test, stack_cv):
    meta_train = np.zeros_like(y_train, dtype=float)
    meta_test = np.zeros_like(y_test, dtype=float)
    
    for i, (train_ind, test_ind) in enumerate(stack_cv.split(X_train, y_train)):
        
        clf.fit(X_train.iloc[train_ind], y_train.iloc[train_ind])
        meta_train[test_ind] = clf.predict_proba(X_train.iloc[test_ind])[:, 1]
        meta_test += clf.predict_proba(X_test)[:, 1]
    
    return meta_train, meta_test / stack_cv.n_splits


from sklearn.model_selection import StratifiedKFold

stack_cv = StratifiedKFold(n_splits=10, random_state=555)

meta_train = []
meta_test = []
col_names = []

for c in tqdm.tqdm(cat_nunique.index.tolist()):
    clf = MeanClassifier(c)
    
    meta_tr, meta_te = get_meta_features(clf, d_train, y_train, d_test, stack_cv)

    meta_train.append(meta_tr)
    meta_test.append(meta_te)
    col_names.append('mean_pred_{}'.format(c))

X_mean_train = pd.DataFrame(np.stack(meta_train, axis=1), columns=col_names, index=d_train.index)
X_mean_test = pd.DataFrame(np.stack(meta_test, axis=1), columns=col_names, index=d_test.index)

X_train = pd.concat([X_train, X_mean_train], axis=1)
X_test = pd.concat([X_test, X_mean_test], axis=1)

100%|██████████| 43/43 [00:07<00:00,  5.81it/s]


In [17]:
best_params = {'n_estimators': 50,
          'learning_rate': 0.03,
          'max_depth': 5,
          'min_child_weight': 1,
          'subsample': 0.8,
          'colsample_bytree': 0.8,
          'n_jobs': 4}

Опишем функцию, похожую на GridSearchCV, только для одной отложенной выборки X_test. Она перебирает параметки по заданной сетке и возврашает лучшие по ROC AUC

In [18]:
def find_params(clf, param_grid):
    clf = GridSearchCV(clf, param_grid, cv=[(np.arange(len(X_train)),
                                                               np.arange(len(X_test)) + len(X_train))],
                  verbose=3)

    clf.fit(pd.concat([X_train, X_test]).values, pd.concat([y_train, y_test]).values)
    best_params = clf.best_estimator_.get_params()
    print('Best params: ', best_params)
    return best_params

#### Подбираем max_depth и min_child_weight

In [19]:
from sklearn.model_selection import GridSearchCV

rg_xgb = xgb.XGBRegressor(**best_params)

param_grid = {
    'max_depth': [3, 5, 10],
    'min_child_weight': [10, 20, 100]#[1, 5, 10]
}

best_params = find_params(rg_xgb, param_grid)

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s


Fitting 1 folds for each of 9 candidates, totalling 9 fits
[CV] max_depth=3, min_child_weight=10 ................................
[CV]  max_depth=3, min_child_weight=10, score=0.5058932223455579, total=   0.1s
[CV] max_depth=3, min_child_weight=20 ................................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.3s remaining:    0.0s


[CV]  max_depth=3, min_child_weight=20, score=0.5359376878522844, total=   0.1s
[CV] max_depth=3, min_child_weight=100 ...............................
[CV]  max_depth=3, min_child_weight=100, score=0.5002286329833703, total=   0.1s
[CV] max_depth=5, min_child_weight=10 ................................
[CV]  max_depth=5, min_child_weight=10, score=0.5148607280823831, total=   0.2s
[CV] max_depth=5, min_child_weight=20 ................................
[CV]  max_depth=5, min_child_weight=20, score=0.5458929720982357, total=   0.2s
[CV] max_depth=5, min_child_weight=100 ...............................
[CV]  max_depth=5, min_child_weight=100, score=0.5039651312098363, total=   0.2s
[CV] max_depth=10, min_child_weight=10 ...............................
[CV]  max_depth=10, min_child_weight=10, score=0.518735905352752, total=   0.3s
[CV] max_depth=10, min_child_weight=20 ...............................
[CV]  max_depth=10, min_child_weight=20, score=0.5487967519402525, total=   0.2s
[CV] max_de

[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    1.7s finished


Best params:  {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bytree': 0.8, 'gamma': 0, 'learning_rate': 0.03, 'max_delta_step': 0, 'max_depth': 10, 'min_child_weight': 20, 'missing': None, 'n_estimators': 50, 'n_jobs': 4, 'nthread': None, 'objective': 'reg:linear', 'random_state': 0, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'seed': None, 'silent': True, 'subsample': 0.8}


#### Подбираем gamma

In [20]:
rg_xgb = xgb.XGBRegressor(**best_params)

param_grid = {
    'gamma': np.linspace(0, 0.5, 5)
}

best_params = find_params(rg_xgb, param_grid)

Fitting 1 folds for each of 5 candidates, totalling 5 fits
[CV] gamma=0.0 .......................................................
[CV] .............. gamma=0.0, score=0.5487967519402525, total=   0.2s
[CV] gamma=0.125 .....................................................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s


[CV] ............ gamma=0.125, score=0.5487967519402525, total=   0.2s
[CV] gamma=0.25 ......................................................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.5s remaining:    0.0s


[CV] ............. gamma=0.25, score=0.5487967519402525, total=   0.2s
[CV] gamma=0.375 .....................................................
[CV] ............ gamma=0.375, score=0.5487967519402525, total=   0.2s
[CV] gamma=0.5 .......................................................
[CV] .............. gamma=0.5, score=0.5487967519402525, total=   0.3s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.2s finished


Best params:  {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bytree': 0.8, 'gamma': 0.0, 'learning_rate': 0.03, 'max_delta_step': 0, 'max_depth': 10, 'min_child_weight': 20, 'missing': None, 'n_estimators': 50, 'n_jobs': 4, 'nthread': None, 'objective': 'reg:linear', 'random_state': 0, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'seed': None, 'silent': True, 'subsample': 0.8}


#### Подбираем subsample и colsample_bytree

In [21]:
rg_xgb = xgb.XGBRegressor(**best_params)

param_grid = {
    'subsample': np.linspace(0.5, 1, 6),
    'colsample_bytree': np.linspace(0.5, 1, 6)
}

best_params = find_params(rg_xgb, param_grid)

Fitting 1 folds for each of 36 candidates, totalling 36 fits
[CV] colsample_bytree=0.5, subsample=0.5 .............................
[CV]  colsample_bytree=0.5, subsample=0.5, score=0.5205217223865368, total=   0.2s
[CV] colsample_bytree=0.5, subsample=0.6 .............................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.3s remaining:    0.0s


[CV]  colsample_bytree=0.5, subsample=0.6, score=0.5374194916440793, total=   0.2s
[CV] colsample_bytree=0.5, subsample=0.7 .............................
[CV]  colsample_bytree=0.5, subsample=0.7, score=0.5357534250558018, total=   0.2s
[CV] colsample_bytree=0.5, subsample=0.8 .............................
[CV]  colsample_bytree=0.5, subsample=0.8, score=0.53911688462111, total=   0.2s
[CV] colsample_bytree=0.5, subsample=0.9 .............................
[CV]  colsample_bytree=0.5, subsample=0.9, score=0.5379878232221023, total=   0.2s
[CV] colsample_bytree=0.5, subsample=1.0 .............................
[CV]  colsample_bytree=0.5, subsample=1.0, score=0.5422456468664477, total=   0.2s
[CV] colsample_bytree=0.6, subsample=0.5 .............................
[CV]  colsample_bytree=0.6, subsample=0.5, score=0.5286554657145994, total=   0.2s
[CV] colsample_bytree=0.6, subsample=0.6 .............................
[CV]  colsample_bytree=0.6, subsample=0.6, score=0.5289977695741213, total=   

[Parallel(n_jobs=1)]: Done  36 out of  36 | elapsed:    8.4s finished


Best params:  {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bytree': 0.9, 'gamma': 0.0, 'learning_rate': 0.03, 'max_delta_step': 0, 'max_depth': 10, 'min_child_weight': 20, 'missing': None, 'n_estimators': 50, 'n_jobs': 4, 'nthread': None, 'objective': 'reg:linear', 'random_state': 0, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'seed': None, 'silent': True, 'subsample': 0.9}


#### Подбираем регуляризацию: reg_lambda и reg_alpha

In [22]:
rg_xgb = xgb.XGBRegressor(**best_params)

param_grid = {
    'reg_alpha': [0, 0.0001, 0.001, 0.1, 1],
    'reg_lambda': [0, 0.0001, 0.001, 0.1, 1]
}

best_params = find_params(rg_xgb, param_grid)

Fitting 1 folds for each of 25 candidates, totalling 25 fits
[CV] reg_alpha=0, reg_lambda=0 .......................................
[CV]  reg_alpha=0, reg_lambda=0, score=0.5802772150198364, total=   0.3s
[CV] reg_alpha=0, reg_lambda=0.0001 ..................................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s


[CV]  reg_alpha=0, reg_lambda=0.0001, score=0.5802730920619759, total=   0.4s
[CV] reg_alpha=0, reg_lambda=0.001 ...................................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.7s remaining:    0.0s


[CV]  reg_alpha=0, reg_lambda=0.001, score=0.5802115459384869, total=   0.3s
[CV] reg_alpha=0, reg_lambda=0.1 .....................................
[CV]  reg_alpha=0, reg_lambda=0.1, score=0.568522256983355, total=   0.4s
[CV] reg_alpha=0, reg_lambda=1 .......................................
[CV]  reg_alpha=0, reg_lambda=1, score=0.5582693010594895, total=   0.3s
[CV] reg_alpha=0.0001, reg_lambda=0 ..................................
[CV]  reg_alpha=0.0001, reg_lambda=0, score=0.5802772159353369, total=   0.4s
[CV] reg_alpha=0.0001, reg_lambda=0.0001 .............................
[CV]  reg_alpha=0.0001, reg_lambda=0.0001, score=0.5802730977742007, total=   0.3s
[CV] reg_alpha=0.0001, reg_lambda=0.001 ..............................
[CV]  reg_alpha=0.0001, reg_lambda=0.001, score=0.5802115459384869, total=   0.3s
[CV] reg_alpha=0.0001, reg_lambda=0.1 ................................
[CV]  reg_alpha=0.0001, reg_lambda=0.1, score=0.5685222552167792, total=   0.3s
[CV] reg_alpha=0.0001, reg_

[Parallel(n_jobs=1)]: Done  25 out of  25 | elapsed:    8.4s finished


Best params:  {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bytree': 0.9, 'gamma': 0.0, 'learning_rate': 0.03, 'max_delta_step': 0, 'max_depth': 10, 'min_child_weight': 20, 'missing': None, 'n_estimators': 50, 'n_jobs': 4, 'nthread': None, 'objective': 'reg:linear', 'random_state': 0, 'reg_alpha': 0.0001, 'reg_lambda': 0, 'scale_pos_weight': 1, 'seed': None, 'silent': True, 'subsample': 0.9}


#### Уменьшим learning_rate

In [23]:
best_params['learning_rate'] = 0.01
best_params['n_estimators'] = 500

rg_xgb = xgb.XGBClassifier(**best_params)

rg_xgb.fit(X_train, y_train, eval_set=[[X_train, y_train], [X_test, y_test]])

ValueError: y contains new labels: [ 35311  52500  55993  62383  75000  75500  79900  80500  89471  91300
  92000  93500  94000  94500  94750 101000 110500 111250 112500 114500
 116500 121000 121500 126175 128900 128950 134450 134800 135960 136900
 138887 139600 143750 144500 146800 147500 150500 150900 153575 154900
 156500 157500 162500 163990 164900 167240 171500 173733 176432 176485
 178740 178900 179400 181134 182000 183900 185750 186000 187100 187750
 189950 192140 192500 195400 201800 202665 206000 208300 213490 214900
 216000 216837 222500 235128 243000 244400 244600 245000 245350 245500
 246578 248900 249700 255000 255500 259500 262280 263000 263435 264132
 265900 274300 275500 281213 283463 286000 287000 294000 305000 313000
 315750 326000 337000 337500 339750 348000 367294 369900 375000 377500
 378500 380000 395000 402861 410000 430000 438780 451950 556581]