<h2 style="color:blue" align="left"> XGBOOST </h2>

## Hyperparamter Tuning

**Same as GBDT**

1. **n_estimators:** Total number of trees


2. **learning_rate:**This determines the impact of each tree on the final outcome


3. **random_state:** The random number seed so that same random numbers are generated every time


4. **max_depth:** Maximum depth to which tree can grow (stopping criteria)


5. **subsample:** The fraction of observations to be selected for each tree. Selection is done by random sampling


6. **objective:** Defines Loss function (*binary:logistic* is for classification using probability, *reg:logistic* is for classification, *reg:linear* is for regression)


7. **colsample_bylevel:** Random feature selection at levels


8. **colsample_bytree:** Random feature selection at tree

In [None]:
# set parameters
clf = XGBClassifier(random_state=96, colsample_bytree=0.7, max_depth=6)

**Regularization**

1. **gamma:** Minimum reduction in loss at every split


2. **reg_alpha:** Makes leaf weights 0


3. **reg_lambda:** Decrease leaf weights more smoothly

In [None]:
clf = XGBClassifier(gamma=0.1, random_state=96)

-----------------------------------------------------

### 1. Classifier

### a. RandomizedSearchCV

In [None]:
xgb_2 = XGBClassifier(learning_rate=0.05, max_depth=5, n_estimators=1000, objective='multi:softprob', 
                      subsample=0.9, verbosity = 1, colsample_bytree=0.9, min_child_weight=2)

In [None]:
# Hyperparameter optimization using RandomizedSearchCV
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
import xgboost

classifier = xgboost.XGBClassifier()

In [None]:
## Hyper Parameter Optimization

params={
 "learning_rate"    : [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ] ,
 "max_depth"        : [ 3, 4, 5, 6, 8, 10, 12, 15],
 "min_child_weight" : [ 1, 3, 5, 7 ],
 "gamma"            : [ 0.0, 0.1, 0.2 , 0.3, 0.4 ],
 "colsample_bytree" : [ 0.3, 0.4, 0.5 , 0.7 ]
    
}

In [None]:
random_search = RandomizedSearchCV(classifier, param_distributions=params, n_iter=5,
                                   scoring='roc_auc', n_jobs=-1, cv=5, verbose=3)

In [None]:
random_search.fit(X,Y)

In [None]:
classifier=xgboost.XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.5, gamma=0.4, learning_rate=0.1,
       max_delta_step=0, max_depth=6, min_child_weight=7, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=True,
       subsample=1)

In [None]:
from sklearn.model_selection import cross_val_score
score=cross_val_score(classifier,X,Y,cv=10)

In [None]:
score

In [None]:
score.mean()

### 2. Regressor

In [None]:
import xgboost
regressor=xgboost.XGBRegressor()

In [None]:
booster=['gbtree','gblinear']
base_score=[0.25,0.5,0.75,1]

In [None]:
# Hyper Parameter Optimization

n_estimators = [100, 500, 900, 1100, 1500]
max_depth = [2, 3, 5, 10, 15]
booster=['gbtree','gblinear']
learning_rate=[0.05,0.1,0.15,0.20]
min_child_weight=[1,2,3,4]

# Define the grid of hyperparameters to search
hyperparameter_grid = {
    'n_estimators': n_estimators,
    'max_depth':max_depth,
    'learning_rate':learning_rate,
    'min_child_weight':min_child_weight,
    'booster':booster,
    'base_score':base_score
    }

In [None]:
# Set up the random search with 4-fold cross validation
random_cv = RandomizedSearchCV(estimator=regressor,
            param_distributions=hyperparameter_grid,
            cv=5, n_iter=50,
            scoring = 'neg_mean_absolute_error',n_jobs = 4,
            verbose = 5, 
            return_train_score = True,
            random_state=42)

In [None]:
random_cv.fit(X_train,y_train)

In [None]:
random_cv.best_estimator_

In [None]:
regressor=xgboost.XGBRegressor(base_score=0.25, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=2, min_child_weight=1, missing=None, n_estimators=900,
       n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)

In [None]:
regressor.fit(X_train,y_train)