# Extreme Gradient Boosting (XGBoost)

* **XGBoost** is an optimized version of **GBM** to increase its speed and prediction performance, scalable and integratable on different platforms


* It can be  used with R, Python HAdoop Scala, julia

* It is scalable


* It is fast


* Success of prediction  is high
 
 
* The success of this model proven in many kaggel competitions

## 1-)Data Preprocessing

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.25, 
                                                    random_state=42)

In [3]:
# !pip install xgboost

## 2-) Model

In [4]:
import xgboost as xgb

In [5]:
from xgboost import XGBRegressor

In [6]:
xgb_model = XGBRegressor(random_state=42).fit(X_train, y_train)

## 3-) Prediction

In [7]:
from sklearn.metrics import mean_squared_error

In [8]:
y_pred = xgb_model.predict(X_test)

In [9]:
test_error_before=np.sqrt(mean_squared_error(y_test, y_pred))
test_error_before #test error before  model tuning

355.4651481224188

## 4-) Model Tuning

* In this section, we will try to determine the optimum **learning_rate, max_depth, colsample_bytree ,n_estimators**   with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **learning_rate, max_depth, colsample_bytree ,n_estimators** .





* **learning_rate, max_depth, colsample_bytree ,n_estimators**  are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.


* **max_features** ==>> Maximum number of independent variables to be used


* **n_estimators** ==>> number of trees to be used

In [10]:
xgb_model

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=42,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

* **booster = 'gbtree'**  => shows that  the model will use a tree-based booster


* **colsample_bytree = 1** => shows that sample rate  per tree to be taken from variables



* **learning_rate = 0.300000012** => throttling step size =>> this is one of the parameters we use to prevent overfitting.

In [11]:
from sklearn.model_selection import GridSearchCV

In [12]:
xgb_grid = {
     'colsample_bytree': [0.4, 0.5,0.6,0.9,1], 
     'n_estimators':[100, 200, 500, 1000],
     'max_depth': [2,3,4,5,6],
     'learning_rate': [0.1, 0.01, 0.5]
}


In [13]:
xgb = XGBRegressor(random_state=42)

In [14]:
xgb_cv_model= GridSearchCV(xgb, 
                      param_grid = xgb_grid, 
                      cv = 10, 
                      n_jobs = -1,
                      verbose = 2)

* For the most optimum model, multiple parameters are crossed with each other to find the best parameters.




* This process takes a long time. The **n_jobs=-1** variable can be used to reduce this time. This variables allows the processor to run at full performance.



* The _**verbose**_ parameter shows us in detail what operations have been performed for how long and as shown below.




In [15]:
xgb_cv_model.fit(X_train, y_train)

Fitting 10 folds for each of 300 candidates, totalling 3000 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    2.6s
[Parallel(n_jobs=-1)]: Done 240 tasks      | elapsed:   13.9s
[Parallel(n_jobs=-1)]: Done 646 tasks      | elapsed:   34.1s
[Parallel(n_jobs=-1)]: Done 1212 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 1777 tasks      | elapsed:  1.7min
[Parallel(n_jobs=-1)]: Done 2222 tasks      | elapsed:  2.3min
[Parallel(n_jobs=-1)]: Done 2749 tasks      | elapsed:  3.0min
[Parallel(n_jobs=-1)]: Done 3000 out of 3000 | elapsed:  3.4min finished


GridSearchCV(cv=10,
             estimator=XGBRegressor(base_score=None, booster=None,
                                    colsample_bylevel=None,
                                    colsample_bynode=None,
                                    colsample_bytree=None, gamma=None,
                                    gpu_id=None, importance_type='gain',
                                    interaction_constraints=None,
                                    learning_rate=None, max_delta_step=None,
                                    max_depth=None, min_child_weight=None,
                                    missing=nan, monotone_constraints=None,
                                    n_estimators=100, n_jobs=None,
                                    num_parallel_tree=None, random_state=42,
                                    reg_alpha=None, reg_lambda=None,
                                    scale_pos_weight=None, subsample=None,
                                    tree_method=None, validate_param

In [16]:
xgb_cv_model.best_params_

{'colsample_bytree': 0.4,
 'learning_rate': 0.1,
 'max_depth': 2,
 'n_estimators': 200}

### 4.1)Tuned Model

In [17]:
xgb_tuned = XGBRegressor(colsample_bytree = 0.4, 
                         learning_rate = 0.1, 
                         max_depth = 2, 
                         n_estimators = 200) 

In [18]:
xgb_tuned = xgb_tuned.fit(X_train,y_train)

In [19]:
y_pred1 = xgb_tuned.predict(X_test)

In [20]:
test_error_after=np.sqrt(mean_squared_error(y_test, y_pred1))
test_error_after # test error after model tuning

359.7887894342941