# Machine Learning 301

## XGBoost (eXtreme Gradient Boosting)

* XGBoost, GBM'in hız ve tahmin performansını arttırmak üzere optimize edilmiş; ölçeklenebilir ve farklı plarformlara entegre edilebilir halidir.

![Screenshot%202021-10-09%20200234.png](attachment:Screenshot%202021-10-09%20200234.png)

### Model ve Tahmin 

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale
from sklearn.preprocessing import StandardScaler
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import neighbors
from sklearn.svm import SVR

In [2]:
df = pd.read_csv("Hitters.csv")
df = df.dropna()
dms = pd.get_dummies(df[["League", "Division", "NewLeague"]])
y = df["Salary"]
X_ = df.drop(["Salary", "League", "Division", "NewLeague"], axis = 1).astype("float64")
X = pd.concat([X_, dms[["League_N", "Division_W", "NewLeague_N"]]], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

In [3]:
!pip install xgboost

Collecting xgboost
  Downloading xgboost-1.4.2-py3-none-win_amd64.whl (97.8 MB)
Installing collected packages: xgboost
Successfully installed xgboost-1.4.2


In [3]:
import xgboost

In [4]:
from xgboost import XGBRegressor

In [5]:
xgb = XGBRegressor().fit(X_train, y_train)

In [6]:
xgb

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=12, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

In [7]:
?xgb

In [9]:
y_pred = xgb.predict(X_test)

In [10]:
np.sqrt(mean_squared_error(y_test, y_pred))

355.46515176059927

### Model Tuning 

In [11]:
xgb = XGBRegressor()
xgb

XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=None, gamma=None,
             gpu_id=None, importance_type='gain', interaction_constraints=None,
             learning_rate=None, max_delta_step=None, max_depth=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=100, n_jobs=None, num_parallel_tree=None,
             random_state=None, reg_alpha=None, reg_lambda=None,
             scale_pos_weight=None, subsample=None, tree_method=None,
             validate_parameters=None, verbosity=None)

In [12]:
xgb_params = {"learning_rate" : [0.1,0.01,0.5],
             "max_depth" : [2,3,4,5,8],
             "n_estimators" : [100,200,500,1000],
             "colsample_bytree" : [0.4,0,7,1]}

# learning_rate : overfittingi engellemek için kullanılan hiperparametredir daraltma adım boyurunu belirtir.
# colsample_bytree : oluşturalacak olan ağaçlarda değişkenklerden alınacak olan altkme oranlarını ifade eder

In [13]:
xgb_cv_model = GridSearchCV(xgb, xgb_params, cv = 10, n_jobs = -1, verbose = 2).fit(X_train, y_train)

Fitting 10 folds for each of 240 candidates, totalling 2400 fits


 0.57278392 0.57241906 0.59627601 0.59585321 0.59602367 0.59602596
 0.56627225 0.56584503 0.5658342  0.56583406 0.57023134 0.569948
 0.56994177 0.56994177 0.32703361 0.57077995 0.56374141 0.57963507
 0.32221587 0.56248674 0.55785125 0.55621181 0.30807737 0.57732402
 0.57953951 0.57670565 0.29421743 0.56952245 0.58280764 0.57952472
 0.28035705 0.56281073 0.59503778 0.58946485 0.51390903 0.50926029
 0.50756829 0.50755878 0.57517485 0.57462365 0.57462635 0.57462637
 0.43909175 0.43910875 0.43910875 0.43910875 0.47308451 0.47308454
 0.47308455 0.47308456 0.47808034 0.47808034 0.47808034 0.47808034
 0.55023658 0.54598233 0.55363043 0.56001932 0.53831157 0.5328179
 0.53040733 0.52607865 0.51329184 0.49417546 0.48304315 0.48139042
 0.48899813 0.47522972 0.46913108 0.46847782 0.45499029 0.44884559
 0.44813288 0.44812242 0.1449153  0.44746244 0.56691207 0.58000052
 0.12295829 0.43885329 0.57500504 0.5902817  0.11466295 0.4330343
 0.56953597 0.57868034 0.11104419 0.4262279  0.54483407 0.55690445

In [14]:
xgb_cv_model.best_params_

{'colsample_bytree': 0.4,
 'learning_rate': 0.1,
 'max_depth': 2,
 'n_estimators': 1000}

In [15]:
xgb_tuned = XGBRegressor(colsample_bytree = 0.4, learning_rate = 0.1, max_depth = 2, n_estimators = 1000).fit(X_train, y_train)

In [16]:
y_pred = xgb_tuned.predict(X_test)

In [17]:
np.sqrt(mean_squared_error(y_test, y_pred))

367.8515299923177