# Gradient Boosting Machines

***AdaBoost'un sınıflandırma ve regresyon problemlerine kolayca uyarlanabilen genelleştirilmiş versiyonudur.***

***Artıklar üzerine tek bir tahminsel model formunda olan modeller serisi kurulur.***

# Model

In [1]:
import numpy as np
import pandas as pd 
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
from sklearn.preprocessing import scale 
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.metrics import roc_auc_score,roc_curve
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

from warnings import filterwarnings
filterwarnings('ignore')

diabetes = pd.read_csv("diabetes.csv")

In [2]:
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
# X = df["Pregnancies"]
X = pd.DataFrame(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)

In [3]:
from sklearn.ensemble import GradientBoostingClassifier

In [4]:
gbm_model = GradientBoostingClassifier().fit(X_train,y_train)

In [5]:
y_pred = gbm_model.predict(X_test)
accuracy_score(y_test, y_pred)

0.7489177489177489

# Model Tuning

In [6]:
?gbm_model

[0;31mType:[0m        GradientBoostingClassifier
[0;31mString form:[0m GradientBoostingClassifier()
[0;31mLength:[0m      100
[0;31mFile:[0m        /opt/anaconda3/lib/python3.9/site-packages/sklearn/ensemble/_gb.py
[0;31mDocstring:[0m  
Gradient Boosting for classification.

GB builds an additive model in a
forward stage-wise fashion; it allows for the optimization of
arbitrary differentiable loss functions. In each stage ``n_classes_``
regression trees are fit on the negative gradient of the
binomial or multinomial deviance loss function. Binary classification
is a special case where only a single regression tree is induced.

Read more in the :ref:`User Guide <gradient_boosting>`.

Parameters
----------
loss : {'deviance', 'exponential'}, default='deviance'
    The loss function to be optimized. 'deviance' refers to
    deviance (= logistic regression) for classification
    with probabilistic outputs. For loss 'exponential' gradient
    boosting recovers the AdaBoost algori

In [14]:
gbm_params = {"learning_rate" : [0.001, 0.01, 0.1, 0.05],
             "n_estimators": [100,500,100],
             "max_depth": [3,5,10],
             "min_samples_split": [2,5,10]}

In [15]:
gbm = GradientBoostingClassifier()

gbm_cv = GridSearchCV(gbm, gbm_params, cv = 10, n_jobs = -1, verbose = 2)

In [16]:
gbm_cv.fit(X_train, y_train)

Fitting 10 folds for each of 108 candidates, totalling 1080 fits
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=2, n_estimators=500; total time=   0.6s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=5, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=5, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=5, n_estimators=500; total time=   0.5s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=5, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=5, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.001, max_depth=3, min_samples_split=5, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.0

GridSearchCV(cv=10, estimator=GradientBoostingClassifier(), n_jobs=-1,
             param_grid={'learning_rate': [0.001, 0.01, 0.1, 0.05],
                         'max_depth': [3, 5, 10],
                         'min_samples_split': [2, 5, 10],
                         'n_estimators': [100, 500, 100]},
             verbose=2)

In [21]:
print("En iyi parametreler: " + str(gbm_cv.best_params_))

En iyi parametreler: {'learning_rate': 0.1, 'max_depth': 3, 'min_samples_split': 5, 'n_estimators': 100}


In [37]:
gbm = GradientBoostingClassifier(learning_rate = 0.01, 
                                 max_depth = 3,
                                min_samples_split = 5,
                                n_estimators = 500)

In [38]:
gbm_tuned =  gbm.fit(X_train,y_train)

In [39]:
y_pred = gbm_tuned.predict(X_test)
accuracy_score(y_test, y_pred)

0.7489177489177489

[CV] END learning_rate=0.001, max_depth=10, min_samples_split=10, n_estimators=100; total time=   0.3s
[CV] END learning_rate=0.001, max_depth=10, min_samples_split=10, n_estimators=500; total time=   2.0s
[CV] END learning_rate=0.001, max_depth=10, min_samples_split=10, n_estimators=100; total time=   0.4s
[CV] END learning_rate=0.001, max_depth=10, min_samples_split=10, n_estimators=100; total time=   0.4s
[CV] END learning_rate=0.01, max_depth=3, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.01, max_depth=3, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END learning_rate=0.01, max_depth=3, min_samples_split=2, n_estimators=500; total time=   0.6s
[CV] END learning_rate=0.01, max_depth=3, min_samples_split=2, n_estimators=500; total time=   0.6s
[CV] END learning_rate=0.01, max_depth=3, min_samples_split=5, n_estimators=100; total time=   0.2s
[CV] END learning_rate=0.01, max_depth=3, min_samples_split=5, n_estimators=500; total t