# Light GBM 
- Light GBM, XGBoost eğitim süresi performansını arttırmaya yönelik geliştirilen bir diğer GBM türüdür.
- XGBoost a göre daha kısa eğitim süresi sunmaktadır.
- Daha performanslıdır.
- karar ağaçlarına (Desicion Tree) dayanmaktadır.
- Level-wise büyüme stratejisi yerine Leaf-wise büyüme stratejisidir.
- XGBoost Breadth-first search (BFS, Geniş Kapsamlı İlk Arama) algoritmasını yapar, Light GBM ise  yerine depth-first search(DFS, Derinlemesine İlk Arama) algoritmasını yapar. 

**Gerekli Kütüphaneler** 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import scale, StandardScaler
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn import  neighbors
from sklearn.svm import SVR

In [2]:
# Uyarı Mesajları ile Karşılaşmamak için bu kütüphaneyi kullanıyoruz.
from warnings import filterwarnings
filterwarnings("ignore")

# Light GBM - Model ve Tahmin İşlemleri 

In [6]:
df = pd.read_csv("./Hitters.csv")
# bu csv dosyasının içerisinde eksik gözlemleri(NA) çıkardık.
df = df.dropna()

# Veri seti içerisindeki kategorik değişkenleri dummy değişkenlere çeviriyoruz.
dms = pd.get_dummies(df[["League","Division","NewLeague"]])

# bağımlı değişken
y = df[["Salary"]]

# Veri Setinin içerisinden Bağımlı Değişkeni ve Kategorik değişkenlerin ilk hallerini dışarı bırakıyoruz.
X_ = df.drop(["Salary","League","Division","NewLeague"], axis = 1).astype("float64")

# dms ile X_ birleştirip(concat) bağımsız değişken oluşturduk.
X = pd.concat([X_, dms[["League_N","Division_W","NewLeague_N"]]], axis=1)

# train ve test setlerimizi oluşturuyoruz.
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size = 0.25,
                                                    random_state= 42) 

In [4]:
!pip install lightgbm 

Collecting lightgbm
  Downloading lightgbm-3.1.1-py2.py3-none-win_amd64.whl (754 kB)
Installing collected packages: lightgbm
Successfully installed lightgbm-3.1.1


In [5]:
import lightgbm
from lightgbm import LGBMRegressor

**Model**

In [7]:
lightbgm_model = LGBMRegressor().fit(X_train,y_train)

In [10]:
?lightbgm_model

[1;31mType:[0m           LGBMRegressor
[1;31mString form:[0m    LGBMRegressor()
[1;31mFile:[0m           c:\users\halil\anaconda3\lib\site-packages\lightgbm\sklearn.py
[1;31mDocstring:[0m      LightGBM regressor.
[1;31mInit docstring:[0m
Construct a gradient boosting model.

Parameters
----------
boosting_type : string, optional (default='gbdt')
    'gbdt', traditional Gradient Boosting Decision Tree.
    'dart', Dropouts meet Multiple Additive Regression Trees.
    'goss', Gradient-based One-Side Sampling.
    'rf', Random Forest.
num_leaves : int, optional (default=31)
    Maximum tree leaves for base learners.
max_depth : int, optional (default=-1)
    Maximum tree depth for base learners, <=0 means no limit.
learning_rate : float, optional (default=0.1)
    Boosting learning rate.
    You can use ``callbacks`` parameter of ``fit`` method to shrink/adapt learning rate
    in training using ``reset_parameter`` callback.
    Note, that this will ignore the ``learning_rate`` 

**Tahmin**

In [12]:
#ilkel hatamız
y_pred = lightbgm_model.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

363.8712087611089

## Model Tuning

In [19]:
lbgm_model = LGBMRegressor().fit(X_train,y_train)
lbgm_model

LGBMRegressor()

In [22]:
lgbm_params = {"learning_rate": [0.01,0.1,0.5,1],
              "n_estimators": [20,40,100,200,500,1000],
              "max_depth":[1,2,3,4,5,6,7,8,9,10]}

In [23]:
lbgm_cv_model = GridSearchCV(lbgm_model,
                             lgbm_params,
                             cv=10,
                             n_jobs=-1,
                             verbose=2).fit(X_train,y_train)

Fitting 10 folds for each of 240 candidates, totalling 2400 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done 504 tasks      | elapsed:    5.9s
[Parallel(n_jobs=-1)]: Done 1316 tasks      | elapsed:   15.0s
[Parallel(n_jobs=-1)]: Done 2400 out of 2400 | elapsed:   28.1s finished


In [24]:
lbgm_cv_model.best_params_

{'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 20}

**Final Modeli**

In [25]:
lgbm_tuned = LGBMRegressor(learning_rate=0.1,max_depth=6,n_estimators=20).fit(X_train,y_train)

In [26]:
y_tuned_pred = lgbm_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_tuned_pred))

371.5044868943621