![LightGBM](https://i.ibb.co/PFRbFKw/LightGBM.png)

- **ML Part 1** - Logistic Regression
- **ML Part 2** - K-Nearest Neighbors (KNN)
- **ML Part 3** - Support Vector Machine (SVM)
- **ML Part 4** - Artificial Neural Network (NN)
- **ML Part 5** - Classification and Regression Tree (CART)
- **ML Part 6** - Random Forests
- **ML Part 7** - Gradient Boosting Machines (GBM)
- **ML Part 8** - XGBoost
- **ML Part 9 - LightGBM**
- **ML Part 10** - CatBoost

## What is LightGBM?
With the rapid increase in data size and variety in recent years, the importance given to algorithm optimizations is increasing. For this reason, as an alternative to the Gradient Boosting algorithm, algorithms that can accept versions of Gradient Boosting such as XGBoost, LightGBM, Catboost have been developed. It is aimed to achieve faster training and higher accuracy with these algorithms.

LightGBM is a boosting algorithm developed in 2017 as part of the Microsoft DMTK (Distributed Machine Learning Toolkit) project. Compared to other boosting algorithms, it has advantages such as high processing speed, ability to process large data, less resource (RAM) usage, high prediction rate, parallel learning and support for GPU learning. According to the article "LightGBM: A Highly Ef Ô¨Å cient Gradient Boosting Decision Tree", in which the model is introduced, it has been concluded that LightGBM is 20 times faster than other models.



## HOW DOES IT WORK?
LightGBM is a histogram-based algorithm. It reduces calculation cost by making variables with continuous value discrete. The training time of decision trees is directly proportional to the calculation and therefore the number of divisions. Thanks to this method, both training time is shortened and resource use is reduced.

Two strategies, level-wise or depth-wise, or leaf-wise, can be used in learning in decision trees. With the level-oriented strategy, the balance of the tree is maintained while the tree grows. In the leaf-oriented strategy, the division process from the leaves, which reduces the loss, continues. LightGBM differs from other boosting algorithms thanks to this feature. The model has less error rate and learns faster with leaf-oriented strategy. However, the leaf-focused growth strategy causes the model to be prone to over-learning in cases where the number of data is low. Therefore, the algorithm is more suitable for use in big data. In addition, parameters such as tree depth and the number of leaves can be optimized to prevent excessive learning.

![](https://www.veribilimiokulu.com/wp-content/uploads/2020/04/Capture1.png)

LightGBM also uses two techniques different from other algorithms. These are Gradient Based One Way Sampling and Special Variable Package that deals with the number of data samples and variables.

Gradient-based One-Side Sampling (GOSS): GOSS aims to reduce the number of data while maintaining the accuracy of decision trees. Traditional Gradient Boosting scans through all data samples to calculate the information gain for each variable, but GOSS only uses the key data. Thus, the number of data is reduced without much affecting the distribution of the data.

Exclusive Feature Bundling (EFB): EFB aims to reduce the number of variables without damaging the accuracy rate and increase the efficiency of model training accordingly. EFB has two process steps. These are creating packages and combining variables in the same package. With EFB, sparse features are combined to create more intense features. Accordingly, it leads to a decrease in complexity and faster training with lower memory consumption.

In summary, EFB combines variables to reduce dimensionality, while GOSS reduces data size to compute knowledge acquisition by neglecting less important data. With these two functions, LightGBM increases the efficiency of the training process.



## PARAMETER OPTIMIZATION

![](https://www.veribilimiokulu.com/wp-content/uploads/2020/04/walking-on-magical-tree-59-1366x768-1.jpg)

Learning_rate, max_dept, num_leaves, min_data_in_leaf parameters can be optimized to prevent excessive learning in LightGBM, feature_fraction, bagging_fraction and num_iteration parameters can be optimized to speed up learning time.

**Num_leaves** is the number of leaves to be found in the tree. It is the most important parameter used in controlling the complexity of the tree. It must be less than 2 ^ (max_dept) to avoid over-learning. For example, when max_depth = 7, setting the num_leaves value to 127 can cause over learning. Setting it to 70 or 80 may achieve better accuracy.

**Max_dept** is used to limit the depth of the tree to be built. It should be optimized to avoid over-learning. Too much branching will lead to excessive learning, less branching will lead to incomplete learning.

**Min_data_in_leaf** is one of the important parameters to be used to prevent over learning. Its optimum value depends on the data size and num_leaves. Setting it to a large value can inhibit the growth of the tree and cause incomplete learning.

**Learning_rate** is a value between 0-1 to scale installed trees. Smallness of this value will help better prediction power. However, it will increase the duration of education and increase the possibility of over learning.

**Feature_fraction,** the variable to be used in each iteration; bagging_fraction are parameters by which the number of data to be used in each iteration can be set. Num_iteration is the number of iterations to be made in the learning process. The feature_fraction, bagging_fraction and num_iteration numbers are directly related to the learning period. The less these numbers are, the less the education period will be. However, it is very important to pay attention to incomplete learning. The optimum number can be found by making many trials.

In [None]:
# Import the necessary packages
import numpy as np
import pandas as pd

import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline

import sklearn
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler, LabelEncoder, MinMaxScaler
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import f1_score, recall_score, precision_score, confusion_matrix
from sklearn.metrics import r2_score, roc_auc_score, roc_curve, classification_report
from lightgbm.sklearn import LGBMClassifier

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Import and read dataset
input_ = "../input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv"
df = pd.read_csv(input_)

df.head(10)

In [None]:
plt.figure(figsize=(15,15))
sns.heatmap(df.corr(),annot=True)
plt.show()

In [None]:
df.describe()

In [None]:
x = df.drop(columns='DEATH_EVENT')
y = df['DEATH_EVENT']

model = LGBMClassifier()
model.fit(x,y)
print(model.feature_importances_)
feat_importances = pd.Series(model.feature_importances_, index=x.columns)
feat_importances.nlargest(12).plot(kind='barh')
plt.show()

In [None]:
for i in range(0,len(df.columns)):
    print("{} = {}".format(i,df.columns[i]))

In [None]:
# Delete outlier
df = df[df['ejection_fraction']<70]

In [None]:
inp_data = df.drop(df[['DEATH_EVENT']], axis=1)
#inp_data = df.iloc[:,[11,6,2,4,7,0,8]]
out_data = df[['DEATH_EVENT']]

X_train, X_test, y_train, y_test = train_test_split(inp_data, out_data, test_size=0.2, random_state=0, shuffle=True)

## Applying Transformer
#sc= StandardScaler()
#sc = MinMaxScaler()
#X_train = sc.fit_transform(X_train)
#X_test = sc.fit_transform(X_test)

In [None]:
## X_train, X_test, y_train, y_test Shape

print("X_train Shape : ", X_train.shape)
print("X_test Shape  : ", X_test.shape)
print("y_train Shape : ", y_train.shape)
print("y_test Shape  : ", y_test.shape)

In [None]:
# I coded this method for convenience and to avoid writing the same code over and over again

def result(clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    
    print('Accuracy Score    : {:.4f}'.format(accuracy_score(y_test, y_pred)))
    print('LightGBM f1-score      : {:.4f}'.format(f1_score( y_test , y_pred)))
    print('LightGBM precision     : {:.4f}'.format(precision_score(y_test, y_pred)))
    print('LightGBM recall        : {:.4f}'.format(recall_score(y_test, y_pred)))
    print("LightGBM roc auc score : {:.4f}".format(roc_auc_score(y_test,y_pred)))
    print("\n",classification_report(y_pred, y_test))
    
    plt.figure(figsize=(6,6))
    cf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap((cf_matrix / np.sum(cf_matrix)*100), annot = True, fmt=".2f", cmap="Blues")
    plt.title("LightGBM Confusion Matrix (Rate)")
    plt.show()
    
    cm = confusion_matrix(y_test,y_pred)
    plt.figure(figsize=(6,6))
    sns.heatmap(cm, annot=True, cmap="Blues",
                xticklabels=["FALSE","TRUE"],
                yticklabels=["FALSE","TRUE"],
                cbar=False)
    plt.title("LightGBM Confusion Matrix (Number)")
    plt.show()
    
    
def report(**params):
    scores = [] 
    for i in range(0,250): # 250 samples
        X_train, X_test, y_train, y_test = train_test_split(inp_data, out_data, test_size=0.2, shuffle=True)
        sc = StandardScaler()
        clf = LGBMClassifier(**params)
        X_train = sc.fit_transform(X_train)
        X_test = sc.fit_transform(X_test)
        clf.fit(X_train, y_train)
        scores.append(accuracy_score(clf.predict(X_test), y_test)) 
        
    Importance = pd.DataFrame({'Importance':clf.feature_importances_*100},index=df.columns[:12])
    Importance.sort_values(by='Importance',axis=0,ascending=True).plot(kind='barh',color='lightblue')
    plt.xlabel('Importance for variable');
    plt.hist(scores)
    plt.show()
    print("Best Score: {}\nMean Score: {}".format(np.max(scores), np.mean(scores)))

---

## Simple Metod
I applied XGBoost directly without changing anything and the result is as follows:

In [None]:
clf = LGBMClassifier()
result(clf)
report()

---

## Advanced Metod

In [None]:
param_grid = {
    'min_child_weight': np.arange(1,20,1),
    'colsample_bytree': np.linspace(0.5,2,11)
}

clf = LGBMClassifier()
grid = GridSearchCV(clf, param_grid, n_jobs=-1, verbose=2, cv=10)
grid.fit(X_train, y_train)
grid.best_params_

In [None]:
clf = LGBMClassifier(
    min_child_weight= 0.6,
    colsample_bytree= 0.65,
    n_jobs=-1
)

result(clf)

In [None]:
report(
    max_depth= 1,
    min_child_weight= 1,
)

---


In [None]:
param_grid = {
    "n_estimators": [10,100,1000,10000 ]
}

clf = LGBMClassifier()
grid = GridSearchCV(clf, param_grid, n_jobs=-1, verbose=2, cv=10)
grid.fit(X_train, y_train)
grid.best_params_

In [None]:
clf = LGBMClassifier(
    max_depth= 1,
    min_child_weight= 1,
    gamma = 0.0,
    colsample_bytree= 0.5,
    n_estimators=10
)

result(clf)

In [None]:
report(
    max_depth= 1,
    min_child_weight= 1,
    gamma = 0.0,
    colsample_bytree= 0.5,
    n_estimators=10
)

---

In [None]:
param_grid = {
 'reg_alpha': [0.001, 0.005, 0.01, 0.05]
}

clf = LGBMClassifier()
grid = GridSearchCV(clf, param_grid, n_jobs=-1, verbose=2, cv=10)
grid.fit(X_train, y_train)
grid.best_params_

In [None]:
clf = LGBMClassifier(
    max_depth= 1,
    min_child_weight= 1,
    gamma = 0.0,
    colsample_bytree= 0.5,
    n_estimators=10,
    reg_alpha=0.001
)

result(clf)

In [None]:
report(
    max_depth= 1,
    min_child_weight= 1,
    colsample_bytree= 0.5,
    n_estimators=10,
    reg_alpha=0.001
)

---

In [None]:
clf = LGBMClassifier(
    max_depth= 1,
    n_estimators=100,
    colsample_bytree=0.9,
    gamma=0.5,
    learning_rate=0.01,
    
)

result(clf)

In [None]:
report(
    max_depth= 1,
    n_estimators=100,
    colsample_bytree=0.9,
    gamma=0.5,
    learning_rate=0.01,
)