# **LGBM [ Light Gradient Boosted Machine ]**
<BR>

<img src="https://repository-images.githubusercontent.com/64991887/dc855780-e34b-11ea-9ab8-e08ca33288b0">

<br>

## **What is LGBM ?**

* Gradient boosting refers to a class of ensemble machine learning algorithms that can be used for classification or regression predictive modeling problems.
* LightGBM is a gradient boosting framework based on decision trees to increases the efficiency of the model and reduces memory usage.
* LightGBM is called “Light” because of its computation power and giving results faster. It takes less memory to run and is able to deal with large amounts of data. 
* It uses two techniques: **Gradient-based One Side Sampling** and **Exclusive Feature Bundling (EFB)**


##**How it differs from other tree based algorithm?**

* Light GBM grows tree vertically while other algorithm grows trees horizontally meaning that Light GBM grows tree leaf-wise while other algorithm grows level-wise.
* It will choose the leaf with max delta loss to grow. When growing the same leaf, Leaf-wise algorithm can reduce more loss than a level-wise algorithm.
* Below diagrams explain the implementation of LightGBM 

<img src="https://datascience.eu/wp-content/uploads/2019/12/Screenshot-2020-10-21-at-18.12.57.png">

##**When is LGBM used ?**
LightGBM is not for a small volume of datasets. It can easily overfit small data due to its sensitivity. It can be used for data having more than 10,000+ rows. There is no fixed threshold that helps in deciding the usage of LightGBM. It can be used for large volumes of data especially when one needs to achieve a high accuracy.

## **How is it Implemented ?**

* Implementation of Light GBM is easy, the only complicated thing is parameter tuning.
* It is very important for an implementer to know atleast some basic parameters of Light GBM.
* Hence below are the impotant parameters 

##**Parameters :**

* LightGBM has more than 100 parameters that are given in the documentation of LightGBM.
* Few important parameters and their usage is listed below :

--> **max_depth** : It sets a limit on the depth of tree. The default value is 20. It is effective in controlling over fitting.

--> **categorical_feature** : It specifies the categorical feature used for training model.

--> **bagging_fraction** : It specifies the fraction of data to be considered for each iteration.

--> **early_stopping_round**: This parameter can help you speed up your analysis. Model will stop training if one metric of one validation data doesn’t improve in last early_stopping_round rounds. This will reduce excessive iterations.

--> **lambda**: lambda specifies regularization. Typical value ranges from 0 to 1.

--> **num_iterations** : It specifies the number of iterations to be performed. The default value is 100.

--> **num_leaves** : It specifies the number of leaves in a tree. It should be smaller than the square of max_depth.

--> **max_bin** : It specifies the maximum number of bins to bucket the feature values.

--> **min_data_in_bin** : It specifies minimum amount of data in one bin.

--> **task** : It specifies the task we wish to perform which is either train or prediction. The default entry is train. Another possible value for this parameter is prediction.

--> **feature_fraction** : It specifies the fraction of features to be considered in each iteration. The default value is one.

##**Practical Implementation**
 
For quick implementation of the algorithm scikit-lean’s wrapper is used for the classifier.

As always, it starts by importing the model:

    from lightgbm import LGBMClassifier


The next step is to create an instance of the model while setting the objective. The options for the objective are regression for LGBMRegressor, binary or multi-class for LGBMClassifier, and LambdaRank for LGBMRanker.

    model = LGBMClassifier(objective=’multiclass’)


When fitting the model, categorical features can be set as follow:

    model.fit(X_train,y_train,categorical_feature=[0,3])


Once predictions are run on the model, one canobtain the important features:

    predictions = model.predict(X_test)importances = model.feature_importances_


##**LightGBM Advantages**
 
According to the official docs, here are the advantages of the LightGBM framework:

* Faster training speed and higher efficiency
* Lower memory usage
* Better accuracy
* Support of parallel and GPU learning
* Capable of handling large-scale data

<br>

##**LightGBM Applications**
 
LightGBM can be best applied to the following problems:

* Binary classification using the logloss objective function
* Regression using the L2 loss
* Multi-classification
* Cross-entropy using the logloss objective function
* LambdaRank using lambdarank with NDCG as the objective function
 

##**Conclusion**
 

* Hence LightGBM is considered to be a really fast algorithm and the most used algorithm in machine learning when it comes to getting fast and high accuracy results.
 

* In this Documentation, I have tried to give you the basic idea about the algorithm, different parameters that are used in the LightGBM algorithm.Hope it gives you brief understanding about the same !

####**References**

towards data science
Analytics Vidhya