# LightGBM

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

1. Faster training speed and higher efficiency.
2. Lower memory usage.
3. Better accuracy.
4. Support of parallel and GPU learning.
5. Capable of handling large-scale data.

A couple of years ago, Microsoft announced its gradient boosting framework LightGBM. Nowadays, it steals the spotlight in gradient boosting machines. Kagglers start to use LightGBM more than XGBoost. LightGBM is 6 times faster than XGBoost.

The size of dataset is increasing rapidly. It is become very difficult for traditional data science algorithms to give accurate results. Light GBM is prefixed as Light because of its high speed. Light GBM can handle the large size of data and takes lower memory to run.

Another reason why Light GBM is so popular is because it focuses on accuracy of results. LGBM also supports GPU learning and thus data scientists are widely using LGBM for data science application development.

### <span style="color:blue">It is not advisable to use LGBM on small datasets. Light GBM is sensitive to overfitting and can easily overfit small data.</span>

# LightGBM intuition

1. LightGBM is a gradient boosting framework that uses tree based learning algorithm.
2. LightGBM documentation states that -

LightGBM grows tree vertically while other tree based learning algorithms grow trees horizontally. 
It means that LightGBM grows tree leaf-wise while other algorithms grow level-wise. It will choose 
the leaf with max delta loss to grow. When growing the same leaf, leaf-wise algorithm can reduce more 
loss than a level-wise algorithm.

3. So, we need to understand the distinction between leaf-wise tree growth and level-wise tree growth.

# Leaf-wise tree growth

![image.png](attachment:image.png)

# Level-wise Tree Growth

Most decision tree learning algorithms grow tree by level (depth)-wise.

Level-wise tree growth can best be explained with the following visual -

![image.png](attachment:image.png)

# Important Points about tree growth

If we grow the full tree, best-first (leaf-wise) and depth-first (level-wise) will result in the same tree. The difference is in the order in which the tree is expanded. Since we don't normally grow trees to their full depth, order matters.

Application of early stopping criteria and pruning methods can result in very different trees. Because leaf-wise chooses splits based on their contribution to the global loss and not just the loss along a particular branch, it often (not always) will learn lower-error trees "faster" than level-wise.

For a small number of nodes, leaf-wise will probably out-perform level-wise. As we add more nodes, without stopping or pruning they will converge to the same performance because they will literally build the same tree eventually.

# XGBoost Vs LightGBM

XGBoost is a very fast and accurate ML algorithm. But now it's been challenged by LightGBM — which runs even faster with comparable model accuracy and more hyperparameters for users to tune.

The key difference in speed is because XGBoost split the tree nodes one level at a time and LightGBM does that one node at a time.

So XGBoost developers later improved their algorithms to catch up with LightGBM, allowing users to also run XGBoost in split-by-leaf mode (grow_policy = ‘lossguide’). Now XGBoost is much faster with this improvement, but LightGBM is still about 1.3X — 1.5X the speed of XGB.

Another difference between XGBoost and LightGBM is that XGBoost has a feature that LightGBM lacks — monotonic constraint. It will sacrifice some model accuracy and increase training time, but may improve model interpretability.

# LightGBM Parameters

# Control Parameters

max_depth : It describes the maximum depth of tree. This parameter is used to handle model overfitting. If you feel that your model is overfitted, you should to lower max_depth.

min_data_in_leaf : It is the minimum number of the records a leaf may have. The default value is 20, optimum value. It is also used to deal with overfitting.

feature_fraction: Used when your boosting is random forest. 0.8 feature fraction means LightGBM will select 80% of parameters randomly in each iteration for building trees.

bagging_fraction : specifies the fraction of data to be used for each iteration and is generally used to speed up the training and avoid overfitting.

early_stopping_round : This parameter can help you speed up your analysis. Model will stop training if one metric of one validation data doesn’t improve in last early_stopping_round rounds. This will reduce excessive iterations.

lambda : lambda specifies regularization. Typical value ranges from 0 to 1.

min_gain_to_split : This parameter will describe the minimum gain to make a split. It can used to control number of useful splits in tree.

max_cat_group : When the number of category is large, finding the split point on it is easily over-fitting. So LightGBM merges them into ‘max_cat_group’ groups, and finds the split points on the group boundaries, default:64.

# Core Parameters

ask : It specifies the task you want to perform on data. It may be either train or predict.

application : This is the most important parameter and specifies the application of your model, whether it is a regression problem or classification problem. LightGBM will by default consider model as a regression model.

regression : for regression 

    binary : for binary classification<br/>
    multiclass : for multiclass classification problem

boosting : defines the type of algorithm you want to run, default=gdbt.<br/>

    gbdt : traditional Gradient Boosting Decision Tree<br/>
    rf : random forest<br/>
    dart : Dropouts meet Multiple Additive Regression Trees<br/>
    goss : Gradient-based One-Side Sampling<br/>

num_boost_round : Number of boosting iterations, typically 100+

learning_rate : This determines the impact of each tree on the final outcome. GBM works by starting with an initial estimate which is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates. Typical values: 0.1, 0.001, 0.003…

num_leaves : number of leaves in full tree, default: 31

device : default: cpu, can also pass gpu

# Metrics Parameters

metric: again one of the important parameter as it specifies loss for model building. Below are few general losses for regression and classification.<br/>


mae : mean absolute error <br/>
mse : mean squared error <br/>
binary_logloss : loss for binary classification <br/>
multi_logloss : loss for multi classification <br/>

# IO Parameters

max_bin : it denotes the maximum number of bin that feature value will bucket in.

categorical_feature : It denotes the index of categorical features. If categorical_features=0,1,2 then column 0, column 1 and column 2 are categorical variables.

ignore_column : same as categorical_features just instead of considering specific columns as categorical, it will completely ignore them.

save_binary : If you are really dealing with the memory size of your data file then specify this parameter as ‘True’. Specifying parameter true will save the dataset to binary file, this binary file will speed your data reading time for the next time.