<a href="https://colab.research.google.com/github/stasSajin/colab_notebooks/blob/master/LightGBM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualize LightGBM performance in one line of code
Gradient boosting decision trees are the state of the art when it comes to building predictive models for structured data.

[LigthGBM](https://github.com/microsoft/LightGBM), a gradient boosting framework by Microsoft, has recently dethroned xgboost and become the go to GBDT algorithm (along with catboost). It outperforms xgboost in training speeds, memory usage and the size of datasets it can handle. LightGBM does so by using histogram-based algorithms to bucket continuous features into discrete bins during training.

We want to make it incredible easy for people to look under the hood of their models, so we built a callback that helps you visualize your LightGBM’s performance in just one line of code.

<img src="https://paper-attachments.dropbox.com/s_64984776E9C3600B2F541866A33ED2467F8DEC17543DF8244C2E643993740430_1578873986950_image.png" alt="lightgbm_performance" style="width: 50%"/>


In [0]:
!pip install wandb -qq

In [0]:
!wget https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.train -qq
!wget https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.test -qq

In [0]:
# import wandb and the lightgbm callback
import wandb
from wandb.lightgbm import wandb_callback
import lightgbm as lgb
import pandas as pd
from sklearn.metrics import mean_squared_error

# initialize a new wandb project
wandb.init(project='lightgbm')

# load or create your dataset
df_train = pd.read_csv('regression.train', header=None, sep='\t')
df_test = pd.read_csv('regression.test', header=None, sep='\t')

y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)

# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': {'rmse', 'l2', 'l1', 'huber'},
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbosity': -1
}
# wandb.config.update(params)

In [0]:
%%wandb
# train
# add lightgbm callback
gbm = lgb.train(params,
                lgb_train,
                num_boost_round=20,
                valid_sets=lgb_eval,
                valid_names=('validation'),
                callbacks=[wandb_callback()],
                early_stopping_rounds=5)

# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)

[1]	validation's l2: 0.243481	validation's l1: 0.492417	validation's huber: 0.121741	validation's rmse: 0.493438
Training until validation scores don't improve for 5 rounds.
[2]	validation's l2: 0.240045	validation's l1: 0.48874	validation's huber: 0.120023	validation's rmse: 0.489944
[3]	validation's l2: 0.236636	validation's l1: 0.485042	validation's huber: 0.118318	validation's rmse: 0.486452
[4]	validation's l2: 0.232959	validation's l1: 0.480872	validation's huber: 0.116479	validation's rmse: 0.482658
[5]	validation's l2: 0.229684	validation's l1: 0.476928	validation's huber: 0.114842	validation's rmse: 0.479254
[6]	validation's l2: 0.226942	validation's l1: 0.473545	validation's huber: 0.113471	validation's rmse: 0.476384
[7]	validation's l2: 0.223972	validation's l1: 0.469984	validation's huber: 0.111986	validation's rmse: 0.473256
[8]	validation's l2: 0.220928	validation's l1: 0.466083	validation's huber: 0.110464	validation's rmse: 0.47003
[9]	validation's l2: 0.217949	validat