# $$LightGBM\ Feature\ Importance\ Tutorial$$

*Original source of the notebook: https://github.com/catboost/tutorials/blob/master/model_analysis/feature_importance_tutorial.ipynb*
**Credits to the creators of the original notebook.**

### Weights and Biases

In [None]:
%%bash
# Install W&B client
git clone https://github.com/neomatrix369/client
cd client
git checkout add_lightgbm_plot_feature_importances_functionality

python setup.py install

# when the functionality is released on PyPI you would do the below:
#!pip install wandb

In [None]:
# Go to https://www.wandb.com/, create an account or login if you already have an account
# Create your project and ensure you have obtained a W&B Token (after logging in and creating your project)

!wandb login ${WANDB_TOKEN}

In [1]:
import wandb
from wandb.lightgbm import plot_feature_importances, wandb_callback

In [2]:
# Here you would enter the name of your project created in the above step

wandb.init(project='lightgbm_plot_feature_importances')

W&B Run: https://app.wandb.ai/neomatrix369/lightgbm_plot_feature_importances/runs/1a2s5c54

### LightGBM Feature Importance

#### Sometimes it is very important to understand which feature made the greatest contribution to the final result. To do this, the LightGBM model has a feature_importance() method.

In [3]:
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_classification

#### First, let's prepare the dataset:

In [4]:
X, y = make_classification(n_samples=10000, n_features=10, n_informative=2, n_redundant=5, random_state=42)

dataset = lgb.Dataset(X, y)

In [5]:
ten_features = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

#### Let's train LightGBM:

In [6]:
%%time
params = {'learning_rate': 0.01, 'max_depth': -1, 'num_leaves': 4, 'objective': 'fair', 'boosting': 'gbdt',
              'boost_from_average': True, 'feature_fraction': 0.9, 'bagging_freq': 1, 'bagging_fraction': 0.5,
              'early_stopping_rounds': 200, 'metric': 'rmse', 'max_bin': 255, 'n_jobs': -1, 'verbosity': -1,
              'bagging_seed': 1234}
model = lgb.train(params, dataset, valid_sets=[dataset], valid_names=['train'], callbacks=[wandb_callback()])



[1]	train's rmse: 0.494761
Training until validation scores don't improve for 200 rounds
[2]	train's rmse: 0.489477
[3]	train's rmse: 0.484513
[4]	train's rmse: 0.479594
[5]	train's rmse: 0.474535
[6]	train's rmse: 0.469726
[7]	train's rmse: 0.465129
[8]	train's rmse: 0.460319
[9]	train's rmse: 0.455888
[10]	train's rmse: 0.45157
[11]	train's rmse: 0.447242
[12]	train's rmse: 0.443106
[13]	train's rmse: 0.438825
[14]	train's rmse: 0.434828
[15]	train's rmse: 0.430664
[16]	train's rmse: 0.426871
[17]	train's rmse: 0.422981
[18]	train's rmse: 0.419085
[19]	train's rmse: 0.415286
[20]	train's rmse: 0.41162
[21]	train's rmse: 0.408093
[22]	train's rmse: 0.404503
[23]	train's rmse: 0.400969
[24]	train's rmse: 0.397703
[25]	train's rmse: 0.3946
[26]	train's rmse: 0.391362
[27]	train's rmse: 0.388256
[28]	train's rmse: 0.385211
[29]	train's rmse: 0.382057
[30]	train's rmse: 0.379212
[31]	train's rmse: 0.37634
[32]	train's rmse: 0.373503
[33]	train's rmse: 0.370766
[34]	train's rmse: 0.36803
[

### W&B Feature Importance visualisation

In [7]:
plot_feature_importances(model, feature_names=ten_features)

Go to your dashboard to see the LightGBM's Feature Importances graph plotted.


<wandb.viz.Visualize at 0x7ff4e0804e48>

## You should see a nice bar chart with the title Feature importances after training with the LightGBM model, in your W&B Dashboard on https://app.wandb.ai/[your_username]/[your_project_name]/runs/[your_run_id]