In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# LIME - Local Interpretable Model-Agnostic Explanations 

![](https://miro.medium.com/max/2000/1*Lo4tT2xLY7cEnTzTSb27Qg.jpeg)

***The interpretation of machine learning models has become of prime importance nowadays. The lime stands for local interpretable model agnostic explanations takes any machine learning models as input and generates explanations about feature contributions in making a prediction. It assumes that is a black box model which means that it does not know the inner workings of models and generates explanation based on this assumption. ***

**Goal of this notebook is to share the basic working example of LIME on the data  and show the interpretations and visualizations.**

In [1]:
import lime
from lime import lime_tabular
import lightgbm as lgb
from xgboost import XGBClassifier, XGBRFRegressor
from sklearn.model_selection import KFold, train_test_split

In [1]:
df_train = pd.read_csv("../input/song-popularity-prediction/train.csv")
df_test = pd.read_csv("../input/song-popularity-prediction/test.csv")
submission = pd.read_csv("../input/song-popularity-prediction/sample_submission.csv")

In [1]:
df_train.drop('id', axis=1,inplace=True)
df_test.drop('id', axis=1,inplace=True)

**Lets drop na values as LIME has some issues handling NA values**

In [1]:
df_train.dropna(inplace=True)
df_train.reset_index(drop=True, inplace=True)

df_test.dropna(inplace=True)
df_test.reset_index(drop=True, inplace=True)

In [1]:
df_train.head()

In [1]:
FEATURES = [
    "song_duration_ms",
    "acousticness",
    "danceability",
    "energy",
    "instrumentalness",
    "key",
    "liveness",
    "loudness",
    "audio_mode",
    "speechiness",
    "tempo",
    "time_signature",
    "audio_valence",
]

In [1]:
len(FEATURES)

In [1]:
dep_var = 'song_popularity'

In [1]:
X = df_train[FEATURES]
y = df_train[dep_var]

In [1]:
X.shape,y.shape

In [1]:
#param_lgb are taken from https://www.kaggle.com/venkatkumar001/spp2-lgbm , with some modification

params_lgb = {
    "task": "train",
    "boosting_type": "gbdt",
    "objective": "binary",
    'subsample': 0.95312,
    'learning_rate': 0.001635,
    "max_depth": 5,
    "feature_fraction": 0.2256038826485174,
    "bagging_fraction": 0.7705303688019942,
    "min_child_samples": 290,
    "reg_alpha": 14.68267919457715,
    "reg_lambda": 66.156,
    "max_bin": 772,
    "min_data_per_group": 177,
    "bagging_freq": 1,
    "cat_smooth": 96,
    "cat_l2": 17,
    "verbosity": -1,
    'random_state':42,
    'n_estimators':8000,
    'colsample_bytree':0.1107
    }

# Model Train

In [1]:
lgb_train = lgb.Dataset(X, y)

model = lgb.train(params=params_lgb,
                      train_set=lgb_train,
                      verbose_eval=False)

In [1]:
# this is required as LIME requires class probabilities in case of classification example
# LightGBM directly returns probability for class 1 by default 
def prob(data):
    return np.array(list(zip(1-model.predict(data),model.predict(data))))

# Model interpretation

To start explaining the model, you first need to import the LIME library and create a tabular explainer object. It expects the following parameters:
* **training_data** – our training data generated with train/test split. It must be in a Numpy array format.
* **feature_names** – column names from the training set
* **class_names** – distinct classes from the target variable
* **mode** – type of problem you’re solving (classification in this case)

In [1]:
explainer = lime.lime_tabular.LimeTabularExplainer(
    X[model.feature_name()].astype(int).values, 
    feature_names=model.feature_name(),
    training_labels=df_train[dep_var],
    mode='classification')

Nou we call the explain_instance function of the explainer object to, well, explain the prediction. The following parameters are required:
* **data_row** – a single observation from the dataset
* **predict_fn** – a function used to make predictions. The predict_proba from the model is a great option because it shows probabilities

The **show_in_notebook** function shows the prediction interpretation in the notebook environment

This is how the explanations look for some of the training data

In [1]:
# asking for explanation for LIME model
i = 3
exp = explainer.explain_instance(X.loc[i,FEATURES].astype(int).values, prob)
exp.show_in_notebook(show_table=True)

In [1]:
# asking for explanation for LIME model
i = 0
exp = explainer.explain_instance(X.loc[i,FEATURES].astype(int).values, prob)
exp.show_in_notebook(show_table=True)

In [1]:
# asking for explanation for LIME model
i = 15000
exp = explainer.explain_instance(X.loc[i,FEATURES].astype(int).values, prob)
exp.show_in_notebook(show_table=True)

# Continue with the more examples !!!! Have fun . Upvote will be appreciated :)

# Conclusion

**Interpreting machine learning models is simple using LIME. It provides you with a great way of explaining what’s going on . You don’t have to worry about data visualization, as the LIME library handles that for you.**

# Happy Learning !!!

**References:** 
* https://www.youtube.com/watch?v=CY3t11vuuOM
* https://coderzcolumn.com/tutorials/machine-learning/how-to-use-lime-to-understand-sklearn-models-predictions
* https://towardsdatascience.com/decrypting-your-machine-learning-model-using-lime-5adc035109b5
* https://towardsdatascience.com/lime-how-to-interpret-machine-learning-models-with-python-94b0e7e4432e
* https://coderzcolumn.com/tutorials/machine-learning/how-to-use-lime-to-understand-sklearn-models-predictions