# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">3. Model Explainability with LIME - Utkarsh Gaikwad</p>

### Introduction to LIME

* [LIME](https://christophm.github.io/interpretable-ml-book/lime.html#lime-for-tabular-data) stands for Local Interpretable Model-agnostic Explanations. LIME focuses on training local surrogate models to explain individual predictions. Local surrogate models are interpretable models that are used to explain individual predictions of black box machine learning models. Surrogate models are trained to approximate the predictions of the underlying black box model. Instead of training a global surrogate model, LIME focuses on training local surrogate models.

* LIME is model-agnostic, meaning that it can be applied to any machine learning model. The technique attempts to understand the model by perturbing the input of data samples and understanding how the predictions change.

![LIME](https://miro.medium.com/v2/resize:fit:1165/1*k-rxjnvUDTwk8Jfg6IYBkQ.png)

* Model-specific approaches aim to understand the black model machine learning model by analysing the internal components and how they interact. LIME provides local model interpretability. LIME modifies a single data sample by tweaking the feature values and observes the resulting impact on the output. The most common question is probably: why was this prediction made or which variables caused the prediction.

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Reading Dataset</p>

In [None]:
import pandas as pd
df = pd.read_csv('./data/gemstone.csv')
df.head()

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Seperating X and Y</p>

In [None]:
X = df.drop(labels=['id','price'],axis=1)
Y = df[['price']]

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Train Test Split</p>

In [None]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(X,Y,test_size=0.2,random_state=42)

In [None]:
xtrain.shape

In [None]:
xtest.shape

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Performing preprocessing</p>

In [None]:
import pickle
with open('E:/Gemstone Price Prediction/artifacts/preprocessor.pkl','rb') as file:
    preprocessor = pickle.load(file)

In [None]:
xtrain_scaled = preprocessor.fit_transform(xtrain)
xtest_scaled = preprocessor.transform(xtest)

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Model Prediction</p>

In [None]:
import pickle
with open('E:/Gemstone Price Prediction/artifacts/model.pkl','rb') as file:
    model = pickle.load(file)

In [None]:
ytrain_pred = model.predict(xtrain_scaled)
ytest_pred = model.predict(xtest_scaled)

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Model Evaluation</p>

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
def evaluate_model(true, predicted):
    mae = mean_absolute_error(true, predicted)
    mse = mean_squared_error(true, predicted)
    rmse = np.sqrt(mean_squared_error(true, predicted))
    r2_square = r2_score(true, predicted)
    return mae, rmse, r2_square

In [None]:
# Evaluate Train and Test dataset
model_train_mae , model_train_rmse, model_train_r2 = evaluate_model(ytrain, ytrain_pred)
model_test_mae , model_test_rmse, model_test_r2 = evaluate_model(ytest, ytest_pred)

# Printing the Evaluation results
print('Model performance for Training set')
print("- Root Mean Squared Error: {:.4f}".format(model_train_rmse))
print("- Mean Absolute Error: {:.4f}".format(model_train_mae))
print("- R2 Score: {:.4f}".format(model_train_r2))

print('\n----------------------------------\n')
    
print('Model performance for Test set')
print("- Root Mean Squared Error: {:.4f}".format(model_test_rmse))
print("- Mean Absolute Error: {:.4f}".format(model_test_mae))
print("- R2 Score: {:.4f}".format(model_test_r2))

# <p style="padding:10px;background-color:#87CEEB ;margin:10;color:#000000;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 10px 10px ;overflow:hidden;font-weight:50">Model Interpretation with LIME</p>

### Create Explainer

In [None]:
from lime.lime_tabular import LimeTabularExplainer
features = list(preprocessor.get_feature_names_out())
explainer = LimeTabularExplainer(xtrain_scaled,
                                 feature_names=list(preprocessor.get_feature_names_out()), 
                                 class_names=['price'],
                                 verbose=True,
                                 mode='regression')

In [None]:
# Choose the 6th instance and use it to predict the results
j = 6
exp = explainer.explain_instance(xtest_scaled[j], model.predict, num_features=9)

In [None]:
# Show the predictions
exp.show_in_notebook(show_table=True)

In [None]:
exp.as_list()