<img src="https://www.th-koeln.de/img/logo.svg" style="float:right;" width="200">

# Use Case: <font color="#C70039">Interpretable Machine Learning with LIME for tabular data</font>
* Course: AML
* Lecturer: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Author of notebook: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Date:   02.12.2022

<img src="https://miro.medium.com/max/664/1*J1V-RIBHIcX-Aej0x7UXnA.png" style="float: center;" width="600">

---------------------------------

### Description 
This is one implementation example for LIME interpreting a random forest ensemble model that predicts tablua data. 
PLease try to understand the implementation.

---------------------------------

### Imports
Import all necessary python utilities for loading, preprocessing and predicting the data.

In [7]:
from utils import DataLoader
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, accuracy_score
from interpret.blackbox import LimeTabular
from interpret import show

# get rid of those LIME warnings
import warnings
warnings.filterwarnings('ignore')

### Load and preprocess the data

## load the data

In [12]:
data_loader = DataLoader() 
data_loader.load_dataset() 
data_loader.preprocess_data()

## Split the data for evaluation

In [13]:
X_train, X_test, y_train, y_test = data_loader.get_data_split()

## Oversample the train data

In [15]:
X_train, y_train = data_loader.oversample(X_train, y_train) 
print("X_train.shape", X_train.shape) 
print("X_test.shape", X_test.shape)

X_train.shape (7776, 21)
X_test.shape (1022, 21)


### Fit the blackbox model (Random Forest Classifier) to the stroke data
For the case of image explanations, perturbations will be generated by turning on and off some of the superpixels in the image.

In [16]:
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

print(f"F1 Score {f1_score(y_test, y_pred, average='macro')}")
print(f"Accuracy {accuracy_score(y_test, y_pred)}")

F1 Score 0.5153574066156178
Accuracy 0.9403131115459883


### Apply LIME
Initilize Lime for being used with tabular data

In [17]:
lime = LimeTabular(predict_fn=rf.predict_proba, data=X_train, random_state=1)

### Get local explanations and interpret the results

In [18]:
lime_local = lime.explain_local(X_test[-10:], y_test[-10:], name='LIME')
show(lime_local)