```
Last modified: 2021/10/10, @haewoon 
```


# Lab: Interpretable Machine Learning

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/haewoon/lab-interpretable-machine-learning/blob/master/Lab%20-%20Interpretable%20Machine%20Learning.ipynb)

## Step 0: Download restaurant review data

The data (restaurant review) is originally compiled from https://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools and preprocessed.

In [None]:
!gdown --id 1IMemmlNFVOqtz7KowN6RqTQjkP4JaVMB

## Step 1: Install LIME

The repo of LIME package is https://github.com/marcotcr/lime. <br/>
This lab code is also partly based on the tutorial code there.

In [None]:
!pip install lime

In [None]:
import lime
import sklearn
import sklearn.ensemble
import sklearn.metrics

## Step 2: Newsgroup (atheism and christianity) classification

We'll be using the [20 newsgroups dataset](https://scikit-learn.org/stable/datasets/real_world.html#the-20-newsgroups-text-dataset). <br/>
In particular, we'll focus on 2 groups: atheism and christianity.

### 2-1. Fetching data, training a classifier

In [None]:
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)
class_names = ['atheism', 'christian']

Let's use the tfidf vectorizer, commonly used for text.

In [None]:
vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(lowercase=False)
train_vectors = vectorizer.fit_transform(newsgroups_train.data)
test_vectors = vectorizer.transform(newsgroups_test.data)

We use random forests for classification. 
It's usually hard to understand what random forests are doing, especially with many trees.

In [None]:
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500) # number of trees
rf.fit(train_vectors, newsgroups_train.target)

In [None]:
pred = rf.predict(test_vectors)
sklearn.metrics.f1_score(newsgroups_test.target, pred, average='binary')

We see that this classifier achieves a very high F1 score (0.923). <br/>
However can we trust this classifier? 

[The sklearn guide to 20 newsgroups](https://scikit-learn.org/stable/datasets/real_world.html#filtering-text-for-more-realistic-training) indicates that Multinomial Naive Bayes overfits this dataset by learning irrelevant stuff, such as headers. <br/>
Let's see whether random forests do the same.

### 2-2. Explaining predictions using LIME

LIME explainers assume that classifiers act on raw text, but sklearn classifiers act on vectorized representation of texts. <br/>
To make LIME work for sklearn classifiers, we implement `predict_proba` on raw_text lists.

In [None]:
from sklearn.pipeline import make_pipeline
c = make_pipeline(vectorizer, rf)
c.predict_proba([newsgroups_test.data[0]])

Now we create an explainer object. We pass the ````class_names```` as an argument for prettier display.

In [None]:
from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names=class_names)

We then generate an explanation with at most 10 features for an arbitrary document in the test set.

In [None]:
idx = 83
NUM_FEATURES = 10
exp = explainer.explain_instance(newsgroups_test.data[idx], c.predict_proba, num_features=NUM_FEATURES)
print(f'Document id: {idx}')
print(f'Probability(christian) = {c.predict_proba([newsgroups_test.data[idx]])[0, 1]}')
print(f'True class: {class_names[newsgroups_test.target[idx]]}')

The classifier got this example right (it predicted atheism).  
The explanation is presented below as a list of weighted features. 

In [None]:
exp.as_list()

These weighted features are a linear model, which approximates the behaviour of the random forest classifier in the vicinity of the test example. <br/>
Roughly, if we remove 'Posting' and 'Host' from the document , the prediction should move towards the opposite class (Christianity). Let's see if this is the case.

In [None]:
print('Original prediction:', rf.predict_proba(test_vectors[idx])[0, 1])
tmp = test_vectors[idx].copy()
tmp[0, vectorizer.vocabulary_['Posting']] = 0
tmp[0, vectorizer.vocabulary_['Host']] = 0
print('Prediction removing some features:', rf.predict_proba(tmp)[0, 1])
print('Difference:', rf.predict_proba(tmp)[0, 1] - rf.predict_proba(test_vectors[idx])[0, 1])

The words that explain the prediction seem very **arbitrary** - not much to do with either Christianity or Atheism. <br/>
In fact, these are words that appear in the email headers (you will see this clearly soon), which make distinguishing between the classes much easier.

### 2-3. Visualizing explanations

The explanations can be returned as a matplotlib barplot:

In [None]:
%matplotlib inline
fig = exp.as_pyplot_figure()

The explanations can also be exported as an html page (which we can render here in this notebook), using D3.js to render graphs.  


In [None]:
exp.show_in_notebook(text=True)

Alternatively, we can save the fully contained html page to a file:

In [None]:
exp.save_to_file('test.html')

LIME explainer works for any classifier you may want to use, as long as it implements `predict_proba`.

## Step 3: Sentiment (positive and negative) classification

We'll be using the restaurant review dataset downloaded in Step 0.



### 3-1. Data preprocessing and training a classifier

In [None]:
import pandas as pd

df = pd.read_csv('restaurant.tsv', sep='\t')
df = df.query("sentiment == 'positive' or sentiment == 'negative'")
df['label'] = df['sentiment'].factorize()[0]
df.head()

In [None]:
class_names = ['negative', 'positive']

In [None]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2)

In [None]:
vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(lowercase=False)
train_vectors = vectorizer.fit_transform(train['text'])
test_vectors = vectorizer.transform(test['text'])

In [None]:
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train_vectors, train['label'])

In [None]:
pred = rf.predict(test_vectors)
sklearn.metrics.f1_score(test['label'], pred, average='binary')

We see that this classifier achieves a very high F1 score (0.870). <br/>
However can we trust this classifier? 


### 3-2. Explaining predictions using LIME

In [None]:
from sklearn.pipeline import make_pipeline
c = make_pipeline(vectorizer, rf)
print(c.predict_proba([test.iloc[0]['text']]))

In [None]:
from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names=class_names)

In [None]:
idx = 35
exp = explainer.explain_instance(test.iloc[idx]['text'], c.predict_proba, num_features=10)
print('Document id: %d' % idx)
print('Probability =', c.predict_proba([test.iloc[idx]['text']]))
print('True class: %s' % test.iloc[idx]['sentiment'])

In [None]:
exp.as_list()

### 3-3. Visualizing explanations

In [None]:
%matplotlib inline
fig = exp.as_pyplot_figure()

In [None]:
exp.show_in_notebook(text=True)