# LIME

LIME is 'locally interpetable model-agnostic explanations'.

The paper is at http://arxiv.org/pdf/1602.04938v1.pdf , by Ribeiro, Singh, and Guestrin.  Ribeiro has a blog post about it at https://homes.cs.washington.edu/~marcotcr/blog/lime/ . There is code provided by Ribeiro at https://github.com/marcotcr/lime

There is thus already ample documentation and code about LIME, and this repo is for self-study purposes primarily, and likely wont introduce anything much new to the world, for now :-)

## What LIME does

LIME does the following:
- creates interpretable features, which for sparse nlp models means, they draw the first few features from a LARS path, and use those.  My own self-study notebook for LARS: https://github.com/hughperkins/selfstudy-LARS/blob/master/test_lars.ipynb
- samples from interpretable feature space, near an example we wish to explain
- uses local gradients, from near the target example, to explain which interpretable features most affect decisions around that example

## LIME Experiments

### Train and test distributions differ

- train on `news20`, for atheist vs christianity
- test against new [religion](https://github.com/marcotcr/lime-experiments/blob/master/religion_dataset.tar.gz) dataset, created from websites from from [DMOZ](https://github.com/marcotcr/lime-experiments/blob/master/religion_dataset.tar.gz) directory
  - these data points have similar classes to the news20 training sets, ie atheism vs christianity.  However, the features are fairly different, and eg learning the names of prolific atheist posters in news20 wont generalize to the DMOZ websites.
- the idea is to examine to what extent the LIME explanations (or any other explanations for that matter) can facilitate rmeoving 'junk' features, after/during training, and thus improving the score on the DMOZ-derived dataset

Let's start by downloading the datasets, and training a simple linear model:

In [22]:
from sklearn.datasets import fetch_20newsgroups
import tarfile
import sklearn.datasets
import random
from collections import defaultdict
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC
import argparse
import numpy as np
import shutil
import os
from os import path
from os.path import join
import urllib.request
import hashlib


global_categories = ['atheism', 'religion']
news_categories = ['alt.atheism', 'soc.religion.christian']

religion_url = 'https://github.com/marcotcr/lime-experiments/blob/master/religion_dataset.tar.gz?raw=true'


def get_md5(filepath):
    with open(filepath, 'rb') as f:
        dat = f.read()
    return hashlib.md5(dat).hexdigest()


def download_religion():
    file_exists = False
    home_dir = os.environ['HOME']
    limedata_dir = join(home_dir, 'limedata')
    if not path.isdir(limedata_dir):
        os.makedirs(limedata_dir)
    religion_filepath = join(limedata_dir, 'religion_dataset.tar.gz')
    if path.isfile(religion_filepath):
        md5sum = get_md5(religion_filepath)
        if md5sum == '0f12beb283869a09584493ddf93672b6':
            file_exists = True
    if not file_exists:
        print('downloading religion dataset...')
        with urllib.request.urlopen(religion_url) as response, open(religion_filepath, 'wb') as out_file:
            shutil.copyfileobj(response, out_file)
            print(get_md5(religion_filepath))
            print('... downloaded religion dataset')
    return religion_filepath


def fetch_religion():
    tar_filepath = download_religion()
    # res = sklearn.datasets.base.Bunch()
    examples = []
    print('loading religion dataset to memory...')
    tar = tarfile.open(tar_filepath)
    # print(tar.getmembers())
    class_name_by_id = ['atheism', 'christianity']
    class_id_by_name = {name: id for id, name in enumerate(class_name_by_id)}
    print('class_id_by_name', class_id_by_name)
    N_per_class = 819
    y = np.zeros((N_per_class * 2), dtype=np.int64)
    n = 0
    count_per_class = defaultdict(int)
    for m in tar.getmembers():
        # print(m)
        # print(dir(m))
        # print(m.name, m.path, m.type)
        if '/' in m.path:
            class_name = m.path.split('/')[0]
            class_id = class_id_by_name[class_name]
            if count_per_class[class_id] >= N_per_class:
                continue
            # if m.path not in ['README.txt', 'atheism', 'christianity']:
            f = tar.extractfile(m)
            try:
                content = f.read()
                content = content.decode('utf-8')
            except:
                # raise Exception('failed for [%s]' % content)
                print('failed to decode to utf-8 => skipping 1 doc')
                continue
            finally:
                f.close()
            examples.append(content)
            y[n] = class_id
            count_per_class[class_id] += 1
            n += 1
    tar.close()
    print('... religion dataset loaded')
    return sklearn.datasets.base.Bunch(data=examples, target=y)


class Model(object):
    def __init__(self, trainer):
        self.trainer = trainer
        trainers = {
            'nb': MultinomialNB(),
            'sgd': SGDClassifier(loss='hinge', penalty='l2',
                                 alpha=1e-3, n_iter=5, random_state=123),
            'rbf': SVC(C=1000000, kernel='rbf')
        }
        self.model = trainers[trainer]
        print('trainer: %s' % trainer)

    def train(self):
        self.twenty_train = fetch_20newsgroups(subset='train', categories=news_categories, shuffle=True, random_state=123)
        self.count_vect = CountVectorizer()
        self.X_train_counts = self.count_vect.fit_transform(self.twenty_train.data)

        self.tfidf_transformer = TfidfTransformer()
        self.X_train_tfidf = self.tfidf_transformer.fit_transform(self.X_train_counts)

        # model = MultinomialNB()
        self.model.fit(self.X_train_tfidf, self.twenty_train.target)
        train_pred = self.model.predict(self.X_train_tfidf)
        train_num_right = np.equal(train_pred, self.twenty_train.target).sum()
        print('train', train_num_right, train_num_right / len(self.twenty_train.target) * 100)
        # return model

    def test(self):
        self.twenty_test = fetch_20newsgroups(subset='test', categories=news_categories, shuffle=True, random_state=123)
        X_test_counts = self.count_vect.transform(self.twenty_test.data)

        X_test_tfidf = self.tfidf_transformer.transform(X_test_counts)
        test_pred = self.model.predict(X_test_tfidf)
        test_num_right = np.equal(test_pred, self.twenty_test.target).sum()
        print('test', test_num_right, test_num_right / len(self.twenty_test.target) * 100)

        # now try religion dataset, from https://github.com/marcotcr/lime-experiments/blob/master/religion_dataset.tar.gz
        religion_test = fetch_religion()
        religion_X_test_counts = self.count_vect.transform(religion_test.data)
        religion_X_test_tfidf = self.tfidf_transformer.transform(religion_X_test_counts)
        religion_test_pred = self.model.predict(religion_X_test_tfidf)
        religion_test_num_right = np.equal(religion_test_pred, religion_test.target).sum()
        print('religion test', religion_test_num_right, religion_test_num_right / len(religion_test.target) * 100)

model = Model('sgd')
model.train()
model.test()


trainer: sgd
train 1078 99.9073215941
test 670 93.4449093445
loading religion dataset to memory...
class_id_by_name {'christianity': 1, 'atheism': 0}
failed to decode to utf-8 => skipping 1 doc
... religion dataset loaded
religion test 884 53.9682539683


LIME trains a model $\xi$, drawn from a class $G$ of interpretable models.  Where interpretable models for LIME means simple-ish linear models, such as linear models, decision trees, or falling rule lists.  $\xi$ is the solution to:

$$\xi(x) = \mathrm{argmin}_{g \in G} \left( \mathcal{L}(f, g,\Pi_x)+\Omega(g) \right)$$

Where:
- $G$ is class of interpretable models
- $f$ is the function learned by the network we wish to interpret
- $\Pi_x(z)$ is a measure of proximity of $z$ to $x$
- $\mathcal{L}$ is a measure of how unfaithful $g$ is in representing $f$ in the locality defined by $\Pi(x)$
- $\Omega(\cdot)$ is a measure of complexity

This is a general formulation. For LIME, we add additional constraints and assumptions
- $G$ is taken to be the class of linear models, and in particular: $g(z') = w_g \cdot z'$, where $w_g$ are parameters to be learned
- $\Pi(x)$ is defined as: $\exp \left( \frac{ -D(x, z)^2 } {\sigma^2} \right)$, so it's something like a radial basis function, and is close to one near $x$, then falls off with distance
- $\mathcal{L}$ is the square loss, weighted by locality:

$$\mathcal{L}(f, g, \Pi_x) = \sum_{z, z', \mathcal{Z}} \Pi_x(z) \left( f(z) - g(z') \right)^2 $$

The locally interpretable features are binary, $\mathbf{x}' \in \{0,1\}^{d'}$. ~~for nlp, LIME uses LARS to obtain the $K$ most important features/words from the model.  I think.  I think these interpretable features are global.  Again, I'm not entirely 100% sure on this point currently :-)~~

For nlp, I think that the interpretable features are a bag of unigrams.  The unigrams includes all the entire vocabulary, I think.  Samples are drawn from this (presumably by perturbing the original example-to-be-explained slightly), then LARS path is run against these samples, to obtain the top $K$ explainers, I think.

So, first we should draw samples.

In [None]:
from sklearn import linear_model

N = 15000   # number of samples, from section 5.1 of the paper
K = 10  # I *think* the paper uses K as 10
rho = 25   # from https://github.com/marcotcr/lime-experiments/blob/master/generate_data_for_compare_classifiers.py#L62
# distance is calculated as per https://github.com/marcotcr/lime-experiments/blob/master/explainers.py#L115:
"""
distance_fn = lambda x : sklearn.metrics.pairwise.cosine_distances(x[0],x)[0] * 100
"""

def foo():
    from sklearn import datasets
    diabetes = datasets.load_diabetes()
    X = diabetes.data
    print(type(X), X.shape)
    y = diabetes.target
    print(type(y), y.shape)
    alphas, _, coefs = linear_model.lars_path(X, y, method='lasso', verbose=True)
foo()
print('trained foo')

print(type(model.X_train_tfidf))
print(type(model.twenty_train.target))
lasso_model = linear_model.LassoLars(alpha=0.001, max_iter=K)
lasso_model.fit(model.X_train_tfidf.toarray(), model.twenty_train.target)
# alphas, _, coefs = linear_model.lars_path(
#     model.X_train_tfidf, model.twenty_train.target, method='lasso', verbose=True)
print(lasso_model.coef_)
print(lasso_model.active_)b
# for j in lasso_model.active_:
# print(model.count_vect.get_feature_names(lasso_model.active_))
names = model.count_vect.get_feature_names()
print(len(names.keys()))

In [None]:
# print(names[:5000])
print(len(names))
# print(names[lasso_model.active_])
for j in lasso_model.active_:
    print(names[j])