# Mixed-Integer Linear Programming (MILP) for Local Interpretable Model-agnostic Explanations (LIME)

This study aims to formulate (and test) Local Interpretable Model-agnostic Explanations (LIME) using Mixed-Integer Linear Programming (MILP).

- Lucas Emanuel Resck Domingues
- Professor: Luciano Guimarães
- FGV-EMAp

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Setup" data-toc-modified-id="Setup-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Setup</a></span></li><li><span><a href="#Model" data-toc-modified-id="Model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Model</a></span><ul class="toc-item"><li><span><a href="#Data" data-toc-modified-id="Data-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Data</a></span></li><li><span><a href="#Classifier" data-toc-modified-id="Classifier-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Classifier</a></span></li></ul></li><li><span><a href="#Optimization" data-toc-modified-id="Optimization-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Optimization</a></span><ul class="toc-item"><li><span><a href="#LIME" data-toc-modified-id="LIME-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>LIME</a></span></li><li><span><a href="#Calculation-of-parameters" data-toc-modified-id="Calculation-of-parameters-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Calculation of parameters</a></span></li><li><span><a href="#Linear-optimization" data-toc-modified-id="Linear-optimization-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Linear optimization</a></span></li><li><span><a href="#Examples" data-toc-modified-id="Examples-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Examples</a></span></li></ul></li><li><span><a href="#References" data-toc-modified-id="References-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>References</a></span></li></ul></div>

## Introduction

The Optimization field is today one of the most important fields of Applied Mathematics.
Not only the theory and the research are very solid, but also the applications are everywhere, from Engineering to Management. In general, problems of this kind try to deal with the maximization of minimization of a function given some conditions to the variables.

The main problem of Machine Learning is to search for a function that represents the observed phenomenon Many machine learning problems can be understood as optimization problems. 

- Machine learning
- Machine learning com otimização
- O problema da caixa preta

- Modelos de interpretação
- LIME
- LIME como otimização
- Não é linear, mas dá pra tentar

## Setup

Only Python and Jupyter stuff. You can jump over this section.

In [156]:
from IPython.display import HTML
from pulp import LpVariable, LpProblem, value, LpStatus, LpMinimize
from sklearn.calibration import CalibratedClassifierCV
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import numpy as np
import pandas as pd
import string

In [157]:
%config Completer.use_jedi = False

## Model

### Data

In [158]:
df = pd.read_csv('../data/IMDB Dataset.csv')
df

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
49995,I thought this movie did a down right good job...,positive
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


In [159]:
def preprocessing(text):
    '''Preprocess text.'''
    return text.replace('<br />', '')

In [160]:
df.review = df.review.apply(preprocessing)

In [161]:
X = df.review.to_list()
y = df.sentiment.to_list()

In [162]:
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

### Classifier

In [163]:
tf_idf = TfidfVectorizer(
    strip_accents=None,
    lowercase=True,
    smooth_idf=True,
)
X_train = tf_idf.fit_transform(X_train)
X_train.shape

(40000, 94342)

In [164]:
t_svd = TruncatedSVD(n_components=50, random_state=42)
X_train = t_svd.fit_transform(X_train)
X_train.shape

(40000, 50)

In [165]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

In [168]:
estimator = LinearSVC(
    dual=False,  # n_samples > n_features
    verbose=1,
    random_state=42
)

svm = GridSearchCV(
    estimator,
    param_grid={'C': np.logspace(-2, 10, 13)},
    cv=5,
    n_jobs=-1,
    verbose=3
)

In [169]:
%%time
svm.fit(X_train, y_train)

Fitting 5 folds for each of 13 candidates, totalling 65 fits
[LibLinear]CPU times: user 644 ms, sys: 152 ms, total: 796 ms
Wall time: 5.65 s


GridSearchCV(cv=5, estimator=LinearSVC(dual=False, random_state=42, verbose=1),
             n_jobs=-1,
             param_grid={'C': array([1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05,
       1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10])},
             verbose=3)

In [170]:
%%time
svm = CalibratedClassifierCV(svm.best_estimator_, cv=5)
svm.fit(X_train, y_train)

[LibLinear][LibLinear][LibLinear][LibLinear][LibLinear]CPU times: user 3.12 s, sys: 2.35 s, total: 5.47 s
Wall time: 1.19 s


CalibratedClassifierCV(base_estimator=LinearSVC(C=0.1, dual=False,
                                                random_state=42, verbose=1),
                       cv=5)

In [171]:
vector = Pipeline([
    ('tf_idf', tf_idf),
    ('t_svd', t_svd)
])

In [172]:
model = Pipeline([
    ('tf_idf', tf_idf),
    ('t_svd', t_svd),
    ('scaler', scaler),
    ('svm', svm)
])

In [173]:
model.score(X_test, y_test)

0.8427

## Optimization

### LIME

### Calculation of parameters

In [174]:
def f(text):
    '''Probability of a text'''
    return model.predict_proba([text])[0]

In [234]:
def pi(x, z, sigma_2=-1/np.log(0.1)):
    '''Weights of locallity.'''
    x = vector.transform([x])[0]
    z = vector.transform([z])[0]
    # If null vector
    if np.abs(z).sum() == 0:
        return 0
    # Cosine of angle between vectors
    cos = np.dot(x, z)/(np.linalg.norm(x)*np.linalg.norm(z))
    # If cosine is like 1.00008
    if cos > 1:
        cos = 1
    # Angle between vectors, normalized to between 0 and 1
    D = np.arccos(cos)*2/np.pi
    return np.exp(-D**2/sigma_2)

In [235]:
def parameters(split, which_class, M, N, K):
    '''Return the parameters for LIME optimization.'''
    # Perturbations
    z_line = []
    # Probabilities
    f_z = []
    # Weights
    pi_x = []
    
    for i in range(N):
        # Choose a random number of words to remove, between 1 and (split - 1)
        n = np.random.choice(range(1, M))
        # Remove n random words
        indices = np.random.choice(range(M), size=n, replace=False)
        
        # The pertubartion
        perturbation = np.ones(M)
        for index in indices:
            perturbation[index] = 0
        z_line.append(perturbation)
            
        # The probability and weight
        text = ' '.join([word for (j, word) in enumerate(split) if perturbation[j]])
        f_z.append(f(text)[which_class])
        pi_x.append(pi(example, text))
    
    return z_line, f_z, pi_x

### Linear optimization

In [260]:
def optimization(z_line, f_z, pi_x, M, N, K, lambda_=0.5):
    '''The MILP for LIME.'''
    prob = LpProblem("LIME", LpMinimize)
    
    # Variables
    L = LpVariable('L')
    Omega = LpVariable('Omega')
    epsilon = [LpVariable('epsilon_{}'.format(i)) for i in range(N)]
    delta = [LpVariable('delta_{}'.format(i)) for i in range(M)]
    g = [LpVariable("g(z'_{})".format(i)) for i in range(N)]
    x = [LpVariable('x_{}'.format(j)) for j in range(M)]
#     y = [LpVariable('y_{}'.format(j), 0, 1, cat='Integer') for j in range(M)]
    
    # Objective
    prob += L + lambda_*Omega
    
    # Constraints
    prob += L == sum([pi_x[i]*epsilon[i] for i in range(N)])
    prob += Omega == sum(delta)

    for i in range(N):
        prob += -epsilon[i] <= f_z[i] - g[i]
        prob += epsilon[i] >= f_z[i] - g[i]
        prob += g[i] == sum([z_line[i][j]*x[j] for j in range(M)])

    infinity = 100000
    for j in range(M):
        prob += -delta[j] <= x[j]
        prob += delta[j] >= x[j]
        
#         prob += -infinity*y[j] <= x[j]
#         prob += infinity*y[j] >= x[j]

#     prob += sum(y) <= K

    print('Solving MILP...')
    status = prob.solve()
    print('Done.')
    
    return prob, status, x

In [261]:
def visualize(split, importances):
    '''Visualize the importance of each word in the classification.'''
    max_abs_importance = np.max(np.abs(importances))
    # Green
    positive = np.array([0, 255, 0])
    white = np.array([255, 255, 255])
    # Red
    negative = np.array([255, 0, 0])
    spans = []
    for i, word in enumerate(split):
        if importances[i] >= 0:
            color = white + (positive - white)/max_abs_importance*importances[i]
        else:
            color = white + (negative - white)/((-1)*max_abs_importance)*importances[i]
        spans.append(
            '<span style="background-color: RGB({R}, {G}, {B})">{word}</span>'.format(
                word=word,
                R=color[0],
                G=color[1],
                B=color[2]
            )
        )
            
    html = ' '.join(spans)
    return HTML(html)

In [262]:
def lime(text, which_class, N=None, K=None):
    # Split 
    split = text.split()
    M = len(split)
    if N is None:
        N = 5*M
#     if K is None:
#         K = min([M, 20])
                
    z_line, f_z, pi_x = parameters(split, which_class, M, N, K)
        
    prob, status, x = optimization(z_line, f_z, pi_x, M, N, K)
    
    importances = [value(i) for i in x]    
    print(dict(zip(split, importances)))    
    
    return visualize(split, importances)

### Examples

In [263]:
example = 'This movie is awful, I regret seing it, it is a bad movie.'
example

'This movie is awful, I regret seing it, it is a bad movie.'

In [264]:
model.predict([example])

array(['negative'], dtype='<U8')

In [265]:
model.predict_proba([example])

array([[0.99280004, 0.00719996]])

In [268]:
lime(example, 0)

Solving MILP...
Done.
{'This': 0.0, 'movie': 0.013286572, 'is': 0.10326589, 'awful,': 0.14443269, 'I': 0.039677759, 'regret': 0.037821996, 'seing': 0.1615384, 'it,': 0.044888046, 'it': 0.079345289, 'a': 0.05742599, 'bad': 0.37240251, 'movie.': 0.12055832}


## References

- https://arxiv.org/pdf/1602.04938.pdf
- https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
- https://vanderbei.princeton.edu/tex/talks/MOPTA14/L1_reg.pdf