# Understand predictions with local constrastive explanations

## Imports and configuration

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import warnings
warnings.simplefilter('ignore')

In [18]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from joblib import load
from alibi.utils.data import gen_category_map
from alibi.explainers import AnchorTabular, CounterFactualProto

## Dataset and classifier

### Load dataset

In [4]:
data = pd.read_feather("data/heloc/heloc_preprocessed.feather")
data.head(10).transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
RiskPerformance,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0
ExternalRiskEstimate,55.0,61.0,67.0,66.0,81.0,59.0,54.0,68.0,59.0,61.0
MSinceOldestTradeOpen,144.0,58.0,66.0,169.0,333.0,137.0,88.0,148.0,324.0,79.0
MSinceMostRecentTradeOpen,4.0,15.0,5.0,1.0,27.0,11.0,7.0,7.0,2.0,4.0
AverageMInFile,84.0,41.0,24.0,73.0,132.0,78.0,37.0,65.0,138.0,36.0
NumSatisfactoryTrades,20.0,2.0,9.0,28.0,12.0,31.0,25.0,17.0,24.0,19.0
NumTrades60Ever2DerogPubRec,3.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
NumTrades90Ever2DerogPubRec,0.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
PercentTradesNeverDelq,83.0,100.0,100.0,93.0,100.0,91.0,92.0,83.0,85.0,95.0
MSinceMostRecentDelq,2.0,15.0,15.0,76.0,15.0,1.0,9.0,31.0,5.0,5.0


In [5]:
X = data.drop(columns=["RiskPerformance"])
y = data.RiskPerformance

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=47)
print(X_train.shape)
print(X_test.shape)

(8883, 35)
(988, 35)


In [6]:
pd.concat([y_test, X_test], axis=1).head(10).transpose()

Unnamed: 0,9622,4927,3011,7045,5895,3993,9550,5376,9868,2622
RiskPerformance,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0
ExternalRiskEstimate,80.0,78.0,78.0,81.0,66.0,75.0,58.0,63.0,74.0,86.0
MSinceOldestTradeOpen,330.0,208.0,175.0,156.0,161.0,156.0,123.0,164.0,129.0,149.0
MSinceMostRecentTradeOpen,28.0,13.0,6.0,13.0,7.0,4.0,3.0,12.0,6.0,3.0
AverageMInFile,134.0,112.0,75.0,71.0,77.0,64.0,51.0,51.0,64.0,68.0
NumSatisfactoryTrades,11.0,16.0,15.0,20.0,9.0,23.0,22.0,10.0,18.0,31.0
NumTrades60Ever2DerogPubRec,0.0,0.0,0.0,0.0,4.0,0.0,1.0,1.0,1.0,0.0
NumTrades90Ever2DerogPubRec,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,1.0,0.0
PercentTradesNeverDelq,100.0,94.0,100.0,100.0,90.0,100.0,83.0,82.0,100.0,100.0
MSinceMostRecentDelq,15.0,3.0,15.0,15.0,79.0,15.0,2.0,24.0,15.0,15.0


### Load pre-trained LGB classifier

In [7]:
lgb = load("models/heloc_lgb_skl.joblib")
lgb

LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               feature_fraction=0.7, importance_type='split', learning_rate=0.1,
               max_depth=5, min_child_samples=20, min_child_weight=0.001,
               min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31,
               objective='binary', random_state=None, reg_alpha=0.0,
               reg_lambda=0.0, silent=True, subsample=1.0,
               subsample_for_bin=200000, subsample_freq=0)

In [8]:
roc_auc_score(y_test, lgb.predict_proba(X_test)[:, 1])

0.8217571712069277

In [9]:
lgb.predict(X_test[:10])

array([0, 0, 0, 0, 1, 1, 1, 1, 0, 0])

## Contrastive explanations using the Alibi package

**Intuition:** 
- Rejected home owners want to understand why they did not qualify for a line of credit
- Natural question: what changes in the application would lead to approval?
- Qualified home owners might also want to know what led to their approval

Research shows that humans prefer contrastive explanations, i.e., explanations that are made in response to counterfactual cases (often called foils) ([Miller, 2019](https://arxiv.org/abs/1706.07269)).

Packages implementing contrastive explanations:
- AIX360: implements [Contrastive Explanation Method (CEM)](https://arxiv.org/abs/1802.07623), which is limited to differentiable classifiers with inputs normalized to (-0.5, 0.5) range
- Alibi: implements [Contrastive Explanation Method (CEM)](https://arxiv.org/abs/1802.07623), [Counterfactual explanations](https://arxiv.org/abs/1711.00399) and [Anchor explanations](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16982)

For our examples we will use the Alibi package as it implements more algorithms and is more adaptable.

### Anchor explanations

Anchors are actually the opposite of counterfactuals. Thus, they answer the question: which features are sufficient to anchor a prediction, i.e., changing the other features cannot change the prediction?

**Description:** "The algorithm provides model-agnostic (black box) and human interpretable explanations suitable for classification models applied to images, text and tabular data. The idea behind anchors is to explain the behaviour of complex models with high-precision rules called anchors. These anchors are locally sufficient conditions to ensure a certain prediction with a high degree of confidence. [...] Anchors address a key shortcoming of local explanation methods like LIME which proxy the local behaviour of the model in a linear way. It is however unclear to what extent the explanation holds up in the region around the instance to be explained, since both the model and data can exhibit non-linear behaviour in the neighborhood of the instance. This approach can easily lead to overconfidence in the explanation and misleading conclusions on unseen but similar instances. The anchor algorithm tackles this issue by incorporating coverage, the region where the explanation applies, into the optimization problem. [...] As highlighted by the above example, an anchor explanation consists of if-then rules, called the anchors, which sufficiently guarantee the explanation locally and try to maximize the area for which the explanation holds. This means that as long as the anchor holds, the prediction should remain the same regardless of the values of the features not present in the anchor." ([Source](https://docs.seldon.io/projects/alibi/en/latest/methods/Anchors.html))

In [10]:
idx = 4

x = X_test.values[idx].reshape(1,-1)
x.shape

(1, 35)

In [11]:
category_map = gen_category_map(data)
category_map

{}

In [12]:
predict_fn = lambda x: lgb.predict(x)
anch_exp = AnchorTabular(predict_fn, X_train.columns, categorical_names=category_map)

In [13]:
anch_exp.fit(X_train.values, disc_perc=[25, 50, 75])

AnchorTabular(meta={
    'name': 'AnchorTabular',
    'type': ['blackbox'],
    'explanations': ['local'],
    'params': {'seed': None, 'disc_perc': [25, 50, 75]}
})

In [14]:
class_names = ['Good risk performance', 'Bad risk performance']
pred = class_names[predict_fn(x)[0]]
exp = anch_exp.explain(x)

In [15]:
def print_explanation(exp, pred):
    print('Prediction: %s' % pred)
    print('Anchor: %s' % (' AND '.join(exp.anchor)))
    print('Precision: %.2f' % exp.precision)
    print('Coverage: %.2f' % exp.coverage)

In [16]:
print_explanation(exp, pred)

Prediction: Bad risk performance
Anchor: ExternalRiskEstimate <= 72.00 AND NetFractionRevolvingBurden > 56.00 AND NumSatisfactoryTrades <= 13.00
Precision: 0.99
Coverage: 0.53


The threshold is set to 95% by default, meaning that predictions on observations where the anchor holds will be the same as the prediction on the explained instance at least 95% of the time.
Precision implies the exact fraction of times the sampled instances where the anchor holds yields the same prediction as the original instance, i.e., precision >= threshold for a valid anchor.
Coverage means the coverage of the anchor over a sampled part of the training set, i.e., the probability that it applies to samples from the training data.

In [17]:
idx = 0

x = X_test.values[idx].reshape(1,-1)
pred = class_names[predict_fn(x)[0]]
exp = anch_exp.explain(x)
print_explanation(exp, pred)

Prediction: Good risk performance
Anchor: ExternalRiskEstimate > 72.00 AND MSinceMostRecentInqexcl7days > 1.00 AND PercentTradesNeverDelq > 97.00 AND AverageMInFile > 76.00
Precision: 0.98
Coverage: 0.47


### Counterfactual Explanations

**Description:**

"We can reason that the most basic requirements for a counterfactual 𝑋′ are as follows:
- The predicted class of 𝑋′ is different from the predicted class of 𝑋
- The difference between 𝑋 and 𝑋′ should be human-interpretable."
([Source](https://docs.seldon.io/projects/alibi/en/stable/methods/CF.html))

Here, we will test the method proposed in [this paper](https://arxiv.org/abs/1907.02584), as described by the authors of the Alibi package [here](https://docs.seldon.io/projects/alibi/en/latest/methods/CFProto.html).

In [23]:
cat_vars = {}
for i in range(len(X_test.columns)-12, len(X_test.columns)):
    cat_vars[i] = 2
cat_vars

{23: 2,
 24: 2,
 25: 2,
 26: 2,
 27: 2,
 28: 2,
 29: 2,
 30: 2,
 31: 2,
 32: 2,
 33: 2,
 34: 2}

In [29]:
predict_proba_fn = lambda x: lgb.predict_proba(x)

In [50]:
cf = CounterFactualProto(
    predict_proba_fn,
    x.shape,
    feature_range=((X_train.min().values.reshape((1,-1)), X_train.max().values.reshape((1,-1)))),
    cat_vars=cat_vars,
    ohe=False,
)

In [51]:
cf.fit(X_train.values)

CounterFactualProto(meta={
    'name': 'CounterFactualProto',
    'type': ['blackbox', 'tensorflow', 'keras'],
    'explanations': ['local'],
    'params': {
        'kappa': 0.0,
        'beta': 0.1,
        'feature_range': (
            array([[33.,  2.,  0.,  4.,  0.,  0.,  0.,  0.,  0.,  0.,  2.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]),
            array([[ 94., 803., 383., 383.,  79.,  19.,  19., 100.,  83.,   9.,   8.,
        104.,  19., 100.,  24.,  66.,  66., 232., 471.,  32.,  23.,  18.,
        100.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,
          1.,   1.]])
        ),
        'gamma': 0.0,
        'theta': 0.0,
        'cat_vars': {
            23: 2,
            24: 2,
            25: 2,
            26: 2,
            27: 2,
            28: 2,
            29: 2,
            30: 2,
            31: 2,
            32: 2,
            33: 2,
            34: 

In [61]:
idx = 4
x = X_test.values[idx].reshape(1,-1)
exp = cf.explain(x)
exp

No counterfactual found!


Explanation(meta={
    'name': 'CounterFactualProto',
    'type': ['blackbox', 'tensorflow', 'keras'],
    'explanations': ['local'],
    'params': {
        'kappa': 0.0,
        'beta': 0.1,
        'feature_range': (
            array([[33.,  2.,  0.,  4.,  0.,  0.,  0.,  0.,  0.,  0.,  2.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]),
            array([[ 94., 803., 383., 383.,  79.,  19.,  19., 100.,  83.,   9.,   8.,
        104.,  19., 100.,  24.,  66.,  66., 232., 471.,  32.,  23.,  18.,
        100.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,
          1.,   1.]])
        ),
        'gamma': 0.0,
        'theta': 0.0,
        'cat_vars': {
            23: 2,
            24: 2,
            25: 2,
            26: 2,
            27: 2,
            28: 2,
            29: 2,
            30: 2,
            31: 2,
            32: 2,
            33: 2,
            34: 2
      

### Contrastive Explanation Method

**Description:** "CEM generates instance based local black box explanations for classification models in terms of Pertinent Positives (PP) and Pertinent Negatives (PN). For a PP, the method finds the features that should be minimally and sufficiently present (e.g. important pixels in an image) to predict the same class as on the original instance. PN’s on the other hand identify what features should be minimally and necessarily absent from the instance to be explained in order to maintain the original prediction class. The aim of PN’s is not to provide a full set of characteristics that should be absent in the explained instance, but to provide a minimal set that differentiates it from the closest different class. Intuitively, the Pertinent Positives could be compared to Anchors while Pertinent Negatives are similar to Counterfactuals. [...] The current implementation is most suitable for images and tabular data without categorical features. In order to create interpretable PP’s and PN’s, feature-wise perturbation needs to be done in a meaningful way." ([Source](https://docs.seldon.io/projects/alibi/en/latest/methods/CEM.html))

## Conclusion

**Results of experimentation:**
- Counterfactual and other constrastive explanation methods seem to be rather finicky to use, at least as of now. Thus, I will cover these methods in a separate notebook.
- However, anchors seem to be working for a variety of models and support numerical as well as categorical variables. Thus, they can be useful tools for interpreting black-box models.