# Compare KLime and Shapley Reason Codes

The goal of this notebook is to compare the reason codes created from the two method provided by Driverless AI: 

* [Shapley](http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/interpreting.html#shapley-plot)
* [K-Lime](http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/interpreting.html#k-lime-and-lime-sup)

Shapley uses the final Driverless AI model to create reason codes whereas K-Lime uses surrogate GLM models.  This tends to make Shapley more accurate and K-Lime more interpretable.  The K-Lime surrogate models are trained on the original data (no feature engineering) which generally makes K-Lime reason codes more understandable.  

In this notebook, we will use the MLI scoring pipeline to compare the reason codes from K-Lime and Shapley.  



## Install Scoring Libraries

The steps below describe how to set up the Python environment to use the MLI scoring pipeline functions:

1. On the MLI page, click the Scoring Pipeline button.  
![mli-scoring-pipeline](images/scoring_pipeline_mli.png)
    
    
2. Unzip the scoring pipeline, and run the  bash script: `run_example_shapley.sh` in the **scoring-pipeline-mli** folder.   (This requires Linux x86_64 and Python 3.6.).  This script will set up the Python environment for you with all necessary libraries loaded.
    
    `$ bash run_example_shapley.sh`


3. Activate the Python environment

    `$ bash mli_dai_env/bin/activate`

## Define Scorers

* scorer = predictions and reason codes from Driverless model
* mli_scorer = predictions and reason codes from Surrogate models (like k-lime)

Follow `example_shapley.py` for examples.

In [18]:
import pandas as pd
import numpy as np
from scoring_h2oai_experiment_ticipucu import Scorer
from scoring_mli_experiment_vamedona import KLimeScorer

In [2]:
scorer = Scorer()
mli_scorer = KLimeScorer()

('2018-09-05 17:00:20,431', 'License manager initialized')
('2018-09-05 17:00:20,433', '-----------------------------------------------------------------')
('2018-09-05 17:00:20,434', 'Checking whether we have a valid license...')
('2018-09-05 17:00:20,435', 'No Cloud provider found')
('2018-09-05 17:00:20,436', 'Read License from file as it was invalidated from environment')
('2018-09-05 17:00:20,437', 'License file exists (/home/megan/.driverlessai/license.sig)')
('2018-09-05 17:00:20,449', '')
('2018-09-05 17:00:20,450', 'license_version:1')
('2018-09-05 17:00:20,451', 'serial_number:28')
('2018-09-05 17:00:20,452', 'licensee_organization:H2O.ai')
('2018-09-05 17:00:20,452', 'licensee_email:YOU@h2o.ai')
('2018-09-05 17:00:20,453', 'licensee_user_id:28')
('2018-09-05 17:00:20,454', 'is_h2o_internal_use:true')
('2018-09-05 17:00:20,454', 'created_by_email:YOU@h2o.ai')
('2018-09-05 17:00:20,455', 'creation_date:2018/06/25')
('2018-09-05 17:00:20,455', 'product:DriverlessAI')
('2018-09-

## Shapley Reason Codes

We can now get the Shapley reason codes on our training data.

In [3]:
import datatable as dt
df = dt.fread("/home/megan/data/CreditCard/CreditCard-train.csv")    

In [4]:
df

[315m   [m  [315mID[m  [315mLIMIT_BAL[m  [315mSEX[m  [315mEDUCATION[m  [315mMARRIAGE[m  [315mAGE[m  [315mPAY_0[m  [315mPAY_2[m  [315mPAY_3[m  [315mPAY_4[m[K
[38m---  --  ---------  ---  ---------  --------  ---  -----  -----  -----  -----[m[K
[38m 0 [m   1      20000    2          2         1   24     -2      2     -1     -1[K
[38m 1 [m   2     120000    2          2         2   26     -1      2      0      0[K
[38m 2 [m   3      90000    2          2         2   34      0      0      0      0[K
[38m 3 [m   4      50000    2          2         1   37      1      0      0      0[K
[38m 4 [m   5      50000    1          2         1   57      2      0     -1      0[K
[38m 5 [m   6      50000    1          1         2   37      3      0      0      0[K
[38m 6 [m   7     500000    1          1         2   29      4      0      0      0[K
[38m 7 [m   8     100000    2          2         2   23      5     -1     -1      0[K
[38m 8 [m   9    



In [5]:
shapley_reason_codes = scorer.score_batch(df, pred_contribs=True)

stype.int8


In [6]:
shapley_reason_codes.head()

Unnamed: 0,contrib_0_AGE,contrib_1_BILL_AMT1,contrib_2_BILL_AMT2,contrib_3_BILL_AMT3,contrib_4_BILL_AMT4,contrib_5_BILL_AMT5,contrib_6_BILL_AMT6,contrib_7_EDUCATION,contrib_8_LIMIT_BAL,contrib_9_MARRIAGE,...,contrib_16_PAY_AMT1,contrib_17_PAY_AMT2,contrib_18_PAY_AMT3,contrib_19_PAY_AMT4,contrib_20_PAY_AMT5,contrib_21_PAY_AMT6,contrib_22_NumToCatTE:PAY_0.0,contrib_23_InteractionSub:BILL_AMT1:PAY_AMT4,contrib_24_InteractionAdd:BILL_AMT3:PAY_0,contrib_bias
0,-0.003827,0.064326,-0.001409,-0.027206,-0.015489,0.030295,-0.021828,-0.004129,0.229688,0.035663,...,0.126735,0.088832,0.125612,0.081761,0.048825,0.056853,-0.026604,0.061895,-0.00965,-1.323698
1,-0.022784,0.081215,6.3e-05,-0.001357,-0.005111,-0.008282,-0.009543,-0.003818,-0.002181,-0.066623,...,0.135186,0.092605,-0.05589,-0.020878,0.043662,-0.000865,-0.025025,0.046718,0.00446,-1.323698
2,-0.012614,-0.122534,-0.032981,-0.0279,-0.010743,-0.014331,-0.010627,0.00165,0.070014,-0.069845,...,0.038845,0.09804,-0.053014,-0.022254,-0.005199,-0.028483,-0.146737,-0.041383,-0.020533,-1.323698
3,0.005847,-0.085735,-0.003807,0.012369,0.000348,-0.019194,-0.007608,-0.001876,0.173825,0.0427,...,0.031877,-0.021773,-0.053012,-0.025899,-0.00995,-0.022517,0.107479,-0.009747,0.023296,-1.323698
4,0.011418,-0.005286,-0.024563,0.007765,-0.011938,-0.028253,-0.014825,0.0015,0.08557,0.045891,...,0.046587,-0.395097,-0.062144,-0.057014,0.014921,0.034986,0.784991,-0.239,0.013613,-1.323698


Notice that the reason codes are calculated not on our original credit card dataset but the data used by the final Driverless AI model. Features like `NumToCatTE:PAY_0.0` were automatically created by Driverless AI.  If Driverless creates complex engineered features, the interpretability of Shapley can be limited.

For the customers who did end up defaulting, we examine which reason codes had the greatest effect.

In [7]:
default = df.topandas()["default payment next month"]
shapley_reason_codes["default payment next month"] = default

default_shapley = shapley_reason_codes[shapley_reason_codes["default payment next month"] == 1]
default_shapley = default_shapley.drop(["contrib_bias", "default payment next month"], axis = 1)

In [8]:
avg_shapley_contrib = default_shapley.mean().to_frame(name = "contribution")
avg_shapley_contrib.sort_values(by="contribution", ascending=False).head()

Unnamed: 0,contribution
contrib_22_NumToCatTE:PAY_0.0,0.239425
contrib_10_PAY_0,0.199009
contrib_11_PAY_2,0.100192
contrib_12_PAY_3,0.043169
contrib_14_PAY_5,0.041533


On average, the feature `NumToCatTE:PAY_0.0` provides the greatest increase in the likelihood of defaulting.

## K-Lime Reason Codes

We can now get the K-Lime reason codes on our training data.

In [9]:
klime_reason_codes = mli_scorer.score_reason_codes_batch(df.topandas())

In [10]:
klime_reason_codes.head()

Unnamed: 0,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,PAY_6,EDUCATION,MARRIAGE,SEX,LIMIT_BAL,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,Intercept
0,-0.06295,0.054943,-0.008576,-0.008411,0.008567,0.019271,0.003502,0.009021,-0.001831,-0.00391,...,-0.0,-0.0,-0.0,-0.0,-0.000243,-0.0,-0.0,-0.0,-0.0,0.249974
1,1.1e-05,0.054943,-0.008568,-0.001287,-0.009423,0.027893,0.003502,-0.007397,-0.001831,-0.023458,...,-0.000216,-6e-06,-9.3e-05,-0.0,-0.000353,-0.00015,-0.000171,-0.0,-0.000699,0.249974
2,-0.037757,-0.004891,-0.004432,-0.001618,-0.00226,-0.002766,0.003883,-0.006498,-0.001534,-0.020791,...,8.8e-05,0.000399,-0.001286,-0.000461,-0.000448,-3.5e-05,-3e-06,-0.000145,-0.001388,0.195449
3,0.075608,-0.004891,-0.004432,-0.001618,-0.00226,-0.002766,0.003883,0.008555,-0.001534,-0.01155,...,0.000174,0.000773,-0.002444,-0.000608,-0.000603,-4.2e-05,-3e-06,-0.000155,-0.000278,0.195449
4,0.42442,-0.004891,-4e-06,-0.001618,-0.00226,-0.002766,0.003883,0.008555,0.002451,-0.01155,...,0.000129,0.000511,-0.001582,-0.000608,-0.010959,-0.00035,-2.8e-05,-0.0001,-0.000188,0.195449


Notice that the reason codes are calculated only on the original credit card data.  K-Lime uses surrogate models to be able to attribute reason codes to the original data.  The surrogate models may suffer from lower accuracy.

In [11]:
klime_reason_codes["default payment next month"] = default

default_klime = klime_reason_codes[klime_reason_codes["default payment next month"] == 1]
default_klime = default_klime.drop(["Intercept", "default payment next month"], axis = 1)

In [12]:
avg_klime_contrib = default_klime.mean().to_frame(name = "contribution")
avg_klime_contrib.sort_values(by="contribution", ascending=False).head()

Unnamed: 0,contribution
PAY_0,0.022265
BILL_AMT3,0.006138
PAY_5,0.004055
PAY_6,0.003848
BILL_AMT2,0.00374


On average, the feature `PAY_0` provides the greatest increase in the likelihood of defaulting.  This is more interpretable than `NumToCatTE:PAY_0.0` but shows agreement between the two types of reason codes.  For both Shapley and K-Lime the feature `PAY_0` (or some derived feature from `PAY_0`) caused the greatest increase in default probability on average among customers who defaulted.

## Combine Reason Codes

We can combine both the Shapley reason codes and K-Lime reason codes to compare.

In [13]:
klime_reason_codes = klime_reason_codes.drop(["default payment next month"], axis=1)
klime_reason_codes.columns = map(lambda x: "klime_rc_" + x, klime_reason_codes.columns)
combined_rc = pd.merge(shapley_reason_codes.drop(["default payment next month"], axis = 1), klime_reason_codes, left_index=True, right_index=True)
combined_rc = pd.merge(df.topandas(), combined_rc, left_index = True, right_index = True)

In [19]:
combined_rc.head()

Unnamed: 0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,...,klime_rc_BILL_AMT4,klime_rc_BILL_AMT5,klime_rc_BILL_AMT6,klime_rc_PAY_AMT1,klime_rc_PAY_AMT2,klime_rc_PAY_AMT3,klime_rc_PAY_AMT4,klime_rc_PAY_AMT5,klime_rc_PAY_AMT6,klime_rc_Intercept
0,1,20000,2,2,1,24,-2,2,-1,-1,...,-0.0,-0.0,-0.0,-0.0,-0.000243,-0.0,-0.0,-0.0,-0.0,0.249974
0,1,20000,2,2,1,24,-2,2,-1,-1,...,-0.0,-9.526835e-07,-0.0,-0.0,-0.0,-0.0,-0.000102,-0.0,-0.000828,0.249974
0,1,20000,2,2,1,24,-2,2,-1,-1,...,-0.003359,-7.997258e-05,-0.001435,-0.002018,-0.00106,-0.000299,-0.0,-0.000486,-0.0,0.249974
0,1,20000,2,2,1,24,-2,2,-1,-1,...,-0.000105,-1.588019e-05,0.000472,-0.000739,-0.000707,-0.000238,-0.001695,-0.000112,-0.0,0.249974
0,1,20000,2,2,1,24,-2,2,-1,-1,...,-0.001106,-1.03178e-05,-0.0,-0.000908,-0.002105,-0.002502,-0.001102,-0.0,-0.00269,0.249974


We will examine a particular customer who did end up defaulting on their payment.

In [14]:
default_customer = combined_rc[combined_rc["default payment next month"] == 1].iloc[0]

In [15]:
print("Shapley Reason Codes")
cust_shapley = default_customer[filter(lambda x:'contrib' in x and x != "contrib_bias", default_customer.index)].to_frame(name = "Shapley")
cust_shapley["Abs_Shapley"] = cust_shapley["Shapley"].abs()
cust_shapley.sort_values(by = "Abs_Shapley", ascending = False).head()

Shapley Reason Codes


Unnamed: 0,Shapley,Abs_Shapley
contrib_11_PAY_2,0.473982,0.473982
contrib_10_PAY_0,-0.232951,0.232951
contrib_8_LIMIT_BAL,0.229688,0.229688
contrib_16_PAY_AMT1,0.126735,0.126735
contrib_18_PAY_AMT3,0.125612,0.125612


In [16]:
print("K-Lime Reason Codes")
cust_klime = default_customer[filter(lambda x:'klime' in x and x != "klime_rc_Intercept", default_customer.index)].to_frame(name = "K-Lime")
cust_klime["Abs_K-Lime"] = cust_klime["K-Lime"].abs()
cust_klime.sort_values(by = "Abs_K-Lime", ascending = False).head()

K-Lime Reason Codes


Unnamed: 0,K-Lime,Abs_K-Lime
klime_rc_PAY_0,-0.06295,0.06295
klime_rc_PAY_2,0.054943,0.054943
klime_rc_PAY_6,0.019271,0.019271
klime_rc_MARRIAGE,0.009021,0.009021
klime_rc_PAY_3,-0.008576,0.008576


For this customer, their behavior 2 months ago (`PAY_2`) increased the likelihood of them defaulting (we can see this for both the Shapley and K-Lime reason codes).  Their more recent behavior (`PAY_0`), however, decreases their likelihood of defaulting.

When we examine these columns, we can see that their pay status 2 months ago (`PAY_2`) showed missed payments.  For their most recent pay status (`PAY_0`), they are up to date on payments.  

In [17]:
default_customer[["PAY_0", "PAY_2"]]

PAY_0   -2.0
PAY_2    2.0
Name: 0, dtype: float64