# Case study

In the case study we demonstrate how to generate counterfactual explanations by using our library on Statlog (German Credit Data) Data Set from UCI ML repository.  

In [1]:
from tensorflow import keras
import numpy as np
import pandas as pd
import warnings
from modules.CFEC.cfec.visualization import show
warnings.filterwarnings('ignore', category=FutureWarning)
np.random.seed(44)

c:\Users\ignacy\.conda\envs\cfec\lib\site-packages\numpy\.libs\libopenblas.4SP5SUA7CBGXUEOC35YP2ASOICYYEQZZ.gfortran-win_amd64.dll
c:\Users\ignacy\.conda\envs\cfec\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
  stacklevel=1)


### Data and model loading

Statlog (German Credit Data) was gathered from UCI ML repository and consists of 20 features and 61 columns (most of them are represented by one-hot encoding). The dataset is wrapped in GermanData class, for easier use. 

Additionally, we load a pretrained keras model (simple logistic regression with 2 outputs) which will be used for prediction making.

In [2]:
from modules.CFEC.data import GermanData

german_data = GermanData('modules/CFEC/data/datasets/input_german.csv', 'modules/CFEC/data/datasets/labels_german.csv')
model = keras.models.load_model('modules/CFEC/models/model_german')

In [3]:
german_data.input.sample(5)

Unnamed: 0,duration,credit,installment_percent,residence_duration,age,existing_credits,people_maintained,account_status_0..200 DM,account_status_< 0 DM,account_status_>= 200 DM,...,job_unskilled - resident,phone_none,"phone_yes, registered under the customers name",foreign_no,foreign_yes,employment_1..4 years,employment_4..7 years,employment_< 1 year,employment_>= 7 years,employment_unemployed
99.0,20.0,7057.0,3.0,4.0,36.0,2.0,2.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
488.0,10.0,1418.0,3.0,2.0,35.0,1.0,1.0,0.0,0.0,0.0,...,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
206.0,12.0,1935.0,4.0,4.0,43.0,3.0,1.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
939.0,24.0,6842.0,2.0,4.0,46.0,2.0,2.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
729.0,24.0,1275.0,2.0,4.0,36.0,2.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0


### Test sample selection

We select one instance, for which we want to calculate the counterfactual. 

In [4]:
X_test = german_data.input.iloc[0]
X_test

duration                    6.0
credit                   1169.0
installment_percent         4.0
residence_duration          4.0
age                        67.0
                          ...  
employment_1..4 years       0.0
employment_4..7 years       0.0
employment_< 1 year         0.0
employment_>= 7 years       1.0
employment_unemployed       0.0
Name: 0.0, Length: 61, dtype: float64

### Data scaling

For the model to work, the dataset has to be standardized. The GermanData scale function uses StandardScaler from sklearn.

In [5]:
X_test_scaled = german_data.scale(X_test)
X_test_scaled

duration                 0.029412
credit                   0.050567
installment_percent      1.000000
residence_duration       1.000000
age                      0.857143
                           ...   
employment_1..4 years    0.000000
employment_4..7 years    0.000000
employment_< 1 year      0.000000
employment_>= 7 years    1.000000
employment_unemployed    0.000000
Length: 61, dtype: float64

Now, we evaluate the model on the test sample.

In [6]:
model.predict(np.expand_dims(X_test_scaled, axis=0))



array([[0.00939065, 0.9906094 ]], dtype=float32)

These outputs  are interpreted as the model prediction of the testing instance to class 1. For this credit data it means a bad (not paying loans) class of customers.

### Counterfactual explanations generation

Here we demonstrate how to generate counterfactual explanations using CADEX, FIMAP and ECE methods implemented in our library.

#### FIMAP


In [None]:
from modules.CFEC.cfec.explainers import Fimap

In [None]:
model_predictions = model.predict(german_data.X_train)
model_predictions = np.argmax(model_predictions, axis=1)
fimap = Fimap()
fimap.fit(german_data.X_train, model_predictions)

In [None]:
cf_fimap = fimap.generate(X_test)
cf_fimap

In [None]:
model.predict(german_data.scale(cf_fimap))

The class predicted for the counterfactual is 0, meaning good credit score.

In [None]:

show(X_test, cf_fimap)

We can see two things wrong with the generated counterfactual: 
- the categorical variables, originally represented in one-hot encoding, were changed to values different than 0 or 1
- the value of age variable decreased, resulting in poor quality of the counterfactual - we can't recommend someone to decrease their age in order to obtain credit

To fix it we can use constraints, which can be defined either in code or in spreadsheets (which can be used by users not familiar with programming).

In [None]:
from modules.CFEC.cfec.constraints import OneHot, ValueMonotonicity

constraints = [OneHot("account_status", 7, 10), 
               OneHot("credit_history", 11, 15),
               OneHot("purpose", 16, 25), 
               OneHot("savings", 26, 30), 
               OneHot("sex_status", 31, 34),
               OneHot("debtors", 35, 37), 
               OneHot("property", 38, 41),
               OneHot("other_installment_plans", 42, 44), 
               OneHot("housing", 45, 47), 
               OneHot("job", 48, 51),
               OneHot("phone", 52, 53), 
               OneHot("foreign", 54, 55), 
               OneHot("employment", 56, 60),
               ValueMonotonicity(['age'], "increasing")
              ]

In [None]:
fimap = Fimap(constraints=constraints, use_mapper=True)
fimap.fit(german_data.X_train, model_predictions)

cf_fimap_constraints = fimap.generate(X_test)

In [None]:
model.predict(german_data.scale(cf_fimap_constraints))

As FIMAP is a method that doesn't guarantee finding a true counterfactual (with a change in prediction), we should always check whether the prediction changed. Here, we'd have to tune the hyperparameters to find a true counterfactual.

In [None]:
cf_fimap_constraints.squeeze()


In [None]:
# from modules.CFEC.cfec.visualization import show
# cf = cf_fimap_constraints.squeeze().round(3)
# x = X_test.round(3)
# df = pd.concat([x, cf.transpose()], axis=1)
# df.columns = ["X", "X'"]
# df["index"] = list(range(len(x)))
# df = df[df["X"] != df["X'"]]
# df["change"] = df["X'"] - df["X"]

# for constraint in constraints:
#     if isinstance(constraint, OneHot):
#         changed = df[df["index"].between(constraint.start_column, constraint.end_column)]
#         if len(changed) == 2:
#                 print(changed)
#                 value_original = changed["X"][changed["X"] == 1].index.tolist()[0]
#                 print(value_original)
#                 value_cf = changed["X'"][changed["X'"] == 1].index.tolist()[0]
#                 df.loc[constraint.name] = [value_original, value_cf, -1, value_cf, "OneHot"]
#                 df.drop(changed.index, inplace=True)


In [None]:
show(X_test, cf_fimap_constraints.squeeze(), constraints=constraints)

In [None]:
X_test

The constraints column shows what constraints have been placed on the given attribute. If these constraints were not met, it would be marked with an asterisk.

#### CADEX

For CADEX, we can either pass scaled instance and then unscale the obtained counterfactual or pass transform and inverse_transform parameters to the constructor.

In [None]:
from modules.CFEC.cfec.explainers import Cadex 

cadex = Cadex(model, transform=german_data.scale, inverse_transform=german_data.unscale, n_changed=5)
cf = cadex.generate(X_test)
cf

In [None]:
model.predict(german_data.scale(cf))

In [None]:
from modules.CFEC.cfec.visualization import show
show(X_test, cf)

For CADEX we can also use constraints:

In [None]:
cadex = Cadex(model, n_changed=10, transform=german_data.scale, inverse_transform=german_data.unscale, constraints=constraints)
cf = cadex.generate(X_test)

In [None]:
model.predict(german_data.scale(cf))

In [None]:
show(X_test, cf, constraints=constraints)

### ECE 
We can use ECE to select the best counterfactuals - we'll run it using 10 explainers, 5 FIMAPs and 5 CADEXs with different parameter values. 

In [None]:
fimaps = []
fimap_hyperparameters = [
    (0.1, 0.001, 0.01),
    (0.1, 0.05, 0.5),
    (0.2, 0.01, 0.1),
    (0.2, 0.08, 0.8),
    (0.5, 0.001, 0.01)
]
for tau, l1, l2 in fimap_hyperparameters:
    fimap = Fimap(tau, l1, l2)
    fimap.fit(german_data.X_train, model_predictions)
    fimaps.append(fimap)
    
cadexs = []
n_list = [5, 8, 10, 15, 20]
for n_changed in n_list:
    cadex = Cadex(model, n_changed, transform=german_data.scale, inverse_transform=german_data.unscale)
    cadexs.append(cadex)

Now, let's use ECE and generate up to 4 best counterfactuals:

In [None]:
from modules.CFEC.cfec.explainers import ECE
from modules.CFEC.cfec.visualization import compare
pd.set_option("display.max_rows", None)

ece = ECE(10, columns=list(german_data.X_train.columns), bces=cadexs + fimaps, dist=2, h=5, lambda_=0.001, n_jobs=1)
cfs = ece.generate(X_test)

First, let's see all 10 counterfactuals:

In [None]:
compare(X_test, ece.get_aggregated_cfs())

And now, those selected by ECE:

In [None]:
compare(X_test, cfs)

We can do the same with constraints:

In [None]:
fimaps = []
fimap_hyperparameters = [
    (0.1, 0.001, 0.01),
    (0.1, 0.05, 0.5),
    (0.2, 0.01, 0.1),
    (0.2, 0.08, 0.8),
    (0.5, 0.001, 0.01)
]
for tau, l1, l2 in fimap_hyperparameters:
    fimap = Fimap(tau, l1, l2, constraints=constraints, use_mapper=True)
    fimap.fit(german_data.X_train, model_predictions)
    fimaps.append(fimap)
    
cadexs = []
n_list = [10, 14, 18, 20, 25]
for n_changed in n_list:
    cadex = Cadex(model, n_changed, transform=german_data.scale, inverse_transform=german_data.unscale, constraints=constraints)
    cadexs.append(cadex)

In [None]:
ece = ECE(4, columns=list(german_data.X_train.columns), bces=cadexs + fimaps, dist=2, h=5, lambda_=0.001, n_jobs=1)
cfs = ece.generate(X_test)

In [None]:
compare(X_test, ece.get_aggregated_cfs(), constraints=constraints)

In [None]:
compare(X_test, cfs, constraints=constraints)

For more information on the library, see our documentation: https://counterfactuals.readthedocs.io/en/latest/