In the following notebook we will show how you can use the CARLA library.

## Data

Before we can do anything else we need some data.
You could import one of the datasets in our [catalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/data.html#module-data.catalog.catalog),
however maybe you want to use your own data instead.

In [1]:
import warnings
warnings.filterwarnings('ignore')

from carla.data.catalog import CsvCatalog

Using TensorFlow backend.


[INFO] Using Python-MIP package version 1.12.0 [model.py <module>]


In [2]:
continuous = ["age", "fnlwgt", "education-num", "capital-gain", "hours-per-week", "capital-loss"]
categorical = ["marital-status", "native-country", "occupation", "race", "relationship", "sex", "workclass"]
immutable = ["age", "sex"]

dataset = CsvCatalog(file_path="adult.csv",
                     continuous=continuous,
                     categorical=categorical,
                     immutables=immutable,
                     target='income')

print(dataset.df)

            age    fnlwgt  education-num  capital-gain  capital-loss  ...  \
0      0.301370  0.044131       0.800000      0.021740           0.0  ...   
1      0.452055  0.048052       0.800000      0.000000           0.0  ...   
2      0.287671  0.137581       0.533333      0.000000           0.0  ...   
3      0.493151  0.150486       0.400000      0.000000           0.0  ...   
4      0.150685  0.220635       0.800000      0.000000           0.0  ...   
...         ...       ...            ...           ...           ...  ...   
48827  0.301370  0.137428       0.800000      0.000000           0.0  ...   
48828  0.643836  0.209130       0.533333      0.000000           0.0  ...   
48829  0.287671  0.245379       0.800000      0.000000           0.0  ...   
48830  0.369863  0.048444       0.800000      0.054551           0.0  ...   
48831  0.246575  0.114919       0.800000      0.000000           0.0  ...   

       occupation_Other  race_White  relationship_Non-Husband  sex_Male  \


## Model

Now that we have the data loaded we also need a classification model.
You could define your own [model](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/examples.html#black-box-model),
however here we will show how you can train one of our [catalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/mlmodel.html#module-models.catalog.catalog) models.
Note that depending on your data you might need to tweak the training hyperparameters.

In [3]:
from carla.models.catalog import MLModelCatalog

In [4]:
training_params = {"lr": 0.002, "epochs": 10, "batch_size": 1024, "hidden_size": [18, 9, 3]}

ml_model = MLModelCatalog(
    dataset, model_type="ann", load_online=False, backend="pytorch"
)
ml_model.train(
    learning_rate=training_params["lr"],
    epochs=training_params["epochs"],
    batch_size=training_params["batch_size"],
    hidden_size=training_params["hidden_size"]
)

Loaded model from /home/johan/carla/models/custom/ann_layers_18_9_3.pt
test accuracy for model: 0.840678243774574


## Recourse

Now that we have both the data, and a model we can start using CARLA to generate counterfactuals.
You can pick a [recourse method](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/recourse.html) from the catalog, or implement one yourself.
In the following example we are getting negative labeled samples for which we want counterfactuals.

In [5]:
from carla.models.negative_instances import predict_negative_instances
import carla.recourse_methods.catalog as recourse_catalog

In [6]:
# get factuals
factuals = predict_negative_instances(ml_model, dataset.df)
test_factual = factuals.iloc[:5]

hyperparams = {
    "data_name": dataset.name,
    "n_search_samples": 100,
    "p_norm": 1,
    "step": 0.1,
    "max_iter": 1000,
    "clamp": True,
    "binary_cat_features": False,
    "vae_params": {
        "layers": [len(ml_model.feature_input_order), 512, 256, 8],
        "train": True,
        "lambda_reg": 1e-6,
        "epochs": 5,
        "lr": 1e-3,
        "batch_size": 32,
    },
}

cchvae = recourse_catalog.CCHVAE(ml_model, hyperparams)
df_cfs = cchvae.get_counterfactuals(test_factual)

print(df_cfs)

[INFO] Start training of Variational Autoencoder... [models.py fit]
[INFO] [Epoch: 0/5] [objective: 0.374] [models.py fit]
[INFO] [ELBO train: 0.37] [models.py fit]
[INFO] [ELBO train: 0.13] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] ... finished training of Variational Autoencoder. [models.py fit]
        age  capital-gain  capital-loss  education-num    fnlwgt  ...  \
0  0.306860      0.036312      0.039532       0.606228  0.119245  ...   
1  0.307038      0.036296      0.040064       0.605119  0.119296  ...   
2  0.304983      0.036325      0.039844       0.605189  0.119365  ...   
3  0.307777      0.036302      0.039464       0.604545  0.119389  ...   
4  0.303826      0.036342      0.039917       0.604851  0.119533  ...   

   race_White  relationship_Non-Husband  sex_Male  workclass_Private  income  
0         1.0                       0.0       1.0           0.733125     1.0