# Tutorial: Counterfactual explanations for scorecard with continuous target

This tutorial shows how to generate counterfactual explanations on scorecard models with continuous target. The dataset for this tutorial is https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html.

In [1]:
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import HuberRegressor

from optbinning import BinningProcess
from optbinning import Scorecard
from optbinning.scorecard import Counterfactual

Load the dataset

In [2]:
data = fetch_california_housing()

target = "target"
variable_names = data.feature_names
df = pd.DataFrame(data.data, columns=variable_names)
df[target] = data.target

#### Scorecard model

First, we develop a scorecard following the steps presented in previous tutorials.

In [3]:
binning_process = BinningProcess(variable_names)

estimator = HuberRegressor(max_iter=200)

scorecard = Scorecard(binning_process=binning_process, target=target,
                      estimator=estimator, scaling_method="min_max",
                      scaling_method_params={"min": 0, "max": 100},
                      reverse_scorecard=True,verbose=True)

scorecard.fit(df)

2021-05-23 09:58:57,060 | INFO : Scorecard building process started.
2021-05-23 09:58:57,061 | INFO : Options: check parameters.
2021-05-23 09:58:57,064 | INFO : Dataset: continuous target.
2021-05-23 09:58:57,065 | INFO : Binning process started.
2021-05-23 09:58:59,278 | INFO : Binning process terminated. Time: 2.2123s
2021-05-23 09:58:59,279 | INFO : Fitting estimator.
2021-05-23 09:59:00,278 | INFO : Fitting terminated. Time 0.9992s
2021-05-23 09:59:00,281 | INFO : Scorecard table building started.
2021-05-23 09:59:00,460 | INFO : Scorecard table terminated. Time: 0.1793s
2021-05-23 09:59:00,461 | INFO : Scorecard building process terminated. Time: 3.4005s


Scorecard(binning_process=BinningProcess(binning_fit_params=None,
                                         binning_transform_params=None,
                                         categorical_variables=None,
                                         max_bin_size=None, max_n_bins=None,
                                         max_n_prebins=20, max_pvalue=None,
                                         max_pvalue_policy='consecutive',
                                         min_bin_size=None, min_n_bins=None,
                                         min_prebin_size=0.05, n_jobs=None,
                                         selection_criteria=None,
                                         special_codes=None, split_digits=None,
                                         v...
                                                         'AveRooms',
                                                         'AveBedrms',
                                                         'Population',
           

#### Generating counterfactual explanations

As an input data point or query, we select the first sample. Note that a query must be either a dictionary of a pandas DataFrame.

In [4]:
query = df.iloc[0, :-1].to_frame().T

In [5]:
query

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23


In [6]:
scorecard.predict(query)

array([4.38634097])

The predicted outcome (house value) for this query (house) is 4.38. We want to generate counterfactual explanations to find out how to increase the house value to at least 4.5.

In [7]:
cf = Counterfactual(scorecard=scorecard, verbose=True)

In [8]:
cf.fit(df)

2021-05-23 09:59:00,596 | INFO : Counterfactual fit started.
2021-05-23 09:59:00,599 | INFO : Options: check parameters.
2021-05-23 09:59:00,601 | INFO : Compute optimization problem data.
2021-05-23 09:59:00,672 | INFO : Counterfactual fit terminated. Time: 0.0755s


Counterfactual(n_jobs=1,
               scorecard=Scorecard(binning_process=BinningProcess(binning_fit_params=None,
                                                                  binning_transform_params=None,
                                                                  categorical_variables=None,
                                                                  max_bin_size=None,
                                                                  max_n_bins=None,
                                                                  max_n_prebins=20,
                                                                  max_pvalue=None,
                                                                  max_pvalue_policy='consecutive',
                                                                  min_bin_size=None,
                                                                  min_n_bins=None,
                                                                  min_prebin_size=0.05,
   

In [9]:
cf.generate(query=query, y=4.5, outcome_type="continuous", n_cf=1,
            max_changes=3, hard_constraints=["min_outcome"])

2021-05-23 09:59:00,740 | INFO : Counterfactual generation started.
2021-05-23 09:59:00,745 | INFO : Options: check parameters.
2021-05-23 09:59:00,766 | INFO : Options: check objectives and constraints.
2021-05-23 09:59:00,779 | INFO : Optimizer started.
2021-05-23 09:59:00,784 | INFO : Optimizer: build model...
2021-05-23 09:59:00,842 | INFO : Optimizer: solve...
2021-05-23 09:59:01,126 | INFO : Optimizer terminated. Time: 0.3416s
2021-05-23 09:59:01,127 | INFO : Post-processing started.
2021-05-23 09:59:01,140 | INFO : Post-processing terminated. Time: 0.0122s
2021-05-23 09:59:01,142 | INFO : Counterfactual generation terminated. Status: OPTIMAL. Time: 0.4017s


Counterfactual(n_jobs=1,
               scorecard=Scorecard(binning_process=BinningProcess(binning_fit_params=None,
                                                                  binning_transform_params=None,
                                                                  categorical_variables=None,
                                                                  max_bin_size=None,
                                                                  max_n_bins=None,
                                                                  max_n_prebins=20,
                                                                  max_pvalue=None,
                                                                  max_pvalue_policy='consecutive',
                                                                  min_bin_size=None,
                                                                  min_n_bins=None,
                                                                  min_prebin_size=0.05,
   

In [10]:
cf.information()

optbinning (Version 0.12.0)
Copyright (c) 2019-2021 Guillermo Navas-Palencia, Apache License 2.0

  Status  : OPTIMAL                         

  Solver statistics
    Type                                 mip
    Number of variables                  120
    Number of constraints                 42
    Objective value                   7.6866
    Best objective bound              7.6866

  Objectives
    proximity                         0.7069
    closeness                         6.9798

  Timing
    Total time                          0.43 sec
    Fit                                 0.08 sec   ( 17.58%)
    Solver                              0.34 sec   ( 79.57%)
    Post-processing                     0.01 sec   (  3.58%)



The generate counterfactual suggest increasing the block population, reduce the average house occupancy and change the house block longitude. None of them seems doable.

In [11]:
cf.display(show_only_changes=True, show_outcome=True)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,outcome
0,-,-,-,-,"[986.50, 1160.50)","[-inf, 1.95)",-,"[-122.16, -118.33)",4.520902


Now, let's generate several counterfactuals aiming to limit the house value to 4.0.

In [12]:
cf.generate(query=query, y=4.0, outcome_type="continuous", n_cf=3,
            max_changes=3,
            hard_constraints=["diversity_features", "max_outcome"],
            time_limit=30
           ).display(show_only_changes=True, show_outcome=True)

2021-05-23 09:59:01,250 | INFO : Counterfactual generation started.
2021-05-23 09:59:01,252 | INFO : Options: check parameters.
2021-05-23 09:59:01,258 | INFO : Options: check objectives and constraints.
2021-05-23 09:59:01,260 | INFO : Optimizer started.
2021-05-23 09:59:01,260 | INFO : Optimizer: build model...
2021-05-23 09:59:01,476 | INFO : Optimizer: solve...
2021-05-23 09:59:22,581 | INFO : Optimizer terminated. Time: 21.3209s
2021-05-23 09:59:22,583 | INFO : Post-processing started.
2021-05-23 09:59:22,614 | INFO : Post-processing terminated. Time: 0.0294s
2021-05-23 09:59:22,615 | INFO : Counterfactual generation terminated. Status: OPTIMAL. Time: 21.3649s


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,outcome
0,-,-,-,-,"[986.50, 1160.50)","[3.11, 3.24)",-,"[-122.16, -118.33)",3.968866
0,"[5.79, 6.82)",-,-,"[-inf, 0.95)",-,-,-,"[-122.16, -118.33)",3.227336
0,"[5.79, 6.82)",-,-,-,"[986.50, 1160.50)",-,-,"[-122.16, -118.33)",3.158199


And the same generation enforcing diversity on feature values.

In [13]:
cf.generate(query=query, y=3.0, outcome_type="continuous", n_cf=3,
            max_changes=3,
            hard_constraints=["diversity_features", "diversity_values", "max_outcome"],
            time_limit=30
           ).display(show_only_changes=True, show_outcome=True)

2021-05-23 09:59:22,658 | INFO : Counterfactual generation started.
2021-05-23 09:59:22,660 | INFO : Options: check parameters.
2021-05-23 09:59:22,666 | INFO : Options: check objectives and constraints.
2021-05-23 09:59:22,667 | INFO : Optimizer started.
2021-05-23 09:59:22,669 | INFO : Optimizer: build model...
2021-05-23 09:59:22,818 | INFO : Optimizer: solve...
2021-05-23 09:59:46,222 | INFO : Optimizer terminated. Time: 23.5526s
2021-05-23 09:59:46,223 | INFO : Post-processing started.
2021-05-23 09:59:46,255 | INFO : Post-processing terminated. Time: 0.0302s
2021-05-23 09:59:46,255 | INFO : Counterfactual generation terminated. Status: OPTIMAL. Time: 23.5970s


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,outcome
0,"[5.79, 6.82)",-,-,-,-,"[3.11, 3.24)",-,"[-118.16, inf)",2.981844
0,"[4.53, 5.04)",-,"[6.12, 6.37)",-,-,-,-,"[-118.33, -118.26)",2.486897
0,"[5.04, 5.79)",-,-,-,"[986.50, 1160.50)",-,-,"[-122.16, -118.33)",2.646881
