Discriminatory Power (also Predictive Power, Scorecard Strength) is the ability to discriminate ex ante between defaulting and non-defaulting borrowers. The discriminatory power of any classification procedure (e.g. a Credit Scorecard) can be assessed using a number of statistical measures of discrimination.

The following measures have been suggested in the literature or are popular in the financial industry:

- Cumulative Accuracy Profile (CAP) and its summary index, the Accuracy Ratio (AR),
- Brier Score
- Receiver Operating Characteristic (ROC) and its summary indices, the ROC measure and the Pietra coefficient,
- Bayesian Error Rate,
- Conditional Entropy, Kullback-Leibler Distance, and Conditional Information Entropy Ratio (CIER),
- Information Value (Divergence Statistic, Portfolio Stability Index),
- Kendall’s tau and Somers’ D (for shadow ratings), and
- Gains Chart
- Lift Curve

Why Brier score?
Because it is most applicable in our case, as we have final PD that can be between 0 and 1 and a binary actual outcome of 0 and 1.

The smaller the Brier score loss, the better, hence the naming with “loss”. The Brier score measures the mean squared difference between the predicted probability and the actual outcome. The Brier score always takes on a value between zero and one, since this is the largest possible difference between a predicted probability (which must be between zero and one) and the actual outcome (which can take on values of only 0 and 1). It can be decomposed as the sum of refinement loss and calibration loss.

The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false.

In [42]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

non_pension_dev = pd.read_pickle(r'file1')
pension_dev = pd.read_pickle(r'file2')
street_dev = pd.read_pickle(r'file3')

non_pension_val = pd.read_pickle(r'file4')
pension_val = pd.read_pickle(r'file5')
street_val = pd.read_pickle(r'file6')

non_pension_oot = pd.read_pickle(r'file7')
pension_oot = pd.read_pickle(r'file8')
street_oot = pd.read_pickle(r'file9')

In [43]:
non_pension_dev = non_pension_dev[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]
pension_dev = pension_dev[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]
street_dev = street_dev[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]

non_pension_val = non_pension_val[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]
pension_val = pension_val[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]
street_val = street_val[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]

non_pension_oot = non_pension_oot[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]
pension_oot = pension_oot[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]
street_oot = street_oot[['CONTRACT_REF_NO_raw','bad_flag','final_pd']]

In [13]:
from sklearn.metrics import brier_score_loss

In [44]:
y_true_non_pension_dev = np.array(non_pension_dev['bad_flag'].to_list())
y_prob_non_pension_dev = np.array(non_pension_dev['final_pd'].to_list())
y_true_pension_dev = np.array(pension_dev['bad_flag'].to_list())
y_prob_pension_dev = np.array(pension_dev['final_pd'].to_list())
y_true_street_dev = np.array(street_dev['bad_flag'].to_list())
y_prob_street_dev = np.array(street_dev['final_pd'].to_list())

y_true_non_pension_val = np.array(non_pension_val['bad_flag'].to_list())
y_prob_non_pension_val = np.array(non_pension_val['final_pd'].to_list())
y_true_pension_val = np.array(pension_val['bad_flag'].to_list())
y_prob_pension_val = np.array(pension_val['final_pd'].to_list())
y_true_street_val = np.array(street_val['bad_flag'].to_list())
y_prob_street_val = np.array(street_val['final_pd'].to_list())

y_true_non_pension_oot = np.array(non_pension_oot['bad_flag'].to_list())
y_prob_non_pension_oot = np.array(non_pension_oot['final_pd'].to_list())
y_true_pension_oot = np.array(pension_oot['bad_flag'].to_list())
y_prob_pension_oot = np.array(pension_oot['final_pd'].to_list())
y_true_street_oot = np.array(street_oot['bad_flag'].to_list())
y_prob_street_oot = np.array(street_oot['final_pd'].to_list())

In [45]:
print("-----DEV BASE-----")
print("Brier score for non pension: " + str(round(brier_score_loss(y_true_non_pension_dev, y_prob_non_pension_dev),4)))
print("Mean PD: " + str(non_pension_dev['final_pd'].mean()))
print("DR: " + str(non_pension_dev['bad_flag'].value_counts()[1]/(non_pension_dev['bad_flag'].value_counts()[0] + non_pension_dev['bad_flag'].value_counts()[1])))

print("Brier score for pension: " + str(round(brier_score_loss(y_true_pension_dev, y_prob_pension_dev),4)))
print("Mean PD: " + str(pension_dev['final_pd'].mean()))
print("DR: " + str(pension_dev['bad_flag'].value_counts()[1]/(pension_dev['bad_flag'].value_counts()[0] + pension_dev['bad_flag'].value_counts()[1])))

print("Brier score for street: " + str(round(brier_score_loss(y_true_street_dev, y_prob_street_dev),4)))
print("Mean PD: " + str(street_dev['final_pd'].mean()))
print("DR: " + str(street_dev['bad_flag'].value_counts()[1]/(street_dev['bad_flag'].value_counts()[0] + street_dev['bad_flag'].value_counts()[1])))

-----DEV BASE-----
Brier score for non pension: 0.0333
Mean PD: 0.038818775734851665
DR: 0.038818719873085605
Brier score for pension: 0.0411
Mean PD: 0.04538977211966523
DR: 0.045389699159218366
Brier score for street: 0.0382
Mean PD: 0.049490826435607184
DR: 0.04949092928544983


In [46]:
print("-----VAL BASE-----")
print("Brier score for non pension: " + str(round(brier_score_loss(y_true_non_pension_val, y_prob_non_pension_val),4)))
print("Mean PD: " + str(non_pension_val['final_pd'].mean()))
print("DR: " + str(non_pension_val['bad_flag'].value_counts()[1]/(non_pension_val['bad_flag'].value_counts()[0] + non_pension_val['bad_flag'].value_counts()[1])))

print("Brier score for pension: " + str(round(brier_score_loss(y_true_pension_val, y_prob_pension_val),4)))
print("Mean PD: " + str(pension_val['final_pd'].mean()))
print("DR: " + str(pension_val['bad_flag'].value_counts()[1]/(pension_val['bad_flag'].value_counts()[0] + pension_val['bad_flag'].value_counts()[1])))

print("Brier score for street: " + str(round(brier_score_loss(y_true_street_val, y_prob_street_val),4)))
print("Mean PD: " + str(street_val['final_pd'].mean()))
print("DR: " + str(street_val['bad_flag'].value_counts()[1]/(street_val['bad_flag'].value_counts()[0] + street_val['bad_flag'].value_counts()[1])))

-----VAL BASE-----
Brier score for non pension: 0.0334
Mean PD: 0.03893636278344383
DR: 0.039135064949856774
Brier score for pension: 0.0418
Mean PD: 0.0456195714710635
DR: 0.04615106638776347
Brier score for street: 0.0393
Mean PD: 0.049371014673996114
DR: 0.05116188666205943


In [48]:
print("-----OOT BASE-----")
print("Brier score for non pension: " + str(round(brier_score_loss(y_true_non_pension_oot, y_prob_non_pension_oot),4)))
print("Mean PD: " + str(non_pension_oot['final_pd'].mean()))
print("DR: " + str(non_pension_oot['bad_flag'].value_counts()[1]/(non_pension_oot['bad_flag'].value_counts()[0] + non_pension_oot['bad_flag'].value_counts()[1])))

print("Brier score for pension: " + str(round(brier_score_loss(y_true_pension_oot, y_prob_pension_oot),4)))
print("Mean PD: " + str(pension_oot['final_pd'].mean()))
print("DR: " + str(pension_oot['bad_flag'].value_counts()[1]/(pension_oot['bad_flag'].value_counts()[0] + pension_oot['bad_flag'].value_counts()[1])))

print("Brier score for street: " + str(round(brier_score_loss(y_true_street_oot, y_prob_street_oot),4)))
print("Mean PD: " + str(street_oot['final_pd'].mean()))
print("DR: " + str(street_oot['bad_flag'].value_counts()[1]/(street_oot['bad_flag'].value_counts()[0] + street_oot['bad_flag'].value_counts()[1])))

-----OOT BASE-----
Brier score for non pension: 0.0291
Mean PD: 0.042048300180840326
DR: 0.03286369070378634
Brier score for pension: 0.0601
Mean PD: 0.06738202302833247
DR: 0.0842733255427763
Brier score for street: 0.0353
Mean PD: 0.041917468610657094
DR: 0.04150068883027008


# Conclusion:

> The value of the Brier score is always between 0.0 and 1.0, where a model with perfect skill has a score of 0.0 and the worst has a score of 1.0
> Model shows good Brier score