# Credit Scorecard Development Quick Start

_First Version: 2022-12-15  
Last Updates: 2022-12-16  
Author: QH_


## 1. The development process
Like any statistical model development, credit scorecard model development involves the same phases as follows if data sources have been identified and data has been collected by the developers:

1. __Target Variable__
    * Good/Bad binary target variable determination (TBA)
    
2. __Data Exploration__
    * _Explore the distribution of each feature_
        * Identify and deal with missing values:
            * 1) Exclude the feature if missing percentage is $> 50\%$.
            * 2) Treat missing as a separate category - _Recommended_.   
            Since from the credit risk perspective missing information is commonly associated with negative information on borrower's credit-worthiness. E.g. Borrowers may not want to provide current employee name or length of employment because they may be out of job market and in need of money to pay for bills. 
            * 3) Impute the values using mean, median or statistical methods.
        * Identify and deal with outliers:
            * Outliers may due to typing error or fraud. Investigation is often needed to proceed with treatment(remove or impute).
    * _Initial univariate feature selection_
        * Group all features into risk categories. For example for a business:
            * Financial strength:
                * Profitability: average profit margin, industry segment, average sales growth, etc.
                * Liquidity: account balance, cash/assets, quick ratio, current ratio, etc.
                * Indebtedness: debt to income ratio, debt service coverage ratio (DSCR)
            * Credit History:
                * previous credit payment behavior (e.g. number of times 30/60/90 days past due)
                * business bureau score
            * Business Stability: Years in the business, Management quality, ability to grow market share, average tenure of employees, turnover rate, etc.
            * Geographic/Industry/Market impact: bankruptcy rate for the industry

        * Rank predictive power for all features and make initial selection within each risk categories
            * For credit scorecard models, sometimes we will use binning/grouping variables into categorical variables (e.g. to account for missing)
                * Calculate the Weight of Evidence (WOE) measure for each feature.
                * Calculate metrics to rank predictive power: Information Value (IV), Chi-square Test, Gini/AUC.
                * Evaluate if the WOE is logical across all the bins and take business operation into consideration for feature selection.
            * If we do not use binning/grouping, metrics to rank predictive power will be different for numeric and categorical as other classification problems:
                * Categorical: _chi-square test statistics_ and _mutual information statistics_
                * Numeric: _ANOVA f test statistics_ and _mutual information statistics_
            
    * _Multivariate variable selection_
        * Remove features that is less predictive but has multi-colinearity with other features.

3. Modeling
    * We use WOE value from each bin to reprent the value of each bin from the selected features.
    * Majority of the time, we use logistic regression for its advantage of simpliciy, interpretabilty and a lot of the times pretty good classification power.
    * During the modeling process, we can use feature selection as well, e.g. forward selection or backward selection.
        * To avoid the situation that some of the risk categories are being eliminated we can do the selection by risk categories.

4. Scaling
    * After classification has been done, we need to scale to generate a score to reflect borrowers' creditability. After specify 1) the odds (good to bad) at which score and 2) _points to double the odds (pdo)_, we can use the following relationship to scale up to obtain credit scores:
    $$ \text{Score} = \text{Offset} + \text{Factor} \times \log (\text{odds})$$
    $$\text{Score} + pdo = \text{Offset} + \text{Factor} \times \log (2 \cdot \text{odds})$$
    $$\rightarrow \text{Factor} = pdo / \log(2), \text{Offset} = \text{Score} - (\text{Factor} \times \log(\text{Odds}))$$
    * Example: If we want odds of good to bad 50:1 at 600 points and want the odds to double every 20 points, then
    $$ \text{Factor} = pdo / \log(2) = 28.85,  \text{Offset} = 600 - (28.85 \times \log(50)) = 487.12$$
    * Since _Factor_ and _Offset_ has been determined, we can determine the credit score for each borrower using the fitted model:
    $$\begin{aligned}
      \text{Score} & = \text{Offset} + \text{Factor} \times \log(\frac{1-\hat{p}}{\hat{p}}) \\
                   &= \text{Offset} + \text{Factor} \times \bigg(- \big(\sum_{i=1} ^{M}\beta_{i} \sum_{j=1}^{K_i} I_{ij}\times woe_{ij} + \alpha \big) \bigg) \\
                   &= \sum_{i=1} ^{M} \bigg(- \text{Factor} \times \big( \beta_{i} \cdot \sum_{j=1}^{K_i} I_{ij}\times woe_{ij} + \frac{\alpha}{M}\big)  + \frac{\text{offset}}{M}\bigg) \\
                   &= \sum_{i=1} ^{M} \bigg(- \big(\text{Factor} \times \beta_{i}\big) \cdot \sum_{j=1}^{K_i} I_{ij}\times woe_{ij} + \frac{\text{offset} - \text{Factor} \times \alpha}{M}\bigg)

      \end{aligned}
    $$
    where $M$ is the total number of features in the model, $K_i$ is the number of groups(bins) for the $i^{th}$ feature, $I_{ij}$ means when the $i^{th}$ feature value is in $j^{th}$ bin then value is 1 otherwise 0. From the last equation, we can calculate the score for each feature and add them together to be the final score.

## 2. Weight of Evidence (WOE) and Information Value (IV)

Weight of Evidence (WOE) is the log odds of a borrower being good in that attribute/feature group and being bad in that attribute group. Mathematically, 
$$
\begin{aligned}
woe_{ij} &= \log \bigg(\frac{DistrGood_{ij}} {DistrBad_{ij}} \bigg) = \log \bigg(\frac{\Pr(X_i \in j|Good)}{\Pr(X_i \in j|Bad)} \bigg) \\
         &= \log \bigg( \frac{\Pr(X_i \in j, Good) / \Pr(Good)}{\Pr(X_i \in j, Bad) / \Pr(Bad)}  \bigg) \\
         &= \log \bigg( \frac{\Pr(X_i \in j, Good) / \Pr(X_i \in j, Bad)}{\Pr(Good) / \Pr(Bad)}  \bigg)
\end{aligned}
$$
where $woe_{ij}$ means woe of $j^{th}$ group of feature $i$ .
Information Value (IV) is the metric to evaluate the predictive power of the feature defined as follows:
$$iv_{ij} =\big( DistrGood_{ij} - DistrBad_{ij} \big) \cdot woe_{ij}$$
$$\rightarrow iv_{i} = \sum_{j = 1}^{K_i} iv_{ij}$$
Where $K_i$ is the number of groups(bins) for the $i^{th}$ feature.


In [4]:
# Example of woe and iv - source from reference [1]
import pandas as pd
import numpy as np

age_grp = ['Missing', '18-22', '23-26', '27-29', '30-35', '35-44', '44+']
age_grp_count = [1000, 4000, 6000, 9000, 10000, 7000, 3000]
good_count = [860, 3040, 4920, 8100, 9500, 6800, 2940]
df = pd.DataFrame({'age_group': age_grp, 'age_group_count': age_grp_count, 'good_count': good_count})
df['bad_count'] = df['age_group_count'] - df['good_count']

def woe_iv_calc(df, event_cnt_var, non_event_cnt_var):
    event_total = df[event_cnt_var].sum()
    non_event_total = df[non_event_cnt_var].sum()
    df['event_distr'] = df[event_cnt_var] / event_total
    df['non_event_distr'] = df[non_event_cnt_var] / non_event_total
    df['woe'] = np.log(df['non_event_distr'] / df['event_distr'])
    df['iv'] = (df['non_event_distr'] - df['event_distr']) * df['woe']
    print(f"Information Value for the Feature is: {df['iv'].sum()}")
    return df

woe_iv_calc(df, 'bad_count', 'good_count')


Information Value for the Feature is: 0.6680562518213035


Unnamed: 0,age_group,age_group_count,good_count,bad_count,event_distr,non_event_distr,woe,iv
0,Missing,1000,860,140,0.036458,0.023783,-0.427191,0.005415
1,18-22,4000,3040,960,0.25,0.084071,-1.089802,0.18083
2,23-26,6000,4920,1080,0.28125,0.136062,-0.726134,0.105426
3,27-29,9000,8100,900,0.234375,0.224004,-0.045257,0.000469
4,30-35,10000,9500,500,0.130208,0.262721,0.701958,0.093018
5,35-44,7000,6800,200,0.052083,0.188053,1.283879,0.174569
6,44+,3000,2940,60,0.015625,0.081305,1.649339,0.108329


Rule of thumb to determine a good predictive power using information value:
* $IV < 0.02$: generally unpredictive
* $0.02 \leq IV \le 0.1$: weak
* $0.1 \leq IV \le 0.3$ medium
* $IV \geq 0.3$: strong

## Reference
1. Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards, Second Edition