# Expected Loss From Accepting Defaulters vs Rejecting Non-Defaulters

Throughout this project we have frequently come across the concept of risking losing a significant proportion of the principal when granting loans to those who will default, and comparing that expected loss with that incurred by rejecting a borrower who would've fully paid, thus losing their interest. Here, we will analytically quantify the expected loss resulting from accepting a defaulting borrower vs rejecting a full-paying borrower.


In [None]:
import pandas as pd
import numpy as np
from main import specialty_preprocess_df
from collections import defaultdict as dd


my_df = pd.read_csv(
    "./datasets/lc_data_2007_to_2018.csv",
    low_memory=False,
    encoding="latin1",
    nrows=100000,  # only looking at 100k rows right now for performance
)
pd.set_option("display.max_columns", None)
cleaned_df = specialty_preprocess_df(my_df)

## Explanation of the Expected Loss Calculation

The expected loss from approving a defaulter can be calculated as follows. Say there are $n$ borrowers who will default and are approved for a loan. These $n$ borrowers, between them, borrow a total sum of $x$ dollars. They pay back a total sum of $y$ dollars. Then the loss given default (LGD) for a single borrower is $\frac{x-y}{n}$. (We don't include the missed interest we would've gotten had they paid it back in full, because they were a defaulter and so they were never going to pay it back in full. The only two options are reject for loss = \$0 and approve for loss = \$ $\frac{x-y}{n}$.)

The expected loss from rejecting a fully-paying borrower is to be calculated as follows. The dataset contains only approved borrowers, so we come upon the classic credit problem of reject inference. Because the dataset contains no rejected borrowers for us to infer the behaviour of, we are forced to make the naive simplification that rejected full-paying borrowers (RFPBs) are distributed equally to approved full-paying borrowers (AFPBs). (This is a simplification because RFPBs would have had higher interest rates than AFPBs on average, since they are riskier on average, which we can conclude based on the fact that they were rejected.) We treat the AFPBs as RFPBs that simply were approved and that's the only difference. Thus we can calculate the expected loss from rejecting a full-paying borrower by finding how much we would've lost on average for rejecting an AFPB. This is found like so: say there are $m$ AFPBs. Across all AFPBs, the sum of the interest they would've paid on top of their principal is $z$ dollars. Then \$ $\frac{z}{m}$ is the amount of missed interest, but the true loss of rejecting an AFPB also takes into account that when a bank rejects an AFPB and keeps their principal, they then invest that principal in risk-free treasury bonds, achieving a non-zero interest rate. So the true cost of rejecting an AFPB is \$ $\frac{z- p \cdot r}{m}$, where $p$ is the sum of the principals borrowed and $r$ is the risk-free interest rate.

### Selection Bias in the Analysis

Because all borrowers in the dataset were approved, we must accept that the insights we draw will contain selection bias. The defaulters in the dataset don't have as high interest rates as defaulters would overall, because the defaulters in the dataset were at least low-risk enough for them to be approved. So approving a defaulter picked at random carries a greater risk of default than is apparent in this dataset, and interest rates in the dataset are lower than the average of the defaulter population. However, the estimate for expected loss of approving a defaulter does not take into account interest rate, since we assume that these borrowers would never pay the loan back fully, so this doesn't matter for the calculation.

As discussed in the calculation of expected loss from RFPBs, there are no true RFPBs in the dataset, so we cannot perform the more intuitive averaging-out process to find the expected loss. Instead we are forced to approximate RFPBs with AFPBs. The difference between these is that RFPBs would likely have a higher average interest rate, since they were rejected in this dataset and therefore are considered more risky than AFPBs. So using the interest rate of AFPBs for RFPBs is a flawed approximation, and the calculation of expected loss from RFPBs using that simplification would systematically underestimate the true expected loss from RFPBs.
