Used in IGANN Paper (Case Study 2)

See IGANN Appendix i:
"The dataset is taken from the FICO Explainable Machine Learning Challenge15. It contains 10,459 samples with 21 continuous features, two categorical features, and a binary target variable stating whether or not an individual defaulted on the loan"

Appendix also highlights preprocessing (keep only 10 features + target)

Link: https://www.kaggle.com/datasets/averkiyoliabev/home-equity-line-of-creditheloc

Alternatives: GAM Coach User Study (eg German Credit)

Model trained on raw dataset

***GPT Analysis of "MSinceMostRecentDelq (index = 8)***

The feature **MSinceMostRecentDelq** (Months Since Most Recent Delinquency) reflects how long it has been since the borrower was last delinquent on a payment. In the context of loan approval, a higher value typically indicates more time has passed since the last delinquency, which is generally seen as a positive sign.

Key domain knowledge contradictions in the shape function:

1. **Negative values for recent delinquencies**: For values close to zero (e.g., the range "(-9.0, -7.5)" to "(2.5, 5.5)"), the function returns negative contributions, which is expected, as recent delinquencies are risky. However, the contributions get **worse** as delinquency becomes more recent, reaching a minimum at "(-9.0, -7.5)", which is inconsistent with the fact that it would be impossible to have a delinquency 9 months in the future.

2. **Improving outcomes with very high values**: The contributions for MSinceMostRecentDelq increase significantly after 20 months (e.g., ranges like "(30.5, 31.5)" or higher). While it makes sense that outcomes improve with more time since the last delinquency, the large positive values beyond 60 months seem unrealistic. The shape function suggests **extreme optimism** for borrowers who have not had a delinquency for several years, even though such borrowers might still have other risk factors.

3. **Inconsistent pattern near 70 months**: After 70 months, the contribution suddenly **drops** (e.g., "(65.5, 66.5)" to "(73.5, 74.5)"). This contradicts the expectation that as time since the last delinquency increases, the likelihood of loan repayment should consistently improve. This drop could indicate a flaw in the data or model.

In summary, the model suggests extreme penalties for very recent delinquencies (including impossible values) and overly optimistic predictions for very old delinquencies. Additionally, the drop near 70 months is unexpected.

In [2]:
import igann_helpers
dataset = igann_helpers.load_fico_data()
X_df = dataset["full"]["X"]
X_df["RiskPerformance"] = dataset["full"]["y"]

# X_df.to_csv("heloc_preprocessed.csv", index=False)

In [8]:
import pandas as pd

df = pd.read_csv("heloc_preprocessed.csv")
df

Unnamed: 0,ExternalRiskEstimate,MSinceOldestTradeOpen,AverageMInFile,NumSatisfactoryTrades,PercentTradesNeverDelq,MSinceMostRecentDelq,PercentInstallTrades,MSinceMostRecentInqexcl7days,NetFractionRevolvingBurden,NumRevolvingTradesWBalance,RiskPerformance
0,55.0,144.0,84.0,20.0,83.0,2.000000,43.0,0.0,33.0,8.0,0
1,61.0,58.0,41.0,2.0,100.0,21.879547,67.0,0.0,0.0,0.0,0
2,67.0,66.0,24.0,9.0,100.0,21.879547,44.0,0.0,53.0,4.0,0
3,66.0,169.0,73.0,28.0,93.0,76.000000,57.0,0.0,72.0,6.0,0
4,81.0,333.0,132.0,12.0,100.0,21.879547,25.0,0.0,51.0,3.0,0
...,...,...,...,...,...,...,...,...,...,...,...
10454,73.0,131.0,57.0,21.0,95.0,80.000000,19.0,7.0,26.0,5.0,1
10455,65.0,147.0,68.0,11.0,92.0,28.000000,42.0,1.0,86.0,2.0,0
10456,74.0,129.0,64.0,18.0,100.0,21.879547,33.0,3.0,6.0,5.0,0
10457,72.0,234.0,113.0,42.0,96.0,35.000000,20.0,6.0,19.0,4.0,0


In [1]:
import pandas as pd
df2 = pd.read_csv("ds_description.csv")
df2

Unnamed: 0,Column Name,Description,Feature Type,Values,Role
0,ExternalRiskEstimate,Consolidated version of risk markers,Continuous,"[33, 94]",Predictor
1,MSinceOldestTradeOpen,Months since oldest trade open,Continuous,"[2, 803]",Predictor
2,AverageMInFile,Average months in file,Continuous,"[0, 383]",Predictor
3,NumSatisfactoryTrades,Number of satisfactory trades,Continuous,"[0, 79]",Predictor
4,PercentTradesNeverDelq,Percentage of trades never delinquent,Continuous,"[0, 100]",Predictor
5,MSinceMostRecentDelq,Months since most recent delinquency,Continuous,"[0, 83]",Predictor
6,PercentInstallTrades,Percentage of installment trades,Continuous,"[0, 100]",Predictor
7,MSinceMostRecentInqexcl7days,Months since most recent inquiry excl. 7 days,Continuous,"[0, 24]",Predictor
8,NetFractionRevolvingBurden,Net fraction revolving burden (= revolving bal...,Continuous,"[0, 232]",Predictor
9,NumRevolvingTradesWBalance,Number of revolving trades with balance,Continuous,"[0, 32]",Predictor
