The Python scikit-learn and imbalanced-learn machine learning libraries were used to assess credit card risk based on features such as loan amount, interest, etc. The target for predicted outcome was 'loan_risk', which could be either 'high-risk' or 'low-risk'. Data was analyzed using six different supervised machine learning models:
The results of each model were assessed based on metrics that included balanced accuracy, precision, and recall scores. The subsequent Analysis section is divided up by candidate model. Each section contains tables and screenshots of model assessment metrics (i.e. balanced accuracy scores, precision scores, recall scores, confusion matrices, classification reports). The Summary section summarizes these results and the resulting model recommendation. Jupyter Notebooks and data can be found in the Resources section.
- The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
- The recall scores for both the high-risk and low-risk outcomes were moderately high.
- Overall, the model had a balanced accuracy score of 0.6620175698580149
Outcome | Precision Score | Recall Score |
---|---|---|
high-risk | 0.01 | 0.72 |
low-risk | 1.00 | 0.60 |
- The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
- The recall scores for both the high-risk and low-risk outcomes were moderately high. The high-risk recall score dropped relative to the naive random oversampling method, but the low-risk score increased.
- Overall, the model had a balanced accuracy score of 0.6568196079430206
Outcome | Precision Score | Recall Score |
---|---|---|
high-risk | 0.01 | 0.61 |
low-risk | 1.00 | 0.70 |
- The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
- The recall scores for both the high-risk and low-risk outcomes were reduced relative to the prior undersampling methods.
- Overall, the model had a balanced accuracy score of 0.6027679241263696
Outcome | Precision Score | Recall Score |
---|---|---|
high-risk | 0.01 | 0.61 |
low-risk | 1.00 | 0.59 |
- The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
- The recall scores for both the high-risk and low-risk outcomes were improved relative to the undersampling method, but not relative to the oversampling methods.
- Overall, the model had a balanced accuracy score of 0.639214728301642
Outcome | Precision Score | Recall Score |
---|---|---|
high-risk | 0.03 | 0.69 |
low-risk | 1.00 | 0.59 |
- The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
- The recall scores for both the high-risk and low-risk outcomes were improved relative to the resampling methods.
- Overall, the model had a balanced accuracy score of 0.7887512850910909
Outcome | Precision Score | Recall Score |
---|---|---|
high-risk | 0.03 | 0.70 |
low-risk | 1.00 | 0.87 |
Feature | Importance |
---|---|
total_rec_prncp | 0.07876809003486353 |
total_pymnt | 0.05883806887524815 |
total_pymnt_inv | 0.05625613759225244 |
total_rec_int | 0.05355513093134745 |
last_pymnt_amnt | 0.0500331813446525 |
int_rate | 0.02966959508700077 |
issue_d_Jan-2019 | 0.021129125328012987 |
installment | 0.01980242888931366 |
dti | 0.01747062730041245 |
out_prncp_inv | 0.016858293184471483 |
- The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
- The recall scores for both the high-risk and low-risk outcomes were improved relative to all other models.
- Overall, the model had a balanced accuracy score of 0.931601605553446
Outcome | Precision Score | Recall Score |
---|---|---|
high-risk | 0.09 | 0.92 |
low-risk | 1.00 | 0.94 |
- Of the 4 resampling machine learning models, it's challenging to identify one model that clearly outperformed the others. Some precision scores were comparable across the board, but more variance was seen in the recall scores. Some had high overall recall scores compared to others, but then their individual high-risk and low-risk scores would be below other candidate models. Looking at the imbalanced accuracy scores (a measure of overall precision/recall score tradeoffs), the naive random oversampling model performed the best of its cohort.
- Of the 2 ensemble models, the AdaBoosted easy ensemble model outperformed the random forest model across all metrics. However, both ensemble models outperformed the resampling models.
- Across all 6 machine learning models, the AdaBoosted easy ensemble by and far performed best and should be the one selected for use. However, it's important to note that, while the AdaBoosted model outperformed other candidate models across all metrics, the precision score for high-risk loan candidates was still only 0.09. If the primary purpose of the modelling effort was to identify high-risk applicants, alternative machine learning models should be explored.
- Data
- Notebooks
- Software
- Jupyter Notebook