Skip to content

Use scikit-learn and imbalanced-learn machine learning libraries to assess credit card risk.

Notifications You must be signed in to change notification settings

inregards2pluto/credit-risk-analysis

Repository files navigation

Credit Risk Analysis

Overview

The Python scikit-learn and imbalanced-learn machine learning libraries were used to assess credit card risk based on features such as loan amount, interest, etc. The target for predicted outcome was 'loan_risk', which could be either 'high-risk' or 'low-risk'. Data was analyzed using six different supervised machine learning models:

The results of each model were assessed based on metrics that included balanced accuracy, precision, and recall scores. The subsequent Analysis section is divided up by candidate model. Each section contains tables and screenshots of model assessment metrics (i.e. balanced accuracy scores, precision scores, recall scores, confusion matrices, classification reports). The Summary section summarizes these results and the resulting model recommendation. Jupyter Notebooks and data can be found in the Resources section.

Analysis

Logistic Regression Model with Resampling

Naive Random Oversampling

  • The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
  • The recall scores for both the high-risk and low-risk outcomes were moderately high.
  • Overall, the model had a balanced accuracy score of 0.6620175698580149
Table 1. Naive Random Oversampling Precision and Recall Scores for Target Outcomes
Outcome Precision Score Recall Score
high-risk 0.01 0.72
low-risk 1.00 0.60
Fig 1. Naive Random Oversampling Model Assessment

Results of Logistic Regression and Naive Random Oversampling

SMOTE Oversampling

  • The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
  • The recall scores for both the high-risk and low-risk outcomes were moderately high. The high-risk recall score dropped relative to the naive random oversampling method, but the low-risk score increased.
  • Overall, the model had a balanced accuracy score of 0.6568196079430206
Table 2. SMOTE Oversampling Precision and Recall Scores for Target Outcomes
Outcome Precision Score Recall Score
high-risk 0.01 0.61
low-risk 1.00 0.70
Fig 2. SMOTE Oversampling Model Assessment

Results of Logistic Regression and SMOTE Oversampling

Cluster Centroid Undersampling

  • The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
  • The recall scores for both the high-risk and low-risk outcomes were reduced relative to the prior undersampling methods.
  • Overall, the model had a balanced accuracy score of 0.6027679241263696
Table 3. Cluster Centroid Undersampling Precision and Recall Scores for Target Outcomes
Outcome Precision Score Recall Score
high-risk 0.01 0.61
low-risk 1.00 0.59
Fig 3. Cluster Centroid Undersampling Model Assessment

Results of Logistic Regression and Cluster Centroid Undersampling

SMOTEENN Combination Sampling

  • The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
  • The recall scores for both the high-risk and low-risk outcomes were improved relative to the undersampling method, but not relative to the oversampling methods.
  • Overall, the model had a balanced accuracy score of 0.639214728301642
Table 4. SMOTEENN Combination Sampling Precision and Recall Scores for Target Outcomes
Outcome Precision Score Recall Score
high-risk 0.03 0.69
low-risk 1.00 0.59
Fig 4. SMOTEENN Combination Sampling Model Assessment

Results of Logistic Regression and SMOTEENN Combination Resampling

Ensemble Models

Balanced Random Forest Classification

  • The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
  • The recall scores for both the high-risk and low-risk outcomes were improved relative to the resampling methods.
  • Overall, the model had a balanced accuracy score of 0.7887512850910909
Table 5. Random Forest Model Precision and Recall Scores for Target Outcomes
Outcome Precision Score Recall Score
high-risk 0.03 0.70
low-risk 1.00 0.87
Fig 5. Random Forest Model Assessment

Results of Random Forest Classification

Table 6. Top 10 Importance Features According to RF Model
Feature Importance
total_rec_prncp 0.07876809003486353
total_pymnt 0.05883806887524815
total_pymnt_inv 0.05625613759225244
total_rec_int 0.05355513093134745
last_pymnt_amnt 0.0500331813446525
int_rate 0.02966959508700077
issue_d_Jan-2019 0.021129125328012987
installment 0.01980242888931366
dti 0.01747062730041245
out_prncp_inv 0.016858293184471483

Easy Ensemble with AdaBoost

  • The model had high precision in regards to low-risk outcomes, but low precision in regards to high-risk outcomes.
  • The recall scores for both the high-risk and low-risk outcomes were improved relative to all other models.
  • Overall, the model had a balanced accuracy score of 0.931601605553446
Table 7. AdaBoost Model Precision and Recall Scores for Target Outcomes
Outcome Precision Score Recall Score
high-risk 0.09 0.92
low-risk 1.00 0.94
Fig 6. AdaBoost Model Assessment

Results of Easy Ensemble Classification with AdaBoosting

Summary

  • Of the 4 resampling machine learning models, it's challenging to identify one model that clearly outperformed the others. Some precision scores were comparable across the board, but more variance was seen in the recall scores. Some had high overall recall scores compared to others, but then their individual high-risk and low-risk scores would be below other candidate models. Looking at the imbalanced accuracy scores (a measure of overall precision/recall score tradeoffs), the naive random oversampling model performed the best of its cohort.
  • Of the 2 ensemble models, the AdaBoosted easy ensemble model outperformed the random forest model across all metrics. However, both ensemble models outperformed the resampling models.
  • Across all 6 machine learning models, the AdaBoosted easy ensemble by and far performed best and should be the one selected for use. However, it's important to note that, while the AdaBoosted model outperformed other candidate models across all metrics, the precision score for high-risk loan candidates was still only 0.09. If the primary purpose of the modelling effort was to identify high-risk applicants, alternative machine learning models should be explored.

Resources