Credit_Risk_Analysis

Overview

The purpose of this analysis is to create a supervised machine learning model that could accurately predict credit risk. Six different methods were utilized.

1. Naive Random Oversampling
1. SMOTE Oversampling
1. Cluster Centroid Undersampling
1. SMOTEENN Sampling
1. Balanced Random Forest Classifying
1. Easy Ensemble Classifying

Through each of these methods, the data was split into training and testing datasets. Accuracy scores, confusion matrices and classification reports were compiled in summary form.

Deliverable 1

Use Resampling Models to Predict Credit Risk

Random Oversampling

Accuracy Score: 67.4%

SMOTE Oversampling

Accuracy Score: 68.2%

Deliverable 2

Use the SMOTEENN algorithm to Predict Credit Risk

Undersampling Analysis

Accuracy Score: 52.2%

SMOTEENN Analysis

Accuracy Score: 68.1%

Deliverable 3

Use Ensemble Classifiers to Predict Credit Risk

Random Forest Analysis

Accuracy Score: 64.8%

Easy Ensemble Analysis

Accuracy Score: 92.3%

Summary

This analysis tries to find the best model that can detect if a loan is high-risk. Because of that, we need to find a model that lets the least amount of high-risk loans pass through undetected. That correlating statistic for this is the recall rate for high-risk. Looking through the different models, the ones that scored the highest were:

Easy Ensemble Analysis (91%)
SMOTEENN Analysis (76%)
SMOTE Oversampling (70%)

While this is the most important statistic that is pulled from this analysis, another important statistic is recall rate for low-risk as it shows how many low-risk loans are flagged as high-risk. Looking through the different models, the ones that scored the highest were:

Random Forest Analysis (100%)
Easy Ensemble Analysis (94%)

However, one looks at the accuracy score to get a picture of how well the model performs in general. The models with the highest accuracy scores were:

Easy Ensemble Analysis (92.3%)
SMOTE Oversampling (68.2%)
SMOTEENN Analysis (68.1%)

Therefore the model that is recommended by factoring multiple metrics is the Easy Ensemble Analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Resources		Resources
.gitignore		.gitignore
Module-17-Challenge-Resources.zip		Module-17-Challenge-Resources.zip
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources

Resources

.gitignore

.gitignore

Module-17-Challenge-Resources.zip

Module-17-Challenge-Resources.zip

README.md

README.md

credit_risk_ensemble.ipynb

credit_risk_ensemble.ipynb

credit_risk_resampling.ipynb

credit_risk_resampling.ipynb

Repository files navigation

Credit_Risk_Analysis

Overview

Deliverable 1

Use Resampling Models to Predict Credit Risk

Random Oversampling

SMOTE Oversampling

Deliverable 2

Use the SMOTEENN algorithm to Predict Credit Risk

Undersampling Analysis

SMOTEENN Analysis

Deliverable 3

Use Ensemble Classifiers to Predict Credit Risk

Random Forest Analysis

Easy Ensemble Analysis

Summary

About

Releases

Packages

Languages

jpmendeziii/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Overview

Deliverable 1

Use Resampling Models to Predict Credit Risk

Random Oversampling

SMOTE Oversampling

Deliverable 2

Use the SMOTEENN algorithm to Predict Credit Risk

Undersampling Analysis

SMOTEENN Analysis

Deliverable 3

Use Ensemble Classifiers to Predict Credit Risk

Random Forest Analysis

Easy Ensemble Analysis

Summary

About

Resources

Stars

Watchers

Forks

Languages