Credit Risk Modeling - ID/X Partners

As the final task of your internship as a Data Scientist at ID/X Partners, this time you will be involved in a project for a lending company. You will collaborate with various other departments in this project to provide technological solutions for the company. You are asked to build a model that can predict credit risk using a dataset provided by the company, which consists of loan data that has been accepted and rejected. Additionally, you also need to prepare visual media to present the solution to the client. Make sure the visual media you create is clear, easy to read, and communicative. You can carry out this end-to-end solution development in your preferred programming language while adhering to the framework/methodology of Data Science.

Dataset Source

I will use a dataset from Kaggle that pertains to consumer loans granted from 2007 to 2014 by Lending Club, which is a peer-to-peer lending platform based in the United States.

Dataset Files

loan_data_2007_2014.csv
LCDataDictionary.xlsx

Target Variable Description:

Target variable = 0 → Rejected for a loan → Defaulter
Target variable = 1 → Accepted for a loan → Non-Defaulter

Tools

Programming language: Python.
Data Tool: Jupyter Notebook.
Reporting Tool: Microsoft PowerPoint.

The Project Workflow

credit-risk-modeling-using-a-scorecard.ipynb

Problem Formulation
Data Collecting
Data Understanding
Data preprocessing
Exploratory Data Analysis (EDA) and Data Visualization
Model Selection and Building
Scorecard Development

credit-risk-modeling-using-various-models.ipynb

Problem Formulation
Data Collecting
Data Understanding
Data preprocessing
Exploratory Data Analysis (EDA) and Data Visualization
Model Selection and Building

Results

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

Conclusions

credit-risk-modeling-using-a-scorecard.ipynb

The loan_data_2007_2014.csv file (containing 466,285 rows and 74 columns) contain numerous missing values and outliers, which have been handled using the WOE binning technique.
No duplicate values are present in the dataset.
The target variables consist of 89.1% non-defaulters (accepted) and 10.9% defaulters (rejected).
Feature selection has been performed using Weight of Evidence (WOE) and Information Value (IV).
Logistic regression was employed in a machine learning model, yielding the following metrics: threshold ≈ 0.22, accuracy ≈ 0.90, precision ≈ 0.93, recall ≈ 0.96, F1 ≈ 0.95, AUROC ≈ 0.84, Gini ≈ 0.67, and AUCPR ≈ 0.97. These metrics are very good for credit risk modeling.
Consequently, the company is expected to save around 1,000,000,000 USD while incurring a loss of approximately 9,000,000 USD.

credit-risk-modeling-using-various-models.ipynb

The loan_data_2007_2014.csv file (containing 466,285 rows and 74 columns) contains numerous missing values and outliers, which have been handled through data imputation methods, such as using the mean for numerical variables and the mode for categorical variables.
No duplicate values are present in the dataset.
The target variables consist of 89.9% non-defaulters (accepted) and 10.1% defaulters (rejected).
Feature selection has been performed using the Chi-Square Test, ANOVA, and Correlation Matrix.
Various machine learning models have been implemented on the data, such as logistic regression, ridge classifier, SGD classifier, passive-aggressive classifier, linear discriminant analysis, quadratic discriminant analysis, decision tree, extra tree, ada boost, Gaussian NB, and LGBM classifier.
The resulting model achieved a higher AUROC score of 0.99 in the LGBM Classifier. This model proceeded further and produced the following metrics: threshold ≈ 0.5, accuracy ≈ 0.95, precision ≈ 0.99, recall ≈ 0.95, F1 ≈ 0.97, AUROC ≈ 0.98, Gini ≈ 0.97, and AUCPR ≈ 0.99. These metrics are very good for credit risk modeling.
Consequently, the company is expected to save around 1,000,000,000 USD while incurring a loss of approximately 9,000,000 USD.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
credit-risk-modeling-report.pdf		credit-risk-modeling-report.pdf
credit-risk-modeling-using-a-scorecard.ipynb		credit-risk-modeling-using-a-scorecard.ipynb
credit-risk-modeling-using-various-models.ipynb		credit-risk-modeling-using-various-models.ipynb
utils.py		utils.py
utils2.py		utils2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Modeling - ID/X Partners

Dataset Source

Dataset Files

Tools

The Project Workflow

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

Results

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

Conclusions

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

About

Releases

Packages

Languages

License

jihadakbr/credit-risk-modeling

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Modeling - ID/X Partners

Dataset Source

Dataset Files

Tools

The Project Workflow

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

Results

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

Conclusions

credit-risk-modeling-using-a-scorecard.ipynb

credit-risk-modeling-using-various-models.ipynb

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages