The ability to predict loan approval outcomes and assess financial risk is crucial for banks and financial institutions to make informed lending decisions. Creditworthiness and financial stability are key factors that influence loan approvals, and accurate predictions in these areas can reduce default rates, streamline lending processes, and enhance financial inclusion.
This project aims to explore several machine learning models that can predict the binary outcome of loan approval based on a range of demographic, financial, and historical data. If time permits, we will also work on predicting the risk score using regression.
The dataset used for this project is a synthetic dataset designed for risk assessment and loan approval modeling, sourced from Kaggle:
Financial Risk for Loan Approval Dataset
It contains 20,000 records with 36 attributes related to demographic information, credit history, income levels, existing debt, and financial stability. Key features include:
- Age
- Credit score
- Employment status
- Loan amount
- Debt-to-income ratio
- Previous loan defaults, and more.
This dataset provides a comprehensive foundation for building models that predict Loan Approval Status (a binary classification problem).
- data exploration: Ruoqi Yan, Kaushal Damania
- Data preprocessing: Naveen Reddy Dyava, Kaushal Damania
- Model Training: Naveen Reddy Dyava
- Model Evaluation: Naveen Reddy Dyava, Kaushal Damania
- Model Interpretation: Kaushal Damania
- Report: Naveen Reddy Dyava, Kaushal Damania
The following machine learning models are proposed for this project:
- Logistic
- CatBoost
- AdaBoost
- LightGBM
- MLP
To run the project please follow the following instructions:
# create a python environment
python3.10 -m venv .venv# Install the required packages
pip install -r requirements.txt# activate the environment
source .venv/bin/activateYou can now run any notebook in the project