loan-default-prediction

Loan default prediction with Berka Dataset using XGboost model.

Goal

To provides mechanisms in determining which consumers should receive loans and to benefit banks in increasing profits.

Dataset

We use Berka dataset also known as PKDD'99 Financial dataset which contains 606 successful and 76 not successful loans along with their personal and transaction information. The data relationship is depicted in the diagram below.

Challenges

Imbalanced data (606 negative class, 76 possitive class)
Feature engineering (Creation, Extraction, Transformation)

Experiments

We used only information from before the loan was accepted (because our Goal is to make decision to issue the loan)
We tried several models, including LGBM, RandomForest, and XGboost (we used auto ML and discovered that these models are the best), and in the end, we used XGboost with feature selection (using feature important) and Grid-search to tune hyperparameters because it gives the best results.
We used SMOTE to handle imbalanced data
Profit is calculated using the formula profit = revenue - cost, where revenue is money earned by the bank from interest and cost is defaulted money. the more information is in profit_analysis.ipynb

How to run

First, preprocess the raw data run data_manipulation.ipynb. The results will be saved in transformed_data/final_transformed_data.csv
second, train model using model.ipynb. The results will be in report/report_xgb.csv which contains true label and probability of prediction in each account
then, we run profit_analysis.ipynb to create report/ori_profit.csv which is the original profit, and the final result report/report_xgb_threshold_profit.csv which is the profit after using this model in each threshold and each interest rate

Results

Performance results

Initial model performance

model	Acc	F1	ROC_AUC
LGBMClassifier	0.925	0.553	0.743
RandomForestClassifier	0.924	0.544	0.764
XGBClassifier	0.923	0.596	0.738

Performance after using best params, best feature, and SMOTE

model	Acc	F1	ROC_AUC
LGBMClassifier	0.919	0.572	0.731
RandomForestClassifier	0.912	0.616	0.791
XGBClassifier	0.927	0.645	0.784

Power BI Visualization

We created an interactive dashboard with Power BI to visualize the profit we've made.

Links

Power BI dashboard
Slide presentation

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
__pycache__		__pycache__
img		img
report		report
transformed_data		transformed_data
.gitignore		.gitignore
README.md		README.md
data_manipulation.ipynb		data_manipulation.ipynb
dataset.py		dataset.py
model.ipynb		model.ipynb
powerbi_loan_default.pbix		powerbi_loan_default.pbix
profit.py		profit.py
profit_analysis.ipynb		profit_analysis.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

loan-default-prediction

Goal

Dataset

Challenges

Experiments

How to run

Results

Performance results

Power BI Visualization

Links

About

Releases

Packages

Contributors 2

Languages

sorayutmild/loan-default-prediction

Folders and files

Latest commit

History

Repository files navigation

loan-default-prediction

Goal

Dataset

Challenges

Experiments

How to run

Results

Performance results

Power BI Visualization

Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages