Credit Score Prediction - Home Credit Indonesia

Home Credit Indonesia is currently using various statistical methods and Machine Learning to make credit score predictions. Now, we ask you to unlock the maximum potential of our data. By doing so, we can ensure that:

Customers who are capable of repayment are not rejected when applying for a loan.
Loans can be given with a principal, maturity, and repayment calendar that will motivate customers to succeed.

Evaluation will be done by checking how deep your understanding of the analysis is. Note that you need to use at least Logistic Regression to construct your machine learning models. After that, create a presentation slide containing end-to-end modeling analysis results along with business recommendations (maximum 10 pages).

Dataset Source

Home Credit Indonesia's internal database

Dataset Files

application_test.csv
application_train.csv
bureau.csv
bureau_balance.csv
credit_card_balance.csv
HomeCredit_columns_description.csv
installments_payments.csv
POS_CASH_balance.csv
previous_application.csv
sample_submission.csv

Target Variable Description:

Target variable = 0 → Rejected for a loan → Defaulter
Target variable = 1 → Accepted for a loan → Non-Defaulter

Tools

Programming language: Python.
Data Tool: Jupyter Notebook.
Reporting Tool: Microsoft PowerPoint.

The Project Workflow

Problem Formulation
Data Collecting
Data Understanding
Data preprocessing
Exploratory Data Analysis (EDA) and Data Visualization
Model Selection and Building
Scorecard Development

Results

Conclusions

The application_train.csv (307511 rows and 122 columns) and application_test.csv (48744 rows and 121 columns) contain numerous missing values and outliers, which have been handled using the WOE binning technique.
No duplicate values are present in the dataset.
Additional features, including age, annuity income ratio, and loan duration, have been created.
The target variables consist of 91.9% non-defaulters (accepted) and 8.1% defaulters (rejected).
Feature selection has been performed using Weight of Evidence (WOE) and Information Value (IV).
Logistic regression was employed in a machine learning model, yielding the following metrics: threshold ≈ 0.23, accuracy ≈ 0.90, precision ≈ 0.93, recall ≈ 0.96, F1 ≈ 0.94, AUROC ≈ 0.74, Gini ≈ 0.48, and AUCPR ≈ 0.97. These metrics exhibit strong performance in credit risk modeling.
Consequently, the company is expected to save around 30,000,000,000 IDR while incurring a loss of approximately 100,000,000 IDR.
The high or low percentages of True Positive/Negative and False Positive/Negative depend on the metrics of the machine learning model mentioned above.
Furthermore, we can enhance them further by incorporating features with higher information value (IV). Several CSV files encompassing such features possess significant IV potential, yet I was unable to merge them into the application_train.csv and application_test.csv datasets. These files comprise:
- bureau.csv
- bureau_balance.csv
- credit_card_balance.csv
- installments_payments.csv
- POS_CASH_balance.csv
- previous_application.csv
that contain features with higher potential IV but couldn't be merged into application_train.csv and application_test.csv. This limitation is due to the current laptop (4GB RAM) experiencing crashes when attempting to merge these files.
It is hoped that in the future, a more advanced laptop/computer can be acquired to successfully merge these files.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LICENSE		LICENSE
README.md		README.md
credit-score-prediction-report.pdf		credit-score-prediction-report.pdf
credit-score-prediction-using-a-scorecard.ipynb		credit-score-prediction-using-a-scorecard.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Score Prediction - Home Credit Indonesia

Dataset Source

Dataset Files

Tools

The Project Workflow

Results

Conclusions

About

Releases

Packages

Languages

License

jihadakbr/credit-score-prediction

Folders and files

Latest commit

History

Repository files navigation

Credit Score Prediction - Home Credit Indonesia

Dataset Source

Dataset Files

Tools

The Project Workflow

Results

Conclusions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages