A bank wants to automate their loan approval process. They have provided a semi-anonymized dataset containing 606 successful and 76 not successful loans along with their information and transactions. These loans are for existing clients. This could be used to pre-approve existing customers and market to them accordingly.
This analysis looks into a dataset from a Czech bank. The goal here is to produce a model that can predict whether client is high-risk for a bank loan.
Original with the data dictionary: link
Overview of the results: link
All notebooks are written in Python 3.
The libraries that are used in this notebook are listed under the requirements.txt
file. One can simply issue to following command to insure the proper libraries are installed:
pip3 install -r requirements.txt
All Ancillary tables were merged onto the loan data to be used as features. A Random Forest Model was used to classify good and bad loans. A SVM and Logisitic Regression Model were also created for comparison. Models were also created on just Small and Big Loans. Classifiers were then used to determine which of the remaining customer should be preapproved for loans. Shap Values were used to assess feature importance after modelling.