Skip to content

lprtk/datamining-credit-risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Data Mining & Machine Learning for credit risk analysis

GitHub issues GitHub forks Github Stars Code style: black

Table of contents

Content

As an individual investor, we want to start investing on the Lending Club platform, a crowdfunding platform based on peer to peer lending, and build a financial portfolio. To do so, we want to build a credit model to help the selection decision of investment projects by classifying potential borrowers into two classes: clients who will default and clients who will fully repay their loan. Our goal is to use the data that the platform has been able to store, as well as the data provided by the borrowers, at the time of their loan application, to design a statistical machine learning model that allows us to maximize the probability that the borrower will actually repay the loan.

Generally, the risk of a loan is measured through a credit score assigned to the borrower according to his risk profile (i.e. FICO scores). We will thus try to model the solvency of a client, i.e. his capacity to repay his credit lines in due time. As you can see, the higher the credit rating of a customer, the higher his creditworthiness and, consequently, the lower the risk of default, and the more likely he is to benefit from a credit facility. The model will be based on data collected from 20,000 recent borrowers who were granted consumer credit through the platform's current loan underwriting process.

The future credit granting model will be built from predictive modeling tools under several constraints:

  • First, in strict compliance with the regulatory constraints that the bank must face.

  • Secondly, our work must be in line with the fundamental and necessary commitment to the Data professions: ethics and the general regulation on data protection.

  • Finally, the last constraint is that of interpretability. We are not necessarily looking for the most efficient or the most complex model. For predictive work on such a sensitive subject, we prefer a less powerful model, but one from which it is much simpler to explain to a customer why his credit application is rejected.

File details

  • codefile
    • This folder contains a .rmd files which contains the code.
  • data
    • This folder contains the data.

Here is the project pattern:

- project
    > datamining-credit-risk
        > data 
            - data_dictionary.xlsx
            - data_lending_club.csv
            - data_unsupervised.csv
            - train.csv
            - test.csv
        > codefile 
            - data_cleaning.rmd
            - data_visualization.rmd
            - unsupervised_learning.rmd
            - supervised_learning.rmd

Features

My profilMy GitHub