Data Mining & Machine Learning for credit risk analysis

Content

As an individual investor, we want to start investing on the Lending Club platform, a crowdfunding platform based on peer to peer lending, and build a financial portfolio. To do so, we want to build a credit model to help the selection decision of investment projects by classifying potential borrowers into two classes: clients who will default and clients who will fully repay their loan. Our goal is to use the data that the platform has been able to store, as well as the data provided by the borrowers, at the time of their loan application, to design a statistical machine learning model that allows us to maximize the probability that the borrower will actually repay the loan.

Generally, the risk of a loan is measured through a credit score assigned to the borrower according to his risk profile (i.e. FICO scores). We will thus try to model the solvency of a client, i.e. his capacity to repay his credit lines in due time. As you can see, the higher the credit rating of a customer, the higher his creditworthiness and, consequently, the lower the risk of default, and the more likely he is to benefit from a credit facility. The model will be based on data collected from 20,000 recent borrowers who were granted consumer credit through the platform's current loan underwriting process.

The future credit granting model will be built from predictive modeling tools under several constraints:

First, in strict compliance with the regulatory constraints that the bank must face.
Secondly, our work must be in line with the fundamental and necessary commitment to the Data professions: ethics and the general regulation on data protection.
Finally, the last constraint is that of interpretability. We are not necessarily looking for the most efficient or the most complex model. For predictive work on such a sensitive subject, we prefer a less powerful model, but one from which it is much simpler to explain to a customer why his credit application is rejected.

File details

codefile
- This folder contains a .rmd files which contains the code.
data
- This folder contains the data.

Here is the project pattern:

- project
    > datamining-credit-risk
        > data 
            - data_dictionary.xlsx
            - data_lending_club.csv
            - data_unsupervised.csv
            - train.csv
            - test.csv
        > codefile 
            - data_cleaning.rmd
            - data_visualization.rmd
            - unsupervised_learning.rmd
            - supervised_learning.rmd

Features

My profil • My GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
codefile		codefile
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining & Machine Learning for credit risk analysis

Table of contents

Content

File details

Features

About

Releases

Packages

lprtk/datamining-credit-risk

Folders and files

Latest commit

History

Repository files navigation

Data Mining & Machine Learning for credit risk analysis

Table of contents

Content

File details

Features

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages