In [None]:
"""
Domain 
    Banking/Loan

focus 
    Lower NPA (Non-Performing Asset)

Business challenge/requirement
    PeerLoanKart is an NBFC (Non-Banking Financial Company) which facilitates peerto-peer loans. 
    It connects people who need money (borrowers) with people who have money 
    (investors). As an investor, you would want to invest in people who showed a profile 
    of having a high probability of paying you back. 
    You as an ML expert create a model that will help predict whether a borrower will 
    pay the loan or not. 

Key issues
    Ensure NPAs are lower - meaning PeerLoanKart wants to be very diligent in giving 
    loans to a borrower

Considerations
    NONE

Data volume
    - Approx 9578 records - file loan_borowwer_data.csv 

Fields in Data
    • credit.policy: 1 if the customer meets the credit underwriting criteria of 
        PeerLoanKart, and 0 otherwise
    • purpose: The purpose of the loan (takes values "credit_card", 
        "debt_consolidation", "educational", "major_purchase", "small_business", and 
        "all_other")
    • int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be 
        stored as 0.11). Borrowers judged by PeerLoanKart to be riskier are assigned 
        higher interest rates
    • installment: The monthly installments owed by the borrower if the loan is 
        funded
    • log.annual.inc: The natural log of the self-reported annual income of the 
        borrower
    • dti: The debt-to-income ratio of the borrower (amount of debt divided by 
        annual income)
    • fico: The FICO credit score of the borrower
    • days.with.cr.line: The number of days the borrower has had a credit line
    • revol.bal: The borrower's revolving balance (amount unpaid at the end of the 
        credit card billing cycle)
    • revol.util: The borrower's revolving line utilization rate (the amount of the 
        credit line used relative to total credit available)
    • inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 
        months
    • delinq.2yrs: The number of times the borrower had been 30+ days past due on 
        a payment in the past 2 years
    • pub.rec: The borrower's number of derogatory public records (bankruptcy 
        filings, tax liens, or judgments)
    • not.fully.paid: This is the output field. Please note that 1 means the 
        borrower is not going to pay the loan completely

"""

In [11]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [2]:
CSV_PATH = r'D:\CourseWork\data-science-python-certification-course\Assignments\07 Supervised Learning - 1\Case Study III\resources\loan_borowwer_data.csv'
df = pd.read_csv(CSV_PATH)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9578 entries, 0 to 9577
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   credit.policy      9578 non-null   int64  
 1   purpose            9578 non-null   object 
 2   int.rate           9578 non-null   float64
 3   installment        9578 non-null   float64
 4   log.annual.inc     9578 non-null   float64
 5   dti                9578 non-null   float64
 6   fico               9578 non-null   int64  
 7   days.with.cr.line  9578 non-null   float64
 8   revol.bal          9578 non-null   int64  
 9   revol.util         9578 non-null   float64
 10  inq.last.6mths     9578 non-null   int64  
 11  delinq.2yrs        9578 non-null   int64  
 12  pub.rec            9578 non-null   int64  
 13  not.fully.paid     9578 non-null   int64  
dtypes: float64(6), int64(7), object(1)
memory usage: 1.0+ MB


In [4]:
df.head(10)

Unnamed: 0,credit.policy,purpose,int.rate,installment,log.annual.inc,dti,fico,days.with.cr.line,revol.bal,revol.util,inq.last.6mths,delinq.2yrs,pub.rec,not.fully.paid
0,1,debt_consolidation,0.1189,829.1,11.350407,19.48,737,5639.958333,28854,52.1,0,0,0,0
1,1,credit_card,0.1071,228.22,11.082143,14.29,707,2760.0,33623,76.7,0,0,0,0
2,1,debt_consolidation,0.1357,366.86,10.373491,11.63,682,4710.0,3511,25.6,1,0,0,0
3,1,debt_consolidation,0.1008,162.34,11.350407,8.1,712,2699.958333,33667,73.2,1,0,0,0
4,1,credit_card,0.1426,102.92,11.299732,14.97,667,4066.0,4740,39.5,0,1,0,0
5,1,credit_card,0.0788,125.13,11.904968,16.98,727,6120.041667,50807,51.0,0,0,0,0
6,1,debt_consolidation,0.1496,194.02,10.714418,4.0,667,3180.041667,3839,76.8,0,0,1,1
7,1,all_other,0.1114,131.22,11.0021,11.08,722,5116.0,24220,68.6,0,0,0,1
8,1,home_improvement,0.1134,87.19,11.407565,17.25,682,3989.0,69909,51.1,1,0,0,0
9,1,debt_consolidation,0.1221,84.12,10.203592,10.0,707,2730.041667,5630,23.0,1,0,0,0


In [6]:
# Encoding

le = LabelEncoder()
df['purpose'] = le.fit_transform(df['purpose'])
df.head(10)

Unnamed: 0,credit.policy,purpose,int.rate,installment,log.annual.inc,dti,fico,days.with.cr.line,revol.bal,revol.util,inq.last.6mths,delinq.2yrs,pub.rec,not.fully.paid
0,1,2,0.1189,829.1,11.350407,19.48,737,5639.958333,28854,52.1,0,0,0,0
1,1,1,0.1071,228.22,11.082143,14.29,707,2760.0,33623,76.7,0,0,0,0
2,1,2,0.1357,366.86,10.373491,11.63,682,4710.0,3511,25.6,1,0,0,0
3,1,2,0.1008,162.34,11.350407,8.1,712,2699.958333,33667,73.2,1,0,0,0
4,1,1,0.1426,102.92,11.299732,14.97,667,4066.0,4740,39.5,0,1,0,0
5,1,1,0.0788,125.13,11.904968,16.98,727,6120.041667,50807,51.0,0,0,0,0
6,1,2,0.1496,194.02,10.714418,4.0,667,3180.041667,3839,76.8,0,0,1,1
7,1,0,0.1114,131.22,11.0021,11.08,722,5116.0,24220,68.6,0,0,0,1
8,1,4,0.1134,87.19,11.407565,17.25,682,3989.0,69909,51.1,1,0,0,0
9,1,2,0.1221,84.12,10.203592,10.0,707,2730.041667,5630,23.0,1,0,0,0


In [9]:
# Features and Target Extraction
x = df.drop('not.fully.paid', axis=1)
y = df['not.fully.paid']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=157)

In [12]:
# Model Fitting and Accuracy
rfc = RandomForestClassifier()
rfc.fit(x_train, y_train)
predicted = rfc.predict(x_test)
accuracy_score(y_test, predicted)

0.8340292275574113