# Loan Tap 

### Life Cycle of Project

   1. Understanding the business problem
   2. Data Collection
   3. Data Checks to perform 
   4. Exploratory Data Analysis
   5. Data Pre-Processing 
   6. Model Training 
   5. Choose Model
    

# Context 
LoanTap is an online platform committed to delivering customized loan products to millennials. They innovate in an otherwise dull loan segment, to deliver instant, flexible loans on consumer friendly terms to salaried professionals and businessmen.

The data science team at LoanTap is building an underwriting layer to determine the creditworthiness of MSMEs as well as individuals.



LoanTap deploys formal credit to salaried individuals and businesses 4 main financial instruments:

1. Personal Loan
2. EMI Free Loan
3. Personal Overdraft
4. Advance Salary Loan 


**This case study will focus on the underwriting process behind Personal Loan only**

# Problem Statement:

### Given a set of attributes for an Individual, determine if a credit line should be extended to them. If so, what should the repayment terms be in business recommendations?

1. loan_amnt : The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
2. term : The number of payments on the loan. Values are in months and can be either 36 or 60.
3. int_rate : Interest Rate on the loan
4. installment : The monthly payment owed by the borrower if the loan originates.
5. grade : LoanTap assigned loan grade
6. sub_grade : LoanTap assigned loan subgrade
7. emp_title :The job title supplied by the Borrower when applying for the loan.*
8. emp_length : Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
9. home_ownership : The home ownership status provided by the borrower during registration or obtained from the credit report.
10. annual_inc : The self-reported annual income provided by the borrower during registration.
11. verification_status : Indicates if income was verified by LoanTap, not verified, or if the income source was verified
12. issue_d : The month which the loan was funded
13. loan_status : Current status of the loan - Target Variable
14. purpose : A category provided by the borrower for the loan request.
15. title : The loan title provided by the borrower
16. dti : A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LoanTap loan, divided by the borrower’s self-reported monthly income.
17. earliest_cr_line :The month the borrower's earliest reported credit line was opened
18. open_acc : The number of open credit lines in the borrower's credit file.
19. pub_rec : Number of derogatory public records
20. revol_bal : Total credit revolving balance
21. revol_util : Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
22. total_acc : The total number of credit lines currently in the borrower's credit file
23. initial_list_status : The initial listing status of the loan. Possible values are – W, F
24. application_type : Indicates whether the loan is an individual application or a joint application with two co-borrowers
25. mort_acc : Number of mortgage accounts.
26. pub_rec_bankruptcies : Number of public record bankruptcies
27. Address: Address of the individual

#### Import Data and Required Libraries.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from scipy import stats 

In [2]:
print(os.getcwd())

/Users/sudhirmalik/Documents/Scaler class notes/MLOPS/Loan_Tap_MLOPS/notebook


In [5]:
# Read the dataset from a CSV file
df = pd.read_csv('data/loan_data.csv')

# Display the first few rows of the dataset
df.head() 


Unnamed: 0,loan_amnt,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,...,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,application_type,mort_acc,pub_rec_bankruptcies,address
0,10000.0,36 months,11.44,329.48,B,B4,Marketing,10+ years,RENT,117000.0,...,16.0,0.0,36369.0,41.8,25.0,w,INDIVIDUAL,0.0,0.0,"0174 Michelle Gateway\r\nMendozaberg, OK 22690"
1,8000.0,36 months,11.99,265.68,B,B5,Credit analyst,4 years,MORTGAGE,65000.0,...,17.0,0.0,20131.0,53.3,27.0,f,INDIVIDUAL,3.0,0.0,"1076 Carney Fort Apt. 347\r\nLoganmouth, SD 05113"
2,15600.0,36 months,10.49,506.97,B,B3,Statistician,< 1 year,RENT,43057.0,...,13.0,0.0,11987.0,92.2,26.0,f,INDIVIDUAL,0.0,0.0,"87025 Mark Dale Apt. 269\r\nNew Sabrina, WV 05113"
3,7200.0,36 months,6.49,220.65,A,A2,Client Advocate,6 years,RENT,54000.0,...,6.0,0.0,5472.0,21.5,13.0,f,INDIVIDUAL,0.0,0.0,"823 Reid Ford\r\nDelacruzside, MA 00813"
4,24375.0,60 months,17.27,609.33,C,C5,Destiny Management Inc.,9 years,MORTGAGE,55000.0,...,13.0,0.0,24584.0,69.8,43.0,f,INDIVIDUAL,1.0,0.0,"679 Luna Roads\r\nGreggshire, VA 11650"


In [19]:
def star() :
    print(80*'*')
    

In [10]:
star()
print("Shape of the dataset")
print(df.shape) 

star()
print("Data types of the dataset")
print(df.dtypes)

star() 
print("Null values present in the dataset")
print(df.isna().sum()) 

star()
print("Normalize form of null values")
print((df.isnull().sum()/len(df)) * 100) 

star()
print("Duplicated values in the Dataset") 
print(df.duplicated().sum())

star()
print("Information about the dataset")
print(df.info())

star()
print("Description about the dataset")
print(df.describe()) 

********************************************************************************
Shape of the dataset
(396030, 27)
********************************************************************************
Data types of the dataset
loan_amnt               float64
term                     object
int_rate                float64
installment             float64
grade                    object
sub_grade                object
emp_title                object
emp_length               object
home_ownership           object
annual_inc              float64
verification_status      object
issue_d                  object
loan_status              object
purpose                  object
title                    object
dti                     float64
earliest_cr_line         object
open_acc                float64
pub_rec                 float64
revol_bal               float64
revol_util              float64
total_acc               float64
initial_list_status      object
application_type         object
mort_acc  

In [9]:
df.describe(include='all')

Unnamed: 0,loan_amnt,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,...,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,application_type,mort_acc,pub_rec_bankruptcies,address
count,396030.0,396030,396030.0,396030.0,396030,396030,373103,377729,396030,396030.0,...,396030.0,396030.0,396030.0,395754.0,396030.0,396030,396030,358235.0,395495.0,396030
unique,,2,,,7,35,173105,11,6,,...,,,,,,2,3,,,393700
top,,36 months,,,B,B3,Teacher,10+ years,MORTGAGE,,...,,,,,,f,INDIVIDUAL,,,USCGC Smith\r\nFPO AE 70466
freq,,302005,,,116018,26655,4389,126041,198348,,...,,,,,,238066,395319,,,8
mean,14113.888089,,13.6394,431.849698,,,,,,74203.18,...,11.311153,0.178191,15844.54,53.791749,25.414744,,,1.813991,0.121648,
std,8357.441341,,4.472157,250.72779,,,,,,61637.62,...,5.137649,0.530671,20591.84,24.452193,11.886991,,,2.14793,0.356174,
min,500.0,,5.32,16.08,,,,,,0.0,...,0.0,0.0,0.0,0.0,2.0,,,0.0,0.0,
25%,8000.0,,10.49,250.33,,,,,,45000.0,...,8.0,0.0,6025.0,35.8,17.0,,,0.0,0.0,
50%,12000.0,,13.33,375.43,,,,,,64000.0,...,10.0,0.0,11181.0,54.8,24.0,,,1.0,0.0,
75%,20000.0,,16.49,567.3,,,,,,90000.0,...,14.0,0.0,19620.0,72.9,32.0,,,3.0,0.0,


In [12]:
df.nunique()

loan_amnt                 1397
term                         2
int_rate                   566
installment              55706
grade                        7
sub_grade                   35
emp_title               173105
emp_length                  11
home_ownership               6
annual_inc               27197
verification_status          3
issue_d                    115
loan_status                  2
purpose                     14
title                    48816
dti                       4262
earliest_cr_line           684
open_acc                    61
pub_rec                     20
revol_bal                55622
revol_util                1226
total_acc                  118
initial_list_status          2
application_type             3
mort_acc                    33
pub_rec_bankruptcies         9
address                 393700
dtype: int64

In [15]:
# Defining numeric and categorical features 
numeric_features = [feature for feature in df.columns if df[feature].dtypes != 'O'] 
categorical_features = [feature for feature in df.columns if df[feature].dtypes == 'O']
 
print(f"Numeric Features: {numeric_features}")
print(f"Categorical Features: {categorical_features}") 


Numeric Features: ['loan_amnt', 'int_rate', 'installment', 'annual_inc', 'dti', 'open_acc', 'pub_rec', 'revol_bal', 'revol_util', 'total_acc', 'mort_acc', 'pub_rec_bankruptcies']
Categorical Features: ['term', 'grade', 'sub_grade', 'emp_title', 'emp_length', 'home_ownership', 'verification_status', 'issue_d', 'loan_status', 'purpose', 'title', 'earliest_cr_line', 'initial_list_status', 'application_type', 'address']


In [23]:
# Checking the unique values in each categorical feature 
for feature in categorical_features:
    print(f"{feature} has :{df[feature].unique()}") 
    print(f"{feature} has :{len(df[feature].unique())} unique values")
    star()
    print()



term has :[' 36 months' ' 60 months']
term has :2 unique values
********************************************************************************

grade has :['B' 'A' 'C' 'E' 'D' 'F' 'G']
grade has :7 unique values
********************************************************************************

sub_grade has :['B4' 'B5' 'B3' 'A2' 'C5' 'C3' 'A1' 'B2' 'C1' 'A5' 'E4' 'A4' 'A3' 'D1'
 'C2' 'B1' 'D3' 'D5' 'D2' 'E1' 'E2' 'E5' 'F4' 'E3' 'D4' 'G1' 'F5' 'G2'
 'C4' 'F1' 'F3' 'G5' 'G4' 'F2' 'G3']
sub_grade has :35 unique values
********************************************************************************

emp_title has :['Marketing' 'Credit analyst ' 'Statistician' ...
 "Michael's Arts & Crafts" 'licensed bankere' 'Gracon Services, Inc']
emp_title has :173106 unique values
********************************************************************************

emp_length has :['10+ years' '4 years' '< 1 year' '6 years' '9 years' '2 years' '3 years'
 '8 years' '7 years' '5 years' '1 year' nan]
emp_leng

In [27]:
pip install --upgrade pip setuptools

Note: you may need to restart the kernel to use updated packages.


In [None]:
!pip install catboost