## Predicting Loan Default Risk

In this project, the goal is to build a classification model that predicts whether a loan applicant is likely to default on a loan.  Different machine learning algorithms such as logistic regression, decision trees, and random forest to identify the best-performing model are employed. Additionally, feature engineering will play a critical role in improving model performance by handling missing values, scaling, and encoding categorical variables.

In [1]:
### Loading Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
### Data Ingestion
df=pd.read_csv("Loan_default.csv")

In [3]:
print(df.head(2))

       LoanID  Age  Income  LoanAmount  CreditScore  MonthsEmployed  \
0  I38PQUQS96   56   85994       50587          520              80   
1  HPSK72WA7R   69   50432      124440          458              15   

   NumCreditLines  InterestRate  LoanTerm  DTIRatio   Education  \
0               4         15.23        36      0.44  Bachelor's   
1               1          4.81        60      0.68    Master's   

  EmploymentType MaritalStatus HasMortgage HasDependents LoanPurpose  \
0      Full-time      Divorced         Yes           Yes       Other   
1      Full-time       Married          No            No       Other   

  HasCoSigner  Default  
0         Yes        0  
1         Yes        0  


In [4]:
df.shape

(255347, 18)

In [5]:
df.columns

Index(['LoanID', 'Age', 'Income', 'LoanAmount', 'CreditScore',
       'MonthsEmployed', 'NumCreditLines', 'InterestRate', 'LoanTerm',
       'DTIRatio', 'Education', 'EmploymentType', 'MaritalStatus',
       'HasMortgage', 'HasDependents', 'LoanPurpose', 'HasCoSigner',
       'Default'],
      dtype='object')

In [6]:
df.dtypes

LoanID             object
Age                 int64
Income              int64
LoanAmount          int64
CreditScore         int64
MonthsEmployed      int64
NumCreditLines      int64
InterestRate      float64
LoanTerm            int64
DTIRatio          float64
Education          object
EmploymentType     object
MaritalStatus      object
HasMortgage        object
HasDependents      object
LoanPurpose        object
HasCoSigner        object
Default             int64
dtype: object

In [7]:
# mapping education categorical data to numerics
education_mapping = {
    "High School": 1,
    "Bachelor's": 2,
    "Master's": 3,
    "PhD": 4
}

df['Education'] = df['Education'].map(education_mapping)

# mapping employmenType categorical data to numerics
employmentType_mapping = {
    "Unemployed": 1,
    "Self-employed": 2,
    "Part-time": 3,
    "Full-time": 4
}
df['EmploymentType']=df['EmploymentType'].map(employmentType_mapping)

maritalstatus_mapping={
    "Single":1,
    "Married":2,
    "Divorced":3
}

df['MaritalStatus']=df["MaritalStatus"].map(maritalstatus_mapping)

hasmortgage_mapping={
    "Yes":0,
    "No":1
}

df["HasMortgage"]=df["HasMortgage"].map(hasmortgage_mapping)

loanpurpose_mapping={
    "Auto":1,
    "Business":2,
    "Education":3,
    "Home":4,
    "Other":5
}

df["LoanPurpose"]=df["LoanPurpose"].map(loanpurpose_mapping)

hasdependent_mapping={
    "Yes": 1,
    "No": 0
}
df["HasDependents"]=df["HasDependents"].map(hasdependent_mapping)
hascosigner_mapping={
    "Yes": 1,
    "No": 0
}

df["HasCoSigner"]=df["HasCoSigner"].map(hascosigner_mapping)

In [14]:
df.to_csv("Transformed_Data.csv")