## Logistic Regression Model for Loan Approval

##### This LR model uses inputs: Assets, Liabilities, Income, Credit Score and Mortgage to predict approval of a loan application
##### Status is the output y variable (Approve, Deny)
##### Input variables are normalized on a scale of 0 to 1

##### Import Libraries and Dependencies

In [1]:
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report

##### Import and read datafile

In [2]:
loans_df = pd.read_csv(Path('loans.csv'))
loans_df.head()

Unnamed: 0,assets,liabilities,income,credit_score,mortgage,status
0,0.210859,0.452865,0.281367,0.628039,0.302682,deny
1,0.395018,0.661153,0.330622,0.638439,0.502831,approve
2,0.291186,0.593432,0.438436,0.434863,0.315574,approve
3,0.45864,0.576156,0.744167,0.291324,0.394891,approve
4,0.46347,0.292414,0.489887,0.811384,0.566605,approve


In [3]:
loans_df.tail()

Unnamed: 0,assets,liabilities,income,credit_score,mortgage,status
95,0.360945,0.823295,0.542451,0.224285,0.328504,approve
96,0.11442,0.107174,0.619564,0.3703,0.047719,deny
97,0.309276,0.692433,0.48373,0.328953,0.304493,approve
98,0.549153,0.301588,0.651869,0.717826,0.602004,approve
99,0.448187,0.217651,0.38867,0.968609,0.606231,approve


In [4]:
#loans_df.loc[(loans_df.status == 'approve'), 'status'] = 1
#loans_df.loc[(loans_df.status == 'deny'), 'status'] = 0

##### Define X (independent) and y (dependent) variables

In [5]:
# X input variables
X = loans_df.drop(columns=['status'])
X.head()

Unnamed: 0,assets,liabilities,income,credit_score,mortgage
0,0.210859,0.452865,0.281367,0.628039,0.302682
1,0.395018,0.661153,0.330622,0.638439,0.502831
2,0.291186,0.593432,0.438436,0.434863,0.315574
3,0.45864,0.576156,0.744167,0.291324,0.394891
4,0.46347,0.292414,0.489887,0.811384,0.566605


In [6]:
# y dependent variable
y = loans_df['status']
y.head()

0       deny
1    approve
2    approve
3    approve
4    approve
Name: status, dtype: object

##### Apply train_test_split to split the dataset into training and testing data

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1, stratify=y)

In [8]:
print(X_train.shape)
print(X_test.shape)

(75, 5)
(25, 5)


##### Initialize the Logistic Regression Classification Model

In [9]:
classifier = LogisticRegression(solver='lbfgs', random_state=1)
classifier

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=1, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

##### Fit (Train) the Model using Training Data

In [10]:
classifier.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=1, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

##### Score the Model

In [11]:
# Training
print(f'Training Score: {classifier.score(X_train, y_train)}')

# Testing
print(f'Testing Score: {classifier.score(X_test, y_test)}')

Training Score: 0.5466666666666666
Testing Score: 0.52


##### Predict the outcome (Y variable) and compare to actual outcome

In [18]:
predicted_outcome = classifier.predict(X_test)
predicted_outcome

array(['deny', 'deny', 'deny', 'approve', 'deny', 'deny', 'deny',
       'approve', 'deny', 'approve', 'deny', 'deny', 'deny', 'deny',
       'deny', 'approve', 'deny', 'approve', 'deny', 'deny', 'deny',
       'approve', 'deny', 'deny', 'deny'], dtype=object)

In [19]:
y_test.head()

76       deny
1     approve
8        deny
42       deny
16       deny
Name: status, dtype: object

In [22]:
# Convert the predicted_outcome list to a pandas dataframe
actual_v_predicted = pd.DataFrame({'Predicted Decision': predicted_outcome, 'Actual Decision': y_test}).reset_index(drop=True)
actual_v_predicted.head()

Unnamed: 0,Predicted Decision,Actual Decision
0,deny,deny
1,deny,approve
2,deny,deny
3,approve,deny
4,deny,deny


### Model Performance

##### Confusion Matrix: True Positives (TP), True Negative (TN), False Positive (FP) and False Negative (FN)

In [24]:
confusion_matrix(y_test, predicted_outcome)

array([[ 3,  9],
       [ 3, 10]], dtype=int64)

##### Classification Report

In [26]:
print(classification_report(y_test, predicted_outcome))

              precision    recall  f1-score   support

     approve       0.50      0.25      0.33        12
        deny       0.53      0.77      0.62        13

    accuracy                           0.52        25
   macro avg       0.51      0.51      0.48        25
weighted avg       0.51      0.52      0.48        25

