# SBA Loan: Using the Model to Make Decisions
This notebook follows the structure of the article [*“Should This Loan be Approved or Denied?”*](https://doi.org/10.1080/10691898.2018.1434342) and showcases **prediction ability of the logit regression model built in the project**.

In [23]:
# Imports
import pandas as pd
import numpy as np
import statsmodels.api as sm

from Utils.prep import Preprocessor as Prep

In [24]:
# LOAD DATA TO PREDICT ON
df_loans = pd.read_csv('Data/SBA_loan_applications.csv')
df_loans

Unnamed: 0,Loan,Name,City,Date,Loan amount requested,SBA portion guaranteed,Secured by real estate?
0,1,Carmichael Realty,"Carmichael, CA",Current (not recession),"$1,000,000","$750,000",Yes
1,2,SV Consulting,"San Leandro, CA",Current (not recession),"$100,000","$40,000",No


### Preparing Data for Model
- Rename `Secured by real estate?` as `RealEstate` and map its values to binary. Same with `Date` -> `Recession`.
- Get `Portion` from `SBA portion guaranteed` / `Loan amount requested`.

In [25]:
# PREPARING DATA FOR MODEL
df_loans_prep = Prep.unsign(df_loans, ['Loan amount requested', 'SBA portion guaranteed'])
df_loans_prep['RealEstate'] = df_loans['Secured by real estate?'].map({'Yes':1, 'No':0})
df_loans_prep['Portion'] = (
    df_loans_prep['SBA portion guaranteed'] / df_loans_prep['Loan amount requested']
)
df_loans_prep['Recession'] = 0
df_loans_input = df_loans_prep[['Loan', 'Name', 'RealEstate', 'Portion', 'Recession']]
df_loans_input

Unnamed: 0,Loan,Name,RealEstate,Portion,Recession
0,1,Carmichael Realty,1,0.75,0
1,2,SV Consulting,0,0.4,0


### Building the Model
Rebuilding the final model from `SBA Loan: Building and Validating the Model` `(SBA-Model.ipynb)`.

In [26]:
# MODEL REBUILD
df_sba_case = pd.read_csv('Data/SBAcase.csv')
predictors = ['RealEstate', 'Portion', 'Recession']
df_train = df_sba_case[df_sba_case['Selected'] == 1].copy()

X_train = df_train[predictors]
y_train = df_train['Default']
X_train = sm.add_constant(X_train)

sm_logreg = sm.Logit(y_train, X_train).fit()
sm_logreg.summary()

Optimization terminated successfully.
         Current function value: 0.515108
         Iterations 7


0,1,2,3
Dep. Variable:,Default,No. Observations:,1051.0
Model:,Logit,Df Residuals:,1047.0
Method:,MLE,Df Model:,3.0
Date:,"Thu, 28 Aug 2025",Pseudo R-squ.:,0.1732
Time:,06:13:07,Log-Likelihood:,-541.38
converged:,True,LL-Null:,-654.77
Covariance Type:,nonrobust,LLR p-value:,6.874e-49

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,1.3931,0.322,4.332,0.000,0.763,2.023
RealEstate,-2.1282,0.345,-6.169,0.000,-2.804,-1.452
Portion,-2.9875,0.539,-5.540,0.000,-4.044,-1.931
Recession,0.5041,0.241,2.090,0.037,0.031,0.977


### Prediction
- Predict on the prepared data.
- Prediction returns values from 0 to 1 into the new column `Estimated probability of default`.
- New column `Approve?` = `Estimated probability of default` smaller than or equal to 0.5 **-> No**, larger than or equal to 0.5 **-> Yes**.

In [14]:
X = sm.add_constant(df_loans_input[['RealEstate', 'Portion', 'Recession']])
df_loans['Estimated probability of default'] = sm_logreg.predict(X)
df_loans['Approve?'] = df_loans['Estimated probability of default'] <= 0.5

df_loans['Estimated probability of default'] = df_loans['Estimated probability of default'].apply(lambda x: format(x, '.2f'))
df_loans['Approve?'] = df_loans['Approve?'].map({True: 'Yes', False: 'No'})
df_loans

Unnamed: 0,Loan,Name,City,Date,Loan amount requested,SBA portion guaranteed,Secured by real estate?,Estimated probability of default,Approve?
0,1,Carmichael Realty,"Carmichael, CA",Current (not recession),"$1,000,000","$750,000",Yes,0.05,Yes
1,2,SV Consulting,"San Leandro, CA",Current (not recession),"$100,000","$40,000",No,0.55,No


### Result
    Based on the predictions from the logistic regression model, the classification was accurate and interpretable.
    The model successfully identified low and high-risk loans using meaningful features.