# Assumptions (explicit & defensible)
Binary default target (1 = default, 0 = no default)
PD model trained using Logistic Regression
Recovery Rate = 10%
Loss Given Default (LGD) = 90%
Dataset likely synthetic → high AUC acknowledged
# Expected Loss=PD×Loan Amount×(1−Recovery Rate)

In [1]:
import pandas as pd

In [22]:
df = pd.read_csv("Task 3 and 4_Loan_Data.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   customer_id               10000 non-null  int64  
 1   credit_lines_outstanding  10000 non-null  int64  
 2   loan_amt_outstanding      10000 non-null  float64
 3   total_debt_outstanding    10000 non-null  float64
 4   income                    10000 non-null  float64
 5   years_employed            10000 non-null  int64  
 6   fico_score                10000 non-null  int64  
 7   default                   10000 non-null  int64  
dtypes: float64(3), int64(5)
memory usage: 625.1 KB


In [16]:
#prepare data for modelling, so separate inputs and targets
X = df.drop("default", axis=1)  # all inputs
y = df["default"]               # target

In [17]:
#train-test-split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [18]:
#train logistic regression
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1000


In [23]:
#function to predict expected loss
def predict_expected_loss(borrower_data, model, recovery_rate=0.10):
    """
    Predicts expected loss for a single borrower.
    
    Parameters:
    borrower_data : pandas DataFrame (1 row, same features as training data)
    model         : trained PD model
    recovery_rate : assumed recovery rate (default = 10%)
    
    Returns:
    expected_loss : float
    """
    
    # Step 1: Predict Probability of Default (PD)
    pd = model.predict_proba(borrower_data)[:, 1][0]
    
    # Step 2: Extract loan amount outstanding
    loan_amount = borrower_data["loan_amt_outstanding"].values[0]
    
    # Step 3: Expected Loss calculation
    lgd = 1 - recovery_rate
    expected_loss = pd * loan_amount * lgd
    
    return expected_loss


In [24]:
#example to test function
sample_borrower = X_test.iloc[[0]]

loss = predict_expected_loss(
    borrower_data=sample_borrower,model=model)
loss

np.float64(1.7634494162856563e-08)

In [32]:
#combined function to give probablity of default pd and expected loss
def predict_pd_and_expected_loss(borrower_data, model, recovery_rate=0.10):
    pd_value = float(model.predict_proba(borrower_data)[:, 1][0])
    loan_amount = float(borrower_data["loan_amt_outstanding"].values[0])
    
    expected_loss = pd_value * loan_amount * (1 - recovery_rate)
    
    return {
        "Probability_of_Default": round(pd_value, 2),
        "Expected_Loss": round(expected_loss, 2)
    }



In [38]:
#example to test function
sample_borrower =pd.DataFrame({
    "customer_id": [999001],
    "credit_lines_outstanding": [1],
    "loan_amt_outstanding": [200000],
    "total_debt_outstanding": [250000],
    "income": [1500000],
    "years_employed": [12],
    "fico_score": [780]
})
predict_pd_and_expected_loss(sample_borrower,model, recovery_rate=0.10)

{'Probability_of_Default': 0.0, 'Expected_Loss': 0.0}

In [37]:
medium_risk_borrower = pd.DataFrame({
    "customer_id": [999002],
    "credit_lines_outstanding": [3],
    "loan_amt_outstanding": [500000],
    "total_debt_outstanding": [900000],
    "income": [700000],
    "years_employed": [4],
    "fico_score": [680]
})
predict_pd_and_expected_loss(medium_risk_borrower ,model, recovery_rate=0.10)

{'Probability_of_Default': 1.0, 'Expected_Loss': 450000.0}

# Conclusion
A logistic regression model was trained to estimate the probability of default (PD) using borrower and loan characteristics. The predicted PD was then used to compute expected loss under an assumed recovery rate of 10%. The final function takes loan details as input and returns the expected loss, allowing the risk team to estimate portfolio-level losses and set aside appropriate capital buffers