**Training a Logistic Regression Model to Predict Loan Default Risk**

*This script loads borrower financial data, preprocesses features, and trains a logistic regression model to estimate the probability of default (PD). Model performance is evaluated using ROC AUC.*

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score

# Load and clean data
df = pd.read_csv(r"C:\Users\satya\Downloads\QR - JP Morgan\Task 3 and 4_Loan_Data.csv")
df.dropna(inplace=True)

# Standardize column names
df.columns = df.columns.str.strip().str.lower()

# Define features and target
features = ['credit_lines_outstanding', 'loan_amt_outstanding', 'total_debt_outstanding',
            'income', 'years_employed', 'fico_score']
target = 'default'

X = df[features]
y = df[target]

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate model
y_pred_proba = model.predict_proba(X_test)[:, 1]
print("ROC AUC Score:", roc_auc_score(y_test, y_pred_proba))

ROC AUC Score: 0.9999652110990509


**Estimating Expected Loss on a Loan Using Predicted Default Probability**

*This function uses the trained logistic regression model to calculate the expected loss on a loan, based on borrower attributes and a fixed recovery rate. It simulates real-time risk scoring for individual loan applications.*

In [None]:
def estimate_expected_loss(credit_lines_outstanding, loan_amt_outstanding, total_debt_outstanding,
                           income, years_employed, fico_score, recovery_rate=0.10):
    input_data = pd.DataFrame([{
        'credit_lines_outstanding': credit_lines_outstanding,
        'loan_amt_outstanding': loan_amt_outstanding,
        'total_debt_outstanding': total_debt_outstanding,
        'income': income,
        'years_employed': years_employed,
        'fico_score': fico_score
    }])
    input_scaled = scaler.transform(input_data)
    pd_estimate = model.predict_proba(input_scaled)[0][1]
    expected_loss = loan_amt_outstanding * pd_estimate * (1 - recovery_rate)
    return round(expected_loss, 2)

# Example usage: simulate expected loss for a sample borrower profile
loss = estimate_expected_loss(
    credit_lines_outstanding=3,
    loan_amt_outstanding=25000,
    total_debt_outstanding=40000,
    income=85000,
    years_employed=6,
    fico_score=690
)
print("Estimated Expected Loss:", loss)

Estimated Expected Loss: 22497.71
