## Case Study: Loan Approval Prediction using Decision Tree

This notebook demonstrates how to use a Decision Tree classifier to predict loan approvals based on features such as income, credit score, loan amount, employment status, and debt-to-income ratio.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report

## Step 1: Generate Synthetic Dataset

We create a dataset with features representing financial information and define a loan approval criterion.


In [None]:
# Set seed for reproducibility
np.random.seed(42)

# Generate synthetic dataset
data = {
    'income': np.random.randint(20000, 100000, 1000),
    'credit_score': np.random.randint(300, 850, 1000),
    'loan_amount': np.random.randint(5000, 50000, 1000),
    'employment_status': np.random.randint(0, 2, 1000),  # 0 = Unemployed, 1 = Employed
    'debt_to_income_ratio': np.random.uniform(0.1, 0.5, 1000),
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Define loan approval criteria (target variable)
df['loan_approved'] = np.where(
    (df['credit_score'] > 650) & (df['income'] > 40000) & (df['debt_to_income_ratio'] < 0.35),
    1, 0
)

# Display the first five rows
df.head()

## Step 2: Split Dataset

We split the dataset into training and testing sets for model evaluation.


In [None]:
# Split dataset into training and testing sets
X = df.drop(columns=['loan_approved'])
y = df['loan_approved']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display dataset shapes
X_train.shape, X_test.shape, y_train.shape, y_test.shape

## Step 3: Train Decision Tree Classifier

We train a Decision Tree model to learn patterns from the dataset.


In [None]:
# Train Decision Tree Classifier
clf = DecisionTreeClassifier(max_depth=4, random_state=42)
clf.fit(X_train, y_train)

## Step 4: Make Predictions

We use the trained model to predict loan approvals on the test set.


In [None]:
# Make predictions
y_pred = clf.predict(X_test)

## Step 5: Evaluate Model

We evaluate the model's performance using accuracy and a classification report.


In [None]:
# Evaluate Model Performance
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

## Step 6: Visualize Decision Tree

We visualize the trained Decision Tree to understand its decision-making process.


In [None]:
# Visualize Decision Tree
plt.figure(figsize=(15, 8))
plot_tree(clf, feature_names=X.columns, class_names=['Rejected', 'Approved'], filled=True)
plt.show()