# Case Study: Predicting Loan Default using Logistic Regression

### 1. Importing Necessary Libraries
In this section, we import all the libraries required for the task. We use `numpy` for numerical operations, `pandas` for data handling, `sklearn` for machine learning functionalities, and `matplotlib` for data visualization.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import matplotlib.pyplot as plt

### 2. Data Generation
Next, we generate synthetic data for the problem. We create features such as `Income`, `Age`, and `Credit Score`, as well as a binary target variable, `Loan Default`.

In [None]:
np.random.seed(42)
# Generate random data
n_samples = 1000
income = np.random.randint(30000, 150000, size=n_samples)
age = np.random.randint(18, 70, size=n_samples)
credit_score = np.random.randint(600, 800, size=n_samples)
loan_default = np.random.choice([0, 1], size=n_samples, p=[0.7, 0.3])  # 30% chance of default
# Create DataFrame
df = pd.DataFrame({
    'Income': income,
    'Age': age,
    'Credit Score': credit_score,
    'Loan Default': loan_default
})

### 3. Data Splitting
After generating the data, we split it into training and testing sets. This helps evaluate the model on unseen data after training it on a portion of the data.

In [None]:
X = df[['Income', 'Age', 'Credit Score']]
y = df['Loan Default']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 4. Data Standardization
Standardizing the data is crucial when using machine learning algorithms like Logistic Regression. This ensures that all features have a mean of 0 and a standard deviation of 1, which helps the model converge more efficiently.

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### 5. Model Initialization and Training
We initialize the Logistic Regression model and train it using the standardized data. Logistic Regression is a simple yet effective algorithm for binary classification tasks.

In [None]:
model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)

### 6. Model Evaluation
After training the model, we evaluate its performance using metrics like accuracy, precision, recall, and F1-score. We also print the confusion matrix to understand the true positives, false positives, etc.

In [None]:
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Display the results
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")
# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

### 7. Visualizing the Decision Boundary
Finally, we visualize the decision boundary of the trained model. This step helps us understand how the model is separating the two classes, `Loan Default` and `No Loan Default`, in the feature space.

In [None]:
plt.figure(figsize=(8,6))
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap=plt.cm.Paired, edgecolor='k', s=100)
plt.xlabel('Income (Standardized)')
plt.ylabel('Age (Standardized)')
plt.title('Decision Boundary for Loan Default Prediction')
plt.colorbar(label='Loan Default (0 or 1)')
plt.show()