# Logistic Regression Classification

This notebook explores binary classification using logistic regression. We'll implement and analyze logistic regression for classification tasks, visualize decision boundaries, and understand how regularization applies to classification problems.

Topics covered:
1. Understanding the logistic regression model
2. Sigmoid function and classification probability
3. Cross-entropy loss function
4. Training and evaluating a logistic regression classifier
5. Visualizing decision boundaries
6. Impact of regularization on classification
7. Handling non-linearly separable data

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sys
import os
from sklearn.datasets import make_classification, make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression
from matplotlib.colors import ListedColormap

# Add the parent directory to sys.path to import our custom modules
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Import our implementations
from models.logistic_regression import LogisticRegression
from utils.preprocessing import StandardScaler as OurStandardScaler
from utils.plotting import plot_decision_boundary, plot_learning_curve
from utils.metrics import confusion_matrix as our_confusion_matrix
from utils.metrics import precision_recall_f1

# Set random seed for reproducibility
np.random.seed(42)

## 1. Understanding Logistic Regression

Unlike linear regression, which predicts continuous values, logistic regression predicts the probability that an instance belongs to a particular class. It uses the sigmoid function to transform the linear prediction into a probability value between 0 and 1.

The key components are:
1. **Linear combination**: $z = w^T x + b$
2. **Sigmoid function**: $\sigma(z) = \frac{1}{1 + e^{-z}}$
3. **Prediction**: $\hat{y} = \sigma(z)$
4. **Decision rule**: Predict class 1 if $\hat{y} \geq 0.5$, otherwise class 0

Let's visualize the sigmoid function:

In [None]:
def sigmoid(z):
    """Compute the sigmoid of z."""
    return 1 / (1 + np.exp(-z))

# Plot the sigmoid function
z = np.linspace(-10, 10, 100)
y = sigmoid(z)

plt.figure(figsize=(10, 6))
plt.plot(z, y, 'b-', linewidth=2)
plt.axhline(y=0.5, color='k', linestyle='--', label='y = 0.5')
plt.axvline(x=0, color='r', linestyle='--', label='z = 0')
plt.grid(True)
plt.xlabel('z = w^T x + b')
plt.ylabel('sigmoid(z)')
plt.title('Sigmoid Function')
plt.legend()
plt.show()

print("The sigmoid function maps any real number to the range (0, 1).")
print("When z = 0, sigmoid(z) = 0.5")
print("As z approaches positive infinity, sigmoid(z) approaches 1.")
print("As z approaches negative infinity, sigmoid(z) approaches 0.")

## 2. Creating a Classification Dataset

Let's generate a simple classification dataset to demonstrate logistic regression.

In [None]:
# Generate a binary classification dataset
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2,
                           random_state=42, n_clusters_per_class=1)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Plot the dataset
plt.figure(figsize=(10, 6))
plt.scatter(X_train[y_train == 0][:, 0], X_train[y_train == 0][:, 1], marker='o', label='Class 0 (Training)', alpha=0.7)
plt.scatter(X_train[y_train == 1][:, 0], X_train[y_train == 1][:, 1], marker='^', label='Class 1 (Training)', alpha=0.7)
plt.scatter(X_test[y_test == 0][:, 0], X_test[y_test == 0][:, 1], marker='o', label='Class 0 (Test)', alpha=0.7, facecolors='none', edgecolors='blue')
plt.scatter(X_test[y_test == 1][:, 0], X_test[y_test == 1][:, 1], marker='^', label='Class 1 (Test)', alpha=0.7, facecolors='none', edgecolors='orange')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification Dataset')
plt.legend()
plt.grid(True)
plt.show()

## 3. Training a Logistic Regression Model

Now, let's train our custom logistic regression model on this dataset and evaluate its performance.

In [None]:
# Train our logistic regression model
lr_model = LogisticRegression(learning_rate=0.1, max_iterations=1000, regularization=0.0)
lr_model.fit(X_train_scaled, y_train)

# Make predictions
y_train_pred = lr_model.predict(X_train_scaled)
y_test_pred = lr_model.predict(X_test_scaled)

# Calculate accuracy
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

print(f"Training accuracy: {train_accuracy:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")

# Display the model weights
print("\nModel weights:")
print(f"w = {lr_model.weights}")
print(f"b = {lr_model.bias:.4f}")

# Compute and display confusion matrix
print("\nConfusion Matrix (Test Set):")
cm = our_confusion_matrix(y_test, y_test_pred)
print(cm)

# Calculate precision, recall, and F1 score
precision, recall, f1 = precision_recall_f1(y_test, y_test_pred)
print(f"\nPrecision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

## 4. Visualizing the Decision Boundary

Let's visualize the decision boundary of our logistic regression model.

In [None]:
# Plot the decision boundary
fig = plot_decision_boundary(lr_model, X_train_scaled, y_train, 
                           title="Logistic Regression Decision Boundary (Training Set)")
plt.show()

# Plot the decision boundary for the test set
fig = plot_decision_boundary(lr_model, X_test_scaled, y_test, 
                           title="Logistic Regression Decision Boundary (Test Set)")
plt.show()

## 5. Examining Predicted Probabilities

One of the advantages of logistic regression is that it provides probabilities, not just class predictions. Let's examine these probabilities.

In [None]:
# Get predicted probabilities
probs_train = lr_model.predict_proba(X_train_scaled)
probs_test = lr_model.predict_proba(X_test_scaled)

# Plot histograms of predicted probabilities
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(probs_train[y_train == 0], alpha=0.5, label='Class 0', bins=20, range=(0, 1))
plt.hist(probs_train[y_train == 1], alpha=0.5, label='Class 1', bins=20, range=(0, 1))
plt.axvline(x=0.5, color='r', linestyle='--', label='Decision boundary')
plt.xlabel('Predicted probability of Class 1')
plt.ylabel('Count')
plt.title('Probabilities (Training Set)')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.hist(probs_test[y_test == 0], alpha=0.5, label='Class 0', bins=20, range=(0, 1))
plt.hist(probs_test[y_test == 1], alpha=0.5, label='Class 1', bins=20, range=(0, 1))
plt.axvline(x=0.5, color='r', linestyle='--', label='Decision boundary')
plt.xlabel('Predicted probability of Class 1')
plt.ylabel('Count')
plt.title('Probabilities (Test Set)')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

## 6. The Effect of Regularization

Let's explore how regularization affects the logistic regression model.

In [None]:
# Train models with different regularization strengths
lambda_values = [0, 0.01, 0.1, 1.0, 10.0]
models_reg = []
train_acc = []
test_acc = []

for lambda_val in lambda_values:
    model = LogisticRegression(learning_rate=0.1, max_iterations=1000, regularization=lambda_val)
    model.fit(X_train_scaled, y_train)
    models_reg.append(model)
    
    # Calculate accuracy
    train_pred = model.predict(X_train_scaled)
    test_pred = model.predict(X_test_scaled)
    
    train_acc.append(accuracy_score(y_train, train_pred))
    test_acc.append(accuracy_score(y_test, test_pred))
    
    print(f"λ = {lambda_val}: Train accuracy = {train_acc[-1]:.4f}, Test accuracy = {test_acc[-1]:.4f}")

# Plot accuracy vs. regularization strength
plt.figure(figsize=(10, 6))
plt.semilogx(lambda_values, train_acc, 'b-o', linewidth=2, label='Training accuracy')
plt.semilogx(lambda_values, test_acc, 'r-o', linewidth=2, label='Test accuracy')
plt.xlabel('Regularization parameter (λ)')
plt.ylabel('Accuracy')
plt.title('Effect of Regularization on Logistic Regression Accuracy')
plt.legend()
plt.grid(True)
plt.show()

## 7. Comparing Decision Boundaries with Different Regularization

Let's visualize how regularization affects the decision boundary.

In [None]:
# Select a subset of models for visualization
lambda_selected = [0, 0.1, 10.0]  # No, moderate, and strong regularization
selected_indices = [lambda_values.index(lambda_val) for lambda_val in lambda_selected]
models_selected = [models_reg[i] for i in selected_indices]

plt.figure(figsize=(15, 5))

for i, (lambda_val, model) in enumerate(zip(lambda_selected, models_selected)):
    plt.subplot(1, 3, i+1)
    fig = plot_decision_boundary(model, X_train_scaled, y_train, step=0.05, 
                               title=f"Decision Boundary (λ = {lambda_val})")
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

## 8. Handling Non-Linearly Separable Data

Basic logistic regression works well for linearly separable data. Let's see how it performs on a non-linearly separable dataset.

In [None]:
# Generate a non-linearly separable dataset (moons)
X_moons, y_moons = make_moons(n_samples=200, noise=0.2, random_state=42)

# Split the data
X_train_moons, X_test_moons, y_train_moons, y_test_moons = train_test_split(
    X_moons, y_moons, test_size=0.3, random_state=42)

# Scale the features
scaler_moons = StandardScaler()
X_train_moons_scaled = scaler_moons.fit_transform(X_train_moons)
X_test_moons_scaled = scaler_moons.transform(X_test_moons)

# Plot the dataset
plt.figure(figsize=(10, 6))
plt.scatter(X_moons[y_moons == 0, 0], X_moons[y_moons == 0, 1], marker='o', label='Class 0', alpha=0.7)
plt.scatter(X_moons[y_moons == 1, 0], X_moons[y_moons == 1, 1], marker='^', label='Class 1', alpha=0.7)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Non-Linearly Separable Dataset (Moons)')
plt.legend()
plt.grid(True)
plt.show()

# Train a logistic regression model on the moons dataset
lr_moons = LogisticRegression(learning_rate=0.1, max_iterations=1000, regularization=0.1)
lr_moons.fit(X_train_moons_scaled, y_train_moons)

# Make predictions
y_train_pred_moons = lr_moons.predict(X_train_moons_scaled)
y_test_pred_moons = lr_moons.predict(X_test_moons_scaled)

# Calculate accuracy
train_accuracy_moons = accuracy_score(y_train_moons, y_train_pred_moons)
test_accuracy_moons = accuracy_score(y_test_moons, y_test_pred_moons)

print(f"Training accuracy: {train_accuracy_moons:.4f}")
print(f"Test accuracy: {test_accuracy_moons:.4f}")

# Plot the decision boundary
fig = plot_decision_boundary(lr_moons, X_train_moons_scaled, y_train_moons, 
                           title="Logistic Regression on Non-Linearly Separable Data")
plt.show()

## 9. Adding Polynomial Features for Non-Linear Decision Boundaries

We can handle non-linearly separable data by adding polynomial features, similar to what we did for regression.

In [None]:
def add_polynomial_features(X, degree=2):
    """Add polynomial features up to the specified degree."""
    X_poly = X.copy()
    n_samples, n_features = X.shape
    
    # Add squared terms (x^2)
    if degree >= 2:
        for i in range(n_features):
            X_poly = np.column_stack((X_poly, X[:, i]**2))
    
    # Add interaction terms (x_i * x_j)
    for i in range(n_features):
        for j in range(i+1, n_features):
            X_poly = np.column_stack((X_poly, X[:, i] * X[:, j]))
    
    # Add cubic terms (x^3)
    if degree >= 3:
        for i in range(n_features):
            X_poly = np.column_stack((X_poly, X[:, i]**3))
    
    return X_poly

# Add polynomial features to the moons dataset
X_train_moons_poly = add_polynomial_features(X_train_moons_scaled, degree=3)
X_test_moons_poly = add_polynomial_features(X_test_moons_scaled, degree=3)

print(f"Original number of features: {X_train_moons_scaled.shape[1]}")
print(f"Number of features after polynomial expansion: {X_train_moons_poly.shape[1]}")

# Train logistic regression on the expanded features
lr_moons_poly = LogisticRegression(learning_rate=0.1, max_iterations=1000, regularization=0.1)
lr_moons_poly.fit(X_train_moons_poly, y_train_moons)

# Make predictions
y_train_pred_moons_poly = lr_moons_poly.predict(X_train_moons_poly)
y_test_pred_moons_poly = lr_moons_poly.predict(X_test_moons_poly)

# Calculate accuracy
train_accuracy_moons_poly = accuracy_score(y_train_moons, y_train_pred_moons_poly)
test_accuracy_moons_poly = accuracy_score(y_test_moons, y_test_pred_moons_poly)

print(f"\nPolynomial Features:")
print(f"Training accuracy: {train_accuracy_moons_poly:.4f}")
print(f"Test accuracy: {test_accuracy_moons_poly:.4f}")
print(f"Improvement over linear model: {(test_accuracy_moons_poly - test_accuracy_moons)*100:.2f}%")

Since we can't directly visualize the decision boundary in the expanded feature space, let's create a mesh grid in the original 2D space and see how the model classifies each point.

In [None]:
# Create a meshgrid in the original feature space
h = 0.02  # step size in the mesh
x_min, x_max = X_moons[:, 0].min() - 1, X_moons[:, 0].max() + 1
y_min, y_max = X_moons[:, 1].min() - 1, X_moons[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Create a grid of points
grid = np.c_[xx.ravel(), yy.ravel()]

# Scale the grid points
grid_scaled = scaler_moons.transform(grid)

# Add polynomial features to the grid
grid_poly = add_polynomial_features(grid_scaled, degree=3)

# Predict class for each point in the grid
Z_linear = lr_moons.predict(grid_scaled)
Z_poly = lr_moons_poly.predict(grid_poly)

# Reshape results to match the meshgrid shape
Z_linear = Z_linear.reshape(xx.shape)
Z_poly = Z_poly.reshape(xx.shape)

# Plot decision boundaries
plt.figure(figsize=(15, 6))

# Plot linear decision boundary
plt.subplot(1, 2, 1)
plt.contourf(xx, yy, Z_linear, alpha=0.8, cmap=plt.cm.RdBu)
plt.scatter(X_train_moons[:, 0], X_train_moons[:, 1], c=y_train_moons, cmap=plt.cm.RdBu, edgecolors='k')
plt.title(f"Linear Boundary (Accuracy: {test_accuracy_moons:.4f})")
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

# Plot polynomial decision boundary
plt.subplot(1, 2, 2)
plt.contourf(xx, yy, Z_poly, alpha=0.8, cmap=plt.cm.RdBu)
plt.scatter(X_train_moons[:, 0], X_train_moons[:, 1], c=y_train_moons, cmap=plt.cm.RdBu, edgecolors='k')
plt.title(f"Polynomial Boundary (Accuracy: {test_accuracy_moons_poly:.4f})")
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

## 10. Comparison with Scikit-learn's Implementation

Let's compare our implementation with scikit-learn's LogisticRegression.

In [None]:
# Train scikit-learn model on the original dataset
sk_lr = SklearnLogisticRegression(C=1/0.1, max_iter=1000, random_state=42)
sk_lr.fit(X_train_scaled, y_train)

# Make predictions
sk_y_train_pred = sk_lr.predict(X_train_scaled)
sk_y_test_pred = sk_lr.predict(X_test_scaled)

# Calculate accuracy
sk_train_accuracy = accuracy_score(y_train, sk_y_train_pred)
sk_test_accuracy = accuracy_score(y_test, sk_y_test_pred)

print("Comparison with scikit-learn:")
print(f"Our model - Train accuracy: {train_acc[1]:.4f}, Test accuracy: {test_acc[1]:.4f}")
print(f"Scikit-learn - Train accuracy: {sk_train_accuracy:.4f}, Test accuracy: {sk_test_accuracy:.4f}")

# Compare weights
print("\nModel weights:")
print(f"Our model - w: {models_reg[1].weights}, b: {models_reg[1].bias:.4f}")
print(f"Scikit-learn - w: {sk_lr.coef_[0]}, b: {sk_lr.intercept_[0]:.4f}")

# Plot decision boundaries side by side
plt.figure(figsize=(15, 6))

# Our model
plt.subplot(1, 2, 1)
fig = plot_decision_boundary(models_reg[1], X_train_scaled, y_train, 
                           title="Our Implementation")

# Scikit-learn model
plt.subplot(1, 2, 2)
h = 0.02  # step size in the mesh
x_min, x_max = X_train_scaled[:, 0].min() - 1, X_train_scaled[:, 0].max() + 1
y_min, y_max = X_train_scaled[:, 1].min() - 1, X_train_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = sk_lr.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdBu)
plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, cmap=plt.cm.RdBu, edgecolors='k')
plt.title("Scikit-learn Implementation")
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

## 11. Summary of Logistic Regression

Let's summarize what we've learned about logistic regression for classification.

### Key Concepts in Logistic Regression

1. **Core Model Structure**
   - Linear combination of features: $z = w^T x + b$
   - Sigmoid function transforms $z$ to probability: $\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$
   - Decision rule: Predict class 1 if $\hat{y} \geq 0.5$, otherwise class 0
   - This is equivalent to: Predict class 1 if $z \geq 0$, otherwise class 0

2. **Training Process**
   - Cross-entropy loss function measures how well predictions match true labels
   - Gradient descent optimizes weights to minimize the loss
   - Regularization helps prevent overfitting

3. **Decision Boundary**
   - In logistic regression, the decision boundary is a line/hyperplane: $w^T x + b = 0$
   - The model predicts class 1 on one side and class 0 on the other
   - The distance from the boundary relates to prediction confidence

4. **Strengths of Logistic Regression**
   - Provides probabilities, not just class predictions
   - Simple and interpretable model with efficient training
   - Works well for linearly separable data
   - Regularization effectively controls overfitting

5. **Limitations**
   - Linear decision boundary can't separate non-linear data
   - May underfit complex relationships in the data
   - Struggles with imbalanced classes (not demonstrated here)

6. **Addressing Non-Linearity**
   - Polynomial features can create non-linear decision boundaries
   - This maintains the model's simplicity while increasing flexibility
   - But requires careful regularization to prevent overfitting with higher-order polynomials

7. **Practical Tips**
   - Always scale features for faster convergence and better results
   - Use regularization to improve generalization
   - Consider polynomial features for non-linearly separable data
   - Examine predicted probabilities, not just class predictions
   - Use metrics beyond accuracy (precision, recall, F1) for imbalanced problems

### Comparison with Linear Regression

1. **Output interpretation**
   - Linear regression: Continuous values (potentially unrestricted range)
   - Logistic regression: Probabilities between 0 and 1

2. **Cost function**
   - Linear regression: Mean Squared Error
   - Logistic regression: Cross-Entropy Loss (Log Loss)

3. **Application**
   - Linear regression: Regression problems (predicting quantities)
   - Logistic regression: Classification problems (predicting categories)