# 🤖 Application 2: Logistic Regression for Classification

> *"Logistic regression is the gateway from regression to the world of classification."*

In our second application, we'll tackle a **classification** problem. Instead of predicting a continuous value (like a price), we want to predict a discrete category (e.g., Yes/No, Spam/Not Spam). We'll build **Logistic Regression**, a fundamental classification algorithm, from scratch.

## 🎯 How the Math Comes Together

- **Linear Algebra**: We still start with a linear equation, `z = X @ theta`, just like in linear regression.
- **Calculus & Probability**: We then pass the result `z` through a special function called the **Sigmoid (or Logistic) function**. This function squashes any input value into a range between 0 and 1, which we can interpret as a probability. The loss function we use, **Log Loss (or Binary Cross-Entropy)**, is derived from the principle of maximum likelihood estimation.
- **Optimization**: We'll again use Gradient Descent to find the parameters `theta` that minimize the Log Loss.

## 📚 Import Essential Libraries

In [None]:
# Core libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# Plotting style
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

# Load a synthetic dataset for classification
iris = datasets.load_iris()
# We'll use only 2 features and 2 classes for easy visualization
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = (iris["target"] == 2).astype(int)  # 1 if Iris-Virginica, else 0

print("🤖 Libraries and Iris dataset loaded!")

# Visualize the data
plt.scatter(X[y==0][:, 0], X[y==0][:, 1], color='blue', label='Not Iris-Virginica')
plt.scatter(X[y==1][:, 0], X[y==1][:, 1], color='green', label='Iris-Virginica')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.title('Classification Problem: Iris Dataset')
plt.legend()
plt.show()

---

# 📝 Step 1: The Sigmoid Function and Model

### The Sigmoid Function (Calculus & Probability)
The sigmoid function is the heart of logistic regression. It takes any real number and maps it to a probability between 0 and 1.
$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$

### The Logistic Regression Model
1. First, we compute a linear combination of the inputs (just like linear regression): `z = X_b @ θ`
2. Then, we apply the sigmoid function to get the predicted probability: `p_hat = σ(z)`
3. Finally, we make a class prediction based on this probability (e.g., if `p_hat >= 0.5`, predict class 1, else predict class 0).

In [None]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Visualize the sigmoid function
z = np.linspace(-10, 10, 100)
plt.plot(z, sigmoid(z), 'b-')
plt.xlabel('z (Linear Combination)')
plt.ylabel('σ(z) (Probability)')
plt.title('The Sigmoid Function', fontsize=16, weight='bold')
plt.axhline(y=0.5, color='r', linestyle='--', label='Decision Boundary (0.5)')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.5)
plt.axhline(y=1, color='k', linestyle='-', alpha=0.5)
plt.legend()
plt.show()

---

# 📉 Step 2: The Loss Function and Gradient (Calculus)

We can't use MSE for classification. Instead, we use the **Log Loss** (or Binary Cross-Entropy) function. This function penalizes the model heavily when it makes a confident but incorrect prediction.

$$ J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(\hat{p}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{p}^{(i)})] $$

The gradient of this loss function with respect to the parameters `θ` is surprisingly simple and looks very similar to the one for linear regression:

$$ \nabla_\theta J(\theta) = \frac{1}{m} X_b^T \cdot (\sigma(X_b \cdot \theta) - y) $$

---

# ⚙️ Step 3: The Training (Optimization)

The training process is identical to linear regression, but we use the sigmoid function in our prediction and the log loss gradient for our updates.

In [None]:
def train_logistic_regression(X, y, learning_rate=0.1, n_iterations=1000):
    """
    Trains a logistic regression model using Batch Gradient Descent.
    """
    # Add bias term to X
    X_b = np.c_[np.ones((len(X), 1)), X]
    m, n = X_b.shape
    y = y.reshape(-1, 1) # Ensure y is a column vector
    
    # 1. Initialize parameters
    theta = np.random.randn(n, 1)
    
    # Keep track of loss
    loss_history = []
    
    for iteration in range(n_iterations):
        # Calculate predictions (probabilities)
        z = X_b.dot(theta)
        p_hat = sigmoid(z)
        
        # Calculate gradient
        gradients = 1/m * X_b.T.dot(p_hat - y)
        
        # Update parameters
        theta = theta - learning_rate * gradients
        
        # Calculate and store loss (with a small epsilon to avoid log(0))
        epsilon = 1e-7
        loss = -np.mean(y * np.log(p_hat + epsilon) + (1 - y) * np.log(1 - p_hat + epsilon))
        loss_history.append(loss)
        
    return theta, loss_history

# Train the model
final_theta, loss_history = train_logistic_regression(X, y, learning_rate=0.1, n_iterations=3000)

print("--- Training Complete ---")
print(f"Final learned parameters (theta):\n  θ₀ (bias): {final_theta[0][0]:.4f}\n  θ₁ (petal length): {final_theta[1][0]:.4f}\n  θ₂ (petal width): {final_theta[2][0]:.4f}")
print(f"Initial Loss: {loss_history[0]:.4f}")
print(f"Final Loss: {loss_history[-1]:.4f}")

---

# 📊 Step 4: Visualization and Analysis

The result of logistic regression is not a line, but a **decision boundary**. This is the line where the predicted probability is exactly 0.5. On one side, the model predicts class 1, and on the other, it predicts class 0.

In [None]:
def visualize_decision_boundary(theta, X, y, loss_history):
    """
    Visualize the decision boundary and the training loss.
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 8))
    
    # 1. Plot the loss curve
    ax1.plot(loss_history)
    ax1.set_xlabel('Iteration')
    ax1.set_ylabel('Log Loss')
    ax1.set_title('Training Loss Over Time', fontsize=16, weight='bold')
    ax1.grid(True)
    
    # 2. Plot the decision boundary
    ax2.scatter(X[y==0][:, 0], X[y==0][:, 1], color='blue', label='Not Iris-Virginica')
    ax2.scatter(X[y==1][:, 0], X[y==1][:, 1], color='green', label='Iris-Virginica')
    
    # Create a mesh to plot the boundary
    x0, x1 = np.meshgrid(
        np.linspace(X[:,0].min()-0.5, X[:,0].max()+0.5, 500).reshape(-1, 1),
        np.linspace(X[:,1].min()-0.5, X[:,1].max()+0.5, 200).reshape(-1, 1)
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]
    X_new_b = np.c_[np.ones((len(X_new), 1)), X_new]
    
    # Get probabilities for the mesh
    y_proba = sigmoid(X_new_b.dot(theta))
    zz = y_proba.reshape(x0.shape)
    
    # Plot the contour (the decision boundary is the 0.5 level)
    contour = ax2.contour(x0, x1, zz, cmap=plt.cm.brg, levels=[0.5])
    ax2.clabel(contour, inline=True, fontsize=12)
    
    ax2.set_xlabel('Petal Length (cm)')
    ax2.set_ylabel('Petal Width (cm)')
    ax2.set_title('Logistic Regression Decision Boundary', fontsize=16, weight='bold')
    ax2.legend()
    ax2.grid(True)
    
    plt.show()

visualize_decision_boundary(final_theta, X, y, loss_history)

### Analysis

- **Loss Curve (Left)**: As with linear regression, the log loss decreases steadily, showing that our model is learning to distinguish between the two classes more accurately over time.
- **Decision Boundary (Right)**: The plot shows the final linear decision boundary that the model has learned. Any new flower that falls to the upper-right of this line will be classified as Iris-Virginica (probability > 0.5), and any flower to the lower-left will be classified as not Iris-Virginica (probability < 0.5).

This application demonstrates how a simple modification—adding a sigmoid function and changing the loss function—allows us to adapt the machinery of linear regression to solve classification problems.