<h1>Logistic Regression (Slow Explanation)</h1> <h2>1) What problem does Logistic Regression solve?</h2> <p>Logistic Regression is used for <b>classification</b>, mostly <b>binary classification</b>:</p> <ul> <li>Email: <b>spam (1)</b> or <b>not spam (0)</b></li> <li>Student: <b>pass (1)</b> or <b>fail (0)</b></li> <li>Message: <b>scam (1)</b> or <b>real (0)</b></li> </ul> <p>So your target/output is usually:</p> <p><code>y ∈ {0, 1}</code></p> <hr/> <h2>2) The main idea (very important)</h2> <p>Logistic Regression tries to predict the probability that something belongs to class <b>1</b>.</p> <p>So it outputs:</p> <p><code>ŷ = P(y = 1 | x)</code></p> <ul> <li>If model says <code>0.90</code> → 90% chance it is class 1</li> <li>If model says <code>0.20</code> → 20% chance it is class 1</li> </ul> <p>Then we convert probability into a class:</p> <ul> <li>If <code>ŷ ≥ 0.5</code> → predict <b>1</b></li> <li>Else → predict <b>0</b></li> </ul> <hr/> <h2>3) How does the model calculate that probability?</h2> <p>First, it makes a <b>linear score</b> (just like linear regression):</p> <p><code>z = w1x1 + w2x2 + ... + wnxn + b</code></p> <p>Short form:</p> <p><code>z = wᵀx + b</code></p> <p><b>Where:</b></p> <ul> <li><code>x</code> = input features</li> <li><code>w</code> = weights (importance of each feature)</li> <li><code>b</code> = bias (shifts the decision)</li> </ul> <p>⚠️ But <code>z</code> can be any number: negative, positive, large, small.</p> <p>We need to turn it into a probability between <b>0</b> and <b>1</b>.</p> <hr/> <h2>4) The Sigmoid function (the “magic” part)</h2> <p>We use the sigmoid function:</p> <p><code>σ(z) = 1 / (1 + e^(-z))</code></p> <p>So the prediction is:</p> <p><code>ŷ = σ(z)</code></p> <p><b>What sigmoid does:</b></p> <ul> <li>If <code>z</code> is a big positive number → sigmoid ≈ 1</li> <li>If <code>z = 0</code> → sigmoid = 0.5</li> <li>If <code>z</code> is a big negative number → sigmoid ≈ 0</li> </ul> <p><b>Examples:</b></p> <ul> <li><code>z = 5</code> → <code>ŷ ≈ 0.99</code></li> <li><code>z = 0</code> → <code>ŷ = 0.5</code></li> <li><code>z = -5</code> → <code>ŷ ≈ 0.01</code></li> </ul> <p>So sigmoid “squashes” any number into <code>[0, 1]</code>.</p> <hr/> <h2>5) Why not just use Linear Regression for classification?</h2> <p>Because linear regression outputs values like:</p> <ul> <li><code>-3.2</code></li> <li><code>4.8</code></li> <li><code>100</code></li> </ul> <p>Those are not probabilities.</p> <hr/> <h2>6) How does it learn? (Training)</h2> <p>The model starts with random/zero weights:</p> <p><code>w = 0, b = 0</code></p> <p>Then it predicts:</p> <p><code>ŷ = σ(Xw + b)</code></p> <p>Then it compares prediction vs real answer:</p> <p><code>error = ŷ - y</code></p> <p>Then it updates weights to reduce error.</p> <hr/> <h2>7) The Loss function (how it measures “wrongness”)</h2> <p>Logistic regression uses <b>Log Loss / Cross-Entropy Loss</b>:</p> <p><code>L = -[ y log(ŷ) + (1 - y) log(1 - ŷ) ]</code></p> <p><b>Meaning:</b></p> <ul> <li>If the true label is 1 → it punishes the model if <code>ŷ</code> is low</li> <li>If the true label is 0 → it punishes the model if <code>ŷ</code> is high</li> </ul> <hr/> <h2>8) Gradient Descent (how it improves)</h2> <p>To reduce the loss, we use gradient descent:</p> <p><code>w := w - α · (∂L/∂w)</code></p> <p><code>b := b - α · (∂L/∂b)</code></p> <p>Where:</p> <ul> <li><code>α</code> = learning rate (step size)</li> </ul> <hr/> <h2>9) Logistic regression in code terms</h2> <p>In your code you had:</p> <pre> Step 1: compute score z = X @ w + b Step 2: convert to probability y_hat = sigmoid(z) Step 3: compute gradients error = y_hat - y dw = (X.T @ error) / n db = error.mean() Step 4: update parameters w -= lr * dw b -= lr * db </pre> <hr/> <h2>10) Intuition: what are weights doing?</h2> <ul> <li>If a feature is strongly related to class 1, the model learns a <b>positive weight</b>: <ul> <li>bigger x → bigger z → sigmoid closer to 1</li> </ul> </li> <li>If a feature makes it more likely to be class 0, weight becomes <b>negative</b>: <ul> <li>bigger x → smaller z → sigmoid closer to 0</li> </ul> </li> </ul> <hr/> <h2>11) Decision boundary (simple meaning)</h2> <p>Decision rule:</p> <p><code>ŷ = 1 if ŷ ≥ 0.5</code></p> <p>Since <code>ŷ = σ(z)</code> and sigmoid is 0.5 at <code>z = 0</code>, the boundary is:</p> <p><code>z = wᵀx + b = 0</code></p> <hr/> <h2>Quick recap (in one sentence)</h2> <p><b>✅ Logistic regression = linear regression + sigmoid + cross-entropy loss, trained with gradient descent to output probabilities for classification.</b></p>

In [1]:
import numpy as np

def _sigmoid(z):
    """Numerically stable sigmoid implementation."""
    return np.where(z >= 0, 1/(1+np.exp(-z)), np.exp(z)/(1+np.exp(z)))

def train_logistic_regression(X, y, lr=0.1, steps=1000):
    """
    Train logistic regression via gradient descent.
    Return (w, b).
    """
    X = np.asarray(X, dtype=float)
    y = np.asarray(y, dtype=float).reshape(-1)

    n_samples, n_features = X.shape

    # Initialize parameters
    w = np.zeros(n_features, dtype=float)
    b = 0.0

    for _ in range(steps):
        # Linear model
        z = X @ w + b

        # Prediction (probabilities)
        y_hat = _sigmoid(z)

        # Gradients
        error = y_hat - y
        dw = (X.T @ error) / n_samples
        db = np.sum(error) / n_samples

        # Update
        w -= lr * dw
        b -= lr * db

    return w, b
