<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
  <title>Logistic Regression (Slow Explanation)</title>
  <style>
    body{
      font-family: Arial, Helvetica, sans-serif;
      line-height: 1.6;
      margin: 0;
      background: #f5f7fb;
      color: #222;
    }
    .container{
      max-width: 900px;
      margin: 30px auto;
      padding: 20px;
      background: white;
      border-radius: 12px;
      box-shadow: 0 4px 15px rgba(0,0,0,0.08);
    }
    h1{
      text-align: center;
      color: #0b3d91;
    }
    h2{
      color: #0b3d91;
      margin-top: 30px;
    }
    .box{
      background: #eef4ff;
      border-left: 6px solid #0b3d91;
      padding: 12px 15px;
      border-radius: 8px;
      margin: 10px 0;
    }
    code{
      background: #111;
      color: #fff;
      padding: 2px 6px;
      border-radius: 5px;
      font-size: 0.95em;
    }
    pre{
      background: #111;
      color: #fff;
      padding: 15px;
      border-radius: 10px;
      overflow-x: auto;
      font-size: 0.95em;
    }
    ul{
      margin: 8px 0 8px 20px;
    }
    .math{
      font-family: "Courier New", monospace;
      background: #fafafa;
      padding: 8px 10px;
      border-radius: 8px;
      border: 1px solid #ddd;
      display: inline-block;
      margin: 5px 0;
    }
    .recap{
      background: #e7ffe7;
      border-left: 6px solid #1f8f1f;
      padding: 12px 15px;
      border-radius: 8px;
      margin-top: 25px;
      font-weight: bold;
    }
  </style>
</head>
<body>
  <div class="container">
    <h1>Logistic Regression (Slow Explanation)</h1>

    <h2>1) What problem does Logistic Regression solve?</h2>
    <p>
      Logistic Regression is used for <b>classification</b>, mostly <b>binary classification</b>.
    </p>

    <div class="box">
      <b>Examples:</b>
      <ul>
        <li>Email: spam (1) or not spam (0)</li>
        <li>Student: pass (1) or fail (0)</li>
        <li>Message: scam (1) or real (0)</li>
      </ul>
    </div>

    <p>So your target/output is usually:</p>
    <p class="math">y ∈ {0, 1}</p>

    <h2>2) The main idea (very important)</h2>
    <p>
      Logistic Regression tries to predict the <b>probability</b> that something belongs to class <b>1</b>.
    </p>

    <p>So it outputs:</p>
    <p class="math">ŷ = P(y = 1 | x)</p>

    <div class="box">
      <b>Example:</b>
      <ul>
        <li>If model says 0.90 → 90% chance it is class 1</li>
        <li>If model says 0.20 → 20% chance it is class 1</li>
      </ul>
    </div>

    <p>Then we convert probability into a class:</p>
    <ul>
      <li>If <span class="math">ŷ ≥ 0.5</span> → predict <b>1</b></li>
      <li>Else → predict <b>0</b></li>
    </ul>

    <h2>3) How does the model calculate that probability?</h2>
    <p>First, it makes a <b>linear score</b> (like linear regression):</p>

    <p class="math">z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b</p>

    <p>Short form:</p>
    <p class="math">z = wᵀx + b</p>

    <div class="box">
      <b>Where:</b>
      <ul>
        <li><span class="math">x</span> = input features</li>
        <li><span class="math">w</span> = weights (importance of each feature)</li>
        <li><span class="math">b</span> = bias (shifts the decision)</li>
      </ul>
    </div>

    <p>
      ⚠️ But <span class="math">z</span> can be any number (negative, positive, large, small).
      We need a probability between <b>0 and 1</b>.
    </p>

    <h2>4) The Sigmoid function (the “magic” part)</h2>
    <p>We use the sigmoid function:</p>
    <p class="math">σ(z) = 1 / (1 + e<sup>-z</sup>)</p>

    <p>So the prediction is:</p>
    <p class="math">ŷ = σ(z)</p>

    <div class="box">
      <b>What sigmoid does:</b>
      <ul>
        <li>If <span class="math">z</span> is big positive → sigmoid ≈ 1</li>
        <li>If <span class="math">z = 0</span> → sigmoid = 0.5</li>
        <li>If <span class="math">z</span> is big negative → sigmoid ≈ 0</li>
      </ul>
    </div>

    <p><b>Examples:</b></p>
    <ul>
      <li><span class="math">z = 5</span> → <span class="math">ŷ ≈ 0.99</span></li>
      <li><span class="math">z = 0</span> → <span class="math">ŷ = 0.5</span></li>
      <li><span class="math">z = -5</span> → <span class="math">ŷ ≈ 0.01</span></li>
    </ul>

    <p>
      So sigmoid “squashes” any number into <b>[0, 1]</b>.
    </p>

    <h2>5) Why not just use Linear Regression for classification?</h2>
    <p>
      Because linear regression outputs values like:
    </p>
    <ul>
      <li>-3.2</li>
      <li>4.8</li>
      <li>100</li>
    </ul>

    <p>
      Those are <b>not probabilities</b>.
      Logistic regression is built to output probabilities.
    </p>

    <h2>6) How does it learn? (Training)</h2>
    <p>The model starts with random/zero weights:</p>
    <p class="math">w = 0, b = 0</p>

    <p>Then it predicts:</p>
    <p class="math">ŷ = σ(Xw + b)</p>

    <p>Then it compares prediction vs real answer:</p>
    <p class="math">error = ŷ - y</p>

    <p>Then it updates weights to reduce error.</p>

    <h2>7) The Loss function (how it measures “wrongness”)</h2>
    <p>Logistic regression uses <b>Log Loss / Cross-Entropy Loss</b>.</p>

    <p>For one example:</p>
    <p class="math">L = -[ y log(ŷ) + (1 - y) log(1 - ŷ) ]</p>

    <div class="box">
      <b>Meaning:</b>
      <ul>
        <li>If the true label is 1 → it punishes the model if <span class="math">ŷ</span> is low</li>
        <li>If the true label is 0 → it punishes the model if <span class="math">ŷ</span> is high</li>
      </ul>
    </div>

    <h2>8) Gradient Descent (how it improves)</h2>
    <p>To reduce the loss, we use gradient descent:</p>

    <p class="math">w := w - α · (∂L/∂w)</p><br/>
    <p class="math">b := b - α · (∂L/∂b)</p>

    <div class="box">
      <b>Where:</b>
      <ul>
        <li><span class="math">α</span> = learning rate (step size)</li>
      </ul>
    </div>

    <h2>9) Logistic regression in code terms</h2>
    <p>In code, logistic regression training looks like this:</p>

    <pre>
Step 1: compute score
z = X @ w + b

Step 2: convert to probability
y_hat = sigmoid(z)

Step 3: compute gradients
error = y_hat - y
dw = (X.T @ error) / n
db = error.mean()

Step 4: update parameters
w -= lr * dw
b -= lr * db
    </pre>

    <h2>10) Intuition: what are weights doing?</h2>
    <p>
      If a feature is strongly related to class 1, the model learns a <b>positive weight</b>:
    </p>
    <ul>
      <li>bigger x → bigger z → sigmoid closer to 1</li>
    </ul>

    <p>
      If a feature makes it more likely to be class 0, weight becomes <b>negative</b>:
    </p>
    <ul>
      <li>bigger x → smaller z → sigmoid closer to 0</li>
    </ul>

    <h2>11) Decision boundary (simple meaning)</h2>
    <p>
      Logistic regression separates data using a line/plane.
    </p>

    <p>Decision rule:</p>
    <p class="math">ŷ = 1 if ŷ ≥ 0.5</p>

    <p>
      Since sigmoid is 0.5 at <span class="math">z = 0</span>, the boundary is:
    </p>

    <p class="math">z = wᵀx + b = 0</p>

    <p>
      That’s the “line” separating class 0 and class 1.
    </p>

    <div class="recap">
      ✅ Quick recap: Logistic regression = linear model + sigmoid + cross-entropy loss,
      trained with gradient descent to output probabilities for classification.
    </div>
  </div>
</body>
</html>


In [1]:
import numpy as np

def _sigmoid(z):
    """Numerically stable sigmoid implementation."""
    return np.where(z >= 0, 1/(1+np.exp(-z)), np.exp(z)/(1+np.exp(z)))

def train_logistic_regression(X, y, lr=0.1, steps=1000):
    """
    Train logistic regression via gradient descent.
    Return (w, b).
    """
    X = np.asarray(X, dtype=float)
    y = np.asarray(y, dtype=float).reshape(-1)

    n_samples, n_features = X.shape

    # Initialize parameters
    w = np.zeros(n_features, dtype=float)
    b = 0.0

    for _ in range(steps):
        # Linear model
        z = X @ w + b

        # Prediction (probabilities)
        y_hat = _sigmoid(z)

        # Gradients
        error = y_hat - y
        dw = (X.T @ error) / n_samples
        db = np.sum(error) / n_samples

        # Update
        w -= lr * dw
        b -= lr * db

    return w, b
