# Logistic Regression with 10 Data Points: Detailed Example

## Problem Setup

We are predicting whether customers will purchase a product ($y = 1$) or not ($y = 0$) based on the number of emails they’ve read ($x$).

### **Given Data**

| Example ($i$) | Emails Read ($x^{(i)}$) | True Label ($y^{(i)}$) |
|------------------|---------------------------|--------------------------|
| 1                | 1                         | 0                        |
| 2                | 2                         | 0                        |
| 3                | 3                         | 0                        |
| 4                | 4                         | 1                        |
| 5                | 5                         | 1                        |
| 6                | 6                         | 1                        |
| 7                | 7                         | 1                        |
| 8                | 8                         | 1                        |
| 9                | 9                         | 1                        |
| 10               | 10                        | 1                        |

---

## Model Parameters

- Weight: $w = 0.5$
- Bias: $b = -2.5$

The predicted probability is calculated using the **sigmoid function**:

$$
f_{\vec{w}, b}(\vec{x}) = \frac{1}{1 + e^{-(w \cdot x + b)}}
$$

---

## Step-by-Step Calculation

### Step 1: Compute Predicted Probabilities ($f_{\vec{w}, b}(\vec{x}^{(i)})$)

We compute $z^{(i)} = w \cdot x^{(i)} + b$ for each example, then pass $z^{(i)}$ through the sigmoid function:

| Example ($i$) | $x^{(i)}$ | $z^{(i)} = w \cdot x^{(i)} + b$ | $f_{\vec{w}, b}(\vec{x}^{(i)}) = \frac{1}{1 + e^{-z^{(i)}}}$ |
|------------------|-------------|----------------------------------|---------------------------------------------------------|
| 1                | 1           | $0.5 \cdot 1 - 2.5 = -2.0$      | $ \frac{1}{1 + e^{2.0}} \approx 0.119$               |
| 2                | 2           | $0.5 \cdot 2 - 2.5 = -1.5$      | $ \frac{1}{1 + e^{1.5}} \approx 0.182$               |
| 3                | 3           | $0.5 \cdot 3 - 2.5 = -1.0$      | $ \frac{1}{1 + e^{1.0}} \approx 0.269$               |
| 4                | 4           | $0.5 \cdot 4 - 2.5 = -0.5$      | $ \frac{1}{1 + e^{0.5}} \approx 0.378$               |
| 5                | 5           | $0.5 \cdot 5 - 2.5 = 0.0$       | $ \frac{1}{1 + e^{0.0}} = 0.500$                     |
| 6                | 6           | $0.5 \cdot 6 - 2.5 = 0.5$       | $ \frac{1}{1 + e^{-0.5}} \approx 0.622$              |
| 7                | 7           | $0.5 \cdot 7 - 2.5 = 1.0$       | $ \frac{1}{1 + e^{-1.0}} \approx 0.731$              |
| 8                | 8           | $0.5 \cdot 8 - 2.5 = 1.5$       | $ \frac{1}{1 + e^{-1.5}} \approx 0.818$              |
| 9                | 9           | $0.5 \cdot 9 - 2.5 = 2.0$       | $ \frac{1}{1 + e^{-2.0}} \approx 0.881$              |
| 10               | 10          | $0.5 \cdot 10 - 2.5 = 2.5$      | $ \frac{1}{1 + e^{-2.5}} \approx 0.924$              |

---

### Step 2: Compute Log Loss for Each Example

The log loss formula depends on the true label ($y^{(i)}$):

$$
L(f_{\vec{w}, b}(\vec{x}^{(i)}), y^{(i)}) =
\begin{cases}
-\log(f_{\vec{w}, b}(\vec{x}^{(i)})) & \text{if } y^{(i)} = 1 \\
-\log(1 - f_{\vec{w}, b}(\vec{x}^{(i)})) & \text{if } y^{(i)} = 0
\end{cases}
$$

| Example ($i$) | $y^{(i)}$ | $f_{\vec{w}, b}(\vec{x}^{(i)})$ | Log Loss $L(f_{\vec{w}, b}(\vec{x}^{(i)}), y^{(i)})$    |
|------------------|------------|----------------------------------|--------------------------------------------------------|
| 1                | 0          | 0.119                            | $-\log(1 - 0.119) \approx 0.127$                     |
| 2                | 0          | 0.182                            | $-\log(1 - 0.182) \approx 0.201$                     |
| 3                | 0          | 0.269                            | $-\log(1 - 0.269) \approx 0.312$                     |
| 4                | 1          | 0.378                            | $-\log(0.378) \approx 0.972$                         |
| 5                | 1          | 0.500                            | $-\log(0.500) \approx 0.693$                         |
| 6                | 1          | 0.622                            | $-\log(0.622) \approx 0.474$                         |
| 7                | 1          | 0.731                            | $-\log(0.731) \approx 0.314$                         |
| 8                | 1          | 0.818                            | $-\log(0.818) \approx 0.201$                         |
| 9                | 1          | 0.881                            | $-\log(0.881) \approx 0.127$                         |
| 10               | 1          | 0.924                            | $-\log(0.924) \approx 0.079$                         |

---

### Step 3: Compute Total Cost

The total cost is the average of all log losses:

$$
J(\vec{w}, b) = \frac{1}{m} \sum_{i=1}^m L(f_{\vec{w}, b}(\vec{x}^{(i)}), y^{(i)})
$$

- $m = 10$ (10 examples).
- Sum of all losses:
$$
\text{Sum} = 0.127 + 0.201 + 0.312 + 0.972 + 0.693 + 0.474 + 0.314 + 0.201 + 0.127 + 0.079 = 3.500
$$

The total cost:
$$
J(\vec{w}, b) = \frac{3.500}{10} = 0.350
$$

---

## Final Results

1. **Predicted Probabilities**:  
   The model assigns probabilities ranging from $0.119$ to $0.924$ for each example.

2. **Individual Losses**:  
   Losses are higher for incorrect or uncertain predictions.

3. **Total Cost**:  
   The average loss across all examples is $J(\vec{w}, b) = 0.350$.

---

## Key Observations

1. **Low Loss for Accurate Predictions**:  
   - For $y = 0$, lower probabilities ($f_{\vec{w}, b}(\vec{x})$) result in small losses.
   - For $y = 1$, higher probabilities ($f_{\vec{w}, b}(\vec{x})$) result in small losses.

2. **High Loss for Incorrect or Uncertain Predictions**:  
   - For $y = 1$, low predicted probabilities ($f_{\vec{w}, b}(\vec{x})$) result in high losses (e.g., Example 4).
   - For $y = 0$, high predicted probabilities ($f_{\vec{w}, b}(\vec{x})$) result in high losses.