
# 🧠 Logistic Regression —  Summary

---

## 1.  Intuition: What is Logistic Regression?

- Logistic Regression is a **classification algorithm**, not a regression algorithm.
- It models the **probability** that a given input belongs to **class 1**.
- Output is a value **between 0 and 1**, representing the probability.

$$
\hat{y} = \sigma(w^T x + b)
$$



## 2. Decision Boundary

- The linear part:  
  $$
  z = w^T x + b
  $$
- Apply the **sigmoid** to get a probability:  
  $$
  \hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}
  $$
- Classification rule:
  - If  the prediction is greater than or equal to the threshold (usually 0.5): Predict class **1**
  - Else: Predict class **0**
- The decision boundary is the line/plane where \( w^T x + b = 0 \)
 ##  Threshold for Classification

- **Default threshold** = `0.5`
- **Prediction rule**:
  - If \( \hat{y} \geq \text{threshold} \) → predict **class 1**
  - If \( \hat{y} < \text{threshold} \) → predict **class 0**

###  Changing the Threshold

- Example: Raise threshold to `0.7` → requires **more confidence** before predicting class 1
- **Changing the threshold**:
  -  Does **not** change the sigmoid curve
  -  Does change the **decision rule**

---

## Decision Boundary

- The **decision boundary** is where the model is **uncertain** (i.e., \( \hat{y} = \text{threshold} \))

### When threshold = 0.5:
- Decision boundary is where:
  $$
  z = 0
  $$

### For custom threshold \( t \):
- Decision boundary is where:
  $$
  z = \ln\left(\frac{t}{1 - t}\right)
  $$

- In feature space, it is a:
  - Line (2D)
  - Plane (3D)
  - Hyperplane (nD)

- The **shape** of the boundary depends on the weights \( w \)
- The boundary is **conceptual**, not a physical object

---

## On the Sigmoid Plot

- **X-axis** = \( z \)
- **Y-axis** = \( \sigma(z) \) (sigmoid output)


- **Vertical line** at \( z = 0 \) → decision boundary (when threshold = 0.5)
- **Horizontal line** at \( \hat{y} = 0.5 \) → cutoff probability

---

##  Conceptual Clarifications

- **Changing threshold** affects predictions, not the shape of the sigmoid
- The decision boundary in **feature space** separates inputs into **class 0 vs class 1**

### In the sigmoid plot:
- The **boundary is vertical** (in terms of \( z \))
- The **cutoff is horizontal** (in terms of probability)




## 3.  What We're Trying to Learn

- Find weights \( w \) and bias \( b \) that make \( \hat{y}^{(i)} \) close to \( y^{(i)} \)
- In other words: minimize the **difference** between predicted probabilities and true labels



## 4.  Loss Function: Log Loss (Binary Cross Entropy)

$$
\mathcal{L}^{(i)} = -y^{(i)} \log(\hat{y}^{(i)}) - (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})
$$

$$
J(w, b) = \frac{1}{m} \sum_{i=1}^m \mathcal{L}^{(i)}
$$

- Penalizes confident wrong predictions
- Works well with sigmoid output



## 5.  Training via Gradient Descent

Steps:

1. Compute predictions:
   $$ \hat{y} = \sigma(w^T x + b) $$

2. Compute error:
   $$ \text{error} = \hat{y} - y $$

3. Compute gradients:
   - With respect to weights:
     $$ \frac{1}{m} X^T (\hat{y} - y) $$
   - With respect to bias:
     $$ \frac{1}{m} \sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)}) $$

4. Update:
   $$
   w := w - \alpha \cdot \frac{\partial J}{\partial w}
   $$
   $$
   b := b - \alpha \cdot \frac{\partial J}{\partial b}
   $$

Repeat until convergence.



## 6. Making Predictions

```python
z = X @ w + b
pred_probs = sigmoid(z)
preds = (pred_probs >= 0.5).astype(int)
```



## 7.  Using Scikit-learn

```python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

preds = model.predict(X_test)
probs = model.predict_proba(X_test)
```
