# **Logistic Regression**

Logistic regression is a supervised learning algorithm used for classification problems. Unlike linear regression, which predicts continuous values, logistic regression predicts probabilities that map input data to discrete classes.

---

## **Types of Logistic Regression**

1. **Binomial Logistic Regression**
   - Handles binary classification problems where the outcome has only two possible classes.
   - Example: Predicting whether a student passes (1) or fails (0) an exam.
   - Output: Probability of belonging to class 1 ($P(y=1)$). If the probability is greater than a threshold (usually 0.5), the model assigns the class label 1.

2. **Multinomial Logistic Regression**
   - Used for multi-class classification problems where the outcome can belong to one of three or more categories.
   - Example: Classifying types of flowers (setosa, versicolor, virginica).
   - It uses the softmax function to compute the probabilities for each class.

3. **Ordinal Logistic Regression**
   - Deals with ordinal data where the outcome has categories with a natural order.
   - Example: Predicting customer satisfaction levels (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied).
   - The model considers the order of classes while making predictions, unlike multinomial logistic regression.

---

## **How Logistic Regression Works**

Logistic regression estimates the probability of a data point belonging to a particular class using a logistic (sigmoid) function. 

### 1. **Linear Combination of Features**
The algorithm begins by calculating a linear combination of the input features:
$$ z = m_1x_1 + m_2x_2 + m_3x_3 + \dots + b $$
- $m_1, m_2, \dots$: Coefficients (weights) representing the importance of each feature.
- $x_1, x_2, \dots$: Input features.
- $b$: Bias term.

In regression problems, this $z$ value is used directly as the predicted output. However, in logistic regression, this linear value is passed through the **sigmoid function** to map it to a probability between 0 and 1.

### 2. **Sigmoid Function**
The sigmoid function transforms the linear regression output into a probability:
$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$
- $\sigma(z)$: The predicted probability of the input belonging to the positive class (e.g., $P(y=1)$).
- $e$: Euler's number (approximately 2.718).

**Key Features of the Sigmoid Function:**
1. **Range**: The output of the sigmoid function is always between 0 and 1, making it ideal for representing probabilities.
2. **Interpretability**: 
   - When $z = 0$, $\sigma(z) = 0.5$, meaning the model is equally likely to classify the input into either class.
   - For $z > 0$, $\sigma(z)$ approaches 1, indicating a high probability of belonging to the positive class.
   - For $z < 0$, $\sigma(z)$ approaches 0, indicating a high probability of belonging to the negative class.

3. **Thresholding**:
   - Logistic regression uses a threshold (commonly 0.5) to decide the class.
   - If $\sigma(z) \geq 0.5$, predict class 1; otherwise, predict class 0.

---

### 3. **Cost Function**
Logistic regression uses the **log-loss (cross-entropy loss)** function to evaluate the error:
$$ J(\mathbf{w}, b) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right] $$
- $\hat{y}_i = \sigma(z_i)$: Predicted probability for instance $i$.
- $y_i$: Actual label for instance $i$.

The goal is to minimize this cost function by adjusting the weights ($\mathbf{w}$) and bias ($b$).

---

### 4. **Gradient Descent Optimization**
To minimize the cost function, logistic regression uses **gradient descent**:
- Compute gradients:
  $$ \frac{\partial J}{\partial w_j} \text{ and } \frac{\partial J}{\partial b} $$
- Update parameters:
  $$ w_j = w_j - \alpha \cdot \frac{\partial J}{\partial w_j} $$  
  $$ b = b - \alpha \cdot \frac{\partial J}{\partial b} $$
- $\alpha$: Learning rate determining the step size.

---

## **Key Characteristics**
1. Logistic regression provides probabilities, allowing for confidence in predictions.
2. It assumes a linear relationship between input features and the log-odds of the outcome.
3. Logistic regression can be extended to multinomial and ordinal problems with suitable adjustments.

---

## **Extensions of Logistic Regression**

### **Multinomial Logistic Regression**
- For multi-class problems, the model calculates probabilities for each class using the **softmax function**:
  $$ P(y=i | \mathbf{x}) = \frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}} $$
  - $z_i$: Linear combination for class $i$.
  - $k$: Total number of classes.

### **Ordinal Logistic Regression**
- Instead of predicting a single probability, the model predicts cumulative probabilities for ordered categories:
  $$ P(y \leq j | \mathbf{x}) = \frac{1}{1 + e^{-(\mathbf{w} \cdot \mathbf{x} + b_j)}} $$
  - Separate thresholds ($b_j$) are learned for each category.

---

## **Advantages of Logistic Regression**
1. Simple and interpretable.
2. Works well for linearly separable data.
3. Computationally efficient and robust to overfitting with regularization (L1 or L2).

## **Limitations**
1. Assumes linearity between features and log-odds, which may not hold for complex datasets.
2. Performance may degrade with multi-collinearity unless addressed.
3. Cannot capture non-linear relationships unless extended with kernel methods.

---

Logistic regression remains a foundational technique for classification problems, offering simplicity, interpretability, and effectiveness for many real-world scenarios.
