# One-vs-Rest (OVR) Logistic Regression

---

## **1. The Problem**

* Standard **logistic regression** handles **binary classification**: two classes only.
* Many real-world problems are **multi-class** (3 or more classes), e.g., classifying animals as Cat, Dog, or Rabbit.
* We need a way to extend logistic regression to handle multiple classes.

---

## **2. One-vs-Rest (OVR) Strategy**

OVR (also called One-vs-All) converts a **multi-class problem into multiple binary classification problems**:

1. Suppose there are $K$ classes: $C_1, C_2, ..., C_K$.
2. For each class $C_k$, train a **binary logistic regression classifier**:

   * Treat $C_k$ as the **positive class** (1).
   * Treat all other classes as **negative class** (0).

**Example with 3 classes (Cat, Dog, Rabbit):**

| Classifier | Positive | Negative    |
| ---------- | -------- | ----------- |
| M1         | Cat      | Dog, Rabbit |
| M2         | Dog      | Cat, Rabbit |
| M3         | Rabbit   | Cat, Dog    |

---

## **3. Training Phase**

* Each binary model $M_k$ is trained **independently**.
* Input features remain the same for all models.
* Use **one-hot encoding** for outputs:

| Class  | One-hot  |
| ------ | -------- |
| Cat    | \[1,0,0] |
| Dog    | \[0,1,0] |
| Rabbit | \[0,0,1] |

* Each model predicts the probability that a sample belongs to its respective class.

---

## **4. Prediction Phase**

For a new data point:

1. Pass the input to **all K models**.
2. Each model outputs a **probability** that the point belongs to its positive class.
3. Example probabilities:

| Model | Probability |
| ----- | ----------- |
| M1    | 0.25        |
| M2    | 0.20        |
| M3    | 0.55        |

4. **Choose the class with the highest probability** → here, **Rabbit (class 3)**.

---

## **5. Advantages of OVR**

* Simple to implement.
* Works with any binary classifier (logistic regression, SVM, etc.).
* Efficient when the number of classes is not very large.

---

## **6. Disadvantages**

* Probabilities from different classifiers may not be **well-calibrated**.
* Can be biased if one class is much smaller than the “rest.”
* Less accurate than One-vs-One in some cases.

---

### ✅ **Summary**

OVR Logistic Regression works by:

1. Splitting a multi-class problem into **K binary problems**.
2. Training a separate logistic regression for each class.
3. Predicting the class with the **highest probability** across all models.



## Example Problem Statement

**Problem:**
You are building a model to classify types of fruits based on two features:

* **f1** = Weight (grams)
* **f2** = Color Score (0–10 scale)

**Classes:**

1. Apple
2. Banana
3. Cherry

**Training Data:**

| Fruit  | f1 (Weight) | f2 (Color Score) |
| ------ | ----------- | ---------------- |
| Apple  | 150         | 8                |
| Apple  | 170         | 7                |
| Banana | 120         | 4                |
| Banana | 130         | 5                |
| Cherry | 10          | 9                |
| Cherry | 15          | 8                |

We want to predict the **fruit type** given `f1` and `f2`.

---

## **Step 1: One-vs-Rest (OVR) Setup**

We have **3 classes**, so we create **3 binary classifiers**:

1. **M1 (Apple vs Rest):**

   * Positive: Apple
   * Negative: Banana, Cherry

2. **M2 (Banana vs Rest):**

   * Positive: Banana
   * Negative: Apple, Cherry

3. **M3 (Cherry vs Rest):**

   * Positive: Cherry
   * Negative: Apple, Banana

---

## **Step 2: One-hot Encoding of Target**

| Fruit  | One-hot (Apple, Banana, Cherry) |
| ------ | ------------------------------- |
| Apple  | \[1, 0, 0]                      |
| Banana | \[0, 1, 0]                      |
| Cherry | \[0, 0, 1]                      |

* Each classifier uses its corresponding column as the **target**.

---

## **Step 3: Training Binary Models**

* Each binary logistic regression model is trained independently:

  * Input: `[f1, f2]`
  * Output: probability of being the **positive class**

---

## **Step 4: Prediction Example**

**Test data:**

* f1 = 140, f2 = 6

**Step 4a: Predict probabilities using each classifier**

| Model | Class  | Probability |
| ----- | ------ | ----------- |
| M1    | Apple  | 0.4         |
| M2    | Banana | 0.35        |
| M3    | Cherry | 0.25        |

**Step 4b: Choose class with highest probability**

* Max probability = 0.4 → **Apple**

So the predicted class is **Apple**.

---

### **Step 5: Summary**

* OVR breaks multi-class classification into **multiple binary logistic regressions**.
* Each model outputs a probability for its class.
* **Final prediction** = class with **highest probability**.



In [2]:
# Import libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore")
# Sample data (Fruit dataset)
X = np.array([
    [150, 8],   # Apple
    [170, 7],   # Apple
    [120, 4],   # Banana
    [130, 5],   # Banana
    [10, 9],    # Cherry
    [15, 8]     # Cherry
])

y = np.array(['Apple', 'Apple', 'Banana', 'Banana', 'Cherry', 'Cherry'])

# Encode labels to integers
le = LabelEncoder()
y_encoded = le.fit_transform(y)  # Apple=0, Banana=1, Cherry=2

# Create OVR Logistic Regression model
model = LogisticRegression(multi_class='ovr', solver='lbfgs')
model.fit(X, y_encoded)

# Test data
X_test = np.array([
    [140, 6],  # Expected: Apple
    [12, 8],   # Expected: Cherry
    [125, 5]   # Expected: Banana
])

# Predict probabilities for each class
probs = model.predict_proba(X_test)
predictions = model.predict(X_test)

# Convert predicted labels back to original class names
predicted_classes = le.inverse_transform(predictions)

# Print results
for i, x in enumerate(X_test):
    print(f"Test Data: {x}")
    print(f"Predicted Probabilities: {probs[i]}")
    print(f"Predicted Class: {predicted_classes[i]}\n")


Test Data: [140   6]
Predicted Probabilities: [5.19151260e-01 4.80777962e-01 7.07778904e-05]
Predicted Class: Apple

Test Data: [12  8]
Predicted Probabilities: [3.41751899e-22 6.85793166e-02 9.31420683e-01]
Predicted Class: Cherry

Test Data: [125   5]
Predicted Probabilities: [3.84060828e-03 9.95473944e-01 6.85447453e-04]
Predicted Class: Banana

