# 📘 Introduction to Multiclass Logistic Regression for AI Beginners

Welcome to this 2-hour session on Multiclass Logistic Regression! This is a powerful and popular machine learning algorithm used to classify data into **three or more** categories. 

### What is Multiclass Logistic Regression?

It's a classification algorithm that predicts the probability of something belonging to one of several different classes. Think of it as an extension of *Binary* Logistic Regression, which can only handle two classes (like 'Yes' or 'No'). 

For example, we can use it to:
*   Classify news articles into topics like "Sports," "Politics," or "Technology."
*   Identify the species of a flower from its measurements.
*   Analyze sentiment as "Positive," "Negative," or "Neutral."

### 🎯 Learning Objectives for Today:

By the end of this session, you will understand:
1.  **What** multiclass classification is and why it's useful.
2.  The **One-vs-Rest (OvR)** strategy for handling multiple classes.
3.  The **Multinomial (Softmax) Regression** approach.
4.  How to **implement** these methods in Python using the `scikit-learn` library.
5.  How to **evaluate** your classification models.

## Topic 1: The One-vs-Rest (OvR) Strategy

📄 **Explanation**

The One-vs-Rest (OvR) method is a clever way to use a simple binary classifier (which only knows how to separate two groups) to solve a multiclass problem.

**The Logic is simple:** Break down one big multiclass problem into several smaller binary problems.

Imagine you have 3 classes: **Cat, Dog, Bird**.

OvR will train **3 separate models**:
1.  **Model 1:** Learns to distinguish **Cat** (`Class 1`) vs. **'Not Cat'** (`Rest`: Dog, Bird).
2.  **Model 2:** Learns to distinguish **Dog** (`Class 2`) vs. **'Not Dog'** (`Rest`: Cat, Bird).
3.  **Model 3:** Learns to distinguish **Bird** (`Class 3`) vs. **'Not Bird'** (`Rest`: Cat, Dog).

When you have a new animal to classify, you show it to all three models. Each model gives a probability score. The model that gives the highest score wins, and its class is assigned as the final prediction! 

💡 **Fun Fact:** This is a very intuitive and common strategy. Because it trains independent models, it can sometimes be slower if you have a huge number of classes.

### 🧠 Practice Task 1

You are building a model to classify handwritten digits from 0 to 9. If you use the One-vs-Rest (OvR) strategy, how many binary models will you need to train? What would the first model (for digit '0') be trained to predict?

## Topic 2: Multinomial Logistic Regression (Softmax Regression)

📄 **Explanation**

This is a more direct and often more effective approach. Instead of training multiple independent models, Multinomial or **Softmax Regression** trains a **single, unified model** that handles all classes at once.

**How it works:**

1.  **Calculate Scores (Logits):** For a new input, the model first calculates a raw score for each class. This score is called a 'logit'.
2.  **Apply the Softmax Function:** The magic happens here! The **Softmax function** takes all the raw scores and squashes them into a set of probabilities that add up to 1.0 (or 100%).

This is powerful because the classes are considered together. Increasing the probability of one class will automatically decrease the probability of the others, which makes sense because an object can usually only belong to one class at a time.

#### Numerical Example Walkthrough
Imagine our model gives these scores (logits) for a new animal: `Cat = 1.5`, `Dog = 7.0`, `Bird = -1.4`.

The Softmax function turns these into probabilities:
*   P(Cat) ≈ 0.4%
*   P(Dog) ≈ 99.6%
*   P(Bird) ≈ 0.02%

Since 'Dog' has the highest probability, the model's final prediction is **Dog**. ✅

### 🧠 Practice Task 2

A model gives you the following logits for three news article categories: `Sports = 3.0`, `Politics = 0.5`, `Technology = 1.5`.

Without doing the full calculation, which category will the Softmax function give the highest probability to? Why?

## Topic 3: Python Implementation with Scikit-Learn

📄 **Explanation**

Enough theory! Let's build some models. We will use the famous **Iris dataset**, which contains measurements for 3 different species of iris flowers. Our goal is to train a model that can predict the species based on these measurements.

We will build **both** an OvR and a Softmax model to see them in action.

In [1]:
# Step 1: Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Step 2: Load the Iris dataset
iris = load_iris()
X = iris.data  # The features (flower measurements)
y = iris.target # The labels (flower species: 0, 1, or 2)

# Step 3: Split the data into training and testing sets
# We train the model on the training set and test its performance on the unseen testing set.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Data loaded and split successfully!")
print("Shape of training data:", X_train.shape)
print("Shape of testing data:", X_test.shape)

Data loaded and split successfully!
Shape of training data: (105, 4)
Shape of testing data: (45, 4)


💻 **Code Example: Training the One-vs-Rest (OvR) Model**

In `scikit-learn`, we just need to set `multi_class='ovr'`.

In [2]:
# Create and train the OvR model
# We set max_iter=200 to ensure the model has enough iterations to find the best solution.
ovr_model = LogisticRegression(multi_class='ovr', max_iter=200)
ovr_model.fit(X_train, y_train)

# Make predictions on the test data
ovr_pred = ovr_model.predict(X_test)

# Evaluate the model
print("--- One-vs-Rest (OvR) Model Evaluation ---")
print(classification_report(y_test, ovr_pred, target_names=iris.target_names))

--- One-vs-Rest (OvR) Model Evaluation ---
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      0.85      0.92        13
   virginica       0.87      1.00      0.93        13

    accuracy                           0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45





💻 **Code Example: Training the Multinomial (Softmax) Model**

Now, let's train the Softmax model by setting `multi_class='multinomial'`.

In [3]:
# Create and train the Softmax model
# The 'lbfgs' solver is a good default for multinomial problems.
softmax_model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=200)
softmax_model.fit(X_train, y_train)

# Make predictions on the test data
softmax_pred = softmax_model.predict(X_test)

# Evaluate the model
print("--- Softmax (Multinomial) Model Evaluation ---")
print(classification_report(y_test, softmax_pred, target_names=iris.target_names))

--- Softmax (Multinomial) Model Evaluation ---
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        13
   virginica       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45





You'll notice both models perform perfectly on this simple dataset! In more complex, real-world scenarios, one might perform better than the other.

💻 **Code Example: Predicting Probabilities for a New Flower**

Let's see what our Softmax model predicts for a new flower with specific measurements. We can also see the probabilities for each class!

In [4]:
# A new flower with measurements [sepal length, sepal width, petal length, petal width]
# These measurements are typical for a 'setosa' flower.
new_flower = np.array([[5.1, 3.5, 1.4, 0.2]])

# Predict the probabilities for each class
softmax_probs = softmax_model.predict_proba(new_flower)

# Get the final predicted class name
predicted_class_index = softmax_model.predict(new_flower)[0]
predicted_class_name = iris.target_names[predicted_class_index]

print(f"Probabilities for new flower (Setosa, Versicolor, Virginica): {softmax_probs}")
print(f"Predicted class: {predicted_class_name}")

Probabilities for new flower (Setosa, Versicolor, Virginica): [[9.73074237e-01 2.69256274e-02 1.35872683e-07]]
Predicted class: setosa


### 🧠 Practice Task 3

🧪 **Time to experiment!**
1. Create a new code cell below.
2. Define a new flower called `my_flower`. A typical 'virginica' flower might have measurements like `[[6.7, 3.0, 5.2, 2.3]]`.
3. Use the `softmax_model.predict()` and `softmax_model.predict_proba()` functions on your `my_flower`.
4. Print the results. Did the model correctly classify it as 'virginica'?

## 🎉 Final Revision Assignment 🎉

Let's test your knowledge! Try to complete these tasks to solidify what you've learned. These are great for home practice.

#### **Task 1: Multiple Choice**
Which function is used in Multinomial Logistic Regression to convert model scores into a probability distribution across multiple classes?
A. Sigmoid
B. ReLU
C. Tanh
D. Softmax

#### **Task 2: Multiple Choice**
In the One-vs-Rest (OvR) strategy for a problem with 5 classes, how many binary classifiers are trained?
A. 1
B. 5
C. 10
D. 25

#### **Task 3: Short Question**
Briefly explain the main difference between how the One-vs-Rest (OvR) and Softmax Regression approaches handle a multiclass problem.

#### **Task 4: Calculation Problem**
A multiclass logistic regression model outputs the following logits for three classes: `Class A = 2.0`, `Class B = 1.0`, `Class C = -1.0`.

Calculate the probability for **Class A** using the Softmax formula. (You'll need a calculator for e^x).

*   e^2.0 ≈ 7.39
*   e^1.0 ≈ 2.72
*   e^-1.0 ≈ 0.37

**Formula:** P(A) = e^(score_A) / (e^(score_A) + e^(score_B) + e^(score_C))

#### **Task 5: Coding Challenge!**

Let's work with a new dataset: handwritten digits! Your task is to train a model to recognize digits from 0-9.

In a new code cell, follow these steps:
1. **Import** `load_digits` from `sklearn.datasets`.
2. **Load** the dataset: `digits = load_digits()`.
3. **Split** the data (`digits.data`, `digits.target`) into training and testing sets.
4. **Train** a `LogisticRegression` model using the **Softmax** (`multinomial`) approach.
5. **Evaluate** your model using the `classification_report`. How well did it do?

In [None]:
# Your code for the Coding Challenge goes here!
# from sklearn.datasets import load_digits

# Step 1: Load the data


# Step 2: Split the data


# Step 3: Train the Softmax model


# Step 4: Evaluate and print the report



### ✅ Well done! 
You have completed the introduction to Multiclass Logistic Regression. You are now equipped with the knowledge to tackle multiclass classification problems!