# Day 39: CNN Multi-Class Classification

Welcome to Day 39!

Today You’ll Learn

1. How CNNs produce multi-class predictions  
2. What logits really represent  
3. Softmax and probability interpretation  
4. Why CrossEntropyLoss includes Softmax internally  
5. Handling class imbalance properly  
6. Top-k accuracy and when to use it  
7. Practical evaluation strategy for real-world CNNs

If you found this notebook helpful, your **<b style="color:skyblue;">UPVOTE</b>** would be greatly appreciated! It helps others discover the work and supports continuous improvement.

---

# Multi-Class Classification

## Problem Setup

You are given:

- An image $x$  
- $C$ possible classes  

Goal:

$$
\text{Predict exactly ONE class out of } C
$$

This is called:

**Single-label multi-class classification**

Classes are mutually exclusive.

Example:

- CIFAR-10 → $C = 10$  
- ImageNet → $C = 1000$

## What the CNN Produces

After feature extraction, the CNN outputs a vector:

```python
Linear(features, C)
```

Mathematically:

If final feature vector is:

$$
h \in \mathbb{R}^{d}
$$

Then:

$$
z = Wh + b
$$

Where:

* $W \in \mathbb{R}^{C \times d}$
* $b \in \mathbb{R}^{C}$

Output:

$$
z = [z_1, z_2, ..., z_C]
$$


## What Are Logits?

Logits are the raw output scores of a neural network before converting them into probabilities.

The values $z_i$ are called:

* Logits
* Raw scores
* Unnormalized predictions

Important:

$$
z_i \in (-\infty, +\infty)
$$

They are **NOT probabilities**.

They are simply the output of the final linear layer:

$$
z = Wh + b
$$

Where:

- $h$ = feature vector  
- $W$ = weight matrix  
- $b$ = bias  

### Key Properties

- Logits are real numbers  
- They are NOT probabilities  
- They do NOT sum to 1  
- They can be negative or positive

Example:

If $C = 3$:

$$
z = [2.3,\ -1.1,\ 0.7]
$$

These are just scores.

## How Prediction Works

We choose:

$$
\hat{y} = \arg\max_i z_i
$$

Meaning:

Pick the class with the largest score.

Example:

$$
[2.3,\ -1.1,\ 0.7]
$$

Largest = $2.3$
So predict class 1.

Only ranking matters.

Scale does NOT matter.

Example:

$$
[100,\ 50,\ -20]
$$

Still class 1.

## Why They Are Not Probabilities

Probabilities must:

* Be between $0$ and $1$
* Sum to $1$

Logits:

* Can be negative
* Can be large
* Do NOT sum to $1$

They are raw evidence scores.

Probabilities are obtained later using Softmax.



---

### <p style="text-align:center; color:orange; font-size:18px;">Optional: Explain $$ \hat{y} = \arg\max_i z_i $$ </p>

Let’s decode it slowly.


**Step 1. What Is $z_i$?**

The model outputs:

$$
z = [z_1, z_2, ..., z_C]
$$

Each $z_i$ is a score for class $i$.

Example (3 classes):

$$
z = [2.3,\ -1.1,\ 0.7]
$$

**Step 2. What Does “max” Mean?**

The maximum value here is:

$$
2.3
$$

That’s the largest score.

**Step 3. What Does “argmax” Mean?**

Important:

- **max** → gives the value  
- **argmax** → gives the index (position)

Example:

$$
z = [2.3,\ -1.1,\ 0.7]
$$

- max = 2.3  
- argmax = 1  

(assuming indexing starts from 1)

Because 2.3 is at position 1.

**Step 4. What Is $\hat{y}$?**

$\hat{y}$ means:

> Predicted label

So:

$$
\hat{y} = \arg\max_i z_i
$$

means:

> The predicted class is the index of the largest score.


**Super Simple Version**

The model outputs scores:

Class 1 → 2.3<br>
Class 2 → -1.1<br>
Class 3 → 0.7

Biggest score = 2.3  
So prediction = Class 1.

**One-Line Meaning**

$$
\hat{y} = \arg\max_i z_i
$$

means:

> Pick the class with the highest score.

---

# Softmax

TO BE CONTINUE...

---

<p style="text-align:center; color:skyblue; font-size:18px;">
© 2026 Mostafizur Rahman
</p>
