### Softmax Function
The softmax function is defined as:

$$
\sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}
$$

-About how neural network outputs probabilities for each class.
-Comes at the end of NN, in the output layer.



In [6]:
#EXAMPLE OF SOFTMAX FUNCTION

import numpy as np

# Raw scores (logits) output by a model
logits = np.array([2.0, 1.0, 0.1])

# Compute softmax manually
exp_vals = np.exp(logits)
softmax_probs = exp_vals / np.sum(exp_vals)

print("Logits:", logits)
print("Exp_val:", exp_vals)
print("Sum of exp_val ", np.sum(exp_vals))
print("Softmax probabilities:", softmax_probs)
print("Sum of probabilities:", np.sum(softmax_probs))

Logits: [2.  1.  0.1]
Exp_val: [7.3890561  2.71828183 1.10517092]
Sum of exp_val  11.212508845465344
Softmax probabilities: [0.65900114 0.24243297 0.09856589]
Sum of probabilities: 1.0


### One hot encoding
-This converts integer class labels → vectors of 0s and 1s.
-How we encode the target classes (correct answer).
-Comes before training, as preprocessing for labels (y_train).


In [1]:
#EXAMPLE OF ONE-HOT CODING
from tensorflow.keras.utils import to_categorical
import numpy as np

# Suppose we have 5 samples with class labels from 0 to 3
y = np.array([0, 2, 1, 3, 2])

# One-hot encode
y_onehot = to_categorical(y, num_classes=4)

print("Original labels:\n", y)
print("One-hot encoded:\n", y_onehot)

Original labels:
 [0 2 1 3 2]
One-hot encoded:
 [[1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [0. 0. 1. 0.]]


### ONE HOT ENCODING & SOFTMAX WORK TOGETHER
✅ The smaller the loss, the closer the predicted probability is to the true class.

In [2]:
# True label (one-hot encoded)
y_true = np.array([0, 1, 0])   # class 1 is correct

# Model prediction (softmax output)
y_pred = np.array([0.1, 0.7, 0.2])

# Cross-entropy loss (manual calculation)
loss = -np.sum(y_true * np.log(y_pred + 1e-9))  # small epsilon to avoid log(0)
print("Cross-entropy loss:", loss)

Cross-entropy loss: 0.3566749425101611
