# Softmax Activation Function

The Softmax activation function is a crucial component in neural networks, particularly for multi-class classification problems. It transforms a vector of real numbers into a probability distribution over multiple classes. Here are the key points about Softmax activation:

1. Purpose: Converts raw scores (logits) into probabilities.
2. Output: Produces a vector of probabilities that sum to 1.
3. Formula: For input vector z, Softmax(z_i) = exp(z_i) / sum(exp(z_j)) for all j
4. Properties:
   - Always outputs values between 0 and 1
   - Outputs sum to 1, representing a valid probability distribution
   - Emphasizes the largest values while suppressing smaller ones
5. Use cases: 
   - Output layer of multi-class classification networks
   - Attention mechanisms in deep learning models

The Softmax function is differentiable, making it suitable for use with gradient-based optimization methods in neural network training.

![image.png](attachment:image.png)


In [3]:
import numpy as np

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# Example
logits = np.array([2.0, 1.0, 0.1])
print(softmax(logits))
print('sum', np.sum(softmax(logits)))



[0.65900114 0.24243297 0.09856589]
sum 1.0
