## 🧠 Softmax Activation Function

The **Softmax function** converts a vector of logits (raw scores) into a **probability distribution** across multiple classes, where the sum of probabilities equals 1. It is commonly used in **multi-class classification**.

**Formula:**

$$
\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}
$$`

- \(z_i\) : logit for class i  
- \(K\) : number of classes  

**Key Properties:**

- Outputs probabilities: all values are positive and sum to 1  
- Highlights differences in logits via exponential scaling  
- Differentiable → usable in backpropagation  

---

### ✅ Use Cases

- Multi-class classification (e.g., image recognition, NLP tasks)  
- Final layer of neural networks to produce probabilities  

---

### 🌟 Advantages

- Produces **interpretable probabilities**  
- **Normalized output** → sum equals 1  
- Differentiable → integrates with gradient-based optimization  

---

### ⚠️ Disadvantages

- Sensitive to outliers / extreme logits  
- Can produce very small gradients → slower training  
- Computationally expensive for large numbers of classes  
- Not suitable for multi-label classification  

---

### 📈 Cross-Entropy Loss (with Softmax)

Used to train multi-class models:

$$
\text{Loss} = -\sum_{i=1}^{K} y_i \log(\hat{y}_i)
$$

- \(y_i\) : true label (1 for correct class, 0 otherwise)  
- \(\hat{y}_i\) : predicted probability from Softmax


In [3]:
import numpy as np
def softmax(x):
    exp_x = np.exp(x - np.max(x))  # for numerical stability
    return exp_x / exp_x.sum(axis=0)

# Example usage
logits = np.array([2.0, 1.0, 0.1])
probabilities = softmax(logits)
print("Softmax Probabilities:", probabilities)
    


ModuleNotFoundError: No module named 'numpy'

In [None]:
# -----------------------------
# Activation Functions
# -----------------------------
def softmax(logits):
    """
    Softmax function for multi-class classification.
    Numerical stability improved by subtracting max logit per sample.
    """
    exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
    return exp_logits / np.sum(exp_logits, axis=1, keepdims=True)

def relu(z):
    """ReLU activation: sets negative values to 0."""
    return np.maximum(0, z)

# -----------------------------
# Neural Network Class
# -----------------------------
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights and biases
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros((1, output_size))
    
    def forward(self, X):
        """
        Forward pass through the network.
        Layer 1: Hidden layer with ReLU
        Layer 2: Output layer with Softmax
        """
        self.Z1 = np.dot(X, self.W1) + self.b1
        self.A1 = relu(self.Z1)
        
        self.Z2 = np.dot(self.A1, self.W2) + self.b2
        self.A2 = softmax(self.Z2)  # Output probabilities
        
        return self.A2
    
    def compute_loss(self, Y, Y_hat):
        """
        Compute cross-entropy loss.
        Y: true labels (class indices)
        Y_hat: predicted probabilities from Softmax
        """
        m = Y.shape[0]
        log_likelihood = -np.log(Y_hat[range(m), Y])
        loss = np.sum(log_likelihood) / m
        return loss

# -----------------------------
# Example Usage
# -----------------------------
if __name__ == "__main__":
    # Initialize network: 3 input features, 5 hidden neurons, 3 output classes
    nn = NeuralNetwork(input_size=3, hidden_size=5, output_size=3)
    
    # Input data (5 samples, 3 features each)
    X = np.array([
        [1.0, 2.0, 3.0],
        [0.5, 1.5, 2.5],
        [1.5, 2.5, 3.5],
        [2.0, 1.0, 0.5],
        [3.0, 3.0, 3.0]
    ])
    
    # True labels
    Y = np.array([0, 1, 2, 1, 0])
    
    # Forward pass
    Y_hat = nn.forward(X)
    
    # Compute loss
    loss = nn.compute_loss(Y, Y_hat)
    
    print("Softmax Output (Probabilities):\n", Y_hat)
    print("Cross-Entropy Loss:", loss)
