## What is Activation Function

An **activation function** in a neural network decides whether a neuron should be activated (i.e., produce an output) or not, based on its input. It helps the network learn complex patterns by introducing **non-linearity** into the model.

Without activation functions, the network would only be able to learn linear relationships, no matter how many layers it has.

### Common Activation Functions:

*  **ReLU (Rectified Linear Unit):** Outputs zero if the input is negative; otherwise, it outputs the input directly.
    - Example: If input is -3 → Output is 0; If input is 5 → Output is 5.
    - Why use it?
        - It’s simple and computationally efficient.
        - It helps the network learn faster by solving the vanishing gradient problem in deep networks.
        - Works well for most hidden layers.
    
######
    
- **Sigmoid:** Converts any input into a value between 0 and 1.
    - Example: Input = 2 → Output ≈ 0.88.
    - Why use it?
        - It’s suitable for binary classification problems (e.g., yes/no, spam/not spam).
        - It outputs probabilities between 0 and 1, making it easy to interpret.
    
######

* **Tanh (Hyperbolic Tangent):** Converts input to a value between -1 and 1.
    - Example: Input = -2 → Output ≈ -0.96.
    - Why use it?
        - It’s preferred when the data is centered around zero, as it outputs values between -1 and 1.
        - It makes gradients stronger than Sigmoid, especially for negative values.
    
######

- **Leaky ReLU (Leaky Rectified Linear Unit):**
    - Problem it solves: Regular ReLU outputs zero for all negative values, which can make neurons inactive during training.
    - Solution: Leaky ReLU allows a small negative slope for negative inputs instead of zero.
    - Example:
        - Input = -3 → Output ≈ -0.03 (for α=0.01)
        - Input = 5 → Output = 5
    - Why use it?
        - It solves the dead neuron problem by allowing a small output for negative inputs.
        - It’s a good alternative to ReLU for deep networks.
        
######
        
* **Softmax:**
    - Purpose: Used in the output layer of classification models to handle multi-class problems.
    - How it works: Converts raw output values (logits) into probabilities that sum to 1.
    - Formula:
        - f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} ]
        
    - Example:
        - Suppose a model predicts raw values: [2.0, 1.0, 0.1].
        - The Softmax output would be: [0.71, 0.19, 0.10], meaning the first class has the highest probability (71%).

    - Why use it?
        - It’s great for **multi-class classification**, as it clearly shows which class the model is mos
        - These functions are essential for neural networks to learn and make accurate predictions.
        
![image.png](attachment:image.png)