
---

# ⚡️ What is an **Activation Function**? (Like You’re 5)

---

## 🧸 **1. Simple Definition**

An **activation function** is like a **decision button** inside a neuron 🧠.

It decides:
➡️ **Should I wake up and do something?**
➡️ **Or stay asleep and ignore this input?**

---

## 🍔 **2. Real-Life Analogy**

### Imagine a burger machine:

* The machine gets ingredients (numbers).
* But it only starts **cooking** if the ingredients are **good enough**.
* That **decision** is made by the **activation function**!

---

## 🧠 **3. Why Do We Need It?**

Without an activation function, your neural network is just a **big calculator** doing boring math ➕➖.

But we want the network to:

✅ Make decisions
✅ Learn patterns
✅ Think like a brain 🧠

So we need **activation functions** to **add intelligence** 🧙.

---

## 🏗️ **4. Where Does It Fit in the Neuron?**

Let’s revisit a neuron:

```
       Inputs (x1, x2...)  
           ↓  
      Weighted Sum: (W1*x1 + W2*x2 + ... + b)  
           ↓  
➡️ Activation Function (e.g., ReLU, Sigmoid)  
           ↓  
       Output (Yes or No, Cat or Dog, etc.)
```

---

## 🔢 **5. Common Types of Activation Functions**

### 1. 🧼 **Sigmoid** (Soft Yes/No)

* **Formula**: `1 / (1 + e^-x)`
* Turns big numbers into values between **0 and 1**.
* Like saying, “I’m **70% sure** this is a cat.”
* Used in **binary problems** (Yes/No).

```
Input = 2  → Output = 0.88  
Input = -2 → Output = 0.12
```

🎨 Looks like: S-curve (soft curve going from 0 to 1)

---

### 2. 🔥 **ReLU (Rectified Linear Unit)** — Most Used!

* **Formula**: `f(x) = max(0, x)`
* If x > 0 → keep it
* If x < 0 → turn it to 0

💡 Like saying:

> "I only care about **positive signals**. Ignore the bad ones."

📊 Fast and works great in deep networks!

```
Input = 5   → Output = 5  
Input = -3  → Output = 0
```

---

### 3. 📈 **Tanh (Hyperbolic Tangent)**

* Like sigmoid, but outputs between **-1 and 1**.
* Better when you want **negative signals** too.

🎨 Looks like: S-curve from -1 to 1

---

### 4. 🧠 **Softmax** (Multi-Class Decisions)

* Used when your output is like:

  * "Is this a cat, dog, or rabbit?"
* It gives **probabilities** for each class.

```
[0.1, 2.0, 0.5] → [0.1%, 80%, 19.9%]
```

---

## 🧮 **6. Table Summary**

| Function    | Output Range   | Used For           | Shape              |
| ----------- | -------------- | ------------------ | ------------------ |
| **Sigmoid** | 0 to 1         | Yes/No problems    | S-curve            |
| **ReLU**    | 0 to ∞         | Most deep layers   | Step upwards       |
| **Tanh**    | -1 to 1        | Centered data      | S-curve            |
| **Softmax** | 0 to 1 (sum=1) | Multi-class output | Probability vector |

---

## 💡 7. In Short

| Without Activation Function | With Activation Function |
| --------------------------- | ------------------------ |
| Just boring math            | Real decisions           |
| No learning possible        | Can learn patterns       |
| Like a calculator           | Like a smart brain       |

---

## 👶 TL;DR (Too Long; Drawn Really-Simple)

> A neuron gets a number, the **activation function** decides:
> 👉 "Is this useful?"
> 👉 "Should I pass it forward?"
> That’s it! 😄

---



![image.png](attachment:image.png)
# Sigmoid Function gives you a smooth curve between 0 to 1 unlike step function as it gives only 0 or 1

![image.png](attachment:image.png)
## This is tanh function same as sigmoid but it ranges between -1 to 1

**Use sigmoid in output layer. All other places try to use tanh**

![image.png](attachment:image.png)
The vanishing gradient problem is a significant issue in deep neural network training where gradients, used to update model weights, become extremely small as they are propagated backward through the network. This can lead to slow or stalled training, especially in deeper layers. 

So to avoid this we have **ReLu**

---
For hidden layers if you are
not sure which activation
function to use, just use ReLU
as your default choice
![image-2.png](attachment:image-2.png)
------

but it also has vanishing gradient problem we have another ReLU which is **Leaky ReLU**

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# Let's Code

In [10]:
import math

def sigmoid(x):
    return 1/(1+math.exp(-x))

sigmoid(1)

0.7310585786300049

In [11]:
sigmoid(-56)

4.780892883885469e-25

In [12]:
sigmoid(0.5)

0.6224593312018546

In [13]:
def tanh(x):
    return (math.exp(x) - math.exp(-x)) / (math.exp(x) + math.exp(-x))

In [14]:
tanh(1)

0.7615941559557649

In [17]:
tanh(-1)

-0.7615941559557649

In [18]:
def relu(x):
    return max(0,x)

In [21]:
relu(-6)

0

In [22]:
relu(19)

19

In [23]:
def leaky_relu(x):
    return max(0.1*x,x)

In [24]:
leaky_relu(-5)

-0.5

In [27]:
leaky_relu(100)

100