<a href="https://colab.research.google.com/github/samiha-mahin/A_Deep_Learning_Repo/blob/main/Introduction_To_Deep_Learnig.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Forward Propagation** and **Backward Propagation**:



## 🎯 Goal:

Let’s say you built a simple neural network that predicts whether a student will pass or fail based on:

* **Hours studied**
* **Sleep hours**

You want the model to learn how these two inputs affect the output: “Pass” or “Fail”.

---

## 🧠 Your Neural Network

Let’s say you have this tiny neural network:

```
Input Layer:         Hidden Layer:        Output Layer:
[hours_studied] ---> (Neuron1)          ---> (Prediction)
[sleep_hours]    ---> (Neuron2)
```

You’ll use:

* Activation function: Sigmoid (squashes numbers between 0 and 1)
* Output close to **1 = Pass**, close to **0 = Fail**

---

## ⚙️ Forward Propagation (How the network makes a prediction)

This is like **guessing the result** based on input.

### Suppose:

* `hours_studied = 2`
* `sleep_hours = 5`

The model starts with **random weights** (which it will later adjust):

* Weight for hours\_studied = `0.4`
* Weight for sleep\_hours = `0.3`
* Bias = `0.2`

### Step 1: Multiply inputs with weights

```
z = (2 × 0.4) + (5 × 0.3) + 0.2
  = 0.8 + 1.5 + 0.2 = 2.5
```

### Step 2: Apply activation (Sigmoid function)

```
Sigmoid(2.5) ≈ 0.92
```

So, the model predicts:

> “There’s a 92% chance the student will **Pass**.”

🎉 That’s **Forward Propagation** — the model uses the input, weights, and bias to make a prediction.

---

## 🔁 Backward Propagation (How the network **learns**)

Now suppose the **actual result** was:

> The student **Failed** (i.e., actual = 0)

So, your model predicted `0.92` but the correct answer was `0`.
It made a big mistake!

---

### Step 1: Calculate **Loss**

Loss is like the penalty for being wrong.
We use something like Mean Squared Error:

```
Loss = (Predicted - Actual)²
     = (0.92 - 0)² ≈ 0.85
```

High loss means "bad prediction."

---

### Step 2: Backward Propagation = Fixing the Mistake

Here’s what happens:

1. The model figures out **which weights caused the mistake**.
2. It adjusts each weight **a little** using gradients (how sensitive the loss is to each weight).
3. This is done using something called **Gradient Descent** (a method to reduce the loss).
4. Updated weights make the model better next time.

### Example: Updating one weight

Let’s say:

* The weight for `hours_studied` caused more error than `sleep_hours`
* So we **reduce** that weight from `0.4` → `0.3`

Next time, the prediction will move closer to the actual result.

This is **learning**.

---

## 🔁 Repeat

Forward → Predict
Backward → Fix mistakes
Repeat for many students (many epochs), and the network gets really smart!

---

## 🧠 Summary Table

| Step      | Forward Propagation                | Backward Propagation                       |
| --------- | ---------------------------------- | ------------------------------------------ |
| Purpose   | Make a prediction                  | Learn from the mistake                     |
| Data Flow | Input → Output                     | Output → Error → Input (adjust weights)    |
| Uses      | Weights, bias, activation function | Loss function, gradients, optimizer        |
| Outcome   | A guess                            | Improved weights for better future guesses |



# **Loss Function**



## 💥 What is a Loss Function?

A **loss function** tells us **how wrong** the model’s prediction is.

Think of it like this:

> 🧠 “Hey model, you guessed 92%, but the real answer was 0. That’s way off! Here’s how badly you messed up.”

The loss function **quantifies the mistake**.

---

## 🔢 Simple Example

Imagine this case:

| Input (Hours Studied) | Actual Result | Model’s Prediction |
| --------------------- | ------------- | ------------------ |
| 2                     | 0 (Fail)      | 0.92               |

We use a **loss function** to measure the difference between:

* Actual = 0
* Predicted = 0.92

---

## 🔧 Common Loss Functions (Simplified)

### 1. **Mean Squared Error (MSE)**

Used in **regression** (predicting numbers)


$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\text{predicted}_i - \text{actual}_i)^2
$$

* $n$ = number of data points
* $\text{predicted}_i$ = the prediction for the i-th data point
* $\text{actual}_i$ = the actual true value for the i-th data point



###  Example:

Let's say we have **just one prediction**:

| Predicted | Actual | Calculation          |
| --------- | ------ | -------------------- |
| 0.92      | 0      | (0.92 - 0)² = 0.8464 |

Since there’s only **one data point**, $n = 1$

$$
\text{MSE} = \frac{1}{1} \times (0.92 - 0)^2 = (0.92)^2 = 0.8464
$$

✅ So the **error** for this prediction is **0.8464**.

---

### 2. **Binary Cross Entropy**

Used for **binary classification** (like Pass/Fail, Yes/No)

```text
Loss = - [ y * log(p) + (1 - y) * log(1 - p) ]
```

* `y` is actual (0 or 1)
* `p` is predicted probability (like 0.92)

#### Example:

If actual = 0, and predicted = 0.92:

```
Loss = - [0 * log(0.92) + (1 - 0) * log(1 - 0.92)]
     = - log(0.08) ≈ 2.525
```

🔴 Very high! Because the model was confident about a wrong answer.

---

### 3. **Categorical Cross Entropy**

Used when there are **more than 2 classes** (e.g., Dog, Cat, Bird)

---

## 🧠 In Simple Words:

| Loss Function            | Use For                    | What it Measures                         |
| ------------------------ | -------------------------- | ---------------------------------------- |
| MSE                      | Regression                 | Squared difference between guess & truth |
| Binary CrossEntropy      | Binary classification      | How confident and correct the guess was  |
| Categorical CrossEntropy | Multi-class classification | Confidence over many options             |

---

## 📉 Why It’s Important

The model uses the **loss** to:

* Know **how bad** its predictions are
* Use **backpropagation** to fix its weights
* Improve predictions over time

> Loss is like a teacher giving a grade — the model learns from it.




# **Activation Function**


## 🧠 What is an Activation Function?

An **activation function** decides **whether a neuron should "fire"** (pass info forward) or not — kind of like a filter for deciding what's important.

> Without it, a neural network would just be a boring linear equation. Activation adds **learning power** and **complexity**.

---

## 🔌 Simple Analogy:

Think of a **light switch**:

* If the signal is strong enough → turn ON
* If weak → stay OFF

An activation function acts **like a smart switch** inside each neuron.

---

## 📊 Why is it needed?

* It adds **non-linearity** — lets the network learn complex patterns.
* Helps the model learn things like images, voices, language, etc.

---

## 🔧 Common Activation Functions (with Simple Examples)

### 1. **ReLU (Rectified Linear Unit)**

```python
f(x) = max(0, x)
```

* If input is positive → keep it
* If input is negative → set it to 0

#### Example:

```
f(5) = 5  
f(-3) = 0
```

✅ Very popular because it’s fast and works well.

---

### 2. **Sigmoid**

```python
f(x) = 1 / (1 + e^(-x))
```

* Squashes output between **0 and 1**
* Good for binary classification

#### Example:

```
f(3) ≈ 0.95 (high confidence)
f(-3) ≈ 0.05 (low confidence)
```

---

### 3. **Tanh (Hyperbolic Tangent)**

```python
f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
```

* Output is between **-1 and 1**
* Better than sigmoid in many cases

#### Example:

```
f(2) ≈ 0.96  
f(-2) ≈ -0.96
```

---

## 🔁 Quick Summary Table

| Activation | Output Range | Use Case                                      | Example Input | Output |
| ---------- | ------------ | --------------------------------------------- | ------------- | ------ |
| ReLU       | 0 to ∞       | Hidden layers (general)                       | -3            | 0      |
| Sigmoid    | 0 to 1       | Binary output layer                           | 3             | 0.95   |
| Tanh       | -1 to 1      | Hidden layers (sometimes better than sigmoid) | -2            | -0.96  |

---

## 🧠 In Real Life:

Let’s say:

* Your neuron calculates a raw score of `-3`
* Without activation → passes `-3` forward
* With ReLU → passes `0` (ignores it)
* With Sigmoid → passes `~0.05` (low confidence)
* With Tanh → passes `~ -0.96` (strong negative signal)

So the activation controls **how much signal is passed on.**

---


## 🧠 When to Use Which Activation Function (Simple Guide)

| Activation Function | When to Use It                                       | Why                                                                       |
| ------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------- |
| **ReLU**            | Hidden layers of deep networks                       | Fast, simple, and works really well. Stops negative values. Most popular! |
| **Leaky ReLU**      | If ReLU is giving 0s too much (dead neurons problem) | Like ReLU, but keeps small negative values (helps learning continue)      |
| **Sigmoid**         | Output layer for **binary classification** (0 or 1)  | Converts to probability (between 0 and 1)                                 |
| **Tanh**            | Hidden layers if your data has negative values       | Better than sigmoid because it outputs between -1 and 1                   |
| **Softmax**         | Output layer for **multi-class classification**      | Turns output into probabilities that add up to 1 (e.g., dog/cat/bird)     |

---

## 🧪 Examples to Remember

### ✅ Use ReLU in hidden layers:

```text
For almost any deep neural network.
```

### ✅ Use Sigmoid at the output:

```text
If you're predicting:
- Spam or not spam
- Cancer or no cancer
- Pass or fail
```

### ✅ Use Softmax at the output:

```text
If you're classifying:
- Dog, Cat, or Bird
- Apple, Banana, or Orange
```

### ✅ Use Tanh in hidden layers (optional):

```text
If your data is centered around zero (e.g., -1 to 1)
```

---

## 🔁 One-Line Summary

> ✅ Use **ReLU** in hidden layers,
> ✅ Use **Sigmoid** for binary output,
> ✅ Use **Softmax** for multi-class output.


