### **📚 Understanding Gradient Descent for Multiple Linear Regression with Vectorization**

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🚀 **Introduction**

In this lesson, we’ll learn how **Gradient Descent** works for **Multiple Linear Regression** and how we can implement it efficiently using **Vectorization**. We’ll also explore an alternative method called the **Normal Equation**.

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🧠 **What is Multiple Linear Regression?**

##### 🔹 **Basic Idea**

Multiple Linear Regression is a way to predict an **output (y)** using multiple **input features (x₁, x₂, ..., xₙ)**. Instead of just one feature like in Simple Linear Regression, we now have **n features**.

##### 🔹 **Equation Representation**

If we have **n features**, the equation for **Multiple Linear Regression** is:

$$
 f_{w, b}(x) = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b
$$

where:

- $ w_1, w_2, ..., w_n $ are the weights (parameters) of the model.
- $ b $ is the bias term.
- $ x_1, x_2, ..., x_n $ are the feature values.

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 📌 **Vectorized Representation**

Instead of treating each **w** separately, we can **combine them into a vector**:

$$
 w = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}
$$

So now, we can write our equation as:

$$
 f_{w, b}(x) = w^T x + b
$$

where **w and x are vectors** and **w^T x means the dot product** of the two.

This makes calculations much **faster** and **more efficient** with NumPy! 🚀

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🎯 **Cost Function**

To measure how well our model is performing, we use the **Mean Squared Error (MSE)**:

$$
 J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} \left( f_{w, b}(x^{(i)}) - y^{(i)} \right)^2
$$

where:

- $ m $ = number of training examples.
- $ f\_{w, b}(x^{(i)}) $ = predicted value.
- $ y^{(i)} $ = actual value.

The goal is to **minimize J(w, b)** using **Gradient Descent**.

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🔥 **Gradient Descent for Multiple Linear Regression**

##### 🔹 **Basic Idea**

Gradient Descent is an **optimization algorithm** that updates weights to minimize the cost function.

##### 🔹 **Update Rule**

We update the parameters **w and b** using the following formula:

$$
 w_j := w_j - \alpha \frac{\partial J(w, b)}{\partial w_j}
$$

$$
 b := b - \alpha \frac{\partial J(w, b)}{\partial b}
$$

where **α (alpha)** is the **learning rate** (controls step size).

##### 🔹 **Gradient Calculation**

For **each feature** $ w_j $:

$$
 \frac{\partial J(w, b)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} \left( f_{w, b}(x^{(i)}) - y^{(i)} \right) x_j^{(i)}
$$

For **b**:

$$
 \frac{\partial J(w, b)}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \left( f_{w, b}(x^{(i)}) - y^{(i)} \right)
$$

This means we update **all weights** and **b** simultaneously.

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### ⚡ **Vectorized Implementation**

Instead of using loops, we use **matrix operations**:

$$
 w := w - \alpha \frac{1}{m} X^T (Xw + b - Y)
$$

$$
 b := b - \alpha \frac{1}{m} \sum (Xw + b - Y)
$$

where:

- $ X $ = feature matrix.
- $ Y $ = target values.
- $ X^T $ = transpose of $ X $.

This **reduces computation time** significantly! ⚡

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🏆 **Alternative Method: The Normal Equation**

Instead of using **Gradient Descent**, we can compute $ w $ **directly** using:

$$
 w = (X^T X)^{-1} X^T Y
$$

##### 📉 **Why is it not used often?**

1. ❌ **Doesn’t work for other models** (Only works for Linear Regression)
2. ❌ **Computationally expensive** for large datasets
3. ✅ **But... it doesn’t need iterations**!

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🎯 **Summary**

✅ **Multiple Linear Regression** predicts an output using multiple input features.
✅ We use **vectorization** to make computations faster.
✅ The **cost function** (MSE) measures how well our model is performing.
✅ **Gradient Descent** updates weights iteratively to minimize the cost.
✅ **Vectorized Gradient Descent** makes training efficient.
✅ **The Normal Equation** finds weights in one step but is slow for large datasets.

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### 🎯 **Interactive Quiz (MCQs)**

##### **1️⃣ What is the purpose of the cost function in Linear Regression?**

A) To increase the weights 🔼  
B) To decrease the number of features 📉  
C) To measure how well the model is performing ✅  
D) To stop the training process ❌

##### **2️⃣ What does the learning rate (α) do in Gradient Descent?**

A) Controls the step size when updating weights ✅  
B) Controls the number of iterations ❌  
C) Decreases the cost function manually ❌  
D) Adjusts the number of features ❌

##### **3️⃣ Which of the following is TRUE about the Normal Equation?**

A) It requires multiple iterations ❌  
B) It is efficient for large datasets ❌  
C) It can be used for other models like Logistic Regression ❌  
D) It directly computes the optimal weights ✅

##### **4️⃣ Why is vectorization important in Machine Learning?**

A) It makes computations slower ❌  
B) It improves computational efficiency ✅  
C) It reduces the number of training examples ❌  
D) It makes models less accurate ❌

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

#### ✅ **Quiz Answers**

1️⃣ C) To measure how well the model is performing  
2️⃣ A) Controls the step size when updating weights  
3️⃣ D) It directly computes the optimal weights  
4️⃣ B) It improves computational efficiency

<div style="text-align:center;">     <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider"> </div>

### 🎉 That’s it! Now you understand **Gradient Descent for Multiple Linear Regression** and how to make it efficient with **vectorization**! 🚀
