## **Feature Scaling in Machine Learning**

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

### 📌 **Introduction**

In this lecture, we learn about **feature scaling**, a technique that helps **gradient descent** run much faster by adjusting the scale of different features. 🚀

Let’s explore how the size of a feature (like house size in square feet) affects the values of parameters and why rescaling features makes optimization smoother. 🏡📏

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

### 🎯 **Understanding the Relationship Between Feature Size and Parameter Value**

We want to predict the price of a house using **two features**:

- **x₁**: Size of the house (in square feet) 🏠
- **x₂**: Number of bedrooms 🛏️

#### Example Data 📊

- House sizes (**x₁**) range from **300 to 2000** square feet.
- Number of bedrooms (**x₂**) ranges from **0 to 5**.

Since x₁ has a **large range** and x₂ has a **small range**, the corresponding parameters (**w₁ and w₂**) will also take different scales.

#### Scenario 1: Parameter Values

Suppose a house has:

- Size **2000** sq. ft.
- **5** bedrooms
- Actual price = **$500,000** 💰

If we choose parameter values:

- **w₁ = 50, w₂ = 0.1, b = 50**

Predicted price:
\[
P = 50 \times 2000 + 0.1 \times 5 + 50
\]
\[
P = 100,000 + 0.5 + 50 \approx 100,050,000
\]

⚠️ **This is an extremely high and incorrect price!**

#### Scenario 2: Swapping w₁ and w₂

If we choose:

- **w₁ = 0.1, w₂ = 50, b = 50**

Predicted price:
\[
P = 0.1 \times 2000 + 50 \times 5 + 50
\]
\[
P = 200 + 250 + 50 = 500,000
\]

✅ **This correctly predicts the house price!**

📌 **Key Takeaway**: When the feature range is large, the parameter is small (0.1 for house size), and when the feature range is small, the parameter is large (50 for bedrooms). ⚖️

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

### 📉 **Effect on Gradient Descent**

When plotting data:

- **x₁ (house size)** is on the **horizontal axis** 📏
- **x₂ (bedrooms)** is on the **vertical axis** 📊

Since x₁ has a **much larger range**, the **cost function contours** become **elongated ellipses** 🏛️.

🔄 **Gradient Descent Issue**

- Due to the **elongated shape**, gradient descent **bounces back and forth**, taking **longer to converge** ❌.
- If x₁ and x₂ were on **similar scales**, the contours would be **more circular**, and gradient descent would find the minimum faster. ⏩

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

### 🎯 **Solution: Feature Scaling**

**Feature scaling** transforms features so they take on similar ranges of values.

Example transformations:

- x₁: **300 - 2000** → **0 to 1**
- x₂: **0 - 5** → **0 to 1**

🔹 After scaling, gradient descent moves **directly to the minimum** instead of bouncing around. 🏆

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

### ✨ **Summary**

✅ **Feature Scaling** speeds up gradient descent by ensuring all features have similar value ranges.
✅ If one feature has a large range and another has a small range, parameters (w) will be very different.
✅ Without scaling, gradient descent takes longer due to an **elongated cost function shape**.
✅ After scaling, the cost function becomes more circular, allowing **faster convergence**. 🎯

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

### 📝 **Interactive Notes (MCQs)**

#### **1. What is the purpose of feature scaling?**

A) Reduce the number of features
B) Increase the dataset size
C) Make gradient descent run faster
D) Change the cost function

#### **2. What happens when features have very different ranges?**

A) Gradient descent runs slower
B) The model improves automatically
C) The number of epochs decreases
D) The parameters become equal

#### **3. If x₁ has a range of 300-2000 and x₂ has a range of 0-5, which parameter is likely smaller?**

A) w₁
B) w₂
C) Both will be equal
D) None of the above

#### **4. What shape does the cost function take when features have very different ranges?**

A) Circle
B) Square
C) Ellipse
D) Triangle

#### **5. What is the effect of scaling features?**

A) Slows down gradient descent
B) Makes gradient descent bounce more
C) Speeds up gradient descent
D) Has no effect

<div style="text-align:center;">
    <img src="https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png" alt="green-divider">
</div>

## ✅ **Answers**

1️⃣ **C**  
2️⃣ **A**  
3️⃣ **A**  
4️⃣ **C**  
5️⃣ **C**
