# Introduction to Principal Component Analysis (PCA)

Hello guys,  

We are going to start a new machine learning algorithm called **Principal Component Analysis (PCA)**, also known as **dimensionality reduction**.

---

## Why PCA? Understanding the Curse of Dimensionality

Before understanding PCA, we need to understand **why we should use PCA** and **what problem it solves** — the **curse of dimensionality**.

---

### Curse of Dimensionality Example

Suppose we have multiple machine learning models:

- Model \( M_1 \)  
- Model \( M_2 \)  
- Model \( M_3 \)  
- Model \( M_4 \)  
- Model \( M_5 \)  
- Model \( M_6 \)  

We have a dataset with **500 features**.  

**Dimensionality = Number of features**  

We want to **predict house prices**, with features such as:

- House size  
- Number of bedrooms  
- Number of bathrooms  
- Other features (totaling 500)  

---

### Accuracy vs Number of Features

| Model | Number of Features | Accuracy        |
|-------|-----------------|----------------|
| M1    | 3               | Accuracy 1     |
| M2    | 6               | Accuracy 2 > Accuracy 1 |
| M3    | 15              | Accuracy 3 > Accuracy 2 |
| M4    | 50              | Accuracy 4 < Accuracy 3 |
| M5    | 100             | Accuracy 5 < Accuracy 4 |
| M6    | 500             | Accuracy 6 < Accuracy 5 |

**Observation:**

- Initially, increasing the number of features improves accuracy.  
- After a certain point, adding more features **decreases accuracy**.  
- This happens because the model gets **overfitted** and confused by less important or redundant features.  

---

### Intuition: Human Analogy

Imagine asking a person to estimate a house price:

1. You give **Location** → Person guesses $450k–$500k  
2. Add **3 BHK requirement** → Price updated $500k–$600k  
3. Add **Beach proximity** → Price increases  
4. Add **Celebrity neighbor** → Price increases further  
5. Add **Nearby grocery shops, schools, etc.** → Person gets confused → cannot accurately predict  

**Lesson:** Too many features confuse the model or expert.  

This illustrates the **curse of dimensionality**.

---

## How to Prevent the Curse of Dimensionality

Two main approaches:

1. **Feature Selection**

   - Select the **most important features**.  
   - Train the model using these features only.  

2. **Dimensionality Reduction (Feature Extraction)**

   - Derive **new features** from the original features.  
   - Capture the **essence/variance** of the original features in fewer dimensions.  
   - This is what PCA does.

---

### Feature Extraction Example

Suppose original features are \( F_1, F_2, F_3 \).  

PCA can derive new features:

- \( D_1, D_2 \) (lesser dimensions)  

These **new features** capture the important information from \( F_1, F_2, F_3 \) and can be used to predict the output effectively.

---

### Summary

- **Curse of Dimensionality:** Too many features confuse the model and degrade performance.  
- **Solution:**  
  1. Feature Selection → Pick important features  
  2. Dimensionality Reduction → Feature Extraction (PCA)  

In the next session, we will **deep dive into PCA**, including:

- Geometric intuition  
- Mathematical explanation  
- Practical implementation  



# Feature Selection vs Feature Extraction

In this section, we will discuss the differences between **feature selection** and **feature extraction**.  
Both are techniques used in **dimensionality reduction**, which helps us reduce the number of features or extract important features from the existing dataset.

---

## Why Perform Dimensionality Reduction?

Dimensionality reduction is performed for several reasons, often asked in interviews:

1. **Prevent Curse of Dimensionality**  
2. **Improve Model Performance**  

   - More features (dimensions) → More computation  
   - Training time increases  
   - Dimensionality reduction improves model efficiency  

3. **Data Visualization**  

   - Humans can visualize up to **3D**  
   - High-dimensional data (e.g., 100D) cannot be visualized  
   - Reduce dimensions to **2D or 3D** to better understand the data  

**Summary:** Dimensionality reduction helps with understanding data, improving performance, and preventing overfitting.

---

## Feature Selection

**Definition:** Process of selecting the most important features that help predict the output.  

### Example: Relationship Between Features

Suppose:

- Input feature: \( X \)  
- Output feature: \( Y \)  

Types of relationships:

1. **Positive Linear Relationship:**  
   - As \( X \) increases, \( Y \) increases  
   - As \( X \) decreases, \( Y \) decreases  

2. **Negative Linear Relationship:**  
   - As \( X \) increases, \( Y \) decreases  
   - As \( X \) decreases, \( Y \) increases  

3. **No Relationship:**  
   - \( X \) and \( Y \) are independent  

---

### Quantifying Relationships

#### Covariance

The **covariance** between \( X \) and \( Y \) is:

$$
\text{Cov}(X, Y) = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}
$$

- **Positive** → Positive linear relationship  
- **Negative** → Negative linear relationship  
- **Zero** → No relationship  

#### Pearson Correlation

$$
r_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
$$

- Ranges between **-1 and +1**  
- Close to +1 → Strong positive correlation  
- Close to -1 → Strong negative correlation  
- Close to 0 → No correlation  

> Covariance and correlation help identify **important features** for feature selection.

---

### Housing Dataset Example

Features:

- Independent: `house_size`, `fountain_size`  
- Dependent: `price`  

**Observation:**

- `house_size` → Strong linear relationship with price → **important feature**  
- `fountain_size` → Weak or no relationship → **can be dropped**  

This illustrates **feature selection**: keeping the most relevant features.

---

## Feature Extraction

**Definition:** Creating new features from existing features, useful when **all original features are important**.  

### Example

Suppose features:

- `room_size`  
- `number_of_rooms`  
- Output: `price`  

Goal: Reduce from **2 features → 1 feature**  

- Both features are important → cannot drop any  
- Apply a **transformation** to combine features:  

$$
\text{house_size} = f(\text{room_size}, \text{number_of_rooms})
$$

- New feature: `house_size`  
- Still predictive of `price`  

> This is **feature extraction**: deriving new features to reduce dimensions while retaining information.

---

### Key Differences

| Aspect                  | Feature Selection                     | Feature Extraction                  |
|-------------------------|-------------------------------------|------------------------------------|
| Goal                    | Select important features            | Derive new features                 |
| Method                  | Drop irrelevant features             | Transform/combine features          |
| Use Case                | Some features are irrelevant         | All features are relevant           |
| Example                 | Drop `fountain_size`                 | Combine `room_size` & `num_rooms` → `house_size` |

---

## Summary

- **Dimensionality Reduction** helps prevent the curse of dimensionality, improve performance, and visualize data  
- **Feature Selection**: Keep most important features (covariance/correlation)  
- **Feature Extraction**: Create new features from existing ones (PCA, transformations)  

Next, we will discuss **Principal Component Analysis (PCA)** and its **geometric intuition**.

