# Dimensionality Reduction

__Notebook Author__: Hamed Qazanfari

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hamedmit/Classic-ML-Algorithms/blob/main/Unsupervised_Learning/Dimensionality_Reduction/Introduction_and_Applications.ipynb)
[![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/hamedmit/Classic-ML-Algorithms/blob/main/Unsupervised_Learning/Dimensionality_Reduction/Introduction_and_Applications.ipynb)

---
## 1. Introduction to Dimensionality Reduction

### What is Dimensionality Reduction?

In machine learning and data analysis, we often deal with datasets that have many features (dimensions). While more features can provide more information, high-dimensional data introduces several challenges:

- **Curse of Dimensionality**: As dimensions increase, the feature space grows exponentially, making data sparse. This sparsity weakens machine learning models, making it harder to find patterns.
- **Computational Complexity**: More dimensions mean higher computational costs and longer processing times.
- **Overfitting**: Models trained on high-dimensional data may capture noise instead of meaningful patterns.
- **Difficulty in Visualization**: Beyond three dimensions, data visualization becomes difficult, limiting our ability to explore patterns intuitively.

**Dimensionality Reduction** simplifies data by reducing the number of features while retaining essential information. This improves model performance and makes data easier to handle.

Dimensionality reduction techniques fall into two main categories:

- **Feature Selection**: Choosing the most important features while discarding irrelevant ones.
- **Feature Extraction**: Transforming data into a lower-dimensional space (e.g., PCA) by creating new features that capture the most variance.

## 2. Applications of Dimensionality Reduction

Dimensionality reduction has several important applications, including:

- **Data Visualization**: Reducing data to two or three dimensions allows for better pattern recognition, clustering, and anomaly detection.
- **Noise Reduction**: Removing irrelevant features helps in reducing noise and improving model accuracy.
- **Feature Extraction**: Identifies and combines features that best represent the data’s variance.
- **Faster Computation**: Fewer dimensions mean reduced processing time, improving model efficiency.
- **Overfitting Prevention**: Lower-dimensional data reduces model complexity, improving generalization to new data.

### 2.1 Data Visualization

**Dataset**: MNIST Handwritten Digits Dataset

**Description**:

The MNIST dataset consists of 70,000 images of handwritten digits (0-9), each of size 28x28 pixels, resulting in a 784-dimensional feature space. Visualizing this high-dimensional data is challenging.

**Objective**:

- Use PCA to reduce the dimensionality of the MNIST dataset from 784 to 2 dimensions.
- Visualize the data in 2D to observe patterns and clusters.

**Visualization**:

![MNIST PCA Visualization](pics/mnist_pca.png)

### 2.2 Noise Reduction

**Dataset**: Fashion MNIST Dataset with Added Noise

**Description**:

The Fashion MNIST dataset is similar to MNIST but contains images of clothing items. We'll add noise to the images and use PCA to reconstruct them, effectively reducing noise.

**Objective**:

- Add Gaussian noise to the images.
- Use PCA to reconstruct the images from a reduced number of components.
- Compare original, noisy, and reconstructed images to observe noise reduction.

**Visualization**:

![Fashion MNIST Noise Reduction](pics/fashion_mnist.png)

### 2.3 Feature Extraction

**Dataset**: Labeled Faces in the Wild (LFW)

**Description**:

The LFW dataset consists of images of faces collected from the web. Each face is represented by high-dimensional pixel data.

**Objective**:

- Use PCA to extract principal components known as "eigenfaces".
- Visualize the eigenfaces to understand the most significant features in the dataset.

**Visualization**:

![LFW Eigenfaces](pics/eigenfaces.png)