# Chapter 8: Dimensionality Reduction
Some ML problems involve up-to millions of features, making training incredibly slow and a decent solution difficult to achieve. This is called *The curse of dimensionality*.

In applied problems, we can reduce dimensionality on a case-by-case basis. For example, the MNIST dataset has its number images surrounded by white pixels which is completely useless for the task. In general, two pixels are highly correlated and mergeable into one by taking the mean of the two pixel intensities; doing so doesn't cause a huge loss of information.

But reducing dimensions does lose *some* information; think of compressing an image into JPEG format and how the quality degrades.

Reducing dimensions does speed up training, but does not generally make for better predictions. **Dimensionality reduction is crucial for Data Visualization**. Reduction of n-dimensions to 2 or 3 makes graphing feasible and makes patterns that were in the n-dimensional space visually interpretable for pattern detection and clustering tasks. It is also crucial that your boss understands what the fuck he/she's looking at.

In this Chapter:
- 2 main approaches of dimensionality reduction
 - projection
 - Manifold Learning
- The *most popular dimensionality reduction techniques*
 - PCA
 - Kernel PCA
 - LLE - Locally Linear Embeddings

## Main Approaches for Dimensionality Reduction

### Projection
In most real-world problems, training instances are not spread out uniformly across all dimensions.
As a result, all training instances lie within (or close to) a much lower-dimensional subspace than the high-dimensional space.

In the given example in Geron ML (Figure 8-2), all the points in the 3D space are *perpendicularly projected* onto a 2D plane, thereby reducing the dimensionality from 3D to 2D. If $x_1, x_2, and x_3$ were the 3 features in the 3D space, $z_1$ and $z_2$ represent the 2 new features created by reducing the dimensionality.

But "*projection is not always the best approach to dimensionality reduction*". Recall the Swiss Roll toy dataset (see Figure 8-4), projecting the points from 3D onto a 2D plane actually makes the problem much more complicated. It would be better to "unroll the Swiss Roll".

### Manifold Learning
The Swiss roll is an example of a 2D *manifold*. A 2D manifold is a 2D shape that is bendable and twistable into a higher-dimensional space. Generally, a d-dimensional manifold is part of an n-dimensional space (where d < n) that locally represents a d-dimensional hyperplane. In the Swiss roll case, d = 2 and n = 3: it locally represents a 2D plane but is bendable into 3-space. 

Many dimensionality reduction algorithms work to model the manifold on which training instances lie; called *Manifold Learning* and relying on the *manifold assumption/hypothesis* that states most real-world n-dimensional datasets lie close to a much lower-dimensional manifold. This assumption is very often empirically observed (often observed by experience). The assumption/hypothesis also implies that moving into a lower-dimensional space will simplify the task as lowering the dimensionality *does not always* make the task simpler, but it seems to frequently.

Again, think about the MNIST dataset: all handwritten digits have similarities like being made of connected lines, having white borders, more-or-less centered, and so on. If you randomly generated images of anything in the universe for a long time, only a tiny percentage would be numbers. So the degrees of freedom in creating one of the MNIST numbers are radically smaller than of some other image. The constraints on our MNIST images mean its moveable into a lower-dimensional space.

"Reducing dimensionality on a training set will speed up training, but not necessarily lead to a better or simpler solution."

## PCA
Principal Component Analysis is the most popular dimensionality reduction technique.

### Preserving the Variance
