# Setup

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "dim_reduction"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

# Dimensionality Reduction 

"Many Machine Learning problems involve thousands or even millions of features for  
each training instance. Not only do all these features make training extremely slow,  
but they can also make it much harder to find a good solution, as we will see. This  
problem is often referred to as the curse of dimensionality."

WARNING: "Reducing dimensionality does cause some information loss (just  
like compressing an image to JPEG can degrade its quality), so  
even though it will speed up training, it may make your system  
perform slightly worse. It also makes your pipelines a bit more  
complex and thus harder to maintain. So, if training is too slow,  
you should first try to train your system with the original data  
before considering using dimensionality reduction. In some cases,  
reducing the dimensionality of the training data may filter out  
some noise and unnecessary details and thus result in higher performance,  
but in general it won’t; it will just speed up training."

"Apart from speeding up training, dimensionality reduction is also extremely useful  
for data visualization (or DataViz). Reducing the number of dimensions down to two  
(or three) makes it possible to plot a condensed view of a high-dimensional training  
set on a graph and often gain some important insights by visually detecting patterns,  
such as clusters. Moreover, DataViz is essential to communicate your conclusions to  
people who are not data scientists—in particular, decision makers who will use your  
results."

# The Curse of Dimensionality 

  
"The more dimensions the training set has, the greater the risk of overfitting it."

# Main Approaches for Dimensionality Reduction

## Projection

"In most real-world problems, training instances are not spread out uniformly across  
all dimensions. Many features are almost constant, while others are highly correlated  
(as discussed earlier for MNIST). As a result, all training instances lie within (or close  
to) a much lower-dimensional subspace of the high-dimensional space. This sounds  
very abstract, so let’s look at an example. In Figure 8-2 you can see a 3D dataset represented  
by circles."

![title](images/proj_1.png)

"Notice that all training instances lie close to a plane: this is a lower-dimensional (2D)  
subspace of the high-dimensional (3D) space. If we project every training instance  
perpendicularly onto this subspace (as represented by the short lines connecting the  
instances to the plane), we get the new 2D dataset shown in Figure 8-3. Ta-da! We  
have just reduced the dataset’s dimensionality from 3D to 2D. Note that the axes correspond  
to new features z1 and z2 (the coordinates of the projections on the plane)."  

![title](images/proj_2.png)

"However, projection is not always the best approach to dimensionality reduction. In  
many cases the subspace may twist and turn, such as in the famous Swiss roll toy dataset  
represented in Figure 8-4."

![title](images/proj_3.png)

"Simply projecting onto a plane (e.g., by dropping x3) would squash different layers of  
the Swiss roll together, as shown on the left side of Figure 8-5. What you really want is  
to unroll the Swiss roll to obtain the 2D dataset on the right side of Figure 8-5."

![title](images/proj_4.png)



## Manifold 

"The Swiss roll is an example of a 2D manifold. Put simply, a 2D manifold is a 2D  
shape that can be bent and twisted in a higher-dimensional space. More generally, a  
d-dimensional manifold is a part of an n-dimensional space (where d < n) that locally  
resembles a d-dimensional hyperplane. In the case of the Swiss roll, d = 2 and n = 3: it  
locally resembles a 2D plane, but it is rolled in the third dimension."

"Many dimensionality reduction algorithms work by modeling the manifold on which  
the training instances lie; this is called Manifold Learning. It relies on the manifold  
assumption, also called the manifold hypothesis, which holds that most real-world  
high-dimensional datasets lie close to a much lower-dimensional manifold. This  
assumption is very often empirically observed."

"The manifold assumption is often accompanied by another implicit assumption: that  
the task at hand (e.g., classification or regression) will be simpler if expressed in the  
lower-dimensional space of the manifold. For example, in the top row of Figure 8-6  
the Swiss roll is split into two classes: in the 3D space (on the left), the decision  
boundary would be fairly complex, but in the 2D unrolled manifold space (on the  
right), the decision boundary is a straight line."
