# Dimensionality Reduction - PCA

## What you'll learn in this course 🧐🧐

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.fit_transform" target="_blank">PCA</a> stands for *Principal Component Analysis*. The goal of this unsupervised algorithm is to create a linear combination of features that will transform your initial dataset into a smaller dataset. 

In this course, we will learn: 

* What is Dimensionality Reduction and why do we use it
* What is PCA
* What is a covariance matrix
* What are eigen-values and eigen-vectors
* What is SVD algorithm
 

## Dimensionality Reduction 🦊🦊

There is this misconception in data science that the more explanatory variables we have at our disposal, the better the model we can derive from them. 🙅‍♀️

This is only **partially true**.

The goal of a data scientist is not always to create a model that is an excellent predictor of reality. Regardless of their complexity, not all problems are similar to image classification. 

It is often a matter of **extracting relevant information** that can be understood by non-specialists, such as in sociology, economics and many other fields. 👌

As always in data science, the important thing is to produce results that are useful in a given context. 🚀 Other times, we come up against technical constraints, because the equipment at our disposal is not very efficient, or because we want to better understand data with which we are not very familiar.

In fact, it is essential to use techniques that allow you to summarize information in a relevant and simple way. These different techniques can be categorized as size reduction, starting with a large number of variables that form a large space and then moving to a smaller space.

### Visualisation 

One use case of dimensionality reduction is to easily visualize data. If you take a 4-D dataset, you won't be able to draw a graph representing that data. 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/4-D_viz.png" alt="4D_viz" width="50%" />

Instead, what you want is to have something like this that you can interpret 👍👍

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/2-D_viz.png" alt="2D_viz" width="50%" />

### Noise Reduction 

Another reason why you could use Dimensionality Reduction algorithms is to reduce noise. 🤫 It is really common in Data Science to have redundant data, i.e data that describes a phenomenon the same way. 

For example, imagine that you have two variables like *Time of sleep* and *Tiredness score* (these are two imaginary variables). Well they obviously describe the same thing and therefore are extremely correlated. That is why you can remove one of them. 

The above example is rather obvious but you might have variables that are not that easy to spot in a dataset. 

Let's take another example, image that you have a dataset of blurry images, i.e images with a lot of noise, you can reduce this noise and make the images better simply by applying PCA for example. 


<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/noise_to_no_noise_digits.png" alt="noise_to_no_noise"/>

!(https://vimeo.com/485918508)

## Principal Component Analysis (PCA) 🐣🐣


### Goal of PCA

Now that we have a better understand of why doing dimensionaly reduction, let's talk about the most famous algorithm for this kind of task: **PCA**. 

The general idea of PCA is to use the variables at our disposal and combine them linearly with each other to create a smaller dimensional space that best represents all the information.

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/why_do_PCA.png" alt="why_do_PCA"/>


To do that, we are going to multiply our initial dataset with a matrix of vectors that are called **Eigen Vectors** that will give us our **principal components**.

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/PCA_logic.png" alt="PCA_logic"/>


**Eigen Vectors** have special properties that we will explain later in the course. 


!(https://vimeo.com/485919093)

### Process 

Now how do we find these **Eigen vectors**? Well, we need to do it in 4 steps that are the following: 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/PCA_process.png" alt="PCA_process"/>


Let's go through them step by step. 

## Normalization 📏📏

This is a classic step in Machine Learning, we won't go into too much details here. The whole idea is simply to apply this formula to all data points in your dataset: 

$$ \frac{(x_i - \mu)}{\sigma} $$

Where 

*   $\mu$ is the mean 
*   $\sigma$ is the standard deviation 

!(https://vimeo.com/485919459)

## Covariance Matrix 😍😍

### Reminders 🌺

Let's talk about covariance matrix. PCA's assumption is that the information in a dataset is contained inside variables with high variance, that is why covariance is important to calculate. Here are a few reminders before going forward. 

* **Variance:** represents how observations are spread out within a variable. To give you a better idea. Here are two variables with high and low variance:

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/High_variance_VS_low_variance.png" alt="High_vs_low_variance"/>

* **Covariance:** represents how two variables are related to each other. Here is an example of a covariance matrix

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/Covariance_matrix.png" alt="Covariance Matrix"/>

On the **diagonal** are the variance of each variables whereas on the **off-diagonal** you have the covariance of each variable with each other. 


**How to interpret covariance?**🤔 

* High covariance --> Variables are **statiscally dependent** (meaning there are redundancy between observations of variable $X$ and $Y$). It can be positive or negative, it doesn't matter.

* Low covariance --> Variables are **statiscally independent** (meaning there are no redundancy between $X$ and $Y$). 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/covariance_comparisons.png" alt="covariance_comparison"/>


!(https://vimeo.com/485919778)

Now the whole goal of PCA is to summarize information. Therefore, in an ideal world, we would like to have a covariance matrix that only **has non-zero values on the diagonals**

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/ideal_covariance_matrix.png" alt="ideal_covariance_matrix"/>


!(https://vimeo.com/485920648)

## SVD 🏗️🏗️

### What is that?

SVD stands for *Singular Value Decomposition*. It is the method that PCA is all based upon. It simply stands that: 

$$A = U \Sigma V^\intercal$$

Where 

*   $A$ is a random matrix 
*   $U$ is another matrix composed of $Eigen-vectors$ of $AA^\intercal$
*   $\Sigma$ is a diagonal matrix composed of $Eigen-values$ 
*   $V\intercal$ is another matrix composed of $Eigen-vectors$ of $A^\intercal A$ 


**NB: $\intercal$ means transpose. Check this link if you need to learn more 👉👉 <a href="https://www.mathsisfun.com/definitions/transpose-matrix-.html#:~:text=%22Flipping%22%20a%20matrix%20over%20its,rows%20and%20columns%20get%20swapped.&text=Example%3A%20the%20value%20in%20the,3rd%20row%20and%201st%20column." target="_blank">Math is fun</a> **


<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/SVD.png" alt="SVD"/>


### Eigen Vectors & Eigen Values 

Let's now tackle the concept of $Eigen-vectors$ and $Eigen-values$. Up until this point, we didn't need to know what it was, but now we do! 🤓 

Indeed **if we can calculate $Eigen-vectors$ and $Eigen-values$, we can calculate the matrix decomposition that SVD gives us**

The whole idea is rather simple: 

$$AX = \lambda X$$

Where: 

* $A$ is a matrix 
* $X$ is an $Eigen-vector$ 
* $\lambda$ is an $Eigen-value$ 

With this, you can use simple linear algebra to determine each matrix $U, \Sigma$ and $V^\intercal$ 

!(https://vimeo.com/486594815)

##  Deduct PCA 🤠🤠

Now that we know how to find $U$, $\Sigma$ and $V^\intercal$ using $Eigen-vectors$ and $Eigen-values$, we can simple multiply $A$ (our initial matrix) by $U$ (our matrix of $Eigen-vectors$), to get our matrix of principal components! 

By the way, if you now get the covariance matrix of $AU$, this is what you get: 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/New_covariance_matrix.png" alt="new-covariance-matrix"/>

Now you might be wondering how to reduce dimension? Well you simply need to check out the variance explained by each $PC$. 

Here $PC1$ explains $2.47$ of the variance which makes $\frac{2.47}{2.47+0.03}= 99$% of the total variance. $PC1$ contains most of the information. Therefore, we can keep it! 

In practice, you can keep as many $PCs$ as you like depending on how much of the variance you want to keep. 

!(https://vimeo.com/486599543)

### PCA Summary ✅

<table>
<tr>
    <th>
        Advantages
    </th>
    <th>
        Flaws
    </th>
</tr>

<tr>
<td>
    Reduce Redundancy
</td>

<td>
    Loss of information if you remove too many $PCs$ 
</td>
</tr>


<tr>
<td>
    Reduce overfitting
</td>

<td>
    $PCs$ are harder to interpret
</td>
</tr>


<tr>
<td>
    Easier to visualize
</td>

<td>
    PCA works **only on quantitative data** 
</td>
</tr>
</table>

## Resources 📚📚

* <a href="https://www.youtube.com/watch?v=OELTJdaU8aA" target="_blank">Eigen values & Eigen Vectors</a>

* <a href="https://www.youtube.com/watch?v=a9jdQGybYmE" target="_blank">Lecture: Principal Component Analysis</a>

* <a href="https://www.youtube.com/watch?v=g-Hb26agBFg" target="_blank">Principal Component Analysis (PCA)</a>

* <a href="https://www.youtube.com/watch?v=PFDu9oVAE-g" target="_blank">Eigenvectors and eigenvalues | Essence of linear algebra, chapter 14</a>

* <a href="https://www.youtube.com/watch?v=FgakZw6K1QQ" target="_blank">StatQuest: Principal Component Analysis (PCA), Step-by-Step</a>

*   <a href="https://medium.com/fintechexplained/what-are-eigenvalues-and-eigenvectors-a-must-know-concept-for-machine-learning-80d0fd330e47" target="_blank">What are Eigen Vectors and Eigen Values </a>

* <a href="https://jonathan-hui.medium.com/machine-learning-singular-value-decomposition-svd-principal-component-analysis-pca-1d45e885e491" target="_blank">Machine Learning — Singular Value Decomposition (SVD) & Principal Component Analysis (PCA)</a>

* <a href="https://dzone.com/articles/understanding-what-is-principal-component-analysis" target="_blank">Understanding Principal Component Analysis (PCA)</a>

* <a href="https://www.dezyre.com/data-science-in-python-tutorial/principal-component-analysis-tutorial#:~:text=PCA%20is%20predominantly%20used%20as,%2C%20bioinformatics%2C%20psychology%2C%20etc." target="_blank">Principal Component Analysis Tutorial</a>

* <a href="https://www.mygreatlearning.com/blog/covariance-vs-correlation/#:~:text=Covariance%20is%20when%20two%20variables,the%20change%20in%20another%20variable." target="_blank">Covariance vs Correlation | Difference between correlation and covariance</a>

* <a href="https://www.quora.com/When-and-where-do-we-use-PCA" target="_blank">When and Where do we use PCA</a>

* <a href="https://blog.umetrics.com/what-is-principal-component-analysis-pca-and-how-it-is-used" target="_blank">What Is Principal Component Analysis (PCA) and How Is It Used?</a>

* <a href="https://blog.exploratory.io/an-introduction-to-principal-component-analysis-pca-with-2018-world-soccer-players-data-810d84a14eab" target="_blank">An Introduction to Principal Component Analysis (PCA) with 2018 World Soccer Players Data</a>

* <a href="https://towardsdatascience.com/linear-algebra-basics-dot-product-and-matrix-multiplication-2a7624942810#:~:text=Multiplication%20of%20two%20matrices%20involves%20dot%20products%20between%20rows%20of,first%20row%2C%20first%20column)." target="_blank">Linear Algebra Basics: Dot Product and Matrix Multiplication</a>