**PCA** and **TSNE** both are dimensionality reduction techniques.

But the question is, why we reduce dimensionality (number of features) of our data as it's some precious information in one way or another? 

Well, there are mainly two reasons for doing this and to understand this, let our dataset is a (n×m) dataframe where n is number of datapoints and m is number of features. Then:

**1.** When number of data points(n) are less than number of features(m), which leads to overfitting our training model. (OVERFIT MODEL: It gives almost 100% accurate results for train data but very poor result for our test data.)

**2.** For visualization purpose: When we have to convert our m-dimensional dataset in 2 or 3 dimensions to plot it.

**How does PCA/TSNE works?**

In simple terms, it projects our data into a new dimension having maximum variance or minimum distance of data points from that axis. Something like this:
![](http://alexhwilliams.info/itsneuronalblog/img/pca/pca_two_views.png)
image source: http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/

**But how PCA is different from TSNE?**

For that, I would like to take you through a simple dataset and visualize it after reducing its dimension and then show you (literally) how TSNE differs from PCA.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**IMPORTING DATASET:**

In [None]:
data=pd.read_csv('../input/mnist-in-csv/mnist_train.csv')

In [None]:
data.head()

In [None]:
data.shape

**ABOUT THE DATASET**

This dataset is widely known as MNIST (Modified National Institute of Standards and Technology).

It has 60000 images of hand written digits from 0-9 in black and white format. Each image is of size (28×28) pixel.

We used each image as a datapoint hence flattened it to get 784 features (each pixel as a feature).
Hence our dataset contains 60000 datapoints with 784 features, and one extra column ('label') is the actual digit in the image.

Hence combineD dataset is of shape (60000×785).


In [None]:
label=data['label']

Removing 'label' column from our actual dataset because its kinda output to our main dataset.

In [None]:
data=data.drop(['label'], axis=1) #axis 1 means doing operaion in columns

Taking only 15000 points so that it would take less time to apply pca/tsne on small dataset.

**Note:** number of features are unchanged. only datapoints (rows) are reduced.

In [None]:
x=data.head(15000)
y=label.head(15000)

Now, apply Standardization our dataset (to work our module efficiently)

In [None]:
from sklearn.preprocessing import StandardScaler
std_x=StandardScaler().fit_transform(x)
std_x.shape

In [None]:
type(std_x)

Here, we see that our dataframe is been changed to numpy array.

Now implementing PCA to our standardize dataset with components=2, which will convert/reduce our dataset to 2 dimensions.

In [None]:
from sklearn import decomposition
pca=decomposition.PCA()
pca.n_components=2
pca_x=pca.fit_transform(std_x)

Now, joining 'label' data with our reduced dimensional dataset (each corresponding label shows actual digit value for related datapoint) and then again converting it to a dataframe.

In [None]:
pca_data=np.vstack((pca_x.T, y))
pca_df=pd.DataFrame(pca_data.T, columns=['first', 'second', 'label'])

In [None]:
pca_df.head()

Finally, plotting our reduced dataset to visualize:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
ax=sns.FacetGrid(pca_df, hue='label', height=6).map(plt.scatter, 'first', 'second').add_legend()
plt.show()

**Looks like a beautiful mess, isn't it?**

See, a common logic we can think of that datapoints having same label value (actual digit value) will be having some internal relation between them.

for example, datapoints which we get by flattening the pixels of image having digit 0 will be similar in one way and datapoints which we get by flattening the pixels of image having digit 1 will be similar in another way.

Hence within a set of datapoints of same label value, there will be some similarity between those datapoints.

So, we can make a guess that in multi-dimension visualization, there will be clusters of different labels (0-9) made because of similarity within.

**But PCA ruined that for us. It simply projected all the datapoints in 2 dimension without considering local similarity within points.**

Lets see what TSNE does now?

Again, same proceedure. Training our TSNE and fitting it on our dataset.

In [None]:
from sklearn.manifold import TSNE
tsne=TSNE(n_components=2, random_state=0)
tsne_x=tsne.fit_transform(std_x)
tsne_data=np.vstack((tsne_x.T, y))
tsne_df=pd.DataFrame(tsne_data.T, columns=['t_first', 't_second', 't_label'])

In [None]:
tsne_df.head()

In [None]:
tsne_df.shape

In [None]:
ax=sns.FacetGrid(tsne_df, hue='t_label', height=6).map(plt.scatter, 't_first', 't_second').add_legend()
plt.show()

Well, compare it with the one we get by using PCA.

Noticed anything?

Ok I will try to explain. See that we are getting separate clusters of different colors (except some noise data points). Its showing the local relationship between datapoints having same label value which I was talking about right after I plotted the PCA visualization.


**CONCLUSION:**

TSNE gives us the insight about our dataset by maintaining local structure/relation of our datapoints. Which PCA is not able to do.

**So why people use PCA?**

Because, TSNE is to place neighbors close to each other, (almost) completly ignoring the global structure.
Hence TSNE is excellent for visualization, because similar items can be plotted next to each other (and not on top of each other).

PCA is quite the opposite. It tries to preserve the global properties (dimensions with high variance) while it may lose low-variance deviations between neighbors.

And while working with this workbook, I personally experianced that TSNE is slower than PCA.

**Disclaimer:** I started learning ML recently and it was hard to find notebooks based on basic problems which beginers face. Hence I decied to put some of my work which might help the beginers by providing detailed notebooks based on easy examples.

Suggestions are most welcome as its my first notebook.

**Thank you.**
