# Introduction to t-SNE


<img src="https://hypercompetent.github.io/post/2017-12-31-gganimate-tweenr-tsne-plot_files/figure-html/get_tsne-1.png">

### t-Distributed Stochastic Neighbor Embedding (t-SNE)

# Dimensionality Reduction

### If you have worked with a dataset before with a lot of features, you can fathom how difficult it is to understand or explore the relationships between the features. Not only it makes the EDA process difficult but also affects the machine learning model’s performance since the chances are that you might overfit your model or violate some of the assumptions of the algorithm, like the independence of features in linear regression. This is where dimensionality reduction comes in. In machine learning, dimensionality reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. By reducing the dimension of your feature space, you have fewer relationships between features to consider which can be explored and visualized easily and also you are less likely to overfit your model.

<img src = "https://www.visiondummy.com/wp-content/uploads/2014/04/dimensionality_vs_performance.png">

### Dimensionality reduction can be achieved in the following ways:



* Feature Elimination
* Feature Selection
* Feature Extraction

### Feature Extraction: You create new independent features, where each new independent feature is a combination of each of the old independent features. These techniques can further be divided into linear and non-linear dimensionality reduction techniques.

Linear dimensional reduction using PCA


Non Linear dimensional reduction using t-SNE

<img src="https://lh3.googleusercontent.com/proxy/49afPkB9IYGkacOkUhgAQYrO3OWPHvTgjzgtaFhnm9Rw83jBS--jrGw1r8TXOvcFBdxUMLlrCXj7fQKR1zywtzAyVIAgGMEME9haT1tbGgt-Kw">

## t-Distributed Stochastic Neighbor Embedding (t-SNE)
### t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. It is extensively applied in image processing, NLP, genomic data and speech processing. To keep things simple, here’s a brief overview of working of t-SNE:

* #### The algorithms starts by calculating the probability of similarity of points in high-dimensional space and calculating the probability of similarity of points in the corresponding low-dimensional space. The similarity of points is calculated as the conditional probability that a point A would choose point B as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian (normal distribution) centered at A.

* #### It then tries to minimize the difference between these conditional probabilities (or similarities) in higher-dimensional and lower-dimensional space for a perfect representation of data points in lower-dimensional space.

* #### To measure the minimization of the sum of difference of conditional probability t-SNE minimizes the sum of Kullback-Leibler divergence of overall data points using a gradient descent method.

#### Note : Kullback-Leibler divergence or KL divergence is is a measure of how one probability distribution diverges from a second, expected probability distribution.

### *In simpler terms, t-Distributed stochastic neighbor embedding (t-SNE) minimizes the divergence between two distributions: a distribution that measures pairwise similarities of the input objects and a distribution that measures pairwise similarities of the corresponding low-dimensional points in the embedding.*

### *In this way, t-SNE maps the multi-dimensional data to a lower dimensional space and attempts to find patterns in the data by identifying observed clusters based on similarity of data points with multiple features. However, after this process, the input features are no longer identifiable, and you cannot make any inference based only on the output of t-SNE. Hence it is mainly a data exploration and visualization technique.*

### T-SNE : Lets code

In [None]:
import os
print(os.listdir("../input"))


In [None]:
import pandas as pd
mnist_df = pd.read_csv('../input/digit-recognizer/train.csv')
mnist_df.head()

In [None]:
mnist_labels = mnist_df['label']
mnist_pixels = mnist_df.drop('label', axis = 1)

In [None]:
# standardize mnist dataset

from sklearn.preprocessing import StandardScaler
mnist_pixels_std_df = StandardScaler().fit_transform(mnist_pixels)
print(mnist_pixels_std_df.shape)

In [None]:
# Clip 1000 datapoints

mnist_pixels_tsne = mnist_pixels_std_df[0:5000,:]
mnist_labels_tsne = mnist_labels[:5000]
print("MNIST labels : ", mnist_labels_tsne.shape)
print("MNIST features : ", mnist_pixels_tsne.shape)

In [None]:
# t-SNE
from sklearn.manifold import TSNE
import timeit
# code you want to evaluate
# D-dash = 2
# default perplexity = 30
# default learning rate = 200
# default no. of iteration for optimization = 1000

start_time = timeit.default_timer()

model = TSNE(n_components = 2, random_state = 0,verbose = 2,n_iter = 2000)
tsne_model = model.fit_transform(mnist_pixels_tsne).T

elapsed = timeit.default_timer() - start_time

In [None]:
# visualizing t-SNE
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

tsne_data = np.vstack((tsne_model, mnist_labels_tsne)).T
tsne_df = pd.DataFrame(data = tsne_data, columns=('Dimention 1', 'Dimention 2', 'Label'))
sns.FacetGrid(tsne_df, size = 8, hue = 'Label').map(plt.scatter, 'Dimention 1', 'Dimention 2').add_legend()

print('\n\nElapsed time for t-SNE visualization :', elapsed, 'seconds')