## t-SNE(t Distributed Stochastic Neighbor Embedding)

1. A popular method for exploring high-dimensional data is something called t-SNE, introduced by van der Maaten and Hinton in 2008. Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading.
2. The technique has become widespread in the field of machine learning, since it has an almost magical ability to create compelling two-dimensonal “maps” from data with hundreds or even thousands of dimensions. Although impressive, these images can be tempting to misread.
3. The algorithm is non-linear and adapts to the underlying data, performing different transformations on different regions.
4. A second feature of t-SNE is a tuneable parameter, “perplexity,” which says (loosely) how to balance attention between local and global aspects of your data. The parameter is, in a sense, a guess about the number of close neighbors each point has. The perplexity value has a complex effect on the resulting pictures.Getting the most from t-SNE may mean analyzing multiple plots with different perplexities.

In [1]:
import plotly.express as px
from sklearn.datasets import make_classification
import pandas as pd
import numpy as np

X,y = make_classification(n_samples=1500, n_features=6,n_classes=3, n_informative=2,n_clusters_per_class=1, random_state=5)

fig = px.scatter_3d(x=X[:,0], y=X[:,1], z=X[:,2], color=y,opacity=0.8, title="3D Scatter Plot with Plotly Express")
fig.show()

In [3]:
import sklearn.decomposition
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

### PCA visualization on a custom classification data

In [4]:
fig=px.scatter(x=X_pca[:,0], y=X_pca[:,1], color=y, opacity=0.8, title="2D Scatter Plot with Plotly Express")
fig.update_layout(title="PCA visualization of custom classification dataset",
xaxis_title="PCA 1",
yaxis_title="PCA 2")
fig.show()


### Fitting and Transforming t-SNE

In [5]:
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)
tsne.kl_divergence_

1.12730073928833

### t-SNE Visualization Python

In [6]:
fig=px.scatter(x=X_tsne[:,0], y=X_tsne[:,1], color=y)
fig.update_layout(title="t-SNE visualization of custom classification dataset",
xaxis_title="t-SNE 1",
yaxis_title="t-SNE 2")
fig.show()

In [7]:
import numpy as np

perplexity = np.arange(50, 1000, 50)
divergence = []
for i in perplexity:
    model = TSNE(n_components=2, init='pca', perplexity=i)
    reduced=model.fit_transform(X_tsne)
    divergence.append(model.kl_divergence_)
fig=px.line(x=perplexity, y=divergence, markers=True)
fig.update_layout(title="t-SNE KL Divergence vs Perplexity",
xaxis_title="Perplexity Value",
yaxis_title="KL Divergence")
fig.update_traces(line_color='red',line_width=1)
fig.show()
