# **Machine Learning** _Day 102_
##### Name: Muhammad Hassaan
##### Date: August 15, 2024
##### Email: muhammadhassaan7896@gmail.com

---

# **t-SNE: t-Distributed Stochastic Neighbor Embedding**

**1. Introduction**
* t-SNE is a powerful tool for visualizing high-dimensional data by reducing its dimensions to 2D or 3D. It helps in exploring and understanding the structure of the data.

**2. Why Use t-SNE?**
* Visualization: It helps visualize data in 2D or 3D.
* Clustering: It can reveal clusters in the data.
* Dimensionality Reduction: It reduces complexity while preserving relationships.

**3. Key Concepts**
* High-Dimensional Data: Data with many features.
* Low-Dimensional Space: The 2D or 3D space where data is visualized.
* Similarity: How close data points are in high-dimensional space.

**4. How t-SNE Works**
* Pairwise Similarities: Calculate similarities between all pairs of data points.
* High-Dimensional Probabilities: Compute probabilities that data points are neighbors.
* Low-Dimensional Mapping: Optimize the placement of points in a lower-dimensional space to preserve similarities.

For more in-depth information on t-SNE and its common misconceptions, check out this article: [How to Use t-SNE Effectively](https://distill.pub/2016/misread-tsne/?hl=cs).


---

## **Implementing t-SNE in Python**

We will use Scikit-learn's make_classification to generate synthetic data with 6 features, 1500 samples, and 3 classes.
After that, we will 3d plot the forst 3 features of the data using the plotly express scatter_3d function.

---

In [9]:
# import libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.datasets import make_classification
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

In [3]:
X, y = make_classification(
    n_samples=1500,
    n_features=6,
    n_informative=2,
    n_classes=3,
    random_state=5,
    n_clusters_per_class=1,
)

# plotting
fig = px.scatter_3d(x=X[:,0], y=X[:,1], z=X[:,2],color=y, opacity=0.8)
fig.show()

We have a 3d plot of the data; you can also visualize the data in a 2d chart by using the plotly express scatter function.

We will now apply the **PCA Algorithm** on the dataset to return two PCA Components. The fit_transform learns and transforms the datasets at the same time.

In [5]:
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

## **t-SNE Visualization Python**
We can now visualize the results by displaying two PCA components on a scatter plot:

x: First Component

y: Second Component

color: target variable

We have also used the update_layout function to add a little and rename the x-asix and y-axis.

In [16]:
fig = px.scatter(x=X_pca[:,0], y=X_pca[:,1], color=y)
fig.update_layout(
    title="PCA Visualization of Custom Classification Dataset",
    xaxis_title="Principal Component 1",
    yaxis_title="Principal Component 2",
)
fig.show()

## **Fitting and Transforming t-SNE**
Now we will apply the t_SNE algorithm to the dataset and compare the results.

After fitting transforming data, we will display Kullback-Leibler (KL) divergence between the high dimensional probability distribution and the low-dimensional probability distribution. 

Low KL divergence is a sign of a better results.

In [10]:
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)
tsne.kl_divergence_

1.1278483867645264

## **t-SNE Visualization Python**
Similar to **PCA**, we will visualize two t-SNE components on a scatter plot.

In [17]:
fig = px.scatter(x=X_tsne[:,0], y=X_tsne[:,1], color=y)
fig.update_layout(
    title="t-SNE Visualization of Custom Classification Dataset",
    xaxis_title="First t-SNE",
    yaxis_title="Second t-SNE",
)
fig.show()

In [20]:
preplexity = np.arange(50, 1000, 50)
divergence = []

for i in preplexity:
    model = TSNE(n_components=2, init='pca', perplexity=i)
    reduced = model.fit_transform(X_tsne)
    divergence.append(model.kl_divergence_)

fig = px.line(x=preplexity, y=divergence, markers=True)
fig.update_layout(
    xaxis_title="Preplexity Values",
    yaxis_title="KL Divergence Values",
)

fig.update_traces(
    line_color='red',
    line_width=1
)

fig.show()

# KL Divergence: Kullback-Leibler Divergence

## Introduction

KL Divergence (Kullback-Leibler Divergence) measures how one probability distribution differs from a second, reference probability distribution. It's widely used in statistics, machine learning, and information theory to quantify the difference between two distributions.

## Formula

For two discrete probability distributions $( P )$ and $( Q )$, the KL Divergence from $( Q )$ to $( P )$ is given by:

$$
D_{KL}(P \parallel Q) = \sum_{i} P(i) \log\left(\frac{P(i)}{Q(i)}\right)
$$

For continuous probability distributions, the formula is:

$$
D_{KL}(P \parallel Q) = \int_{-\infty}^{\infty} p(x) \log\left(\frac{p(x)}{q(x)}\right) \, dx
$$



where:
- $( P ) and ( Q )$ are the probability distributions.
- $( p(x) ) and ( q(x) )$ are the probability density functions of $( P )$ and $( Q )$, respectively.


## Properties

- **Asymmetry**: $( D_{KL}(P \parallel Q) \neq D_{KL}(Q \parallel P) )$. KL Divergence is not symmetric and thus not a true distance metric.
- **Non-Negativity**: $( D_{KL}(P \parallel Q) \geq 0 )$. The divergence is always non-negative and is zero if and only if $( P ) and ( Q )$ are identical.

## Applications

- **Model Evaluation**: Compares a model’s predicted distribution with the true distribution.
- **Anomaly Detection**: Identifies unusual data points by measuring their divergence from a normal distribution.
- **Information Theory**: Measures the amount of information lost when $( Q )$ is used to approximate $( P )$.

---