### t-SNE

**t-SNE (t-Distributed Stochastic Neighbor Embedding)** is a popular technique for dimensionality reduction, primarily used for visualizing high-dimensional datasets in 2D or 3D space. It’s particularly useful when trying to understand complex data patterns or clusters.

---

### 1. **Why Do We Need Dimensionality Reduction?**
- High-dimensional data is hard to analyze and visualize. For example, a dataset with 100 features cannot be directly plotted.
- High-dimensional spaces can lead to phenomena like the "curse of dimensionality," where data points tend to appear uniformly distant from each other.
- Dimensionality reduction helps to simplify the data while retaining its meaningful structure.

t-SNE is one of the tools to address these challenges, focusing on preserving the local structure of the data.

---

### 2. **Core Idea of t-SNE**
The goal of t-SNE is to map high-dimensional data into a lower-dimensional space while preserving the **local neighborhoods** of points.

#### Analogy:
Imagine you have a world map. The relationships between cities (local neighborhoods) are more meaningful than exact global distances. t-SNE tries to preserve these local structures.

---

### 3. **How Does t-SNE Work?**
t-SNE involves two main steps:

#### a. **Measure Pairwise Similarities**
1. **In High-Dimensional Space:**
   - For each pair of data points \(i\) and \(j\), calculate a similarity score.
   - Similarity is defined as a **conditional probability** that point \(j\) is a neighbor of point \(i\).
   - Use a Gaussian distribution centered at \(i\) to compute this:
     \[
     P_{j|i} = \frac{\exp(-||x_i - x_j||^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2)}
     \]
     - \(\sigma_i\): Controls the size of the neighborhood around \(i\).
     - The similarity is high if \(x_i\) and \(x_j\) are close.

   - Symmetrize the probabilities:
     \[
     P_{ij} = \frac{P_{j|i} + P_{i|j}}{2N}
     \]
     where \(N\) is the total number of data points.

2. **In Low-Dimensional Space:**
   - Similarly, define pairwise similarities \(Q_{ij}\) in the reduced space using a **Student's t-distribution** with one degree of freedom (heavy-tailed distribution):
     \[
     Q_{ij} = \frac{(1 + ||y_i - y_j||^2)^{-1}}{\sum_{k \neq l} (1 + ||y_k - y_l||^2)^{-1}}
     \]
     - This helps spread out points in the lower-dimensional space to avoid crowding.

#### b. **Minimize the Divergence Between \(P_{ij}\) and \(Q_{ij}\)**
- Use **Kullback-Leibler (KL) divergence** to measure the difference between \(P_{ij}\) and \(Q_{ij}\):
  \[
  C = \sum_{i \neq j} P_{ij} \log\left(\frac{P_{ij}}{Q_{ij}}\right)
  \]
- Minimize \(C\) to ensure the low-dimensional representation reflects the high-dimensional pairwise similarities.

#### Optimization:
- Gradient descent is used to iteratively adjust the positions of points in the low-dimensional space.

---

### 4. **Why Use a t-Distribution in Low Dimensions?**
The heavy tails of the t-distribution ensure that distant points in the low-dimensional space are placed farther apart, which helps prevent overcrowding and preserves local neighborhoods better.

---

### 5. **Key Parameters in t-SNE**
1. **Perplexity (\(P\)):**
   - A user-defined parameter related to the number of effective neighbors.
   - Controls the balance between local and global aspects of the data.
   - Typical range: 5–50.
   
2. **Learning Rate:**
   - Controls the speed of optimization during gradient descent.
   - Too high: Points may move erratically.
   - Too low: Convergence may be slow.

3. **Number of Iterations:**
   - More iterations allow for fine-tuning the embedding but can lead to overfitting if too many.

---

### 6. **Advantages of t-SNE**
- Preserves local neighborhoods well.
- Ideal for visualizing clusters and non-linear structures in the data.
- Easy to interpret visually.

---

### 7. **Limitations of t-SNE**
- **Computationally Expensive:** Scales poorly with dataset size.
- **Doesn’t Preserve Global Structure:** Focuses on local relationships, which may distort larger patterns.
- **Non-Deterministic:** Results can vary slightly between runs due to random initialization.
- **Hard to Interpret:** The axes in the lower-dimensional space have no specific meaning.

---

### 8. **Applications of t-SNE**
- **Clustering:** Visualizing groups in high-dimensional data (e.g., customer segments, gene expressions).
- **Data Exploration:** Understanding the structure of embeddings (e.g., word embeddings in NLP).
- **Outlier Detection:** Identifying points that don’t belong to any cluster.

---

### 9. **Practical Tips**
- Preprocess the data using normalization or PCA (Principal Component Analysis) to reduce noise and make t-SNE faster.
- Experiment with perplexity and learning rate to achieve the best visualization.
- Use t-SNE only for visual exploration, not for quantitative analysis.

---

### Example: t-SNE Visualization
Imagine a dataset of images (e.g., digits from MNIST). t-SNE can reduce the image feature vectors to 2D, where:
- Similar digits (like 0s) cluster together.
- Dissimilar digits (like 0s and 7s) are far apart.

This clustering helps understand the structure of the data intuitively.

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA

In [2]:
print("Loading MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist.data, mnist.target

Loading MNIST dataset...


In [3]:
print("Scaling data...")
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Scaling data...


In [4]:
# (Optional) Reduce dimensions with PCA for computational efficiency
print("Reducing dimensions with PCA...")
pca = PCA(n_components=50)
X_pca = pca.fit_transform(X_scaled)

Reducing dimensions with PCA...


In [5]:
# Apply t-SNE to reduce to 2 dimensions
print("Applying t-SNE...")
tsne = TSNE(n_components=2, perplexity=30, n_iter=1000, random_state=42)
X_tsne = tsne.fit_transform(X_pca)

Applying t-SNE...




In [6]:
tsne_df = pd.DataFrame(X_tsne, columns=['Dim1', 'Dim2'])
tsne_df['Label'] = y.astype(int)

In [7]:
print("Creating Plotly visualization...")
fig = px.scatter(
    tsne_df,
    x='Dim1',
    y='Dim2',
    color=tsne_df['Label'].astype(str),
    title="t-SNE Visualization of MNIST Dataset",
    labels={'color': 'Digit Label'},
    hover_data=['Label']
)

fig.update_layout(
    coloraxis_colorbar=dict(title="Digit Label"),
    xaxis_title="t-SNE Dimension 1",
    yaxis_title="t-SNE Dimension 2",
    template="plotly",
    width=800,
    height=600
)

fig.show()

Creating Plotly visualization...
