# **t-SNE (t-distributed Stochastic Neighbor Embedding)**


## 📌 1. Technical Introduction

### 🧭 Where It Fits:

* Part of **Unsupervised Learning**, under **Dimensionality Reduction**
* Specifically designed for **visualizing high-dimensional data in 2D or 3D**

### 🛠 How It Works Conceptually:

* t-SNE maps high-dimensional points into lower dimensions (2D/3D) while preserving **local relationships**.
* It tries to place **similar points close together** and **dissimilar points far apart**.

### Key Terms:

* **High-Dimensional Space**: Original data with many features
* **Low-Dimensional Space**: Compressed 2D/3D version for visualization
* **Perplexity**: Balances local vs. global structure; like the number of neighbors each point considers
* **KL Divergence**: A way to measure how different two distributions are

---

## 🧸 2. Simplified Explanation

Imagine compressing a large **world map into a small sheet**, where:

* Nearby cities still stay close
* Far-away cities remain distant (mostly)

t-SNE is like a **smart map-maker**:

> It keeps nearby points **visually close**, so you can **see the data’s structure** — like clusters — even if the original data had 100+ dimensions.

---

## 📕 3. Definition

> **t-SNE** is a non-linear, unsupervised dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving local neighborhood structure using probability distributions and minimizing divergence between them.

---

## 🧠 4. Simple Analogy

🧩 **Friend Groups Analogy**:
You’re trying to draw a seating chart where:

* Best friends sit together
* Strangers sit apart

t-SNE does this by comparing who’s “close” in the original world and arranges them similarly in 2D — so **natural groups form** visually.

---

## 🚗 5. Examples

### 🚘 Automotive:

* **Cluster driving patterns** using high-dimensional sensor data
* **Visualize health of ECUs** or battery packs by reducing diagnostic features
* Compare **vehicle usage clusters** in fleet analytics

### 🌍 General:

* Visualizing high-dimensional **image embeddings**
* Clustering **gene expression profiles**
* Understanding **word embeddings** (like Word2Vec)

---

## 📐 6. Mathematical Equations

### Step 1: Compute Similarities in High-D Space

For each pair of points $x_i, x_j$, compute:

$$
p_{j|i} = \frac{\exp(-\|x_i - x_j\|^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\|x_i - x_k\|^2 / 2\sigma_i^2)}
$$

Then:

$$
p_{ij} = \frac{p_{j|i} + p_{i|j}}{2n}
$$

### Step 2: Compute Similarities in Low-D Space

$$
q_{ij} = \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}
$$

### Step 3: Minimize KL Divergence

Cost function:

$$
KL(P \parallel Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
$$

This is minimized using **gradient descent**.

---

## 📌 7. Important Information

* t-SNE focuses on **local structure**, not global
* **Doesn’t preserve distances or scales**
* **Non-deterministic** — results can change each time unless you set a `random_state`
* You should **standardize/normalize** data before applying t-SNE

---

## 🔁 8. Comparison with Similar Topics

| Feature      | PCA             | t-SNE                | UMAP                |
| ------------ | --------------- | -------------------- | ------------------- |
| Type         | Linear          | Non-linear           | Non-linear          |
| Preserves    | Global variance | Local similarity     | Local + some global |
| Speed        | ⚡ Fast          | 🐢 Slow              | 🚀 Fast             |
| Reproducible | ✅ Yes           | ❌ No (unless seeded) | ✅ Yes (mostly)      |
| Useful for   | Preprocessing   | Visualization        | Visualization + ML  |

---

## ✅ 9. Advantages and Disadvantages

### ✅ Advantages:

* Creates **beautiful 2D/3D plots**
* Helps visually discover **clusters**
* Works well even when **data is not linearly separable**

### ❌ Disadvantages:

* Slow on large datasets
* Doesn’t preserve true distances or densities
* Hard to tune parameters (especially `perplexity`)

---

## ⚠️ 10. Things to Watch Out For

* Always **normalize** data before t-SNE
* Results may vary — use `random_state` for reproducibility
* **Choose perplexity wisely** (typically between 5–50)
* **Not suitable** for downstream tasks (like feeding into another model)

---

## 💡 11. Other Critical Insights

* You can use **PCA before t-SNE** to speed it up
* t-SNE is great for **checking clustering quality** (e.g., after K-Means)
* Use **UMAP** if you want better global structure and faster performance

---
