```{contents}
```

## Workflow

t-SNE is **unsupervised** and mainly used for **dimensionality reduction for visualization**. The workflow can be divided into key stages:

---

### **1. Data Preprocessing**

1. **Standardize / normalize features**

   * t-SNE is sensitive to scale.
   * Common: z-score normalization (mean 0, variance 1).
2. **Optional PCA preprocessing**

   * Reduce dimensionality to 30–50 components first.
   * Advantages:

     * Reduces noise
     * Speeds up t-SNE
     * Helps avoid local minima

---

### **2. Compute Pairwise Similarities in High-Dimensional Space**

1. For each point $x_i$, compute **conditional probabilities** $p_{j|i}$ representing similarity to other points:

$$
p_{j|i} = \frac{\exp(-\|x_i - x_j\|^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\|x_i - x_k\|^2 / 2\sigma_i^2)}
$$

2. Symmetrize the probabilities:

$$
p_{ij} = \frac{p_{i|j} + p_{j|i}}{2n}
$$

3. **Perplexity parameter** determines $\sigma_i$ (effective neighborhood size).

**Intuition:** Nearby points get higher similarity, distant points get low similarity.

---

### **3. Initialize Low-Dimensional Embedding**

* Start with a random placement of points in 2D or 3D: $y_i$.
* Alternative: PCA initialization (can help convergence).

---

### **4. Compute Pairwise Similarities in Low-Dimensional Space**

* Use a **Student-t distribution** with 1 degree of freedom:

$$
q_{ij} = \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}
$$

* Heavy tails help avoid “crowding problem”: distant points are spread out.

---

### **5. Minimize KL Divergence**

* Objective: match high-D similarities ($p_{ij}$) with low-D similarities ($q_{ij}$) by minimizing:

$$
\text{KL}(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
$$

* Use **gradient descent** to update low-D points $y_i$.

**Iterative Process:**

* Gradually move points to reduce KL divergence.
* Early exaggeration phase: multiplies $p_{ij}$ temporarily to form tight clusters early in training.
* Momentum is used to stabilize optimization.

---

### **6. Output Low-Dimensional Embedding**

* After convergence, you get a **2D or 3D representation**: $y_i$.
* Can visualize clusters and local neighborhoods.

---

### **7. Optional Post-Processing**

* Color points by labels (if available) for visualization.
* Annotate clusters, compute centroids, or overlay additional metadata.

---

**Workflow Summary Diagram**

1. **Data preprocessing** → standardization, optional PCA
2. **Compute high-D similarities** → probabilities $p_{ij}$
3. **Initialize low-D embedding** → random or PCA
4. **Compute low-D similarities** → $q_{ij}$
5. **Minimize KL divergence** → iterative gradient descent
6. **Output embedding** → 2D/3D for visualization
7. **Post-processing / visualization** → annotate, color, interpret

---

**Intuition**

* t-SNE is like **folding a high-dimensional sheet of data into 2D**:

  * Keep neighbors close
  * Push distant points away
  * Heavy-tailed distribution ensures clusters don’t collapse
