# Embedding Visualization

## Overview

This notebook demonstrates how to generate embeddings, optimize them, and visualize using t-SNE, PCA, and UMAP dimensionality reduction techniques.

### Learning Objectives

- Generate embeddings for documents
- Optimize embeddings for better quality
- Visualize embeddings using t-SNE
- Visualize embeddings using PCA
- Visualize embeddings using UMAP

---

## Workflow

**Generate Embeddings → Optimize → Visualize**

Visualization helps understand the structure and relationships in embedding spaces.

---

## Step 1: Generate Embeddings

Start by generating embeddings for your documents.


In [None]:
from semantica.embeddings import EmbeddingGenerator
import numpy as np

documents = [
    "Machine learning algorithms",
    "Deep neural networks",
    "Natural language processing",
    "Computer vision",
    "Reinforcement learning",
]

generator = EmbeddingGenerator()

try:
    embeddings = generator.generate(documents)
    print("✓ Embeddings generated")
    print(f"  Documents: {len(documents)}")
    print(f"  Embedding dimension: {embeddings.shape[1] if hasattr(embeddings, 'shape') else 'N/A'}")
    
except Exception as e:
    print(f"✗ Error generating embeddings: {e}")
    embeddings = np.random.rand(len(documents), 1536).astype(np.float32)
    print("  Using demo embeddings")


## Step 2: Optimize Embeddings

Optimize embeddings to improve their quality and reduce noise.


In [None]:
from semantica.embeddings import EmbeddingOptimizer

optimizer = EmbeddingOptimizer()

try:
    optimized_embeddings = optimizer.optimize(embeddings)
    print("✓ Embeddings optimized")
    print(f"  Optimized embeddings ready for visualization")
    
except Exception as e:
    print(f"✗ Error optimizing embeddings: {e}")
    optimized_embeddings = embeddings
    print("  Using original embeddings")


## Step 3: Visualize with t-SNE

Use t-SNE (t-Distributed Stochastic Neighbor Embedding) to visualize embeddings in 2D space.


In [None]:
from semantica.visualization import EmbeddingVisualizer

visualizer = EmbeddingVisualizer()

labels = [f"Doc {i+1}" for i in range(len(documents))]

try:
    visualizer.visualize_tsne(optimized_embeddings, labels)
    print("✓ t-SNE visualization complete")
    print("  Note: t-SNE shows local structure and clusters in embedding space")
    
except Exception as e:
    print(f"✗ Error visualizing with t-SNE: {e}")
    print("  Note: t-SNE reduces high-dimensional embeddings to 2D for visualization")


## Step 4: Visualize with PCA

Use PCA (Principal Component Analysis) to visualize embeddings, preserving global structure.


In [None]:
try:
    visualizer.visualize_pca(optimized_embeddings, labels)
    print("✓ PCA visualization complete")
    print("  Note: PCA preserves global structure and variance")
    
except Exception as e:
    print(f"✗ Error visualizing with PCA: {e}")
    print("  Note: PCA reduces dimensions while preserving maximum variance")


## Step 5: Visualize with UMAP

Use UMAP (Uniform Manifold Approximation and Projection) for a balance between local and global structure.


In [None]:
try:
    visualizer.visualize_umap(optimized_embeddings, labels)
    print("✓ UMAP visualization complete")
    print("  Note: UMAP balances local and global structure preservation")
    print("\n✓ Embedding visualization complete")
    print("  All visualization methods demonstrate different aspects of embedding space")
    
except Exception as e:
    print(f"✗ Error visualizing with UMAP: {e}")
    print("  Note: UMAP provides a good balance between t-SNE and PCA")
