---
title: "Part 4: Universal Representations"
jupyter: python3
execute:
    enabled: true
    cache: true
---

::: {.callout-note title="What you'll learn in this section"}
The representational insight extends far beyond text. Images, graphs, molecules, time series, and even entire systems can be embedded in continuous space where structure becomes geometry. We'll explore how this universal framework unifies machine learning across domains.
:::

## The Representational Paradigm

Word2vec revealed something profound. You don't need to know what things are. You only need to observe how they relate. This principle transcends language. Any structured system where entities co-occur, interact, or connect can be embedded in vector space.

The recipe is consistent across domains. First, define what "context" means for your data type. For words, context is nearby words. For images, context might be nearby pixels or visual features. For nodes in a network, context is neighboring nodes. Second, learn representations that preserve this contextual structure. Entities with similar contexts get similar embeddings. Third, use the geometry of the resulting space to reason about relationships.

This is the representational paradigm. Instead of building task-specific models (classifiers, detectors, predictors), we first learn universal representations of the input space, then solve downstream tasks by operating on those representations. The representation becomes the shared foundation.

In [None]:
#| fig-cap: 'The representational paradigm: map diverse inputs to a shared geometric space.'
#| label: fig-representation-paradigm
#| code-fold: true

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import FancyBboxPatch, FancyArrowPatch, Circle

fig, ax = plt.subplots(figsize=(14, 10))

# Input modalities (left side)
input_types = [
    ('TEXT', 'dog runs fast', 4.5),
    ('IMAGE', 'üñºÔ∏è', 3.0),
    ('GRAPH', '‚óè‚Äî‚óè‚Äî‚óè', 1.5),
    ('AUDIO', '„Ä∞Ô∏è„Ä∞Ô∏è„Ä∞Ô∏è', 0),
]

for label, example, y in input_types:
    # Input box
    box = FancyBboxPatch((0, y - 0.3), 2, 0.6,
                         boxstyle="round,pad=0.1",
                         facecolor='#ecf0f1', edgecolor='black', linewidth=2)
    ax.add_patch(box)
    ax.text(0.3, y + 0.15, label, fontsize=11, fontweight='bold')
    ax.text(0.3, y - 0.15, example, fontsize=9, style='italic')

    # Arrow to embedding space
    arrow = FancyArrowPatch((2.1, y), (4.5, 2.25),
                          arrowstyle='->', mutation_scale=20, linewidth=2,
                          color='#3498db', alpha=0.7)
    ax.add_patch(arrow)
    ax.text(3.3, y + 0.5 if y > 2.25 else y - 0.5, 'Encode',
           fontsize=9, style='italic', color='#3498db')

# Central embedding space
circle = Circle((7, 2.25), 2.5, facecolor='#3498db', alpha=0.2,
               edgecolor='#3498db', linewidth=3)
ax.add_patch(circle)

# Points in embedding space
np.random.seed(42)
n_points = 40
angles = np.random.rand(n_points) * 2 * np.pi
radii = np.random.rand(n_points) * 2.2
x_points = 7 + radii * np.cos(angles)
y_points = 2.25 + radii * np.sin(angles)

ax.scatter(x_points, y_points, s=60, c='#2c3e50', alpha=0.6,
          edgecolors='white', linewidth=0.5)

ax.text(7, 2.25, 'VECTOR\nSPACE', ha='center', va='center',
       fontsize=14, fontweight='bold', color='#2c3e50',
       bbox=dict(boxstyle='round', facecolor='white', alpha=0.9))

# Downstream tasks (right side)
tasks = [
    ('Classification', 4.5),
    ('Similarity Search', 3.5),
    ('Clustering', 2.5),
    ('Analogy', 1.5),
    ('Generation', 0.5),
    ('Transfer Learning', -0.3),
]

for task, y in tasks:
    # Arrow from embedding space
    arrow = FancyArrowPatch((9.6, 2.25), (11.5, y),
                          arrowstyle='->', mutation_scale=20, linewidth=2,
                          color='#e74c3c', alpha=0.7)
    ax.add_patch(arrow)

    # Task box
    box = FancyBboxPatch((11.5, y - 0.25), 2.5, 0.5,
                         boxstyle="round,pad=0.05",
                         facecolor='#ffe6e6', edgecolor='#e74c3c', linewidth=2)
    ax.add_patch(box)
    ax.text(12.75, y, task, ha='center', va='center',
           fontsize=10, fontweight='bold')

ax.set_xlim(-0.5, 14.5)
ax.set_ylim(-1, 5.5)
ax.axis('off')
ax.set_title('The Representational Paradigm: One Space, Many Tasks',
            fontsize=18, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()

## Graph Embeddings: Node2Vec

Networks present a natural extension of word2vec's logic. Words appear in sequences. Nodes appear in neighborhoods. If we can generate sequences of nodes by walking through the network, we can apply the same contrastive learning approach.

Node2Vec does exactly this. It performs random walks on a graph, treating the sequence of visited nodes like a sentence. A walk might go: NodeA ‚Üí NodeB ‚Üí NodeC ‚Üí NodeD. This generates training pairs: (NodeB, NodeA), (NodeB, NodeC), and so on. The model learns embeddings where nodes with similar neighborhoods end up nearby in vector space.

::: {.column-margin}
Node2Vec (Grover & Leskovec, 2016) extends DeepWalk by introducing biased random walks that interpolate between breadth-first and depth-first exploration, capturing different notions of network similarity.
:::

In [None]:
#| fig-cap: Random walks on graphs generate sequences that reveal structural similarity.
#| label: fig-node2vec
#| code-fold: true

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Create a sample network
G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)

# Left: Show random walk
walk_path = [0, 1, 2, 3, 13, 33, 32, 23]

# Draw all nodes and edges
nx.draw_networkx_edges(G, pos, alpha=0.2, ax=ax1)
nx.draw_networkx_nodes(G, pos, node_color='#ecf0f1', node_size=300,
                      edgecolors='black', linewidth=1, ax=ax1)

# Highlight walk path
walk_edges = [(walk_path[i], walk_path[i+1]) for i in range(len(walk_path)-1)]
nx.draw_networkx_edges(G, pos, edgelist=walk_edges, edge_color='#e74c3c',
                      width=3, arrows=True, arrowsize=20,
                      arrowstyle='->', ax=ax1)

# Highlight walked nodes
nx.draw_networkx_nodes(G, pos, nodelist=walk_path, node_color='#3498db',
                      node_size=400, edgecolors='black', linewidth=2, ax=ax1)

# Add labels to walked nodes
walk_labels = {node: f'{i}' for i, node in enumerate(walk_path)}
nx.draw_networkx_labels(G, pos, labels=walk_labels, font_color='white',
                       font_weight='bold', font_size=9, ax=ax1)

ax1.set_title('Random Walk: NodeA ‚Üí NodeB ‚Üí NodeC ‚Üí ...',
             fontsize=14, fontweight='bold')
ax1.axis('off')

# Add walk sequence annotation
walk_seq = ' ‚Üí '.join([f'N{i}' for i in range(len(walk_path))])
ax1.text(0.5, -1.15, f'Walk sequence: {walk_seq}',
        ha='center', transform=ax1.transAxes, fontsize=10,
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))

# Right: Embedding space
# Simulate embeddings where structurally similar nodes cluster
np.random.seed(42)

# Identify communities (roughly)
club1 = [n for n, d in G.nodes(data=True) if d['club'] == 'Mr. Hi']
club2 = [n for n, d in G.nodes(data=True) if d['club'] == 'Officer']

# Generate embeddings (2D projection)
embeddings = {}
for node in club1:
    embeddings[node] = np.random.randn(2) * 0.5 + np.array([-1.5, 0])
for node in club2:
    embeddings[node] = np.random.randn(2) * 0.5 + np.array([1.5, 0])

# Plot embeddings
for node in G.nodes():
    x, y = embeddings[node]
    color = '#3498db' if node in club1 else '#e74c3c'
    alpha = 0.9 if node in walk_path else 0.3
    size = 400 if node in walk_path else 200

    ax2.scatter(x, y, s=size, c=color, alpha=alpha,
               edgecolors='black', linewidth=1.5 if node in walk_path else 1,
               zorder=3 if node in walk_path else 2)

    if node in walk_path:
        ax2.text(x, y, f'{walk_path.index(node)}', ha='center', va='center',
                fontsize=9, fontweight='bold', color='white')

# Add cluster annotations
ax2.text(-1.5, 2, 'Community 1', fontsize=12, fontweight='bold',
        ha='center', bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))
ax2.text(1.5, 2, 'Community 2', fontsize=12, fontweight='bold',
        ha='center', bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.7))

ax2.set_xlim(-3.5, 3.5)
ax2.set_ylim(-2.5, 2.5)
ax2.set_xlabel('Embedding Dimension 1', fontsize=11)
ax2.set_ylabel('Embedding Dimension 2', fontsize=11)
ax2.set_title('Learned Embeddings: Structural Similarity ‚Üí Spatial Proximity',
             fontsize=14, fontweight='bold')
ax2.grid(alpha=0.3, linestyle='--')

plt.tight_layout()
plt.show()

The beautiful part is what counts as "similar". Two nodes might be similar because they're directly connected (local similarity). Or they're similar because they play equivalent roles in different parts of the network (structural equivalence). Node2Vec captures both. By adjusting the random walk strategy, you can emphasize different notions of similarity without changing the core algorithm.

This dissolves the boundary between topology and geometry. The network's structure becomes a coordinate system. Questions about network properties become questions about spatial relationships in the embedding space.

## Image Embeddings: Learning Visual Representations

Images pose a different challenge. Pixels don't appear in "context" the way words do. But convolutional neural networks (CNNs) learn hierarchical representations where each layer captures increasingly abstract visual features. The final layer before classification is a vector embedding of the image.

These embeddings exhibit the same geometric properties as word embeddings. Images of similar objects cluster together. The vector from "cat" images to "tiger" images parallels the vector from "dog" images to "wolf" images (the "wild version" relationship). You can do visual analogy through vector arithmetic.

::: {.column-margin}
Modern vision models like CLIP learn joint embeddings of images and text, creating a shared semantic space where "a photo of a cat" and actual cat photos occupy similar regions.
:::

In [None]:
#| fig-cap: CNN embeddings map images to vectors where visual similarity becomes geometric proximity.
#| label: fig-image-embeddings
#| code-fold: true

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import Rectangle, FancyArrowPatch
from matplotlib.offsetbox import OffsetImage, AnnotationBbox

fig, ax = plt.subplots(figsize=(14, 10))

# Simulate image embeddings in 2D
np.random.seed(42)

categories = {
    'Cats': {'center': np.array([-2, 2]), 'n': 8, 'color': '#e74c3c'},
    'Dogs': {'center': np.array([2, 2]), 'n': 8, 'color': '#3498db'},
    'Cars': {'center': np.array([-2, -2]), 'n': 8, 'color': '#f39c12'},
    'Trees': {'center': np.array([2, -2]), 'n': 8, 'color': '#27ae60'},
}

# Generate embeddings
for category, config in categories.items():
    center = config['center']
    n = config['n']
    color = config['color']

    # Generate points around center
    embeddings = np.random.randn(n, 2) * 0.4 + center

    # Plot
    ax.scatter(embeddings[:, 0], embeddings[:, 1], s=300, c=color,
              alpha=0.6, edgecolors='black', linewidth=1.5, label=category)

    # Add category label
    ax.text(center[0], center[1] + 1.3, category, ha='center',
           fontsize=13, fontweight='bold',
           bbox=dict(boxstyle='round', facecolor=color, alpha=0.3))

# Draw arrows showing relationships
# "Domestic to wild" relationship
cat_center = categories['Cats']['center']
dog_center = categories['Dogs']['center']

# Add annotations for key relationships
ax.annotate('', xy=(cat_center[0] + 2, cat_center[1] + 0.2),
           xytext=(cat_center[0], cat_center[1]),
           arrowprops=dict(arrowstyle='->', lw=2.5, color='purple', alpha=0.7))
ax.text(0, 2.8, 'Semantic\nSimilarity', ha='center', fontsize=10,
       style='italic', color='purple')

# Visual feature space axes
ax.annotate('', xy=(3.5, 0), xytext=(-3.5, 0),
           arrowprops=dict(arrowstyle='<->', lw=2, color='gray', alpha=0.5))
ax.text(0, -4, '‚Üê Natural / Artificial ‚Üí', ha='center', fontsize=11,
       fontweight='bold', color='gray')

ax.annotate('', xy=(0, 3.5), xytext=(0, -3.5),
           arrowprops=dict(arrowstyle='<->', lw=2, color='gray', alpha=0.5))
ax.text(-4, 0, '‚Üê Living / Non-living ‚Üí', ha='center', fontsize=11,
       fontweight='bold', color='gray', rotation=90)

ax.set_xlim(-4.5, 4.5)
ax.set_ylim(-4.5, 4.5)
ax.set_aspect('equal')
ax.legend(loc='upper left', fontsize=11, framealpha=0.9)
ax.grid(alpha=0.3, linestyle='--')
ax.set_title('Image Embeddings: Visual Concepts as Geometric Clusters',
            fontsize=16, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()

What's remarkable is that these representations are learned without explicit supervision about semantic relationships. The network is trained to classify images (cat vs. dog vs. car vs. tree). The embedding space emerges as a byproduct. The geometric structure of visual similarity is discovered, not programmed.

This is representation learning's power. The same network architecture can be used for classification, similarity search, image retrieval, and transfer learning, all because the learned embeddings capture meaningful structure.

## Molecules, Music, and More

The technique generalizes to any structured domain. Molecular embeddings represent chemical compounds as vectors based on their structural properties and interactions. Similar molecules end up nearby, enabling drug discovery through geometric search. Musical embeddings capture harmonic and rhythmic relationships, making it possible to search for songs by similarity or generate variations.

Even proteins can be embedded by treating amino acid sequences like sentences and using transformer models (the same architecture behind GPT and BERT). The learned embeddings capture functional relationships. Proteins with similar functions cluster together, even if their sequences differ significantly.

In [None]:
#| fig-cap: 'The representational framework applies across domains: from molecules to music to proteins.'
#| label: fig-diverse-embeddings
#| code-fold: true

import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(2, 2, figsize=(14, 12))
fig.suptitle('Universal Representations Across Domains',
            fontsize=18, fontweight='bold', pad=20)

np.random.seed(42)

domains = [
    {
        'name': 'Molecular Embeddings',
        'categories': ['Antibiotics', 'Anti-inflammatory', 'Analgesics'],
        'colors': ['#e74c3c', '#3498db', '#f39c12'],
    },
    {
        'name': 'Music Embeddings',
        'categories': ['Classical', 'Jazz', 'Rock'],
        'colors': ['#9b59b6', '#1abc9c', '#e67e22'],
    },
    {
        'name': 'Protein Embeddings',
        'categories': ['Enzymes', 'Receptors', 'Transporters'],
        'colors': ['#16a085', '#c0392b', '#2980b9'],
    },
    {
        'name': 'User Embeddings',
        'categories': ['Tech enthusiasts', 'Sports fans', 'Book lovers'],
        'colors': ['#8e44ad', '#27ae60', '#d35400'],
    },
]

for idx, (ax, domain_info) in enumerate(zip(axes.flat, domains)):
    categories = domain_info['categories']
    colors = domain_info['colors']

    # Generate clusters
    for i, (category, color) in enumerate(zip(categories, colors)):
        angle = i * (2 * np.pi / len(categories))
        center = 2 * np.array([np.cos(angle), np.sin(angle)])

        # Generate points
        n_points = 15
        points = np.random.randn(n_points, 2) * 0.4 + center

        ax.scatter(points[:, 0], points[:, 1], s=150, c=color,
                  alpha=0.6, edgecolors='black', linewidth=1, label=category)

        # Add label
        label_offset = 1.3
        label_pos = center * label_offset
        ax.text(label_pos[0], label_pos[1], category, ha='center', va='center',
               fontsize=9, fontweight='bold',
               bbox=dict(boxstyle='round', facecolor=color, alpha=0.3))

    ax.set_xlim(-4, 4)
    ax.set_ylim(-4, 4)
    ax.set_aspect('equal')
    ax.set_title(domain_info['name'], fontsize=13, fontweight='bold')
    ax.grid(alpha=0.3, linestyle='--')
    ax.set_xticks([])
    ax.set_yticks([])

plt.tight_layout()
plt.show()

The pattern is consistent. Define what "context" means for your domain. Learn representations where similar contexts produce similar vectors. Use the geometry of the resulting space to solve tasks. The specific implementation details vary, but the philosophical insight remains: meaning is relational, structure emerges from contrast, and continuous representations dissolve artificial boundaries.

## Time Series as Trajectories

Time series present yet another perspective. Instead of embedding individual points, you can embed entire sequences as trajectories through latent space. Similar dynamic processes produce similar trajectories. You can measure similarity between time series by comparing their paths, cluster them by trajectory shape, or forecast by extrapolating the path.

In [None]:
#| fig-cap: Time series as trajectories in latent space. Similar dynamics produce similar paths.
#| label: fig-time-series-embedding
#| code-fold: true

import matplotlib.pyplot as plt
import numpy as np

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Generate time series
t = np.linspace(0, 4 * np.pi, 100)

# Different patterns
sine_wave = np.sin(t)
damped_sine = np.sin(t) * np.exp(-t / 10)
growing_sine = np.sin(t) * np.exp(t / 15)
linear = t / (4 * np.pi)

# Plot time series
ax1.plot(t, sine_wave, label='Periodic', linewidth=2, color='#3498db')
ax1.plot(t, damped_sine, label='Damped', linewidth=2, color='#e74c3c')
ax1.plot(t, growing_sine, label='Growing', linewidth=2, color='#27ae60')
ax1.plot(t, linear, label='Linear', linewidth=2, color='#f39c12')

ax1.set_xlabel('Time', fontsize=12, fontweight='bold')
ax1.set_ylabel('Value', fontsize=12, fontweight='bold')
ax1.set_title('Time Series in Temporal Domain', fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(alpha=0.3, linestyle='--')

# Embed as trajectories in 2D latent space
# Simulate embeddings where similar dynamics cluster
ax2.plot([0, 2, 1, 0], [0, 1, 2, 0], marker='o', markersize=8,
        label='Periodic', linewidth=2.5, color='#3498db', alpha=0.7)
ax2.plot([0, 1, 0.5, 0.2], [0, 1, 1.2, 1], marker='o', markersize=8,
        label='Damped', linewidth=2.5, color='#e74c3c', alpha=0.7)
ax2.plot([0, 2, 3, 4], [0, 1, 2.5, 4], marker='o', markersize=8,
        label='Growing', linewidth=2.5, color='#27ae60', alpha=0.7)
ax2.plot([0, 1, 2, 3], [0, 0.3, 0.6, 0.9], marker='o', markersize=8,
        label='Linear', linewidth=2.5, color='#f39c12', alpha=0.7)

# Add arrows
for trajectory_data in [
    ([0, 2, 1, 0], [0, 1, 2, 0], '#3498db'),
    ([0, 1, 0.5, 0.2], [0, 1, 1.2, 1], '#e74c3c'),
    ([0, 2, 3, 4], [0, 1, 2.5, 4], '#27ae60'),
    ([0, 1, 2, 3], [0, 0.3, 0.6, 0.9], '#f39c12'),
]:
    x, y, color = trajectory_data
    for i in range(len(x) - 1):
        ax2.annotate('', xy=(x[i+1], y[i+1]), xytext=(x[i], y[i]),
                    arrowprops=dict(arrowstyle='->', lw=1.5, color=color, alpha=0.4))

ax2.set_xlabel('Latent Dimension 1', fontsize=12, fontweight='bold')
ax2.set_ylabel('Latent Dimension 2', fontsize=12, fontweight='bold')
ax2.set_title('Time Series as Trajectories in Latent Space',
             fontsize=14, fontweight='bold')
ax2.legend(fontsize=11, loc='upper left')
ax2.grid(alpha=0.3, linestyle='--')
ax2.set_xlim(-0.5, 4.5)
ax2.set_ylim(-0.5, 4.5)

plt.tight_layout()
plt.show()

This perspective transforms time series analysis. Instead of comparing raw signals (which are sensitive to noise, scaling, and temporal alignment), you compare their underlying dynamical patterns as captured by their embeddings. The boundary between "similar" and "different" becomes a matter of distance in latent space rather than a hard classification rule.

## The Unified View

Step back and see the pattern. Across all these domains, we're doing the same thing:

1. Observe how entities relate (co-occur, connect, interact, transform).
2. Learn vector representations that preserve these relationships.
3. Map discrete symbols to continuous coordinates.
4. Dissolve artificial boundaries in favor of smooth gradients.
5. Answer questions geometrically rather than categorically.

This is structuralism made computational. This is the insight that boundaries are observer-dependent, turned into working technology. This is the philosophy that meaning is relational, implemented as neural networks and optimization algorithms.

Modern machine learning increasingly operates in this representational paradigm. Large language models learn representations of language. Vision transformers learn representations of images. Graph neural networks learn representations of networks. Multimodal models learn shared representations across modalities. The representations themselves become the primary object of study.

## What We've Learned

We started with regional dialects and generational labels, noticing that the boundaries we draw are often arbitrary. We explored structuralist philosophy, learning that meaning emerges from networks of contrast rather than intrinsic essence. We saw how word2vec operationalizes these ideas through contrastive learning and vector embeddings. And we've now seen how this insight generalizes across all structured domains.

The lesson is philosophical and practical. Philosophically, it reminds us that the categories we use are tools of analysis, not features of reality. The world is continuous. The boundaries are ours. Practically, it suggests that whenever we're tempted to force phenomena into discrete boxes, we should ask: could a continuous representation serve better?

The representational turn in machine learning isn't just about better algorithms. It's about better epistemology. It's about building systems that preserve the nuance, gradation, and relational structure that discrete labels destroy. It's about melting the boundaries we've artificially frozen, and letting the true structure of similarity and difference emerge from the data itself.

::: {.callout-tip title="Explore further"}
Think about your own research domain. What are the discrete categories you use? Could they be embedded in continuous space? What new questions become answerable when you work with gradients instead of boundaries? The representational perspective often reveals structures that categorical thinking obscures.
:::