# Layer-by-Layer Hidden State Analysis Tutorial

This notebook demonstrates how to use **The Loom** for layer-by-layer hidden state extraction and analysis.

## What You'll Learn

1. **Extracting hidden states from specific layers** (e.g., `[-1, -5, -11]`)
2. **Computing D_eff (effective dimensionality)** per layer
3. **Visualizing D_eff evolution** across transformer layers

## Prerequisites

- The Loom server running (`docker run -d --gpus all -p 8080:8080 tbucy/loom:latest`)
- Python packages: `numpy`, `matplotlib`, `httpx`

## Mathematical Background

**D_eff (Effective Dimensionality)** measures the intrinsic dimensionality of embedding space:

- Computed via PCA eigenvalue analysis
- Returns the number of dimensions needed to capture 90% of variance
- Lower D_eff = more compressed representations
- Higher D_eff = richer, more diverse representations

*Reference: Whiteley et al. "Statistical exploration of the Manifold Hypothesis" (arXiv:2208.11665)*

## 1. Setup

In [None]:
# Standard imports
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np

# Add parent directory for The Loom imports
sys.path.insert(0, str(Path.cwd().parent))

from src.client import LoomClient
from src.analysis.conveyance_metrics import calculate_d_eff, calculate_d_eff_detailed

In [None]:
# Configuration
LOOM_URL = "http://localhost:8080"
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
PROMPT = "The fundamental nature of reality is"

# Layers to analyze
# Negative indices: -1 = last, -5 = 5th from last, -11 = 11th from last
TARGETED_LAYERS = [-1, -5, -11]

print(f"Model: {MODEL_ID}")
print(f"Prompt: '{PROMPT}'")
print(f"Target Layers: {TARGETED_LAYERS}")

In [None]:
# Connect to The Loom server
client = LoomClient(base_url=LOOM_URL)

# Verify connection
try:
    health = client.health()
    print(f"Connected to The Loom!")
    print(f"Server Status: {health.get('status', 'unknown')}")
    print(f"GPU Available: {health.get('gpu', {}).get('available', False)}")
except Exception as e:
    print(f"Error connecting: {e}")
    print("\nMake sure The Loom is running:")
    print("  docker run -d --gpus all -p 8080:8080 tbucy/loom:latest")

## 2. Extracting Hidden States from Specific Layers

The Loom allows you to specify exactly which layers you want hidden states from using the `hidden_state_layers` parameter.

**Layer indexing:**
- `-1` = Last layer (final semantic representation)
- `-5` = 5th layer from the end
- `"all"` = Every layer in the model

In [None]:
# Extract hidden states from specific layers: [-1, -5, -11]
result = client.generate(
    model=MODEL_ID,
    prompt=PROMPT,
    max_tokens=20,
    return_hidden_states=True,
    hidden_state_layers=TARGETED_LAYERS,
)

print(f"Generated text: '{result['text']}'")
print(f"Token count: {result['token_count']}")
print(f"\nHidden states returned for layers: {list(result['hidden_states'].keys())}")

In [None]:
# Examine the hidden states
for layer_key, layer_data in result['hidden_states'].items():
    print(f"\nLayer {layer_key}:")
    print(f"  Shape: {layer_data['shape']}")
    print(f"  Dtype: {layer_data['dtype']}")
    
    # Quick stats on the hidden state vector
    data = np.array(layer_data['data'])
    print(f"  Mean: {data.mean():.4f}")
    print(f"  Std: {data.std():.4f}")
    print(f"  L2 Norm: {np.linalg.norm(data):.4f}")

## 3. Computing D_eff (Effective Dimensionality) Per Layer

D_eff tells us how many dimensions are "active" in a hidden state representation.

For a single hidden state vector, we estimate D_eff by analyzing the contribution of each dimension to the total magnitude.

In [None]:
def compute_d_eff_single_vector(vector: np.ndarray, variance_threshold: float = 0.90) -> int:
    """Compute D_eff for a single hidden state vector.
    
    For a single vector, we estimate D_eff by treating each dimension
    as a sample and computing how many are needed to capture 90% of
    the total magnitude.
    
    Parameters:
        vector: The hidden state vector (1D numpy array)
        variance_threshold: Target cumulative contribution (default 0.90)
    
    Returns:
        Estimated effective dimensionality
    """
    # Flatten if needed
    vector = vector.flatten()
    
    # Sort by absolute value (descending)
    abs_values = np.abs(vector)
    sorted_vals = np.sort(abs_values)[::-1]
    
    # Compute cumulative contribution
    total = sorted_vals.sum()
    if total == 0:
        return 1
    
    cumsum = np.cumsum(sorted_vals) / total
    d_eff = int(np.searchsorted(cumsum, variance_threshold) + 1)
    
    return min(d_eff, len(vector))

# Test on the last layer
last_layer = result['hidden_states']['-1']
vector = np.array(last_layer['data'])
d_eff = compute_d_eff_single_vector(vector)

print(f"Layer -1 Analysis:")
print(f"  Hidden Size: {len(vector)}")
print(f"  D_eff (90%): {d_eff}")
print(f"  Utilization: {d_eff / len(vector):.2%}")

In [None]:
# Compute D_eff for all targeted layers
layer_analysis = {}

for layer_key, layer_data in result['hidden_states'].items():
    vector = np.array(layer_data['data'])
    d_eff = compute_d_eff_single_vector(vector)
    hidden_size = layer_data['shape'][-1]
    
    layer_analysis[int(layer_key)] = {
        'layer': int(layer_key),
        'd_eff': d_eff,
        'hidden_size': hidden_size,
        'utilization': d_eff / hidden_size,
        'l2_norm': float(np.linalg.norm(vector)),
    }

# Display results
print("\nD_eff Analysis for Targeted Layers:")
print("-" * 50)
print(f"{'Layer':>8} {'D_eff':>8} {'Hidden':>8} {'Util.':>10}")
print("-" * 50)

for layer_idx in sorted(layer_analysis.keys()):
    info = layer_analysis[layer_idx]
    print(f"{info['layer']:>8} {info['d_eff']:>8} {info['hidden_size']:>8} {info['utilization']:>10.2%}")

## 4. Full Layer Analysis (All Layers)

Now let's analyze **all layers** to see how D_eff evolves through the transformer.

In [None]:
# Extract hidden states from ALL layers
# When hidden_state_layers is not specified, the server returns all layers
full_result = client.generate(
    model=MODEL_ID,
    prompt=PROMPT,
    max_tokens=20,
    return_hidden_states=True,
    # Not specifying hidden_state_layers returns all layers
)

all_hidden_states = full_result.get('hidden_states', {})
print(f"Number of layers returned: {len(all_hidden_states)}")
print(f"Layer keys: {sorted(all_hidden_states.keys(), key=int)}")

In [None]:
# Compute D_eff for all layers
full_analysis = {}

for layer_key, layer_data in all_hidden_states.items():
    vector = np.array(layer_data['data'])
    d_eff = compute_d_eff_single_vector(vector)
    hidden_size = layer_data['shape'][-1]
    
    full_analysis[int(layer_key)] = {
        'layer': int(layer_key),
        'd_eff': d_eff,
        'hidden_size': hidden_size,
        'utilization': d_eff / hidden_size,
        'mean': float(np.mean(vector)),
        'std': float(np.std(vector)),
        'l2_norm': float(np.linalg.norm(vector)),
    }

# Display summary
layers_sorted = sorted(full_analysis.keys())
d_effs = [full_analysis[l]['d_eff'] for l in layers_sorted]

print(f"\nFull Layer Analysis Summary:")
print(f"  Total Layers: {len(full_analysis)}")
print(f"  Mean D_eff: {np.mean(d_effs):.1f}")
print(f"  Std D_eff: {np.std(d_effs):.1f}")
print(f"  Min D_eff: {np.min(d_effs)} (Layer {layers_sorted[np.argmin(d_effs)]})")
print(f"  Max D_eff: {np.max(d_effs)} (Layer {layers_sorted[np.argmax(d_effs)]})")

## 5. Plotting D_eff Across Layers

Visualization helps us understand how information is transformed through the network:

- **Early layers**: Often process raw features (higher D_eff)
- **Middle layers**: Abstraction and compression
- **Final layers**: Task-specific representations

In [None]:
# Prepare data for plotting
layers = [full_analysis[l]['layer'] for l in layers_sorted]
d_effs = [full_analysis[l]['d_eff'] for l in layers_sorted]
utilization = [full_analysis[l]['utilization'] for l in layers_sorted]
hidden_size = full_analysis[layers_sorted[0]]['hidden_size']

# Create figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Absolute D_eff values
ax1.plot(layers, d_effs, 'b-o', linewidth=2, markersize=6)
ax1.axhline(y=hidden_size, color='r', linestyle='--', alpha=0.5,
            label=f'Hidden Size ({hidden_size})')
ax1.set_xlabel('Layer Index', fontsize=12)
ax1.set_ylabel('D_eff (Effective Dimensionality)', fontsize=12)
ax1.set_title(f'D_eff by Layer - {MODEL_ID}', fontsize=14)
ax1.grid(True, alpha=0.3)
ax1.legend()

# Plot 2: D_eff as fraction of hidden size
colors = ['red' if u < 0.3 else 'orange' if u < 0.6 else 'green' for u in utilization]
ax2.bar(layers, utilization, color=colors, alpha=0.7)
ax2.axhline(y=0.9, color='g', linestyle='--', alpha=0.5, label='90% Threshold')
ax2.axhline(y=0.3, color='r', linestyle='--', alpha=0.5, label='30% Threshold')
ax2.set_xlabel('Layer Index', fontsize=12)
ax2.set_ylabel('D_eff / Hidden Size', fontsize=12)
ax2.set_title('Dimensional Utilization by Layer', fontsize=14)
ax2.set_ylim(0, 1)
ax2.grid(True, alpha=0.3, axis='y')
ax2.legend()

plt.tight_layout()
plt.savefig('layer_deff_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nPlot saved to: layer_deff_analysis.png")

## 6. D_eff Trajectory Analysis

Let's analyze how D_eff changes between consecutive layers to identify:
- **Compression points** (D_eff decreases)
- **Expansion points** (D_eff increases)
- **Bottleneck layers** (significant compression)

In [None]:
# Compute layer-to-layer changes
changes = []
for i in range(1, len(layers_sorted)):
    prev_layer = layers_sorted[i-1]
    curr_layer = layers_sorted[i]
    
    prev_deff = full_analysis[prev_layer]['d_eff']
    curr_deff = full_analysis[curr_layer]['d_eff']
    
    change = curr_deff - prev_deff
    pct_change = (change / prev_deff * 100) if prev_deff > 0 else 0
    
    changes.append({
        'from_layer': prev_layer,
        'to_layer': curr_layer,
        'change': change,
        'pct_change': pct_change,
    })

# Identify significant changes (>10% compression or expansion)
print("\nSignificant D_eff Changes Between Layers:")
print("-" * 60)
print(f"{'From':>8} -> {'To':>8}  {'Change':>10} {'Percent':>10}")
print("-" * 60)

for c in changes:
    if abs(c['pct_change']) > 10:
        direction = "compression" if c['change'] < 0 else "expansion"
        print(f"{c['from_layer']:>8} -> {c['to_layer']:>8}  {c['change']:>+10}  {c['pct_change']:>+9.1f}% ({direction})")

In [None]:
# Visualize the changes
change_values = [c['change'] for c in changes]
layer_pairs = [f"{c['from_layer']}â†’{c['to_layer']}" for c in changes]

fig, ax = plt.subplots(figsize=(12, 5))

colors = ['red' if v < 0 else 'green' for v in change_values]
ax.bar(range(len(change_values)), change_values, color=colors, alpha=0.7)
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)

ax.set_xlabel('Layer Transition', fontsize=12)
ax.set_ylabel('D_eff Change', fontsize=12)
ax.set_title('D_eff Changes Between Consecutive Layers', fontsize=14)
ax.set_xticks(range(0, len(layer_pairs), 2))
ax.set_xticklabels([layer_pairs[i] for i in range(0, len(layer_pairs), 2)], rotation=45, ha='right')

plt.tight_layout()
plt.show()

# Summary
compressions = sum(1 for c in changes if c['change'] < 0)
expansions = sum(1 for c in changes if c['change'] > 0)
print(f"\nSummary: {compressions} compression steps, {expansions} expansion steps")

## 7. Interpretation Guide

### Understanding D_eff

**D_eff (Effective Dimensionality)** measures how many dimensions are actively used in a hidden state representation:

| D_eff / Hidden_Size | Interpretation |
|---------------------|----------------|
| > 0.7 | High utilization - rich, diverse representations |
| 0.3 - 0.7 | Moderate - typical for most layers |
| < 0.3 | Low utilization - compressed or collapsed |

### Typical Patterns

1. **Early layers**: Often higher D_eff (processing raw input features)
2. **Middle layers**: Variable - abstraction forming
3. **Final layers**: Often task-specific (may show compression)

### Research Applications

- **Bottleneck detection**: Find layers where D_eff drops significantly
- **Model comparison**: Compare D_eff trajectories across models
- **Feature extraction**: Identify optimal layers for downstream tasks
- **Alignment research**: Study how representations evolve with different prompts

In [None]:
# Cleanup
client.close()
print("\nAnalysis complete! Client connection closed.")

## Next Steps

1. **Multi-agent comparison**: Compare D_eff patterns across different models
2. **Prompt sensitivity**: Analyze how D_eff changes with different prompts
3. **Full sequence analysis**: Use `return_full_sequence=True` for manifold analysis
4. **Beta calculation**: Compute collapse indicators using `calculate_beta()`

See the `src/analysis/conveyance_metrics.py` module for advanced metrics:
- `calculate_d_eff_detailed()` - Full eigenvalue analysis
- `calculate_beta()` - Collapse indicator
- `calculate_c_pair()` - Pairwise conveyance