# Glitchlings Metrics Framework Tutorial

This notebook demonstrates how to use the metrics framework to analyze text transformation effects across different tokenizers.

## Installation

```bash
pip install glitchlings[metrics,metrics-tokenizers,metrics-viz]
```

## Part 1: Computing Metrics for a Single Transformation

Let's start with the basics: computing metrics for one text transformation.

In [None]:
from glitchlings.metrics.metrics import create_default_registry

# Create a registry with all 14 metrics
registry = create_default_registry()

print(f"Loaded {len(registry.specs)} metrics:")
for metric_id in sorted(registry.specs.keys()):
    spec = registry.specs[metric_id]
    print(f"  - {metric_id:15s}: {spec.name}")

In [None]:
# Define a simple transformation (swap adjacent tokens)
before = [1, 2, 3, 4, 5]  # Token IDs
after = [1, 3, 2, 4, 5]   # Tokens 2 and 3 swapped

# Compute all metrics
results = registry.compute_all(before, after, context={})

# Display results
print("Metric Results:")
print("-" * 50)
for key, value in sorted(results.items()):
    print(f"{key:20s}: {value:8.4f}")

### Understanding the Results

- **ned.value** (Normalized Edit Distance): How different are the sequences?
- **lcsr.value** (LCS Retention): What fraction of tokens remain in order?
- **rord.value** (Reordering): How much reordering occurred?
- **pmr.value** (Position Match Rate): How many tokens stayed at their original positions?

## Part 2: Real Text with Tokenizers

Now let's analyze how a real glitchling affects tokenized text.

In [None]:
from glitchlings.metrics.core.tokenizers import SimpleTokenizer

# Define a glitchling
def typogre(text: str) -> str:
    """Swap 'th' with 'ht'."""
    return text.replace("th", "ht").replace("TH", "HT")

# Create tokenizer
tokenizer = SimpleTokenizer()

# Test text
text_before = "The quick brown fox jumps over the lazy dog."
text_after = typogre(text_before)

print(f"Before: {text_before}")
print(f"After:  {text_after}")
print()

# Tokenize
tokens_before = tokenizer.encode(text_before)
tokens_after = tokenizer.encode(text_after)

print(f"Tokens before ({len(tokens_before)}): {tokens_before}")
print(f"Tokens after ({len(tokens_after)}):  {tokens_after}")

In [None]:
# Compute metrics
results = registry.compute_all(tokens_before, tokens_after, context={})

# Display key metrics
print("Key Metrics for Typogre Transformation:")
print("-" * 50)
print(f"Edit Distance (NED):  {results['ned.value']:.3f}")
print(f"LCS Retention:        {results['lcsr.value']:.3f}")
print(f"Reordering Score:     {results['rord.value']:.3f}")
print(f"Length Ratio:         {results['lr.value']:.3f}")
print(f"JS Divergence:        {results['jsdiv.value']:.3f}")

## Part 3: Comparing Multiple Tokenizers

Different tokenizers may respond differently to the same transformation.

In [None]:
try:
    from glitchlings.metrics.core.tokenizers import create_huggingface_adapter
    
    # Create multiple tokenizers
    tokenizers = [
        SimpleTokenizer(),
        create_huggingface_adapter("gpt2"),
        create_huggingface_adapter("bert-base-uncased"),
    ]
    
    print("Loaded tokenizers:")
    for tok in tokenizers:
        print(f"  - {tok.name}")
    
except ImportError:
    print("HuggingFace tokenizers not available.")
    print("Install with: pip install glitchlings[metrics-tokenizers]")
    tokenizers = [SimpleTokenizer()]

In [None]:
import pandas as pd

# Compare tokenizers
text = "The theory of computation is fundamental."

comparison_results = []
for tokenizer in tokenizers:
    tokens_before = tokenizer.encode(text)
    tokens_after = tokenizer.encode(typogre(text))
    
    results = registry.compute_all(tokens_before, tokens_after, context={})
    
    comparison_results.append({
        "Tokenizer": tokenizer.name,
        "Tokens Before": len(tokens_before),
        "Tokens After": len(tokens_after),
        "NED": results["ned.value"],
        "LCSR": results["lcsr.value"],
        "RORD": results["rord.value"],
    })

df = pd.DataFrame(comparison_results)
print("\nTokenizer Comparison:")
print(df.to_string(index=False))

**Observation**: Different tokenizers may show different sensitivity to the same transformation!

## Part 4: Batch Processing

For serious analysis, use batch processing to handle many texts efficiently.

In [None]:
from glitchlings.metrics.core.batch import process_and_write
import tempfile
from pathlib import Path

# Sample texts
texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Machine learning enables computers to learn from data.",
    "The theory of computation studies algorithmic complexity.",
    "Natural language processing analyzes human language.",
    "Deep neural networks can model complex patterns.",
]

# Create temporary output directory
output_dir = Path(tempfile.mkdtemp()) / "metrics_results"
output_dir.mkdir()

print(f"Output directory: {output_dir}")

# Process batch
manifest = process_and_write(
    texts=texts,
    glitchling_fn=typogre,
    glitchling_id="typogre",
    registry=registry,
    tokenizers=tokenizers,
    output_dir=output_dir,
    partition_by=["tokenizer_id"],  # One file per tokenizer
)

print(f"\n✓ Processed {manifest.num_observations} observations")
print(f"  Run ID: {manifest.run_id}")
print(f"  Tokenizers: {', '.join([t.split(':')[0] for t in manifest.tokenizers])}")

In [None]:
# List generated files
parquet_files = list(output_dir.rglob("*.parquet"))
print(f"Generated {len(parquet_files)} Parquet files:")
for pf in parquet_files:
    print(f"  - {pf.relative_to(output_dir)}")

## Part 5: Loading and Analyzing Results

Load observations from Parquet files for analysis.

In [None]:
from glitchlings.metrics.viz import load_observations_from_parquet
from glitchlings.metrics.viz.aggregate import aggregate_observations

# Load all observations
all_observations = []
for pf in parquet_files:
    obs = load_observations_from_parquet(pf)
    all_observations.extend(obs)

print(f"Loaded {len(all_observations)} observations")
print(f"\nFirst observation:")
obs = all_observations[0]
print(f"  Glitchling: {obs.glitchling_id}")
print(f"  Tokenizer: {obs.tokenizer_id}")
print(f"  Tokens: {obs.m} -> {obs.n}")
print(f"  Metrics: {list(obs.metrics.keys())[:5]}...")

In [None]:
# Aggregate by tokenizer
agg_results = aggregate_observations(
    all_observations,
    group_by=["tokenizer_id"],
    metrics=["ned.value", "lcsr.value", "jsdiv.value"],
)

# Convert to DataFrame for display
summary_data = []
for result in agg_results:
    summary_data.append({
        "Tokenizer": result["tokenizer_id"],
        "NED (mean)": result["metric_ned.value"]["mean"],
        "NED (std)": result["metric_ned.value"]["std"],
        "LCSR (mean)": result["metric_lcsr.value"]["mean"],
        "LCSR (std)": result["metric_lcsr.value"]["std"],
    })

df_summary = pd.DataFrame(summary_data)
print("\nSummary Statistics by Tokenizer:")
print(df_summary.to_string(index=False, float_format="%.3f"))

## Part 6: Visualizations

Now let's create visualizations to understand the patterns.

### 6.1 Radar Chart (Transformation Fingerprint)

In [None]:
try:
    from glitchlings.metrics.viz import create_radar_chart
    
    # Aggregate all metrics for the glitchling
    agg_all = aggregate_observations(
        all_observations,
        group_by=["glitchling_id"],
    )
    
    # Extract mean values
    metrics = {
        k.replace("metric_", ""): v["mean"]
        for k, v in agg_all[0].items()
        if k.startswith("metric_") and ".value" in k
    }
    
    # Create radar chart
    fig = create_radar_chart(
        metrics,
        title="Typogre Transformation Fingerprint",
        backend="plotly",  # Interactive
        normalization="percentile",
    )
    
    fig.show()
    
except ImportError as e:
    print(f"Visualization libraries not available: {e}")
    print("Install with: pip install glitchlings[metrics-viz]")

### 6.2 Heatmap (Metric Grid)

In [None]:
try:
    from glitchlings.metrics.viz import create_heatmap
    
    fig = create_heatmap(
        all_observations,
        metric="ned.value",
        row_key="input_id",
        col_key="tokenizer_id",
        title="Edit Distance by Input × Tokenizer",
        backend="plotly",
        aggregation="median",
    )
    
    fig.show()
    
except ImportError as e:
    print(f"Heatmap not available: {e}")

### 6.3 Metric Space Embedding (UMAP)

In [None]:
try:
    from glitchlings.metrics.viz import create_embedding_plot
    
    if len(all_observations) >= 3:  # Need at least 3 for UMAP
        fig = create_embedding_plot(
            all_observations,
            method="umap",
            color_by="tokenizer_id",
            title="Metric Space (UMAP)",
            backend="plotly",
            n_neighbors=min(5, len(all_observations) - 1),
        )
        
        fig.show()
    else:
        print("Need at least 3 observations for UMAP")
    
except ImportError as e:
    print(f"UMAP not available: {e}")
    print("Install with: pip install umap-learn")

## Part 7: Config-Driven Rendering

For reproducible research, define visualizations in configuration files.

In [None]:
from glitchlings.metrics.viz import FigureConfig, render_figure

# Define figure configuration
config = FigureConfig(
    figure_type="heatmap",
    title="Edit Distance Heatmap",
    params={
        "metric": "ned.value",
        "row_key": "input_id",
        "col_key": "tokenizer_id",
        "backend": "plotly",
    }
)

# Render from config
try:
    fig = render_figure(config, all_observations)
    fig.show()
except Exception as e:
    print(f"Could not render: {e}")

## Summary

You've learned:

1. ✅ How to compute metrics for transformations
2. ✅ How to compare multiple tokenizers
3. ✅ How to use batch processing for scale
4. ✅ How to load and aggregate results
5. ✅ How to create various visualizations
6. ✅ How to use config-driven rendering

### Next Steps

- Explore more glitchlings from the zoo
- Try different tokenizers (tiktoken, custom tokenizers)
- Define custom metrics for your use case
- Create metric lenses to focus on specific aspects
- Use sparklines to analyze length sensitivity

### Documentation

- **README**: `docs/metrics-framework-README.md`
- **Planning Doc**: `docs/metrics-framework-plan.md`
- **Acceptance Tests**: `docs/metrics-acceptance-tests.md`
- **Example Script**: `examples/metrics_complete_example.py`