# Reproducing "Empathetic Language Bandwidth in LLMs"

**Research:** Santarcangelo (2026) — [GitHub](https://github.com/marcosantar93/empathetic-language-bandwidth)

## Summary
This notebook reproduces the core analyses from our empathetic language bandwidth research. We measure how different LLMs encode empathy in their activation spaces, defining **bandwidth = dimensionality × steering_range**.

### Key Finding: 109% variation in empathetic bandwidth across 5 models
- Gemma-2-9B: highest bandwidth (136.6)  
- Mistral-7B: lowest bandwidth (36.3)
- Empathy bandwidth 2.8× higher than syntactic control on average

### Phase 2 Discovery
Cosine similarity between separately-trained probes reflects **classifier geometry**, not concept structure. AUROC and d-prime are the correct metrics.

**Requirements:** CPU is sufficient for analysis cells. GPU needed only for model loading (optional).
**Estimated time:** ~2 minutes (analysis only), ~30 minutes (with model loading).

In [None]:
# Install dependencies
!pip install -q numpy scipy scikit-learn matplotlib seaborn pandas

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import pandas as pd
import json
import warnings
warnings.filterwarnings('ignore')

# Style
plt.rcParams['figure.dpi'] = 100
sns.set_style('whitegrid')
print("Setup complete!")

In [None]:
# === Phase 1 Results ===
# From results/empathy/all_results_20260118_124215.json
# Inline for Colab reproducibility

RESULTS = {
    "gemma2-9b": {
        "model_path": "google/gemma-2-9b-it",
        "auroc": 0.95,
        "effective_rank": 16,
        "max_alpha": 8.538,
        "bandwidth": 136.608,
        "control_bandwidth": 52.377,
        "control_rank": 9,
        "control_range": 5.820,
        "sae_features": 15,
        "sae_agreement": True,
        "transfer_rate": 0.834,
    },
    "llama-3.1-8b": {
        "model_path": "meta-llama/Meta-Llama-3.1-8B-Instruct",
        "auroc": 0.874,
        "effective_rank": 14,
        "max_alpha": 9.069,
        "bandwidth": 126.962,
        "control_bandwidth": 48.001,
        "control_rank": 8,
        "control_range": 6.000,
        "sae_features": 16,
        "sae_agreement": True,
        "transfer_rate": 0.909,
    },
    "deepseek-r1-7b": {
        "model_path": "deepseek-ai/DeepSeek-R1-Distill-Llama-7B",
        "auroc": 0.856,
        "effective_rank": 11,
        "max_alpha": 8.365,
        "bandwidth": 92.013,
        "control_bandwidth": 34.684,
        "control_rank": 6,
        "control_range": 5.781,
        "sae_features": 10,
        "sae_agreement": True,
        "transfer_rate": 0.855,
    },
    "qwen2.5-7b": {
        "model_path": "Qwen/Qwen2.5-7B-Instruct",
        "auroc": 0.835,
        "effective_rank": 10,
        "max_alpha": 6.730,
        "bandwidth": 67.296,
        "control_bandwidth": 15.889,
        "control_rank": 3,
        "control_range": 5.296,
        "sae_features": 7,
        "sae_agreement": False,
        "transfer_rate": 0.918,
    },
    "mistral-7b": {
        "model_path": "mistralai/Mistral-7B-Instruct-v0.3",
        "auroc": 0.829,
        "effective_rank": 6,
        "max_alpha": 6.044,
        "bandwidth": 36.263,
        "control_bandwidth": 14.607,
        "control_rank": 3,
        "control_range": 4.869,
        "sae_features": 6,
        "sae_agreement": True,
        "transfer_rate": 0.852,
    },
}

# Create DataFrame
df = pd.DataFrame(RESULTS).T
df.index.name = 'model'
df = df.sort_values('bandwidth', ascending=False)

print("Phase 1 Results Summary:")
print(df[['auroc', 'effective_rank', 'max_alpha', 'bandwidth', 'control_bandwidth']].to_string())
print(f"\nBandwidth range: {df['bandwidth'].min():.1f} — {df['bandwidth'].max():.1f}")
print(f"Variation: {(df['bandwidth'].max() / df['bandwidth'].min() - 1) * 100:.0f}%")

In [None]:
# === Bandwidth Metric Definition ===
# bandwidth = effective_rank × max_steering_range
#
# Where:
# - effective_rank = number of PCA components for 90% variance (dimensionality)
# - max_steering_range = max |α| where coherence > 0.7 (steering range)

print("Bandwidth = Dimensionality × Steering Range")
print("=" * 55)
for model in df.index:
    rank = df.loc[model, 'effective_rank']
    alpha = df.loc[model, 'max_alpha']
    bw = df.loc[model, 'bandwidth']
    print(f"  {model:18s}: {int(rank):2d} dims × {alpha:.2f} range = {bw:7.1f} bandwidth")

# Verify calculation
print("\nVerification (rank × alpha ≈ bandwidth):")
for model in df.index:
    computed = df.loc[model, 'effective_rank'] * df.loc[model, 'max_alpha']
    actual = df.loc[model, 'bandwidth']
    print(f"  {model}: {computed:.1f} vs {actual:.1f} ({'✓' if abs(computed - actual) < 1 else '✗'})")

In [None]:
# === Bandwidth Comparison: Bar Chart ===

fig, axes = plt.subplots(1, 3, figsize=(16, 5))
models = df.index.tolist()
colors = sns.color_palette("viridis", len(models))

# Panel 1: Total bandwidth
bars = axes[0].bar(range(len(models)), df['bandwidth'], color=colors)
axes[0].set_xticks(range(len(models)))
axes[0].set_xticklabels(models, rotation=25, ha='right', fontsize=9)
axes[0].set_ylabel('Bandwidth (dim × range)')
axes[0].set_title('Empathetic Bandwidth by Model', fontweight='bold')
for i, v in enumerate(df['bandwidth']):
    axes[0].text(i, v + 2, f'{v:.0f}', ha='center', fontsize=9)

# Panel 2: Components (rank and range)
x = np.arange(len(models))
width = 0.35
axes[1].bar(x - width/2, df['effective_rank'], width, label='Dimensionality', color='steelblue')
axes[1].bar(x + width/2, df['max_alpha'], width, label='Steering Range (α_max)', color='coral')
axes[1].set_xticks(x)
axes[1].set_xticklabels(models, rotation=25, ha='right', fontsize=9)
axes[1].set_ylabel('Value')
axes[1].set_title('Bandwidth Components', fontweight='bold')
axes[1].legend()

# Panel 3: AUROC
axes[2].bar(range(len(models)), df['auroc'], color=colors)
axes[2].set_xticks(range(len(models)))
axes[2].set_xticklabels(models, rotation=25, ha='right', fontsize=9)
axes[2].set_ylabel('AUROC')
axes[2].set_title('Empathy Detection Accuracy', fontweight='bold')
axes[2].set_ylim(0.7, 1.0)
axes[2].axhline(y=0.90, color='red', linestyle='--', alpha=0.5, label='Target (0.90)')
axes[2].legend()

plt.tight_layout()
plt.show()

In [None]:
# === PCA Dimensionality Comparison ===

fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(models))
width = 0.35

empathy_ranks = df['effective_rank'].values
control_ranks = df['control_rank'].values

bars1 = ax.bar(x - width/2, empathy_ranks, width, label='Empathy Subspace', color='steelblue')
bars2 = ax.bar(x + width/2, control_ranks, width, label='Control (Syntactic)', color='lightcoral')

ax.set_xlabel('Model')
ax.set_ylabel('Effective Rank (PCA components for 90% variance)')
ax.set_title('Empathy vs Control Subspace Dimensionality', fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(models, rotation=15)
ax.legend()

# Add ratio annotations
for i in range(len(models)):
    ratio = empathy_ranks[i] / max(control_ranks[i], 1)
    ax.annotate(f'{ratio:.1f}×', (i, max(empathy_ranks[i], control_ranks[i]) + 0.5),
                ha='center', fontsize=9, color='darkgreen', fontweight='bold')

plt.tight_layout()
plt.show()

print("Empathy/Control dimensionality ratios:")
for model in df.index:
    ratio = df.loc[model, 'effective_rank'] / max(df.loc[model, 'control_rank'], 1)
    print(f"  {model}: {ratio:.1f}× (empathy {int(df.loc[model, 'effective_rank'])} vs control {int(df.loc[model, 'control_rank'])})")

In [None]:
# === Steering Range Analysis ===
# Max α before coherence drops below 0.7

fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(models))
width = 0.35

empathy_range = df['max_alpha'].values
control_range = df['control_range'].values

ax.bar(x - width/2, empathy_range, width, label='Empathy', color='steelblue')
ax.bar(x + width/2, control_range, width, label='Control (Syntactic)', color='lightcoral')

ax.set_xlabel('Model')
ax.set_ylabel('Max Steering Magnitude (α_max)')
ax.set_title('Steering Range: Empathy vs Control', fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(models, rotation=15)
ax.legend()
ax.set_ylim(0, 12)

plt.tight_layout()
plt.show()

print("Steering range comparison:")
for model in df.index:
    ratio = df.loc[model, 'max_alpha'] / max(df.loc[model, 'control_range'], 0.01)
    print(f"  {model}: empathy α={df.loc[model, 'max_alpha']:.2f}, control α={df.loc[model, 'control_range']:.2f} (ratio: {ratio:.2f}×)")

In [None]:
# === Empathy vs Control Bandwidth ===
# Key finding: empathy bandwidth averages 2.8× control bandwidth

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Panel 1: Side-by-side bandwidth
x = np.arange(len(models))
width = 0.35
axes[0].bar(x - width/2, df['bandwidth'], width, label='Empathy', color='steelblue')
axes[0].bar(x + width/2, df['control_bandwidth'], width, label='Control (Syntactic)', color='lightcoral')
axes[0].set_xlabel('Model')
axes[0].set_ylabel('Bandwidth (dim × range)')
axes[0].set_title('Empathy vs Control Bandwidth', fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(models, rotation=15)
axes[0].legend()

# Panel 2: Ratio scatter
ratios = df['bandwidth'] / df['control_bandwidth']
axes[1].bar(range(len(models)), ratios, color=colors)
axes[1].axhline(y=ratios.mean(), color='red', linestyle='--', label=f'Mean: {ratios.mean():.1f}×')
axes[1].set_xticks(range(len(models)))
axes[1].set_xticklabels(models, rotation=15)
axes[1].set_ylabel('Empathy / Control Ratio')
axes[1].set_title('Bandwidth Ratio (Empathy ÷ Control)', fontweight='bold')
axes[1].legend()

plt.tight_layout()
plt.show()

print(f"Mean empathy/control bandwidth ratio: {ratios.mean():.1f}×")
print(f"Range: {ratios.min():.1f}× — {ratios.max():.1f}×")
for model in df.index:
    r = df.loc[model, 'bandwidth'] / df.loc[model, 'control_bandwidth']
    print(f"  {model}: {r:.1f}× ({df.loc[model, 'bandwidth']:.1f} / {df.loc[model, 'control_bandwidth']:.1f})")

In [None]:
# === SAE Cross-Validation ===
# Compare PCA effective rank vs SAE active features

fig, ax = plt.subplots(figsize=(8, 6))

pca_ranks = df['effective_rank'].values
sae_features = df['sae_features'].astype(int).values
agreements = df['sae_agreement'].values

for i, model in enumerate(models):
    color = 'green' if agreements[i] else 'red'
    ax.scatter(pca_ranks[i], sae_features[i], s=150, c=color, edgecolors='black', zorder=5)
    ax.annotate(model, (pca_ranks[i] + 0.3, sae_features[i] + 0.3), fontsize=9)

# Perfect agreement line
max_val = max(max(pca_ranks), max(sae_features)) + 2
ax.plot([0, max_val], [0, max_val], 'k--', alpha=0.3, label='Perfect agreement')

# ±20% bands
ax.fill_between([0, max_val], [0, max_val * 0.8], [0, max_val * 1.2],
                alpha=0.1, color='green', label='±20% agreement zone')

ax.set_xlabel('PCA Effective Rank', fontsize=12)
ax.set_ylabel('SAE Active Features', fontsize=12)
ax.set_title('PCA vs SAE Dimensionality Validation', fontweight='bold')
ax.legend()
ax.set_xlim(0, max_val)
ax.set_ylim(0, max_val)

plt.tight_layout()
plt.show()

agreement_rate = sum(agreements) / len(agreements)
print(f"SAE-PCA agreement rate: {agreement_rate:.0%} ({sum(agreements)}/{len(agreements)} models)")
for model in df.index:
    status = "✓ Agree" if df.loc[model, 'sae_agreement'] else "✗ Disagree"
    print(f"  {model}: PCA={int(df.loc[model, 'effective_rank'])}, SAE={int(df.loc[model, 'sae_features'])} → {status}")

In [None]:
# === Cross-Context Transfer Test ===
# Empathy vectors extracted on crisis_support, tested on technical_assistance

fig, ax = plt.subplots(figsize=(10, 5))
transfer_rates = df['transfer_rate'].values

bars = ax.bar(range(len(models)), transfer_rates * 100, color=colors)
ax.axhline(y=90, color='red', linestyle='--', alpha=0.5, label='90% threshold')
ax.set_xticks(range(len(models)))
ax.set_xticklabels(models, rotation=15)
ax.set_ylabel('Transfer Success Rate (%)')
ax.set_title('Cross-Context Generalization\n(Crisis Support → Technical Assistance)', fontweight='bold')
ax.set_ylim(0, 105)
ax.legend()

for i, v in enumerate(transfer_rates):
    ax.text(i, v * 100 + 1, f'{v:.1%}', ha='center', fontsize=9)

plt.tight_layout()
plt.show()

print(f"Mean transfer rate: {df['transfer_rate'].mean():.1%}")
print(f"All models ≥ 83%: {'✓ Yes' if df['transfer_rate'].min() >= 0.83 else '✗ No'}")

In [None]:
# === Statistical Analysis ===

print("=" * 60)
print("STATISTICAL ANALYSIS")
print("=" * 60)

# 1. One-way ANOVA on bandwidth
bandwidths = df['bandwidth'].values
control_bws = df['control_bandwidth'].values

# Paired t-test: empathy vs control bandwidth
t_stat, p_value = stats.ttest_rel(bandwidths, control_bws)
print(f"\n1. Paired t-test (empathy vs control bandwidth):")
print(f"   t = {t_stat:.3f}, p = {p_value:.6f}")
print(f"   {'Significant' if p_value < 0.05 else 'Not significant'} at α=0.05")

# 2. Effect size (Cohen's d)
diff = bandwidths - control_bws
cohens_d = diff.mean() / diff.std()
print(f"\n2. Cohen's d (empathy vs control): {cohens_d:.2f}")
print(f"   Interpretation: {'Large' if abs(cohens_d) >= 0.8 else 'Medium' if abs(cohens_d) >= 0.5 else 'Small'} effect")

# 3. Correlation: bandwidth components
r_rank_range, p_rr = stats.pearsonr(df['effective_rank'].values, df['max_alpha'].values)
print(f"\n3. Correlation (rank vs steering range): r={r_rank_range:.3f}, p={p_rr:.4f}")

# 4. Bandwidth variance
cv = df['bandwidth'].std() / df['bandwidth'].mean()
print(f"\n4. Coefficient of variation (bandwidth): {cv:.2f} ({cv*100:.0f}%)")

# 5. Ranking
print(f"\n5. Model Ranking (by bandwidth):")
for i, (model, row) in enumerate(df.iterrows(), 1):
    print(f"   {i}. {model}: {row['bandwidth']:.1f}")

# 6. Range and fold-change
print(f"\n6. Summary:")
print(f"   Max bandwidth: {df['bandwidth'].max():.1f} (gemma2-9b)")
print(f"   Min bandwidth: {df['bandwidth'].min():.1f} (mistral-7b)")
print(f"   Fold change: {df['bandwidth'].max() / df['bandwidth'].min():.1f}×")
print(f"   Mean empathy/control ratio: {(bandwidths / control_bws).mean():.1f}×")

## Phase 2: Tripartite Decomposition Findings

Phase 2 investigated whether empathy decomposes into distinct subspaces: **Cognitive** (perspective-taking), **Affective** (emotional resonance), and **Instrumental** (problem-solving).

### Key Findings

| Hypothesis | Status | Evidence |
|------------|--------|----------|
| H1: Separation (cosine) | **Artifact** | Cosine between separate probes reflects classifier geometry |
| H2: Classification | **CONFIRMED** | AUROC = 1.0 across all models |
| H3: Consistency | **CONFIRMED** | 4/4 models show empathy structure |
| H5: Layer Emergence | **CONFIRMED** | Empathy emerges at Layer 1 |
| H7: Effect Size | **CONFIRMED** | d-prime consistent (~1.75) across models |
| H8: Multi-class | **CONFIRMED** | 89.3% 3-way accuracy (vs 33% chance) |
| H11: Causal | **CONFIRMED** | 6/6 intervention criteria met |

### Critical Methodological Discovery

**Cosine similarity between separately-trained probes reflects classifier geometry, NOT concept structure.**

- Probes achieve AUROC=1.0 yet show *worse than random* on cosine metric (Z=+12.9)
- This is specific to comparing weights of separately-trained classifiers
- AUROC, d-prime, and clustering purity correctly measure concept separability

### Causal Intervention Results
```
+Cognitive:    12.8% → 91.5% empathy probability
+Affective:    12.8% → 89.1% empathy probability
+Instrumental: 12.8% → 84.4% empathy probability
```

All three empathy subtype directions are **causally meaningful** — activating them shifts model behavior toward empathetic responses.

## Conclusions

### Phase 1: Empathetic Bandwidth
1. **109% variation** in empathetic bandwidth across 5 models (7B-9B scale)
2. **Ranking:** Gemma-2-9B (136.6) > Llama-3.1-8B (127.0) > DeepSeek-R1 (92.0) > Qwen2.5-7B (67.3) > Mistral-7B (36.3)
3. **Empathy-specific:** Empathy bandwidth averages **2.8× higher** than syntactic control
4. **SAE validation:** 4/5 models show PCA-SAE agreement
5. **Transferable:** 83-92% cross-context generalization

### Phase 2: Tripartite Decomposition
1. Empathy **is** linearly encoded (AUROC = 0.98-1.0)
2. Structure is **universal** across architectures (1.1B-7B)
3. Emerges at **Layer 1** and persists throughout
4. Subtypes are **causally meaningful** (70%+ probability shifts)
5. **Methodological discovery:** Cosine similarity inappropriate for comparing separately-trained probes

### Implications for AI Safety
- **Detectable:** Linear probes achieve perfect accuracy on empathy subtypes
- **Steerable:** Distinct empathy directions can be targeted for intervention  
- **Generalizable:** Findings transfer across models and scales

### Citation
```
@article{santarcangelo2026empathy,
  title={Empathetic Language Bandwidth in LLMs: Measuring Dimensional Capacity for Emotional Response},
  author={Santarcangelo, Marco},
  year={2026}
}
```

**Full code and data:** [github.com/marcosantar93/empathetic-language-bandwidth](https://github.com/marcosantar93/empathetic-language-bandwidth)