# Centroid Angle: Dead vs Live

What's the angle between the centroid of the dead tokens and the centroid of the live tokens?

If they started together and dead tokens just drifted less, you'd expect ~0°.

If dead tokens drifted in a different direction (pushed by h), you'd expect a nonzero angle.

---

In [1]:
import torch
from safetensors.torch import load_file
import math

In [2]:
tensors = load_file("../tensors/qwen3_4b_instruct_2507.safetensors")
W = tensors["W"].float()  # [151936, 2560]
black_hole_mask = tensors["black_hole_mask"]  # [151936] bool

W_dead = W[black_hole_mask]
W_live = W[~black_hole_mask]

print(f"Dead tokens: {W_dead.shape[0]:,}")
print(f"Live tokens: {W_live.shape[0]:,}")

Dead tokens: 2,100
Live tokens: 149,836


In [3]:
# Compute centroids
centroid_dead = W_dead.mean(dim=0)  # [2560]
centroid_live = W_live.mean(dim=0)  # [2560]

print(f"Dead centroid norm: {torch.norm(centroid_dead):.6f}")
print(f"Live centroid norm: {torch.norm(centroid_live):.6f}")

Dead centroid norm: 0.370917
Live centroid norm: 0.304392


In [4]:
# Compute angle between them
cos_sim = torch.nn.functional.cosine_similarity(
    centroid_dead.unsqueeze(0), 
    centroid_live.unsqueeze(0)
).item()

angle_rad = math.acos(cos_sim)
angle_deg = math.degrees(angle_rad)

print(f"\nCosine similarity: {cos_sim:.6f}")
print(f"Angle: {angle_deg:.1f}°")


Cosine similarity: 0.894030
Angle: 26.6°


## For Reference: What About the Black Hole Centroids?

In [5]:
# The 13 black hole centroids
bh_centroids = tensors["black_hole_centroids"].float()  # [13, 2560]

# Mean of the 13 black hole centroids (should be very close to centroid_dead)
bh_mean = bh_centroids.mean(dim=0)

# But actually, centroid_dead is weighted by population...
# The black hole centroids are the unique vectors, not population-weighted.
# Let's check how close they are.

cos_sim_check = torch.nn.functional.cosine_similarity(
    centroid_dead.unsqueeze(0),
    bh_mean.unsqueeze(0)
).item()

print(f"Cosine similarity between:")
print(f"  - Mean of all 2,100 dead token vectors")
print(f"  - Mean of 13 black hole centroid vectors (unweighted)")
print(f"  = {cos_sim_check:.6f}")
print()
print(f"(Should be ~1.0 if the black holes dominate the dead token population)")

Cosine similarity between:
  - Mean of all 2,100 dead token vectors
  - Mean of 13 black hole centroid vectors (unweighted)
  = 1.000002

(Should be ~1.0 if the black holes dominate the dead token population)


In [6]:
# Angle of each black hole to the live centroid
print("Angle from each black hole centroid to live token centroid:\n")

for i in range(13):
    bh_vec = bh_centroids[i]
    cos_sim_i = torch.nn.functional.cosine_similarity(
        bh_vec.unsqueeze(0),
        centroid_live.unsqueeze(0)
    ).item()
    angle_i = math.degrees(math.acos(cos_sim_i))
    print(f"  BH{i+1:02d}: {angle_i:.1f}°")

Angle from each black hole centroid to live token centroid:

  BH01: 26.6°
  BH02: 26.6°
  BH03: 26.6°
  BH04: 26.6°
  BH05: 26.6°
  BH06: 26.6°
  BH07: 26.6°
  BH08: 26.6°
  BH09: 26.6°
  BH10: 26.6°
  BH11: 26.6°
  BH12: 26.6°
  BH13: 26.6°


## Summary

In [7]:
print("="*60)
print(f"Angle between dead and live centroids: {angle_deg:.1f}°")
print("="*60)
print()
print("The dead tokens are NOT along the same ray as the live tokens.")
print("They drifted in a different direction—likely pushed by h.")

Angle between dead and live centroids: 26.6°

The dead tokens are NOT along the same ray as the live tokens.
They drifted in a different direction—likely pushed by h.
