# Mapping Residual Space

Logit Lens and Tuned Lens show a representation of the residual space of a transformer based on the unembedding vectors for tokens. Both show that this representation is in some way meaningful beyond the first/last layers. [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html) indicates that this representation may approximate the bigram log-likelihood for tokens. 

The residual space is also shaped by the training goals of attracting attention from later layers, and of contributing to the residual space in later layers. 

Output vectors are in a high dimensional space (512 for the small Pythia-70m model). Due to the high number of dimensions, random vectors will be orthoganal to each other. 


import torch
# Assuming you have a tensor named 'tensor' of size [n]
mean = torch.mean(tensor)
std = torch.std(tensor)


In [None]:
random_tensor = torch.randn_like(tensor) * std + mean


In [None]:
torch.dot(v1,v2)


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Define the two vectors
v1 = np.array([3, 4])
v2 = np.array([1, 2])

# Calculate the dot product
dot_product = np.dot(v1, v2)

# Calculate the magnitudes of the vectors
magnitude_v1 = np.linalg.norm(v1)
magnitude_v2 = np.linalg.norm(v2)

# Calculate the angle between the vectors in radians
angle_rad = np.arccos(dot_product / (magnitude_v1 * magnitude_v2))

# Convert the angle to degrees
angle_deg = np.degrees(angle_rad)

# Create a plot
fig, ax = plt.subplots()

# Plot the vectors
ax.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='r', label='v1')
ax.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='b', label='v2')

# Set the x and y limits
ax.set_xlim([-1, 4])
ax.set_ylim([-1, 4])

# Add a legend
ax.legend()

# Add a title
ax.set_title(f"Angle between v1 and v2: {angle_deg:.2f} degrees")

# Show the plot
plt.show()
