# Spectrum Analysis of the Action of the Weight Matrix on the Buffer
The goal is to assess the **action** of the weight matrix on the buffer at specific layer in the transformer. We define the action as $Y\equiv V^TX$, with $V$ the matrix of the right singular vectors of the SVD of the weight matrix. The procedure can be summarized as follows:
1. From a set of randomly generate prompts of vector tokens with i.i.d. entries we compute the mean of the covariance matrices of the action at a layer. We call this object Activation Covariance Matrix (ACM)
2. Eigendecomposition of the ACM gives the principal direction of the mean of the actions. It can be shown that these eigenvectors correspond exactly to the projection of the principal directions of the mean of the prompts on the space spanned by the right singular vectors of $W$.
3. We call those principal directions as the overlap vectors components of the Projection Matrix (PM) as each overlap vector is mathematically equivalent to the normalized projection of the principal directions of the mean of the prompts on each right singular vector. We claim that, depending on the size of the corresponding singular value, a high overlap can be interepreted as a general significant action of the weight matrix on the buffer in the specifica layer of consideration.
4. In order to assess the statistical validity of the results we compute the Marchenko-Pastur distribution (MP) of the singular values of the weight matrix and expect to find deviations, i.e. outliers from MP bounds, corresponding higher overlaps values indicating significant action of the weight matrix on the buffer.

## Formalities

Let $d$ be the hidden (embedding) dimension of GPT-2, here $d=768$. Let a sample be a single prompt, we run $p$ random prompt batches, each of length $m$ tokens. 

Let $x_{i,t}^{(s,\ell)} \in\R^{n_{\ell}}$, for $i=(1,\dots,p),\quad t=(1,\dots,m)$, denote the input to sublayer $\ell$ of block $s$. Here $n_{\ell}$ is the dimensionality of the tokens in input to the $\ell$-th sublayer:

| Sublayer    | PyTorch module               | Input dim $n_{\ell}$ | Output dim $m_{\ell}$ |
|-------------|------------------------------|-------------------|--------------------|
| Query (q)   | `attn.c_attn[:, :d]`         | $n_q = d$       | $m_{q} = d$       |
| Key (k)     | `attn.c_attn[:, d:2*d]`      | $n_k = d$       | $m_{k} = d$        |
| Value (v)   | `attn.c_attn[:, 2*d:3*d]`    | $n_v = d$       | $m_{v} = d$        |
| Attn-out (a)| `attn.c_proj`                | $n_a = d$       | $m_{a} = d$        |
| MLP-up (u)  | `mlp.c_fc`                   | $n_u = d$       | $m_{u} = 4d$       |
| MLP-down (d)| `mlp.c_proj`                 | $n_d = 4d$      | $m_{d} = d$        |

For each $i$-th run we are able to extract the buffer to each sublayer and block.

### Weight-Matrix SVD

Each linear sublayer has weight matrix

$$
W^{(s,\ell)}\in\R^{m_{\ell}\times n_{\ell}}.
$$

We compute its SVD

$$
W^{(s,\ell)} = U^{(s,\ell)}\,\Sigma^{(s,\ell)}\,V^{(s,\ell)T},
$$
where
1.	$V^{(s,\ell)}\in\R^{n_{\ell}\times n_{\ell}}$ has columns
$\{v_k^{(s,\ell)}\}_{k=1}^{n_{\ell}}$, the right singular vectors.
2.	$\Sigma^{(s,\ell)}=\mathrm{diag}(\sigma_1\!\ge\!\sigma_2\!\ge\cdots\!\ge\!\sigma_{r_{\ell}})$ are the singular values.
3.	$U^{(s,\ell)}\in\R^{m_{\ell}\times m_{\ell}}$ has the left singular vectors.
4.  $r_{\ell}=\min(m_{\ell},n_{\ell})$ is the numerical rank.

As we only need the top $r_{\ell} \le n_{\ell}$ nonzero components of the eigenvalues of the ACM, we slice out the eigenvectors with $j$-th indices beyond $r_{\ell}$.

### Activation Covariance Matrix (ACM)

For each $(s,\ell)$, define the sample mean across buffers as
$$\bar x^{(s,\ell)} = \frac1{p\times m}\sum_{i,t}x_{i,t}^{(s,\ell)}.$$
The Activation Covariance Matrix (ACM) of the buffers is
$$F^{(s,\ell)}
=\frac{1}{p\times m}\sum_{i,t}
\bigl(x_{i,t}^{(s,\ell)}-\bar x^{(s,\ell)}\bigr)
\bigl(x_{i,t}^{(s,\ell)}-\bar x^{(s,\ell)}\bigr)^T
\;\in\;\R^{n_{\ell}\times n_{\ell}}.$$

Because $F^{(s,\ell)}$ is symmetric, it admits an eigendecomposition
$$F^{(s,\ell)}\,f_j^{(s,\ell)} = \lambda_j^{(s,\ell)}\,f_j^{(s,\ell)},
\quad
j=1,\dots,n_{\ell},$$
with eigenvalues
$\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_{n_{\ell}}\ge0$
and corresponding orthonormal eigenvectors $f_j^{(s,\ell)}$.

### Change-of-Basis Perspective
We now ask, for each $k=1,\dots,n_{\ell}$, how much the $k$-th (activation) axis of the $V$ decomposition of $W$ (i.e. $v_k$) aligns with the buffer samples intrinsic directions $\{f_j\}$. We can visualize this by defining the Action, i.e. the coordinate transform in input space:
$$
Y = V^{T}\,X, \in \R^{n_{\ell}\times m}
$$
where for simplicity we have dropped the $(s,\ell)$ superscripts.

1. In $Y$-coordinates, the covariance is
$$
\widetilde F
= \mathrm{Cov}(Y)
= V^T F V,
$$ a similarity of $F$.
2. Eigenpairs carry over resulting in eigendecomposition with eigenvectors
$
\widetilde f_j = V^T f_j.
$
If
$$
F f_j
= \lambda_j\,f_j,
$$
then
$$
\widetilde F\;
\bigl(V^T f_j\bigr)
= V^T F V V^T f_j
= V^T F f_j
= \lambda_j\,\bigl(V^T f_j \bigr).
$$
The eigenvectors of $\widetilde F$ are the projection of the eigen-vectors of $F$ on the space spanned by right singular vectors in $V$.Specifically $V^Tf_j$ gives the overlaps of each $v_k$ right-vector and the $j$-th eigenvector $f_j$.

### Overlap Definition
The Projection Matrix (PM) arises from the following computation:
$$
\widetilde{f}_j = V^T f_j = 
\begin{pmatrix}
\vec{v}_1^T \\
\vec{v}_2^T \\
\vdots \\
\vec{v}_n^T
\end{pmatrix}
\;
\begin{pmatrix}
\vec{f}_1 & \vec{f}_2 & \cdots & \vec{f}_n
\end{pmatrix}
\; = 
\begin{pmatrix}
\vec{v}_1^T\vec{f}_1 & \vec{v}_1^T\vec{f}_2 & \cdots & \vec{v}_1^T\vec{f}_n \\
\vdots               &                      &        & \vdots               \\
\vec{v}_n^T\vec{f}_1 & \vec{v}_n^T\vec{f}_2 & \cdots & \vec{v}_n^T\vec{f}_n 
\end{pmatrix}
\; = 
\begin{pmatrix}
\vec{O}_1^T \\
\vec{O}_2^T \\
\vdots \\
\vec{O}_n^T
\end{pmatrix},
$$
where the $\vec{O}_k$ vectors quanify the overlap .

Finally, we define the overlap as:
$$
O_k^{(s,\ell)}
=\max_{1\le j\le n_{\ell}}\;\bigl|\langle v_k^{(s,\ell)},\,f_j^{(s,\ell)}\rangle\bigr|,
$$
such that:
+ If $O_k\approx1$, then one activation‐axis $f_j$ lies essentially on $v_k$.
+ If $O_k\approx0$, then $v_k$ is almost orthogonal to all principal data‐axes, i.e. an unused direction.

## Marchenko-Pastur of singular values of $W$

We need a null model for the singular-value spectrum of a "random" weight matrix of the same shape as our trained $W$.  Any singular values that lie outside the theoretical MP bulk can be flagged as outliers, i.e.learned, data-driven directions.


Let $X\in\R^{m\times n}$ have i.i.d. entries with zero mean and variance $\sigma^2$. One can form the (scaled) sample covariance
$$
\begin{equation*}
C \;=\;\frac1n\,X\,X^T
\;\in\;\R^{m\times m}.
\end{equation*}
$$.

As $m,n\to\infty$ with the ratio $q = \frac{m}{n}\;\;(0<q\le1)$ fixed, the empirical eigenvalue distribution of $C$ converges to the Marchenko–Pastur law with support $\lambda_\pm^{(\rm cov)}\;=\;\sigma^2\bigl(1\pm\sqrt{q}\bigr)^2,$ meaning that nearly all eigenvalues of $C$ lie in
$$
\begin{equation*}
\bigl[\sigma^2(1-\sqrt q)^2,\;\sigma^2(1+\sqrt q)^2\bigr]
\end{equation*}
$$.

The nonzero singular values of $X$ are the square-roots of the nonzero eigenvalues of $XX^T$, i.e., let $\{\lambda_i\}$ be the eigenvalues of $C = \tfrac1n\,X X^T$,then the corresponding singular values of $X$ are $s_i(X)\;=\;\sqrt{\,n\,\lambda_i\,}\,$.
Thus the support of the singular-value distribution of $X$ is
$$
\begin{equation*}
s_\pm
=\sqrt{\,n\,\lambda_\pm^{(\rm cov)}\,}
=\sqrt{\,n\,\sigma^2\bigl(1\pm\sqrt q\bigr)^2\,}
=\sigma\;\sqrt n\;\bigl(1\pm\sqrt q\bigr).
\end{equation*}
$$.

Center $W$ and compute $\sigma^2=\tfrac1{mn}\sum_{i,j}W_{ij}^2$ as the empirical variance of $W_{centered}$. Set 
$$
\begin{equation*}
s_- = \sigma\bigl|\sqrt n - \sqrt m\bigr|,\quad
s_+ = \sigma\bigl(\sqrt n + \sqrt m\bigr),
\end{equation*}
$$
then any empirical singular values $s_k$ outside $[s_-,s_+]$ will be outliers relative to the random baseline for that weight matrix $W$.

Given that the (nonzero) eigenvalues $\{\lambda_i\}$ of $C$ in the large-$m,n$ limit have density
$$
\begin{equation*}
p_C(\lambda)
=\frac{1}{2\pi\,\sigma^2\,q\,\lambda}
\sqrt{(\lambda_+^{(\rm cov)}-\lambda)\,(\lambda-\lambda_-^{(\rm cov)})},
\end{equation*}
$$
and that the nonzero singular values $s_i$ of $W_{centered}$ relate by
$$
\begin{equation*}
s_i = \sqrt{\lambda_i},
\end{equation*}
$$
so the density $p_s(s)$ satisfies
$$
\begin{equation*}
p_s(s)\,ds \;=\; p_C(\lambda)\,d\lambda
\quad\text{with}\quad
\lambda = s^2,\quad d\lambda = 2s\,ds.
\end{equation*}
$$
then
$$
\begin{equation*}
p_s(s)
= p_C(s^2)\;\Bigl|\frac{d\lambda}{ds}\Bigr|
= 2s\;p_C(s^2)
= \frac{2s}{2\pi\,\sigma^2\,q\,s^2}
\sqrt{\bigl(\lambda_+^{(\rm cov)}-s^2\bigr)\,\bigl(s^2-\lambda_-^{(\rm cov)}\bigr)},
\end{equation*}
$$.
Then the Marchenko–Pastur distribution for each weight matrix follows:
$$
\begin{equation*}
\boxed{
p_s(s)
= \frac{1}{\pi\,\sigma^2\,q\,s}
\sqrt{\bigl(s_-^2 - s^2\bigr)\,\bigl(s^2- s_+^2\bigr)},
}
\end{equation*}
$$.
supported on $s\in[s_-,s_+]$.

By checking for right singular vectors whose associate values reside outside those boundaries we can assess in each layer which directions significantly affect the orientation of the incoming buffer.