In [1]:
from modular_addition_interpretation import *

In [2]:
def update_interp_tools(*args, **kwargs):
    interp_tools = generate_interpretation_tools(*args, **kwargs)
    for k, v in interp_tools.items():
        globals()[k] = v

# Encoding of Inputs

Claim: the embedding represents each number $x$ with a sparse Fourier representation: $cos(w_i x)$ and $sin(w_i x)$ for frequencies $w_i = {2k \pi \over P}$ where $k \in \{14, 35, 41, 42, 52\}$ and $P = 113$.

### Unrounded

In [3]:
update_interp_tools(decimals=None)

In [4]:
get_prefix_equivalence_eps(0, return_match_rate=True)

{'epsilon (confidence 0.95)': 1.0, 'disagreement rate': 1.0}

In [5]:
get_component_equivalence_eps(0)

1.0

In [6]:
get_prefix_replaceability_eps(0)

0.0003350744321373167

In [7]:
get_component_replaceability_eps(0)

0.0003350744321373167

### Rounded to 3 decimal points

In [8]:
update_interp_tools()

In [9]:
get_prefix_equivalence_eps(0, return_match_rate=True)

{'epsilon (confidence 0.95)': 0.0003350744321373167, 'disagreement rate': 0.0}

In [10]:
get_component_equivalence_eps(0)

0.0003350744321373167

In [11]:
get_prefix_replaceability_eps(0)

0.0003350744321373167

# Sum of Angles

Claim: the attention mechanism and MLP input layers compute $cos(w_i(a+b))$ and $sin(w_i(a+b))$ using trigonometric identities for each of the $w_i$.

### Equality

In [12]:
update_interp_tools(equivalence_class=EqualityEquivalenceClass)

In [13]:
get_prefix_equivalence_eps(1)

1.0

In [14]:
get_component_equivalence_eps(1)

1.0

In [15]:
get_prefix_replaceability_eps(1)

0.0003350744321373167

In [16]:
get_component_replaceability_eps(1)

0.0003350744321373167

### Concrete and Abstract Downstream Equivalence

In [17]:
update_interp_tools(equivalence_class=ConcreteAndAbstractEquivalenceClass)

In [18]:
get_prefix_equivalence_eps(1)

0.0003350744321373167

In [19]:
get_component_equivalence_eps(1)

0.0003350744321373167

In [20]:
get_prefix_replaceability_eps(1)

0.0003350744321373167

In [21]:
get_component_replaceability_eps(1)

0.0003350744321373167

### Rounding to a single decimal place

In [22]:
update_interp_tools(equivalence_class=RoundedEquivalenceClass)

In [23]:
get_prefix_equivalence_eps(1)

1.0

In [24]:
get_component_equivalence_eps(1)

1.0

In [25]:
get_prefix_replaceability_eps(1)

0.0003350744321373167

In [26]:
get_component_replaceability_eps(1)

0.0003350744321373167

# Difference of Angles + Argmax

Claim: the MLP output layer and the unembedding matrix compute $cos(w_i(a+b - c))$ using trigonometric identities for each of the $w_i$ and for each $c \in \mathbb{Z}_P$. Then these values are then grouped by $c$ and added; the final output is the $c*$ which maximizes the summed values. 

As analyzing the steps independently requires a canonical decomposition of the unembedding matrix, we analyze the composition of these two steps as the final abstract component.

In [27]:
get_prefix_equivalence_eps(2)

0.0003350744321373167

In [28]:
get_component_equivalence_eps(2)

0.0003350744321373167

In [29]:
get_prefix_replaceability_eps(2)

0.0003350744321373167

In [30]:
get_component_replaceability_eps(2)

0.0003350744321373167