Compare two or more models. Get some useful data / plots. Built on top of TransformerLens.
This is a hackaton-mode early alpha.
$ git clone https://github.com/johny-b/modiff.git
$ cd modiff
# (start a venv)
$ pip3 install -r requirements.txt
Tested on (python3.10.6, ubuntu22.04).
There is a set of open problems in mech interp that can be summarized as "build a model-comparing tool and test it" (6.49 - 6.58). This is a POC of such a tool.
Main two workflows:
- Exploration. You have a model and you want to compare its behaviour to behaviour of some other model.
- Hypothesis testing. You want to test some hypothesis and you can express it in terms of difference between two models.
example_induction_heads.py
- Compareattn-only-1l
andattn-only-2l
models. Guess which one has induction heads!example_pythia.py
- Compare earlypythia
models, after 512 and 1000 steps of learning:- Notice that induction heads appeared in the later one
- Find the suspected heads
example_brackets.py
- Take two copies of the bracket classifier model. Ablate a certain head in one of them. Compare their performance.
More details inside examples.