🪴 Planting language models, seeing how they grow etc.
Conda 🐍
conda env create -f conda.yaml
conda activate feature-dynamics
Dependencies 📦
pip install pipx
pipx install poetry
poetry install
Train decoder-only models.
poetry run python training/transformer/train.py <experiment.toml>
Train sparse autoencoder.
poetry run python training/autoencoder/train.py <experiment.toml>
Using TransformerLens's HookedTransformer
(specifically via my hacked fork*) to train sparse autoencoders.
* This one is required to hook custom Mistral models.
Evaluation of pretrained autoencoders.
This module contains functionality to make target models use autoencoder reconstructions in place of existing activations, by using a forward pass hook.
Interpolate model weights using Mergekit.
poetry run python interpolation/interpolate.py <experiment.toml>
Merge models using Mergekit. This part is just Mergekit.