Simple starter code for experiments on open-source LLMs. Built for my SPAR project participants, but anyone is welcome to use it.
# optional: create a virtual environment
python3 -m venv venv
source venv/bin/activate
# run from the root of the repo, this will install everything you need
pip install -e .
To download Llama models from huggingface and/or use Claude API, add a .env
file in the root of the repo with your API keys (see .env.example
).
All code is in lmexp/
Example data and generation scripts using Claude API.
Example Llama 3 fine-tuning implementation. Quantizes to 8-bit. You may also want to try LoRA / PEFT methods / torchtune. Meta's fine-tuning example code can be found here.
Implementation of model-internals techniques like CAA and linear probing in terms of an abstract HookedModel
class. An extended class, SteerableModel
, is also provided for techniques that require modifying the model's activations.
See models/implementations/gpt2small.py
for an example of how to use this class. The idea is that we can write a single implementation of a technique, and then apply it to any model we want. Note that this is very similar to the TransformerLens paradigm but pared down a lot to just provide the functionality we're likely to use. Feel free to use TransformerLens if you want more features.
Model implementations. Currently has:
- Gemma 2
- Llama 3.1
- Qwen 1.5
- GPT2 (useful for testing locally)
Jupyter notebooks demonstrating basic use-cases.
- Integrate with Gemma 2 SAEs / SAE feature steering
- Port over all the experiments / plotting code from CAA repo
- More contrast pair datasets