Experiments with representation engineering. There's been a bunch of recent work (1, 2, 3) into using a neural network's latent representations to control & interpret models.
This repository contains utilities for running experiments (the repeng
package) and a bunch of experiments (the notebooks in experiments
).
git clone https://github.com/mishajw/repeng
cd repeng
pip install -e .
# Or if using poetry:
poetry install
- Install the repository, as described above.
- Optional: Check out
c99e9aa
. This shouldn't be necessary, unless I introduce breaking changes. - Create a dataset of activations:
python experiments/comparison_dataset.py
.- This will upload the experiments to S3. Some tinkering may be required to change the upload location - sorry about that!
- Run the analysis:
python experiments/comparison.py
.- This will write plots to
./output/comparison
.
- This will write plots to
This is split into two scripts as only the first requires a GPU for LLM inference.