Representation Engineering

Experiments with representation engineering. There's been a bunch of recent work (1, 2, 3) into using a neural network's latent representations to control & interpret models.

This repository contains utilities for running experiments (the repeng package) and a bunch of experiments (the notebooks in experiments).

Installation

git clone https://github.com/mishajw/repeng
cd repeng
pip install -e .
# Or if using poetry:
poetry install

Reproducing experiments

How well do truth probes generalise?

Report.

Install the repository, as described above.
Optional: Check out c99e9aa. This shouldn't be necessary, unless I introduce breaking changes.
Create a dataset of activations: python experiments/comparison_dataset.py.
- This will upload the experiments to S3. Some tinkering may be required to change the upload location - sorry about that!
Run the analysis: python experiments/comparison.py.
- This will write plots to ./output/comparison.

This is split into two scripts as only the first requires a GPU for LLM inference.

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.github/workflows		.github/workflows
experiments		experiments
repeng		repeng
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Representation Engineering

Installation

Reproducing experiments

How well do truth probes generalise?

About

Releases

Packages

Languages

mishajw/repeng

Folders and files

Latest commit

History

Repository files navigation

Representation Engineering

Installation

Reproducing experiments

How well do truth probes generalise?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages