Skip to content

raphael-goodfire/glp

 
 

Repository files navigation

Learning a Generative Meta-Model of LLM Activations

Grace Luo, Jiahai Feng, Trevor Darrell, Alec Radford, Jacob Steinhardt

This repository contains the PyTorch implementation of the paper "Learning a Generative Meta-Model of LLM Activations". The code walks through our proposed method for training an activation diffusion model, and using it for applications like on-manifold steering and scalar probing. We call this model a GLP, or Generative Latent Prior.

[Project Page][arXiv]

Compute

🌟 TLDR: Most of the scripts in this README take less than 24GB of VRAM, so they should fit on an Nvidia RTX 4090.

We want everyone to have a chance to try our models out, even in this economy. All of our released GLPs were trained on a billion FineWeb activations using two Nvidia A100 80GB GPUs (one for activation caching and the other for training), but with some ingenuity you can probably make it work on smaller GPUs too.

Setup

This code was tested with Python 3.11. To set up the environment, please run:

conda env create -f environment.yaml
conda activate glp
pip install vllm==0.9.2 
pip install transformers==4.47.0
pip install -e .

You'll need to do the installation in the exact order above, and ignore any pip warnings. We used this exact setup, which was the only way we could get vllm/nnsight/transformers to work together.

Pre-Trained Weights

You can view all the weights on our HuggingFace page.

🌟 TLDR: For a quickstart, run

from glp.denoiser import load_glp
model = load_glp("generative-latent-prior/glp-llama8b-d6", device="cuda:0", checkpoint="final")

This grabs our main GLP trained on Llama8B-Base activations.

Llama8B Link
glp-llama8b-d6 Link

If you're interested in diving deeper and studying scaling behavior, we also provide Llama1B-Base GLPs and all intermediate checkpoints.

Llama1B Link
glp-llama1b-d3 Link
glp-llama1b-d6 Link
glp-llama1b-d12 Link
glp-llama1b-d24 Link
glp-llama1b-d12-multi Link

Unless otherwise specified, GLPs are trained on the middlemost layer (Layer 15 for Llama8B, Layer 07 for Llama1B). We also provide a multi-layer GLP trained on all Layers 00-15 of Llama1B, called glp-llama1b-d12-multi. You can also directly transfer these GLPs, which were trained on Base models, onto Instruct models, as shown in the paper.

Note: Each intermediate checkpoint is labeled by "epoch," which corresponds to 1M activations. This means epoch_1024 was trained on 1024M ≈ 1B activations (and final is the same as epoch_1024). We use the term "epoch" loosely; in reality we stream data without repetition (so no activation is seen twice).

Demo

🌟 TLDR: For a quickstart, walk through our demo notebook at glp_demo.ipynb.

In the demo, we'll walk through loading a GLP, generating activations, then using it for on-manifold steering.

Applications

  • Scalar 1-D Probing: Evaluate on the 113 binary classification datasets from Kantamneni et. al., 2025, by running python3 glp/script_probe.py.
  • On-Manifold Steering: Post-process Persona Vectors by following the instructions at integrations/persona_vectors/README.md.

Note: In the paper, we use the variable t to denote the timestep. In the codebase, we follow the diffusers scheduler convention and use u = 1 - t instead.

Training

🌟 TLDR: For a quickstart, train a toy Llama1B GLP in a few minutes.

# download data
huggingface-cli download generative-latent-prior/llama1b-layer07-fineweb-1M \
    --repo-type dataset  \
    --local-dir data/llama1b-layer07-fineweb-1M \
    --local-dir-use-symlinks False
# launch training
conda activate glp
python3 glp_train.py config=configs/train_llama1b_static.yaml

Currently training is pre-set to a small static sanity dataset with 1M activations, representing the first 1M activations of the full dynamic dataset. Even on this small dataset, you should see a beautiful loss curve that just goes down. You can also download the Llama8B sanity dataset. Training on the full one billion activations takes 5.6 days for the Llama8B GLP.

Roadmap

Currently this codebase is in its initial release. All features marked as complete below are stable and ready to use. The others are still in progress.

  • Release pre-trained GLP weights
  • Release training code at glp_train.py
  • Release Persona Vectors steering at integrations/persona_vectors
  • Release 1-D probing at glp/script_probe.py
  • Release dynamic producer-consumer data pipeline at glp_save.py

Citing

@article{luo2026glp,
  title={Learning a Generative Meta-Model of LLM Activations},
  author={Grace Luo and Jiahai Feng and Trevor Darrell and Alec Radford and Jacob Steinhardt},
  journal={arXiv preprint arXiv:2602.06964},
  year={2026}
}

About

Official PyTorch Implementation for Learning a Generative Meta-Model of LLM Activations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 70.6%
  • Python 28.3%
  • Shell 1.1%