Grace Luo, Jiahai Feng, Trevor Darrell, Alec Radford, Jacob Steinhardt
This repository contains the PyTorch implementation of the paper "Learning a Generative Meta-Model of LLM Activations". The code walks through our proposed method for training an activation diffusion model, and using it for applications like on-manifold steering and scalar probing. We call this model a GLP, or Generative Latent Prior.
🌟 TLDR: Most of the scripts in this README take less than 24GB of VRAM, so they should fit on an Nvidia RTX 4090.
We want everyone to have a chance to try our models out, even in this economy. All of our released GLPs were trained on a billion FineWeb activations using two Nvidia A100 80GB GPUs (one for activation caching and the other for training), but with some ingenuity you can probably make it work on smaller GPUs too.
This code was tested with Python 3.11. To set up the environment, please run:
conda env create -f environment.yaml
conda activate glp
pip install vllm==0.9.2
pip install transformers==4.47.0
pip install -e .
You'll need to do the installation in the exact order above, and ignore any pip warnings. We used this exact setup, which was the only way we could get vllm/nnsight/transformers to work together.
You can view all the weights on our HuggingFace page.
🌟 TLDR: For a quickstart, run
from glp.denoiser import load_glp
model = load_glp("generative-latent-prior/glp-llama8b-d6", device="cuda:0", checkpoint="final")
This grabs our main GLP trained on Llama8B-Base activations.
| Llama8B | Link |
|---|---|
| glp-llama8b-d6 | Link |
If you're interested in diving deeper and studying scaling behavior, we also provide Llama1B-Base GLPs and all intermediate checkpoints.
| Llama1B | Link |
|---|---|
| glp-llama1b-d3 | Link |
| glp-llama1b-d6 | Link |
| glp-llama1b-d12 | Link |
| glp-llama1b-d24 | Link |
| glp-llama1b-d12-multi | Link |
Unless otherwise specified, GLPs are trained on the middlemost layer (Layer 15 for Llama8B, Layer 07 for Llama1B). We also provide a multi-layer GLP trained on all Layers 00-15 of Llama1B, called glp-llama1b-d12-multi. You can also directly transfer these GLPs, which were trained on Base models, onto Instruct models, as shown in the paper.
Note: Each intermediate checkpoint is labeled by "epoch," which corresponds to 1M activations. This means epoch_1024 was trained on 1024M ≈ 1B activations (and final is the same as epoch_1024).
We use the term "epoch" loosely; in reality we stream data without repetition (so no activation is seen twice).
🌟 TLDR: For a quickstart, walk through our demo notebook at glp_demo.ipynb.
In the demo, we'll walk through loading a GLP, generating activations, then using it for on-manifold steering.
- Scalar 1-D Probing: Evaluate on the 113 binary classification datasets from Kantamneni et. al., 2025, by running
python3 glp/script_probe.py. - On-Manifold Steering: Post-process Persona Vectors by following the instructions at
integrations/persona_vectors/README.md.
Note: In the paper, we use the variable t to denote the timestep. In the codebase, we follow the diffusers scheduler convention and use u = 1 - t instead.
🌟 TLDR: For a quickstart, train a toy Llama1B GLP in a few minutes.
# download data
huggingface-cli download generative-latent-prior/llama1b-layer07-fineweb-1M \
--repo-type dataset \
--local-dir data/llama1b-layer07-fineweb-1M \
--local-dir-use-symlinks False
# launch training
conda activate glp
python3 glp_train.py config=configs/train_llama1b_static.yaml
Currently training is pre-set to a small static sanity dataset with 1M activations, representing the first 1M activations of the full dynamic dataset. Even on this small dataset, you should see a beautiful loss curve that just goes down. You can also download the Llama8B sanity dataset. Training on the full one billion activations takes 5.6 days for the Llama8B GLP.
Currently this codebase is in its initial release. All features marked as complete below are stable and ready to use. The others are still in progress.
- Release pre-trained GLP weights
- Release training code at
glp_train.py - Release Persona Vectors steering at
integrations/persona_vectors - Release 1-D probing at
glp/script_probe.py - Release dynamic producer-consumer data pipeline at
glp_save.py
@article{luo2026glp,
title={Learning a Generative Meta-Model of LLM Activations},
author={Grace Luo and Jiahai Feng and Trevor Darrell and Alec Radford and Jacob Steinhardt},
journal={arXiv preprint arXiv:2602.06964},
year={2026}
}