Learning a Generative Meta-Model of LLM Activations

Grace Luo, Jiahai Feng, Trevor Darrell, Alec Radford, Jacob Steinhardt

This repository contains the PyTorch implementation of the paper "Learning a Generative Meta-Model of LLM Activations". The code walks through our proposed method for training an activation diffusion model, and using it for applications like on-manifold steering and scalar probing. We call this model a GLP, or Generative Latent Prior.

[Project Page][arXiv]

Compute

🌟 TLDR: Most of the scripts in this README take less than 24GB of VRAM, so they should fit on an Nvidia RTX 4090.

We want everyone to have a chance to try our models out, even in this economy. All of our released GLPs were trained on a billion FineWeb activations using two Nvidia A100 80GB GPUs (one for activation caching and the other for training), but with some ingenuity you can probably make it work on smaller GPUs too.

Setup

This code was tested with Python 3.11. To set up the environment, please run:

conda env create -f environment.yaml
conda activate glp
pip install vllm==0.9.2 
pip install transformers==4.47.0
pip install -e .

You'll need to do the installation in the exact order above, and ignore any pip warnings. We used this exact setup, which was the only way we could get vllm/nnsight/transformers to work together.

Pre-Trained Weights

You can view all the weights on our HuggingFace page.

🌟 TLDR: For a quickstart, run

from glp.denoiser import load_glp
model = load_glp("generative-latent-prior/glp-llama8b-d6", device="cuda:0", checkpoint="final")

This grabs our main GLP trained on Llama8B-Base activations.

Llama8B	Link
glp-llama8b-d6	Link

If you're interested in diving deeper and studying scaling behavior, we also provide Llama1B-Base GLPs and all intermediate checkpoints.

Llama1B	Link
glp-llama1b-d3	Link
glp-llama1b-d6	Link
glp-llama1b-d12	Link
glp-llama1b-d24	Link
glp-llama1b-d12-multi	Link

Unless otherwise specified, GLPs are trained on the middlemost layer (Layer 15 for Llama8B, Layer 07 for Llama1B). We also provide a multi-layer GLP trained on all Layers 00-15 of Llama1B, called glp-llama1b-d12-multi. You can also directly transfer these GLPs, which were trained on Base models, onto Instruct models, as shown in the paper.

Note: Each intermediate checkpoint is labeled by "epoch," which corresponds to 1M activations. This means epoch_1024 was trained on 1024M ≈ 1B activations (and final is the same as epoch_1024). We use the term "epoch" loosely; in reality we stream data without repetition (so no activation is seen twice).

Demo

🌟 TLDR: For a quickstart, walk through our demo notebook at glp_demo.ipynb.

In the demo, we'll walk through loading a GLP, generating activations, then using it for on-manifold steering.

Applications

Scalar 1-D Probing: Evaluate on the 113 binary classification datasets from Kantamneni et. al., 2025, by running python3 glp/script_probe.py.
On-Manifold Steering: Post-process Persona Vectors by following the instructions at integrations/persona_vectors/README.md.

Note: In the paper, we use the variable t to denote the timestep. In the codebase, we follow the diffusers scheduler convention and use u = 1 - t instead.

Training

🌟 TLDR: For a quickstart, train a toy Llama1B GLP in a few minutes.

# download data
huggingface-cli download generative-latent-prior/llama1b-layer07-fineweb-1M \
    --repo-type dataset  \
    --local-dir data/llama1b-layer07-fineweb-1M \
    --local-dir-use-symlinks False
# launch training
conda activate glp
python3 glp_train.py config=configs/train_llama1b_static.yaml

Currently training is pre-set to a small static sanity dataset with 1M activations, representing the first 1M activations of the full dynamic dataset. Even on this small dataset, you should see a beautiful loss curve that just goes down. You can also download the Llama8B sanity dataset. Training on the full one billion activations takes 5.6 days for the Llama8B GLP.

Roadmap

Currently this codebase is in its initial release. All features marked as complete below are stable and ready to use. The others are still in progress.

Release pre-trained GLP weights
Release training code at glp_train.py
Release Persona Vectors steering at integrations/persona_vectors
Release 1-D probing at glp/script_probe.py
Release dynamic producer-consumer data pipeline at glp_save.py

Citing

@article{luo2026glp,
  title={Learning a Generative Meta-Model of LLM Activations},
  author={Grace Luo and Jiahai Feng and Trevor Darrell and Alec Radford and Jacob Steinhardt},
  journal={arXiv preprint arXiv:2602.06964},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning a Generative Meta-Model of LLM Activations

Compute

Setup

Pre-Trained Weights

Demo

Applications

Training

Roadmap

Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
glp		glp
integrations/persona_vectors		integrations/persona_vectors
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
glp_demo.ipynb		glp_demo.ipynb
glp_save.py		glp_save.py
glp_train.py		glp_train.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Learning a Generative Meta-Model of LLM Activations

Compute

Setup

Pre-Trained Weights

Demo

Applications

Training

Roadmap

Citing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages