Skip to content

goodfire-ai/param-decomp

Repository files navigation

Parameter Decomposition

This repo is for running parameter decomposition on neural networks.

VPD paper (April 2026)

SPD paper (June 2025)

App

This project ships a web app for visualising and interpreting decompositions. You can point it at any decomposed run, including ones we've already trained and stored on wandb (e.g. the canonical goodfire/spd/runs/s-55ea3f9b below). At present, viewing a run still requires running the harvest and autointerp post-processing stages yourself — these produce the artifacts the app reads.

make install-app   # Install frontend dependencies (one-time)
make app           # Launch backend + frontend dev servers

See the app's README and CLAUDE.md for details.

Nano Parameter Decomposition

nano_param_decomp/ is a self-contained, single-file implementation of the whole method. It deliberately omits alternative loss/CI/sigmoid types and various logging for brevity.

Installation

From the root of the repository, run one of:

make install-dev  # Install the package, dev requirements, pre-commit hooks
make install      # Install the package only (`pip install -e .`)

Experiments

Run an experiment locally with pd-local <name>, or on SLURM with pd-run --experiments <name> (adds git snapshot + W&B view; also supports --dp N, --cpu, and --sweep --n_agents N). The two main language-model decompositions:

Other registered experiments (TMS, ResidualMLP, induction heads, GPT-2 / TinyStories variants) are listed in param_decomp/registry.py. The lm experiment can decompose any HuggingFace-loadable model whose target modules are nn.Linear, nn.Embedding, or transformers.modeling_utils.Conv1D.

Post-Processing Pipeline

After a decomposition has finished training, post-processing produces the artifacts the app reads: component statistics, autointerp labels, dataset attributions, and graph-context interpretations. Each stage is a separate CLI; pd-postprocess runs them all under one SLURM dependency graph from a single config:

pd-postprocess param_decomp/postprocess/pile.yaml

The individual stages, with links to their docs:

  • Harvest (pd-harvest) — collect activation examples, correlations, and token statistics for each component.
  • Autointerp (pd-autointerp) — generate LLM interpretations of components from harvested examples. Requires OPENROUTER_API_KEY.
  • Dataset attributions (pd-attributions) — compute component-to-component attribution strengths over the training distribution.
  • Graph interpretation (pd-graph-interp) — context-aware component labels that combine attributions and correlations.
  • Clustering (pd-clustering) — ensemble clustering of components.

Default batch sizes (256 for harvest and attributions) work for models like pile_llama_simple_mlp-4L; tune via --batch_size / --n_gpus per stage.

Development

Suggested VSCode/Cursor settings live in .vscode/. Copy .vscode/settings-example.json to .vscode/settings.json to use them. We are unlikely to be able to action new features, though issue reports are greatly appreciated!

Useful make targets:

make check     # Run pre-commit on all files (basedpyright, ruff lint, ruff format)
make type      # basedpyright only
make format    # ruff lint + format
make test      # Tests not marked `slow`
make test-all  # All tests

About

Parameter Decomposition

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors