This repository trains and analyzes sparse autoencoders (SAEs) on Pythia-70M, layer 3 residual stream activations. The default research path is now script-first:
- Shared code lives in
mechint/ - Reproducible entrypoints live in
scripts/ - New experiment artifacts live under
runs/ - Shared activation caches stay in
activations/ - Legacy checkpoints in
saved_models/and CSVs intraining_metrics/remain readable but are no longer the primary output layout
The canonical SAE implementation uses separate encoder and decoder weights with decoder-row renormalization after each optimizer step.
python scripts/collect_activations.py \
--model pythia-70m \
--layer 3 \
--hook blocks.3.hook_resid_post \
--dataset openwebtext \
--num-texts 50000 \
--output activations/activations_50000.ptpython scripts/train_sae.py \
--activations-path activations/activations_50000.pt \
--activation-dim 512 \
--expansion 8 \
--lam 3 \
--epochs 50000This creates a run directory under runs/ with:
config.jsonmetrics.csvcheckpoint.ptsummary.jsonmanifest.json
python scripts/eval_sae.py --run-dir runs/<run-name>python scripts/batch_ablate.py \
--checkpoint runs/<run-name>/checkpoint.pt \
--config runs/<run-name>/config.json \
--eval-texts-path eval_texts.txt \
--output runs/<run-name>/ablation.csvpython scripts/compare_expansions.py \
--checkpoints saved_models/sae_model_4x.pt saved_models/sae_model_8x.pt \
--activation-dim 512 \
--output runs/comparisons/4x_vs_8x.csvThe notebooks remain for exploration, but core logic should move through the package and scripts:
inspect-features.ipynb: canonical inspection notebookablation.ipynb: canonical ablation notebooknotebooks/archive/: archived scratch and duplicate notebooks
Key reusable functions now live in:
mechint.analysismechint.ablationmechint.evalmechint.data
python train.py ...still works and now forwards toscripts/train_sae.pyfrom sae import SparseAutoEncoderstill works and now re-exports the canonical implementation frommechint.sae- Existing legacy checkpoints in
saved_models/can be evaluated by the new scripts
Run:
python check_paths.py
python -m unittest tests.test_pipeline- The default training path uses a deterministic held-out validation split from the cached activation tensor.
- Training streams random batches from CPU-backed activations instead of moving the full cache onto the accelerator.
- The repo also contains an
autoresearch/subtree, but the default manual research pipeline is intentionally separate and should be stabilized first.