coder-interp-tap

A Crucible tap for coder-model interpretability: TopK Sparse Autoencoder training, NLA-style activation verbalizers, and feature-diff metrics for measuring how fine-tuning changes a model's internal features.

The pipeline runs on RunPod via Crucible, logs to W&B, and is deliberately scoped for "usable, not perfect" — single-GPU budget, off-the-shelf base models, off-the-shelf pre-trained SAEs where available.

Tap layout

	Count	What
`projects/`	2	Project YAMLs: `nla_qwen3_5_2b_pilot.yaml`, `nla_qwen2_5_coder_1_5b.yaml`.
`architectures/`	0	SAE / NLA model code (planned: TopK SAE wrapper, AV+AR pair).
`callbacks/`	1	`wandb_periodic_validation` skeleton — periodic eval to W&B every N steps.
`data_adapters/`	0	Activation capture pipelines (planned: residual-stream HDF5 dumper).
`evaluation/`	0	Feature-diff metrics (planned: cosine drift, BERTScore on AV descriptions, top-k Jaccard, KL on firing).
`launchers/`	6	Stub launchers per project variant. Smoke variants are runnable end-to-end and validate pod plumbing; the pilot/training variants are scaffolds for follow-up implementation.
`findings/`	0	Documented experiment findings — populated as runs complete.
`examples/`	0	Example notebooks / scripts.

Quick start

crucible tap add https://github.com/eren23/coder-interp-tap
crucible tap sync coder-interp-tap
crucible run_project nla_qwen3_5_2b_pilot --variant smoke

What this tap is for

Two concrete pipelines, both runnable as Crucible projects:

Phase 1 — pilot on Qwen3.5-2B base + Qwen-Scope pre-trained TopK SAE. Validates the activation-dump → feature-extract → AV-train → quality-probe pipeline using an off-the-shelf SAE (Qwen/SAE-Res-Qwen3.5-2B-Base-W32K-L0_50, W=32K, L0=50, all 24 layers). Cheap (~$5 + Claude API for synthetic feature descriptions).
Phase 2/3 — port to Qwen2.5-Coder-1.5B + LoRA delta study. Trains a fresh TopK SAE on a coder-tuned model (no pre-trained coder-SAE exists), trains an AV verbalizer on those features, then runs a LoRA delta study using the 5 feature-diff metrics. ~$20 GPU + Claude API.

See docs/sae-nla-pipeline.md for the full plan.

Why a separate tap

This work uses different base models (Qwen3.5-2B base, Qwen2.5-Coder-1.5B) and different libraries (SAELens or Sparsify, anthropic SDK, sentence-transformers) than the existing crucible-community-tap. Keeping it separate avoids bloating that tap with research-specific dependencies.

License

Code is MIT unless a plugin's plugin.yaml says otherwise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coder-interp-tap

Tap layout

Quick start

What this tap is for

Why a separate tap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
architectures		architectures
callbacks		callbacks
data_adapters		data_adapters
docs		docs
dria-fine-tune-interp-post		dria-fine-tune-interp-post
evaluation		evaluation
examples		examples
findings		findings
launchers		launchers
projects		projects
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DATA_REGISTRY.yaml		DATA_REGISTRY.yaml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

coder-interp-tap

Tap layout

Quick start

What this tap is for

Why a separate tap

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages