vpstack

Voice-privacy research infrastructure for AI coding agents.

v0.3.0-dev — Markdown skills + bash binaries. Works in Claude Code, Codex, and Cursor.

vpstack gives AI agents the domain knowledge and workflow automation they need for VoicePrivacy 2026. They stop hallucinating baseline numbers. They know what EER means and which direction is better. They run the right attacker condition. They check reproducibility before you submit.

Auto-activates on voice-anonymization projects. Silent everywhere else.

How it works

vpstack has two entry points — /vp-talk mode R for research and mode E for engineering. They diverge at the first command and rejoin at /vp-brain and /vp-ship. Pick the path that matches what you're actually doing.

Research path — VP2026 benchmark, ablations, paper

You:    /vp-talk

Claude: What kind of work is this?
        A) Research — VP2026 benchmark, ablations, paper
        B) Engineering — building a system for production
You:    A

Claude: 8 forcing questions: open question · threat model · contribution claim ·
        baseline · eval scope · failure modes · scope discipline.
        Locks ~/.vpstack/projects/my-system/research-plans/2026-04-29-hubert-ablation.md

You:    /vp-hypothesis

Claude: [7 questions: what changes, baseline, acceptance criteria, magnitude]
        Domain note: Layer 6 is pre-content peak — phonetic + residual speaker.
        Layer 7-9 is the phonetic content peak per Pasad et al. (ASRU 2021).
        → ~/.vpstack/projects/my-system/hypotheses/20260429-hubert-l6.md

You:    /vp-spike

Claude: $ vpstack-b1 --data_path /data/l6/  --seed 42
        $ vpstack-b1 --data_path /data/l9/  --seed 42
        $ vpstack-b1 --data_path /data/l12/ --seed 42

        | Variant   | Files | Output           | Config hash       |
        | layer-6   | 847   | /data/l6/anon_b1 | d4f662e1b76de2f5  |
        | layer-9   | 847   | /data/l9/anon_b1 | a72e8c33f9a14b6e  |
        | layer-12  | 847   | /data/l12/anon_b1| b8f1d24e6c8a7f1d  |

You:    /vp-attack on /data/l6/anon_b1

Claude: $ vpstack-score --anonymized_path /data/l6/anon_b1 \
            --condition lazy_informed --backend speechbrain --seed 42

        VP2026 Attacker — lazy_informed
        EER overall: 38.2%   (↑ = more private, 50% = random)
        → Run semi-informed before citing this in a paper

You:    /vp-repro-check

Claude: $ vpstack-lock verify checkpoints.lock         3/3 OK
        Reproducibility: PASS_STRONG
        ✓ seed pinned · splits explicit · checkpoints hash-verified · no placeholders

You:    /vp-brain top --metric eer

Claude: $ vpstack-brain top --metric eer --limit 5
        RANK   VALUE  EXP_ID                          METHOD
        1     38.20  layer-6-spike-20260429T143012   B1-McAdams
        2     35.10  layer-9-spike-20260429T143245   B1-McAdams
        3     28.40  layer-12-spike-20260429T143518  B1-McAdams

You:    /vp-ship

Claude: Repro PASS_STRONG. Deferred gates: 0.
        VERSION: 0.3.0-dev → 0.3.1-dev
        PR: github.com/you/vp2026-system/pull/7

Engineering path — building an anonymizer for a real product

You:    /vp-talk
You:    B  (engineering)

Claude: 6 domain questions: domain · audio format · methods · metrics · compliance · scale.
        Domain: medical / clinical speech.
        Sample rate: 44.1 kHz → resample to 16 kHz required (sox -r 16000)
        Compliance: HIPAA → telemetry coerced to off.
        Methods: B1 baseline + speaker-selection for production.

        Wrote:
          ~/.vpstack/projects/my-system/domain_config.yaml      (read by every skill)
          ~/.vpstack/projects/my-system/engineering-plan.md     (feeds /vp-plan-eng-review)

You:    /vp-plan-eng-review

Claude: 18 VP-specific gates against the engineering plan.
        P0: license isolation OK · runtime model fetch OK · 16kHz contract OK
        P1: three attacker conditions covered · F-F/M-M/Mixed gender split planned
        2 deferred gates noted for follow-up.

You:    /vp-baseline-compare

Claude: vpbrain says you ran B1 on this data 3 days ago (eer=38.2). Reuse?
You:    Yes
        → Loads prior numbers, skips the 5-min B1 re-run.

You:    /vp-implement
        → Runs the 14-step dev workflow: license check, repro gates, contract check,
          test green-bar, atomic commit. Logs experiment to vpbrain on success.

You:    /vp-qa

Claude: Quick tier (~15min):
        ✓ vp-repro-check     PASS_STRONG
        ✓ lazy_informed smoke EER=37.8%
        ✓ submission format  CSVs valid for Track 1
        QA score: 92 / 100

You:    /vp-ship
        → same shape as research path. Version bump, repro gate, PR.

What's the same on both paths

/vp-brain browses the same JSONL store. /vp-repro-check runs the same hash verification. /vp-ship enforces the same gates. Whether you're chasing a paper or shipping a product, the binaries underneath are the same — vpstack-b1, vpstack-score, vpstack-lock, vpstack-brain. No skill pretends to do something it can't.

Install

Claude Code

git clone https://github.com/khamidov17/vpstack.git ~/.claude/skills/vpstack
chmod +x ~/.claude/skills/vpstack/bin/*

You don't have to pre-install the ML deps. Skills that drive an ML binary (vpstack-b1, vpstack-score, vpstack-wer, vpstack-utmos, vpstack-b2) probe their Python packages first and prompt before installing — pip install --user so it's sandboxed to your home directory, no sudo. Pick which components you want and the skill asks one question per component.

If you'd rather pre-install everything up front:

~/.claude/skills/vpstack/bin/vpstack-deps list           # see status of every component
~/.claude/skills/vpstack/bin/vpstack-deps install b1     # numpy scipy soundfile
~/.claude/skills/vpstack/bin/vpstack-deps install score  # speechbrain torch torchaudio
~/.claude/skills/vpstack/bin/vpstack-deps install wer    # openai-whisper torch
~/.claude/skills/vpstack/bin/vpstack-deps install utmos  # speechmos torch torchaudio

Restart Claude Code. Open a voice-anonymization project. Type /vp-hypothesis.

vpstack auto-activates on voice-anonymization projects (detects VP2026, voice-anonymization, speechbrain, British spelling anonymisation, more). Silent everywhere else.

Codex

git clone https://github.com/khamidov17/vpstack.git ~/.claude/skills/vpstack
chmod +x ~/.claude/skills/vpstack/bin/*
cp ~/.claude/skills/vpstack/AGENTS.md /path/to/your/vp2026-project/AGENTS.md

Codex reads AGENTS.md for domain context. Open skills/vp-*/SKILL.md and tell Codex to follow it. The bash binaries (vpstack-b1, vpstack-score, etc.) work the same way for Codex as for Claude — same commands, same outputs.

Cursor

git clone https://github.com/khamidov17/vpstack.git ~/.claude/skills/vpstack
chmod +x ~/.claude/skills/vpstack/bin/*
mkdir -p /path/to/your/vp2026-project/.cursor
cp ~/.claude/skills/vpstack/.cursor/rules /path/to/your/vp2026-project/.cursor/rules

In Cursor Composer, open skills/vp-*/SKILL.md in the sidebar and reference it. Same bash binaries.

All clients — add domain context to your project

cat ~/.claude/skills/vpstack/docs/claude-md-template.md >> CLAUDE.md

This gives the agent the VP2026 domain facts upfront — metric directions, baseline notes, component tradeoffs, common mistakes — so it doesn't ask you what EER means or get the direction backwards.

Skills

Skill	What it does
`/vp-talk`	Two modes: Research (8 forcing questions → research plan) or Engineering (6 domain questions → domain_config.yaml + engineering plan)
`/vp-hypothesis`	Formalize an experiment in 7 questions, write structured doc
`/vp-spike`	Run 1–3 B1 ablation variants via `vpstack-b1`, log verdicts as learnings
`/vp-baseline-compare`	B1 baseline (real, via `vpstack-b1`) + delta table
`/vp-attack`	ASV attacker (real, via `vpstack-score`): ignorant / lazy-informed / semi-informed
`/vp-eval`	Full VP2026 scorecard — currently uses `vpstack-score` + Whisper + UTMOS
`/vp-repro-check`	PASS_STRONG / PASS_WEAK — seeds, splits, checkpoint hashes (real, via `vpstack-lock`)
`/vp-implement`	14-step dev workflow with license + repro gates
`/vp-investigate`	Domain-aware debugging — "my EER looks wrong"
`/vp-writeup`	Engineering report from experiment logs (no LLM-generated citations)
`/vp-qa`	QA pass: repro + attacker smoke + submission format
`/vp-ship`	Version bump + repro gate + deferred gate check + PR
`/vp-plan-eng-review`	18 VP-specific engineering gates against research/engineering plan
`/vp-plan-design-review`	Architecture review before implementation
`/vp-autoplan`	Chains all skills end-to-end
`/vp-context-save`	Save session state — active hypothesis, last experiment, next step
`/vp-context-restore`	Resume from saved state — 2-sentence briefing, pick up where you left off
`/vp-brain`	Browse experiment store: `list`, `top`, `stats`, `query`, `diff`, `learnings`, `timeline`, `projects`

Bash binaries

The skills call these directly. You can also use them standalone:

Binary	Purpose
`vpstack-b1`	McAdams B1 anonymization on any WAV directory. CPU only, ~5min for dev set
`vpstack-b2`	Neural B2 wrapper. `--backend external` (drives your VP2026 B2 recipe), `--backend pool-selection` (ECAPA farthest-neighbor target speakers), `--backend speechbrain-vc` (experimental)
`vpstack-score`	ASV attacker — wraps SpeechBrain ECAPA or your own external attacker
`vpstack-wer`	ASR Word-Error-Rate scoring via OpenAI Whisper, against a TSV reference manifest
`vpstack-utmos`	Naturalness PMOS scoring via UTMOS22 (sarulab-speech / SpeechMOS)
`vpstack-eval`	Full VP2026 scorecard orchestrator — calls score/wer/utmos and writes the official submission CSV layout (`exp/asv_anon*/`, `exp/asr/`, `exp/ser/`, `exp/results_summary/track1/`)
`vpstack-lock`	Generate/verify `checkpoints.lock` for hash-pinned reproducibility
`vpstack-brain`	Experiment store CLI: `list`, `top`, `show`, `diff`, `query`, `stats`, `learnings`, `timeline`, `projects`. `--slug <name>` for cross-project queries
`vpstack-deps`	Per-feature ML dependency manager: `check`, `install`, `list`, `packages` for `b1` / `score` / `wer` / `utmos` / `b2-pool`. Skills probe before invoking a binary so the user gets a clean install prompt instead of `DEPS_MISSING` JSON
`vpstack-slug`	Project slug (basename + USER-scoped hash) — matches storage paths
`vpstack-skill-init`	Skill preamble logic — activation gate, learnings load, routing injection
`vpstack-detect`	Detects voice-anonymization projects
`vpstack-config`	User config (`~/.vpstack/config.json`) — telemetry mode, etc.
`vpstack-telemetry-log`	Opt-in telemetry (off / anonymous / community)
`vpstack-learnings-log`	Append a research learning to `~/.vpstack/projects/{slug}/learnings.jsonl`
`vpstack-learnings-search`	Query learnings by text, type, source
`vpstack-timeline-log`	Append a skill-run event to `~/.vpstack/analytics/timeline.jsonl`

Run any of them with --help for usage.

What's implemented

Component	Status
B1 McAdams anonymization (`vpstack-b1`)	✅ Working end-to-end on any 16kHz WAV directory
ASV attacker scoring (`vpstack-score`)	✅ SpeechBrain ECAPA backend + external backend for official VP2026
Checkpoint hash verification (`vpstack-lock`)	✅ Generate + verify, real SHA256, macOS/Linux compat
Experiment store CLI (`vpstack-brain`)	✅ Full operations over JSONL state
All 17 skills	✅ Markdown workflows that call the binaries directly
Cross-session learnings + timeline	✅ Persistent state, surfaced in skill preambles
Reproducibility check (PASS_STRONG / PASS_WEAK)	✅ Real verification, not placeholder
Engineering plan output (vp-talk Mode E)	✅ `domain_config.yaml` + `engineering-plan.md`
Context save/restore	✅ Resume mid-experiment from any session
Activation auto-detect	✅ VP2026 projects fire, others stay silent
Claude Code / Codex / Cursor	✅ All three clients, same skills, same binaries
B2 neural baseline wrapper (`vpstack-b2`)	✅ Three backends: `external` (drive the VP2026 B2 recipe), `pool-selection` (ECAPA farthest-neighbor), `speechbrain-vc` (experimental)
WER scoring (`vpstack-wer`)	✅ Whisper-based, configurable model size, edit-distance WER against a TSV manifest
Naturalness PMOS scoring (`vpstack-utmos`)	✅ UTMOS22 via SpeechMOS package
Full eval orchestrator (`vpstack-eval`)	✅ Runs score/wer/utmos in one shot, writes VP2026 submission CSVs + optional ZIP

Storage layout

State lives at ~/.vpstack/:

~/.vpstack/
├── config.json                              user config (telemetry mode, etc.)
├── analytics/
│   └── timeline.jsonl                       every skill run, ever (audit trail)
└── projects/
    └── {repo}-{user-hash}/
        ├── domain_config.yaml               from /vp-talk Mode E
        ├── learnings.jsonl                  research insights, persistent across sessions
        ├── session-state.md                 from /vp-context-save (resume target)
        ├── deferred-gates.jsonl             from /vp-plan-eng-review (re-checked at /vp-ship)
        ├── hypotheses/{id}.md               from /vp-hypothesis
        ├── research-plans/{id}.md           from /vp-talk Mode R
        ├── engineering-plans/{id}.md        from /vp-talk Mode E
        ├── spikes/{id}.md                   from /vp-spike
        └── experiments/{exp_id}/summary.json    canonical experiment log

Browse it with vpstack-brain or directly via cat. No database. No proprietary format.

Domain knowledge

Everything the agent needs to know about VP2026 is in docs/domain.md:

Metric directions (EER ↑ = private, 50% = goal; WER ↓ = useful)
Three attacker conditions and which one to report (semi-informed)
Component tradeoffs (HuBERT layers, ContentVec, WavLM, ECAPA-TDNN, HiFi-GAN, McAdams, PLDA)
Known component failure modes (the things researchers re-discover every time)
checkpoints.lock format for hash-pinned reproducibility
VP2026 submission format requirements
Common mistakes that waste GPU time

Architecture

skills/vp-*/SKILL.md     pure markdown — workflow instructions for the agent
bin/vpstack-*            bash binaries — actual computation, called by skills
docs/                    domain knowledge agents read in every session

Zero Python files in the skill layer. The bash binaries embed Python heredocs where needed (numpy/scipy for B1, torch/speechbrain for the attacker), but the skills themselves are markdown only — same pattern as gstack.

No MCP server. No daemon. No build step beyond chmod +x.

License

Apache 2.0. Do NOT import VP2024 baseline GitHub code — it's GPLv3 and would force the whole project to GPL. Implement from the VP2024 Eval Plan PDF or subprocess the GPLv3 scripts as external tools (vpstack does this via vpstack-score --backend external).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.cursor		.cursor
.github/workflows		.github/workflows
bin		bin
docs		docs
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
LICENSING.md		LICENSING.md
README.md		README.md
TEST-PLAN.md		TEST-PLAN.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
VERSION		VERSION
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vpstack

How it works

Research path — VP2026 benchmark, ablations, paper

Engineering path — building an anonymizer for a real product

What's the same on both paths

Install

Claude Code

Codex

Cursor

All clients — add domain context to your project

Skills

Bash binaries

What's implemented

Storage layout

Domain knowledge

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vpstack

How it works

Research path — VP2026 benchmark, ablations, paper

Engineering path — building an anonymizer for a real product

What's the same on both paths

Install

Claude Code

Codex

Cursor

All clients — add domain context to your project

Skills

Bash binaries

What's implemented

Storage layout

Domain knowledge

Architecture

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages