voxmap

A speaker diarization library and voxmap-studio — an open-source diarization annotation tool that records annotation cost (typed edit-operation counts and time) as a first-class output.

This repository accompanies the paper "voxmap-studio: an open-source speaker diarization annotation tool with built-in cost instrumentation." If you arrived here from the paper, the annotation tool lives in apps/studio/.

What is in this repository

voxmap/
├── apps/studio/        # voxmap-studio: the annotation tool (React frontend + FastAPI backend)
├── src/voxmap/         # the diarization library that powers automatic initialization
│   ├── vad/            #   voice activity detection
│   ├── embedding/      #   speaker embeddings
│   ├── clustering/     #   clustering
│   ├── pipeline/       #   VAD → embedding → clustering composed into a pipeline
│   └── eval/           #   DER / speaker-recall / latency / visualization
├── configs/            # reference pipeline configs (YAML)
├── scripts/            # CLIs: diarize / evaluate / compare_runs
└── tests/

voxmap-studio

A browser-based tool for producing and correcting speaker diarization annotations. Instead of drawing every speaker turn by hand, the annotator starts from the output of a fast automatic diarization pipeline and corrects it. Distinguishing features:

Automatic initialization. The canvas is pre-filled by a stride-accelerated diarization engine, so the first annotation appears with little waiting.
Built-in cost instrumentation. Every edit is counted by type (create / delete / split / resize / reassign) and active editing time is recorded, written into a JSON sidecar alongside the annotation — so you can measure where annotation effort actually goes.
Label assistance. Segments likely to be mislabeled are highlighted, and a cluster gallery plus an R-key recommendation panel speed up labeling.
Confirmation-gated export. The final RTTM/JSON is emitted only after every segment has been human-confirmed, with injected "phantom" attention checks that prevent unverified automatic output from being released as ground truth.

→ Setup and how to run it: apps/studio/README.md

→ Keyboard shortcuts and the annotation workflow: apps/studio/USAGE.md

Using the library directly

The same engine that initializes the studio canvas can be used on its own.

from voxmap.pipeline import build_pipeline

pipeline = build_pipeline("configs/pipeline/baseline.yaml")
diarization = pipeline.run("path/to/audio.wav")

# Run diarization and write RTTM
uv run python scripts/diarize.py audio.wav \
    --config configs/pipeline/baseline.yaml -o out.rttm

# Evaluate against a reference (DER + speaker recall)
uv run python scripts/evaluate.py --pred out.rttm --ref reference.rttm --out results/

Components are swappable: each */base.py defines a Protocol, and any class that satisfies it can be registered in registry.py and selected by name in a config.

vad:        { name: silero,    threshold: 0.5 }
embedding:  { name: wespeaker, model: voxceleb_resnet34 }
clustering: { name: spectral,  n_clusters: null }

Development

make setup       # install dependencies (uv) and pre-commit hooks
make check       # ruff lint + mypy (strict)
make test        # pytest

Stack: Python 3.12 / uv / ruff / mypy (strict). The studio frontend uses Bun + Vite + Biome; see apps/studio/README.md.

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
.github/PULL_REQUEST_TEMPLATE		.github/PULL_REQUEST_TEMPLATE
apps/studio		apps/studio
configs/pipeline		configs/pipeline
data		data
scripts		scripts
src/voxmap		src/voxmap
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bun.lock		bun.lock
lefthook.yml		lefthook.yml
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voxmap

What is in this repository

voxmap-studio

Using the library directly

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voxmap

What is in this repository

voxmap-studio

Using the library directly

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages