feat: multi-model ensemble separation with 9 community-curated presets by beveradb · Pull Request #265 · nomadkaraoke/python-audio-separator

beveradb · 2026-03-16T01:20:31Z

Summary

Adds multi-model ensemble separation to audio-separator, allowing users to combine the outputs of multiple separation models for better quality results. Includes a preset system with 9 community-curated configurations, stem labeling fixes, and a comprehensive multi-stem test framework.

Core ensemble support (by @makhlwf — thank you!)

New Ensembler class with 11 algorithms: avg_wave, median_wave, min_wave, max_wave, avg_fft, median_fft, min_fft, max_fft, uvr_max_spec, uvr_min_spec, ensemble_wav
Multi-model load_model() and _separate_ensemble() pipeline with temp directory management
CLI support via --extra_models for specifying additional models
Weighted ensembling support via --ensemble_weights

Ensemble presets

ensemble_presets.json with 9 presets sourced from deton24's community guide:
- Instrumental: `instrumental_clean`, `instrumental_full`, `instrumental_balanced`, `instrumental_low_resource`
- Vocal: `vocal_balanced`, `vocal_clean`, `vocal_full`, `vocal_rvc`
- Karaoke: `karaoke` (3-model)
`--ensemble_preset ` CLI flag and `Separator(ensemble_preset=...)` Python API
`--list_presets` to show available presets
Preset algorithm/weights can be overridden by explicit user arguments
Contributors can add presets via PR to `ensemble_presets.json`

Bug fixes

Stem label swap: Fixed `common_separator.py` to swap primary/secondary stem names when `target_instrument` doesn't match `instruments[0]` — fixes `bs_roformer_instrumental_resurrection_unwa` whose stems were backwards
Contextual stem mapping: In 2-stem models where one stem is vocal, "other" is now mapped to "Instrumental" (previously kept as separate group, causing broken ensembles)
CLI backward compat: Reverted `-m` from `nargs="+"` to single value, added `--extra_models` for additional models (old syntax `audio-separator -m model audio.wav` preserved)
Missing init attributes: `model_filename`/`model_filenames` initialized in `init`
State mutation: `original_output_dir` captured outside per-model loop
List reference copy: `self.model_filenames = list(model_filename)` instead of shared reference
Mono preservation: Track and restore original channel count through ensemble

Output naming

Preset ensembles: `audio_(Vocals)_preset_vocal_balanced.flac`
Manual ensembles: `audio_(Vocals)custom_ensemble_.flac`

Test framework

Unit tests (233 total, 37 new):

Ensembler algorithms (all 11)
Preset loading, validation, override, error handling
Stem name swap (target_instrument mismatch)
CLI flags (--extra_models, --ensemble_preset, --list_presets)
Output filename format (preset and custom slugs)

Ensemble preset integration tests (9 parametrized):

Run each preset on mardy20s.flac
Verify correct stem labels via correlation
Compare spectrograms against 36 committed reference images (SSIM > 0.80)

Multi-stem integration tests (5 test clips, 39 reference stems):

Test clips: Led Zeppelin (drums/guitar/vocals), Coldplay (piano/instruments), Benny Goodman (brass/wind), Enya (reverb), Queen (backing harmonies)
Vocal/instrumental, 4-stem, drumsep pipeline, karaoke, wind extraction, dereverb pipeline
Each test verifies output correlates > 0.70 with best-model reference stems

Meaningful ensemble tests:

Vocal ensemble matches best single model (>0.90 correlation)
Karaoke ensemble extracts only lead vocals (<0.90 correlation with standard vocals on Under Pressure)
Karaoke on extracted vocals produces distinct lead/backing split (correlation <0.50)

On-demand regression test (163 models):

Verifies every supported model's output stems contain what their labels claim
Uses correlation against known vocal/instrumental references
Handles utility models (de-echo, de-noise, de-reverb), sub-stems, drumsep
Run locally: `pytest tests/regression/test_all_models_stem_verification.py -v -s`

Documentation

README updated with ensemble CLI examples, Python API, preset table
`ensemble_presets.json` self-documenting with name/description/contributor fields
`docs/deton24-model-mapping-and-ensemble-guide.md` — model naming lookup table, ensemble recommendations, and phase fix documentation

Version

Bumped to 0.42.0

Credits

The core ensemble functionality was originally implemented by @makhlwf in #261. This PR builds on that work with bug fixes, the preset system, stem labeling corrections, and comprehensive test coverage. Thank you @makhlwf for the foundational contribution!

Test plan

@coderabbitai ignore

🤖 Generated with Claude Code

…allback writer

… coverage - Revert -m to single value, add --extra_models for ensemble (fixes CLI breaking change) - Initialize model_filename/model_filenames in __init__ (prevents AttributeError) - Fix list reference copy in load_model (use list() instead of shared reference) - Move original_output_dir capture outside per-model loop (state mutation fix) - Extract stem name map to module-level STEM_NAME_MAP constant - Preserve mono channel count through ensemble (avoid fake stereo) - Add trailing newlines to all files - Add 8 new unit tests: median/min/max_fft, uvr_max/min_spec, invalid algo, weight mismatch - Add 3 CLI tests: --extra_models, single model string compat, old syntax backward compat - Update README ensemble examples for new --extra_models flag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a JSON-based ensemble preset system that lets users select known-good model combinations by name instead of specifying every detail manually. Presets are sourced from deton24's community-maintained audio separation guide and cover instrumental (4), vocal (4), and karaoke (1) use cases. New features: - ensemble_presets.json with 9 presets (instrumental_clean/full/balanced/low_resource, vocal_balanced/clean/full/rvc, karaoke) - --ensemble_preset CLI flag and Separator(ensemble_preset=...) Python API - --list_presets CLI flag to show available presets - Preset algorithm/weights can be overridden by explicit user args - ensemble_algorithm parameter now accepts None (defaults to avg_wave) - 10 new unit tests for preset loading, validation, override, JSON validity - 2 new CLI tests for --ensemble_preset and --list_presets - README updated with preset documentation and usage examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rument, map "other" to "Instrumental" Three fixes for stem name handling in ensemble mode: 1. common_separator.py: When a model's target_instrument doesn't match instruments[0], swap primary/secondary stem names so the model's prediction gets the correct label. Fixes bs_roformer_instrumental_ resurrection_unwa whose "vocals" output was actually instrumental. 2. separator.py: In _separate_ensemble, when a model produces exactly 2 stems and one is vocal-like, map "other" to "Instrumental" instead of keeping it as a separate group. This ensures all 2-stem models contribute to the same Vocals/Instrumental ensemble regardless of whether they label their non-vocal stem "Instrumental" or "other". 3. separator.py: Use preset name in ensemble output filenames (preset_<name>) and descriptive slugs for manual ensembles (custom_ensemble_<slug1>_<slug2>). Also adds tests/utils_audio_verification.py — a content verification utility that correlates output stems against known references to detect label mismatches programmatically. Verified: all 9 presets now produce exactly 2 correctly-labeled stems (18/18 OK, 0 mismatches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ence spectrograms - 36 reference spectrogram/waveform PNGs for 9 presets × 2 stems each - test_ensemble_integration.py: parametrized test that for each preset: 1. Runs the preset separation on mardy20s.flac 2. Verifies stems contain correct content (correlation-based) 3. Compares spectrograms against committed references (SSIM) - generate_reference_images_ensemble.py: script to regenerate references - utils_audio_verification.py: content verification utility (already committed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…logic - 5 tests for CommonSeparator stem name swap (target_instrument mismatch, no swap when matching, edge cases) - 2 tests for STEM_NAME_MAP completeness and lowercase invariant - 2 tests for ensemble output filename format (preset and custom slugs) - 5 tests for preset validation edge cases (bad weights length, bad algorithm, single model, weights applied, weights override) Total: 233 unit tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The nargs="+" change on -m was reverted in favor of --extra_models, so the old CLI arg order (audio-separator -m model audio.wav) works again. No need to change these tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… models Runs every supported model on mardy20s.flac and verifies each output stem's label matches its actual content using correlation against known vocal and instrumental references. Usage: pytest tests/regression/test_all_models_stem_verification.py -v -s pytest ... -k "VR" (single architecture) pytest ... -k "resurrection" (single model) STEM_VERIFY_REPORT_ONLY=1 pytest ... (report without failing) Handles: - Vocal/Instrumental stems: verified via Pearson correlation (>0.7 threshold) - Sub-stems (drums, bass, guitar, piano): verified not-full-mix; near-silence OK - Full mix detection: any stem with >0.95 correlation to original mix fails - Demucs 6-stem models: sub-stems like Piano can be legitimately silent Not run in CI — requires downloading all models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Utility models (de-echo, de-noise, de-reverb, BVE) get relaxed verification — their stems don't follow standard vocal/instrumental patterns on clean source audio - Sub-stems (drums, bass, guitar, "No X" variants) skip the full-mix check since "No X" is legitimately ≈ the mix when X isn't present - Partial vocal stems (backing/lead vocals) skip full-vocal correlation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…piration, etc.) Full 163-model run revealed stem types not yet in SUB_STEMS or UTILITY_STEMS: - Drumsep: kick, snare, toms, hh, ride, crash - Gender split: male, female - Specialized: aspiration, bleed, no bleed - Utility: noreverb 160 passed, 0 real failures, 3 skipped (download failures). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New test input audio clips with diverse instrumentation for testing instrument-specific separation models: - levee_drums.flac (20s, 24-bit) — Led Zeppelin, drums+guitar+vocals - clocks_piano.flac (20s, 16-bit) — Coldplay, piano+instruments+vocals - sing_sing_sing_brass.flac (25s, 16-bit) — Benny Goodman, drums+brass+wind - only_time_reverb.flac (25s, 16-bit) — Enya, reverb-heavy vocal+synths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New integration test suite verifying instrument-specific separation models across 4 test clips with diverse instrumentation: Test matrix: - Vocal/Instrumental: resurrection model on all 4 clips - 4-stem (drums/bass/other/vocals): htdemucs_ft on levee + clocks - DrumSep pipeline: mix → htdemucs_ft drums → drumsep kit parts - Karaoke: aufr33/viperx model on levee + clocks - Wind/Brass: 17_HP-Wind on sing_sing_sing - De-reverb pipeline: mix → resurrection vocals → dereverb 30 reference stems generated by best-in-class models, committed as tests/inputs/reference/ref_*.flac. Tests verify new model outputs correlate > 0.70 with references. Includes generate_multi_stem_references.py for regenerating references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Karaoke models remove lead vocals while preserving backing vocals. The test now additionally checks that karaoke vocal output differs from standard vocal output (correlating < 0.95), confirming the model is doing karaoke-specific extraction, not just a generic split. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Queen & David Bowie — Under Pressure 1:35-1:55 (20s, 16-bit). Section has clear lead vocal over dense backing harmonies, making karaoke vs standard vocal separation measurably different (0.740 correlation vs 0.961 for Clocks which lacks strong backing vocals). Karaoke test now runs on 3 clips: levee, clocks, under_pressure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New minor version for: - Multi-model ensemble separation - 9 community-curated ensemble presets - Stem label fixes (target_instrument swap, contextual "other" mapping) - New CLI flags: --extra_models, --ensemble_preset, --list_presets - Multi-stem integration test framework Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three tests verifying ensemble presets produce semantically correct output: 1. test_vocal_ensemble_matches_best_single_model: vocal_balanced ensemble output should correlate >0.90 with the best single model (Resurrection), confirming ensemble doesn't degrade quality. 2. test_karaoke_ensemble_extracts_lead_only: On Under Pressure (prominent backing harmonies), karaoke ensemble vocals should differ from standard vocal extraction (<0.90 correlation), confirming it extracts only lead. 3. test_karaoke_on_vocals_produces_lead_backing_split: Pipeline test — mix → vocal model → karaoke model should produce distinct lead and backing vocal stems (both non-silent, correlation <0.50). Includes 9 new reference stems for these tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…54-fix-pr265-tests

…on tests find_stem() matched the first _(StemName) group in filenames, which broke pipeline tests where the input filename already contained a parenthesized stem from a prior step. Now uses the last match. Also handle near-silent stems (e.g. vocals from instrumental-only audio) returning nan correlation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…54-fix-pr265-tests

beveradb · 2026-03-17T01:39:46Z

Would love to get y'alls opinions/input on the ensemble presets which are now live in the latest audio-separator if you're interested @Politrees @Eddycrack864 😄

makhlwf and others added 22 commits March 11, 2026 21:15

add ensembler

371de86

refactor(ensembler): fix state mutation, handle mono input, and add f…

3c16178

…allback writer

try fix test

3bac786

review comments

ac68238

review comments

19ff223

fix test

6eeb743

docs: add ensemble_preset to Python API parameter reference in README

f325e89

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

beveradb enabled auto-merge (squash) March 16, 2026 05:58

beveradb mentioned this pull request Mar 16, 2026

add ensembler #261

Closed

beveradb and others added 3 commits March 16, 2026 18:10

Merge remote-tracking branch 'origin/main' into feat/sess-20260316-17…

ff6fb31

…54-fix-pr265-tests

Merge remote-tracking branch 'origin/main' into feat/sess-20260316-17…

68e22e3

…54-fix-pr265-tests

beveradb merged commit adc5539 into main Mar 16, 2026
21 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multi-model ensemble separation with 9 community-curated presets#265

feat: multi-model ensemble separation with 9 community-curated presets#265
beveradb merged 25 commits into
mainfrom
add-ensemble

beveradb commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

beveradb commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

beveradb commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core ensemble support (by @makhlwf — thank you!)

Ensemble presets

Bug fixes

Output naming

Test framework

Documentation

Version

Credits

Test plan

Uh oh!

Uh oh!

beveradb commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beveradb commented Mar 16, 2026 •

edited

Loading