feat: multi-model ensemble separation with 9 community-curated presets#265
Merged
Conversation
… coverage - Revert -m to single value, add --extra_models for ensemble (fixes CLI breaking change) - Initialize model_filename/model_filenames in __init__ (prevents AttributeError) - Fix list reference copy in load_model (use list() instead of shared reference) - Move original_output_dir capture outside per-model loop (state mutation fix) - Extract stem name map to module-level STEM_NAME_MAP constant - Preserve mono channel count through ensemble (avoid fake stereo) - Add trailing newlines to all files - Add 8 new unit tests: median/min/max_fft, uvr_max/min_spec, invalid algo, weight mismatch - Add 3 CLI tests: --extra_models, single model string compat, old syntax backward compat - Update README ensemble examples for new --extra_models flag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a JSON-based ensemble preset system that lets users select known-good model combinations by name instead of specifying every detail manually. Presets are sourced from deton24's community-maintained audio separation guide and cover instrumental (4), vocal (4), and karaoke (1) use cases. New features: - ensemble_presets.json with 9 presets (instrumental_clean/full/balanced/low_resource, vocal_balanced/clean/full/rvc, karaoke) - --ensemble_preset CLI flag and Separator(ensemble_preset=...) Python API - --list_presets CLI flag to show available presets - Preset algorithm/weights can be overridden by explicit user args - ensemble_algorithm parameter now accepts None (defaults to avg_wave) - 10 new unit tests for preset loading, validation, override, JSON validity - 2 new CLI tests for --ensemble_preset and --list_presets - README updated with preset documentation and usage examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rument, map "other" to "Instrumental" Three fixes for stem name handling in ensemble mode: 1. common_separator.py: When a model's target_instrument doesn't match instruments[0], swap primary/secondary stem names so the model's prediction gets the correct label. Fixes bs_roformer_instrumental_ resurrection_unwa whose "vocals" output was actually instrumental. 2. separator.py: In _separate_ensemble, when a model produces exactly 2 stems and one is vocal-like, map "other" to "Instrumental" instead of keeping it as a separate group. This ensures all 2-stem models contribute to the same Vocals/Instrumental ensemble regardless of whether they label their non-vocal stem "Instrumental" or "other". 3. separator.py: Use preset name in ensemble output filenames (preset_<name>) and descriptive slugs for manual ensembles (custom_ensemble_<slug1>_<slug2>). Also adds tests/utils_audio_verification.py — a content verification utility that correlates output stems against known references to detect label mismatches programmatically. Verified: all 9 presets now produce exactly 2 correctly-labeled stems (18/18 OK, 0 mismatches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ence spectrograms - 36 reference spectrogram/waveform PNGs for 9 presets × 2 stems each - test_ensemble_integration.py: parametrized test that for each preset: 1. Runs the preset separation on mardy20s.flac 2. Verifies stems contain correct content (correlation-based) 3. Compares spectrograms against committed references (SSIM) - generate_reference_images_ensemble.py: script to regenerate references - utils_audio_verification.py: content verification utility (already committed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…logic - 5 tests for CommonSeparator stem name swap (target_instrument mismatch, no swap when matching, edge cases) - 2 tests for STEM_NAME_MAP completeness and lowercase invariant - 2 tests for ensemble output filename format (preset and custom slugs) - 5 tests for preset validation edge cases (bad weights length, bad algorithm, single model, weights applied, weights override) Total: 233 unit tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The nargs="+" change on -m was reverted in favor of --extra_models, so the old CLI arg order (audio-separator -m model audio.wav) works again. No need to change these tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… models Runs every supported model on mardy20s.flac and verifies each output stem's label matches its actual content using correlation against known vocal and instrumental references. Usage: pytest tests/regression/test_all_models_stem_verification.py -v -s pytest ... -k "VR" (single architecture) pytest ... -k "resurrection" (single model) STEM_VERIFY_REPORT_ONLY=1 pytest ... (report without failing) Handles: - Vocal/Instrumental stems: verified via Pearson correlation (>0.7 threshold) - Sub-stems (drums, bass, guitar, piano): verified not-full-mix; near-silence OK - Full mix detection: any stem with >0.95 correlation to original mix fails - Demucs 6-stem models: sub-stems like Piano can be legitimately silent Not run in CI — requires downloading all models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Utility models (de-echo, de-noise, de-reverb, BVE) get relaxed verification — their stems don't follow standard vocal/instrumental patterns on clean source audio - Sub-stems (drums, bass, guitar, "No X" variants) skip the full-mix check since "No X" is legitimately ≈ the mix when X isn't present - Partial vocal stems (backing/lead vocals) skip full-vocal correlation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…piration, etc.) Full 163-model run revealed stem types not yet in SUB_STEMS or UTILITY_STEMS: - Drumsep: kick, snare, toms, hh, ride, crash - Gender split: male, female - Specialized: aspiration, bleed, no bleed - Utility: noreverb 160 passed, 0 real failures, 3 skipped (download failures). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New test input audio clips with diverse instrumentation for testing instrument-specific separation models: - levee_drums.flac (20s, 24-bit) — Led Zeppelin, drums+guitar+vocals - clocks_piano.flac (20s, 16-bit) — Coldplay, piano+instruments+vocals - sing_sing_sing_brass.flac (25s, 16-bit) — Benny Goodman, drums+brass+wind - only_time_reverb.flac (25s, 16-bit) — Enya, reverb-heavy vocal+synths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New integration test suite verifying instrument-specific separation models across 4 test clips with diverse instrumentation: Test matrix: - Vocal/Instrumental: resurrection model on all 4 clips - 4-stem (drums/bass/other/vocals): htdemucs_ft on levee + clocks - DrumSep pipeline: mix → htdemucs_ft drums → drumsep kit parts - Karaoke: aufr33/viperx model on levee + clocks - Wind/Brass: 17_HP-Wind on sing_sing_sing - De-reverb pipeline: mix → resurrection vocals → dereverb 30 reference stems generated by best-in-class models, committed as tests/inputs/reference/ref_*.flac. Tests verify new model outputs correlate > 0.70 with references. Includes generate_multi_stem_references.py for regenerating references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Karaoke models remove lead vocals while preserving backing vocals. The test now additionally checks that karaoke vocal output differs from standard vocal output (correlating < 0.95), confirming the model is doing karaoke-specific extraction, not just a generic split. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Queen & David Bowie — Under Pressure 1:35-1:55 (20s, 16-bit). Section has clear lead vocal over dense backing harmonies, making karaoke vs standard vocal separation measurably different (0.740 correlation vs 0.961 for Clocks which lacks strong backing vocals). Karaoke test now runs on 3 clips: levee, clocks, under_pressure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New minor version for: - Multi-model ensemble separation - 9 community-curated ensemble presets - Stem label fixes (target_instrument swap, contextual "other" mapping) - New CLI flags: --extra_models, --ensemble_preset, --list_presets - Multi-stem integration test framework Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three tests verifying ensemble presets produce semantically correct output: 1. test_vocal_ensemble_matches_best_single_model: vocal_balanced ensemble output should correlate >0.90 with the best single model (Resurrection), confirming ensemble doesn't degrade quality. 2. test_karaoke_ensemble_extracts_lead_only: On Under Pressure (prominent backing harmonies), karaoke ensemble vocals should differ from standard vocal extraction (<0.90 correlation), confirming it extracts only lead. 3. test_karaoke_on_vocals_produces_lead_backing_split: Pipeline test — mix → vocal model → karaoke model should produce distinct lead and backing vocal stems (both non-silent, correlation <0.50). Includes 9 new reference stems for these tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closed
…54-fix-pr265-tests
…on tests find_stem() matched the first _(StemName) group in filenames, which broke pipeline tests where the input filename already contained a parenthesized stem from a prior step. Now uses the last match. Also handle near-silent stems (e.g. vocals from instrumental-only audio) returning nan correlation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…54-fix-pr265-tests
Collaborator
Author
|
Would love to get y'alls opinions/input on the ensemble presets which are now live in the latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds multi-model ensemble separation to audio-separator, allowing users to combine the outputs of multiple separation models for better quality results. Includes a preset system with 9 community-curated configurations, stem labeling fixes, and a comprehensive multi-stem test framework.
Core ensemble support (by @makhlwf — thank you!)
Ensemblerclass with 11 algorithms:avg_wave,median_wave,min_wave,max_wave,avg_fft,median_fft,min_fft,max_fft,uvr_max_spec,uvr_min_spec,ensemble_wavload_model()and_separate_ensemble()pipeline with temp directory management--extra_modelsfor specifying additional models--ensemble_weightsEnsemble presets
ensemble_presets.jsonwith 9 presets sourced from deton24's community guide:Bug fixes
Output naming
Test framework
Unit tests (233 total, 37 new):
Ensemble preset integration tests (9 parametrized):
Multi-stem integration tests (5 test clips, 39 reference stems):
Meaningful ensemble tests:
On-demand regression test (163 models):
Documentation
Version
Bumped to 0.42.0
Credits
The core ensemble functionality was originally implemented by @makhlwf in #261. This PR builds on that work with bug fixes, the preset system, stem labeling corrections, and comprehensive test coverage. Thank you @makhlwf for the foundational contribution!
Test plan
@coderabbitai ignore
🤖 Generated with Claude Code