# Demucs - Audio Source Separation

This notebook demonstrates how to use Demucs for separating music into stems (drums, bass, other, vocals).

**Features:**
- Separate music into 4 stems (drums, bass, other, vocals)
- Optional 6-stem separation (adds guitar and piano)
- High-quality hybrid transformer architecture
- Multiple pre-trained models available

## 1. Installation

In [None]:
# Install demucs-infer package and dependencies
!pip install -q demucs-infer soundfile

In [None]:
# Verify installation
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

## 2. Download Model

We'll use the **htdemucs_ft** model (fine-tuned Hybrid Transformer Demucs), which provides the best quality for music separation.

In [None]:
# Create directories
!mkdir -p input_songs
!mkdir -p outputs

In [None]:
# Download and cache the model (models are automatically downloaded on first use)
from demucs_infer.pretrained import get_model

model_name = "htdemucs_ft"  # Best quality model
print(f"Downloading model: {model_name}")
print("(This may take a few minutes on first run)\n")

model = get_model(model_name)

print(f"\nModel loaded successfully!")
print(f"Sources: {model.sources}")
print(f"Sample rate: {model.samplerate} Hz")

## 3. Upload Your Audio File

Upload a `.wav` file to separate into stems.

In [None]:
# Option 1: Upload from your computer
from google.colab import files

print("Upload your audio file (.wav format):")
uploaded = files.upload()

# Move uploaded file to input folder
for filename in uploaded.keys():
    !mv "{filename}" input_songs/
    print(f"Moved {filename} to input_songs/")

In [None]:
# Option 2: Use a sample audio (uncomment to use)
# !wget -q -O input_songs/sample.wav "YOUR_AUDIO_URL_HERE"

In [None]:
# Check input files
!ls -lh input_songs/

## 4. Run Source Separation

Select your preferred method and run the separation:

In [None]:
#@title Select Inference Method { display-mode: "form" }
#@markdown Choose which method to use for audio separation:

inference_method = "CLI (Command Line)" #@param ["CLI (Command Line)", "Python API"]

print(f"Selected method: {inference_method}")

In [None]:
#@title Run Source Separation { display-mode: "form" }
#@markdown Click the play button to run separation with your selected method.

if inference_method == "CLI (Command Line)":
    # ============================================
    # Option A: CLI (Command Line)
    # ============================================
    print("Running with CLI...\n")
    !demucs-infer \
        -n htdemucs_ft \
        -o ./outputs \
        ./input_songs/*.wav

else:
    # ============================================
    # Option B: Python API
    # ============================================
    print("Running with Python API...\n")

    import torch
    import soundfile as sf
    import numpy as np
    from pathlib import Path
    from demucs_infer.pretrained import get_model
    from demucs_infer.apply import apply_model
    from demucs_infer.audio import convert_audio

    # Load model
    print("Loading model...")
    model = get_model("htdemucs_ft")
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)
    model.eval()
    print(f"Model loaded on {device}\n")

    # Process files
    input_folder = Path("input_songs")
    output_folder = Path("outputs")

    for audio_path in input_folder.glob("*.wav"):
        print(f"Processing: {audio_path.name}")

        # Load audio using soundfile
        audio, sr = sf.read(str(audio_path))

        # Convert to tensor (soundfile returns [time, channels], we need [channels, time])
        if len(audio.shape) == 1:
            # Mono: convert to stereo
            audio = torch.tensor(audio, dtype=torch.float32).unsqueeze(0)
            audio = audio.expand(2, -1)
        else:
            # Stereo or multi-channel
            audio = torch.tensor(audio.T, dtype=torch.float32)

        # Convert to model's sample rate if needed
        if sr != model.samplerate:
            audio = convert_audio(audio, sr, model.samplerate, model.audio_channels)
            sr = model.samplerate

        # Add batch dimension and move to device
        wav = audio.unsqueeze(0).to(device)

        # Run separation
        with torch.no_grad():
            sources = apply_model(model, wav, device=device)

        # Save each stem using soundfile
        stem_name = audio_path.stem
        stem_output_dir = output_folder / "htdemucs_ft" / stem_name
        stem_output_dir.mkdir(parents=True, exist_ok=True)

        for i, source_name in enumerate(model.sources):
            source = sources[0, i].cpu().numpy().T  # [channels, time] -> [time, channels]
            output_path = stem_output_dir / f"{source_name}.wav"
            sf.write(str(output_path), source, sr, subtype="FLOAT")
            print(f"  Saved: {output_path}")

    print("\nDone!")

## 5. Check Output Files

In [None]:
# Check output files
!find outputs -name "*.wav" -type f

## 6. Listen to Results

In [None]:
import IPython.display as ipd
from pathlib import Path

output_dir = Path("outputs")

# Find and display all output files
for audio_file in sorted(output_dir.rglob("*.wav")):
    print(f"\n{'='*50}")
    print(f"File: {audio_file.relative_to(output_dir)}")
    print(f"{'='*50}")
    display(ipd.Audio(str(audio_file)))

## 7. Download Results

In [None]:
# Download all output files as a zip
!zip -r outputs.zip outputs/

from google.colab import files
files.download("outputs.zip")

---

## Available Models

| Model | Sources | Description | Best For |
|-------|---------|-------------|----------|
| htdemucs_ft | 4 | Fine-tuned Hybrid Transformer | Best quality (recommended) |
| htdemucs | 4 | Hybrid Transformer Demucs | High quality |
| htdemucs_6s | 6 | 6-source separation | Guitar/piano extraction |
| mdx | 4 | MDX architecture | Fast processing |
| mdx_extra | 4 | Enhanced MDX | Better quality than mdx |
| mdx_q | 4 | Quantized MDX | Fastest processing |
| mdx_extra_q | 4 | Quantized enhanced MDX | Fast with good quality |

See the [GitHub repository](https://github.com/openmirlab/demucs-infer) for more options.