# MelBand RoFormer - Audio Source Separation

This notebook demonstrates how to use MelBand RoFormer for separating vocals from instrumentals.

**Features:**
- Separate vocals from music
- High-quality source separation using transformer architecture
- Multiple pre-trained models available

## 1. Installation

In [None]:
# Install melband-roformer-infer package
!pip install -q git+https://github.com/openmirlab/melband-roformer-infer.git

In [None]:
# Verify installation
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

## 2. Download Model

We'll use the **MelBand Roformer Kim** model, which is excellent for vocal separation.

In [None]:
# Create directories
!mkdir -p models/melband-roformer-kim
!mkdir -p input_songs
!mkdir -p outputs

In [None]:
# Download model checkpoint from Hugging Face
!wget -q -O models/melband-roformer-kim/MelBandRoformer.ckpt \
    "https://huggingface.co/KimberleyJSN/melbandroformer/resolve/main/MelBandRoformer.ckpt"

# Copy bundled config from the installed package
import shutil
from mel_band_roformer import __file__ as pkg_file
from pathlib import Path

pkg_dir = Path(pkg_file).parent
config_src = pkg_dir / "configs" / "config_vocals_mel_band_roformer.yaml"
config_dst = Path("models/melband-roformer-kim/config.yaml")
shutil.copy(config_src, config_dst)

print("Model and config ready!")
!ls -lh models/melband-roformer-kim/

## 3. Upload Your Audio File

Upload a `.wav` file to separate vocals from instrumentals.

In [None]:
# Option 1: Upload from your computer
from google.colab import files

print("Upload your audio file (.wav format):")
uploaded = files.upload()

# Move uploaded file to input folder
for filename in uploaded.keys():
    !mv "{filename}" input_songs/
    print(f"Moved {filename} to input_songs/")

In [None]:
# Option 2: Use a sample audio (uncomment to use)
# !wget -q -O input_songs/sample.wav "YOUR_AUDIO_URL_HERE"

In [None]:
# Check input files
!ls -lh input_songs/

## 4. Run Source Separation

Select your preferred method and run the separation:

In [None]:
#@title Select Inference Method { display-mode: "form" }
#@markdown Choose which method to use for audio separation:

inference_method = "CLI (Command Line)" #@param ["CLI (Command Line)", "Python API"]

print(f"Selected method: {inference_method}")

In [None]:
#@title Run Source Separation { display-mode: "form" }
#@markdown Click the play button to run separation with your selected method.

if inference_method == "CLI (Command Line)":
    # ============================================
    # Option A: CLI (Command Line)
    # ============================================
    print("Running with CLI...\n")
    !melband-roformer-infer \
        --config_path models/melband-roformer-kim/config.yaml \
        --model_path models/melband-roformer-kim/MelBandRoformer.ckpt \
        --input_folder ./input_songs \
        --store_dir ./outputs

else:
    # ============================================
    # Option B: Python API
    # ============================================
    print("Running with Python API...\n")
    
    import torch
    import yaml
    import soundfile as sf
    import numpy as np
    from pathlib import Path
    from ml_collections import ConfigDict
    from mel_band_roformer.utils import get_model_from_config, demix_track

    # Load model
    print("Loading model...")
    with open("models/melband-roformer-kim/config.yaml") as f:
        config = ConfigDict(yaml.safe_load(f))

    model = get_model_from_config("mel_band_roformer", config)
    model.load_state_dict(
        torch.load("models/melband-roformer-kim/MelBandRoformer.ckpt", map_location="cpu")
    )

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)
    model.eval()
    print(f"Model loaded on {device}\n")

    # Process files
    input_folder = Path("input_songs")
    output_folder = Path("outputs")
    output_folder.mkdir(exist_ok=True)

    for audio_path in input_folder.glob("*.wav"):
        print(f"Processing: {audio_path.name}")
        
        # Load audio
        mix, sr = sf.read(audio_path)
        original_mono = len(mix.shape) == 1
        if original_mono:
            mix = np.stack([mix, mix], axis=-1)
        
        # Convert to tensor
        mixture = torch.tensor(mix.T, dtype=torch.float32)
        
        # Run separation
        with torch.no_grad():
            result, _ = demix_track(config, model, mixture, device)
        
        # Save vocals
        stem_name = audio_path.stem
        for instrument, audio in result.items():
            output = audio.T
            if original_mono:
                output = output[:, 0]
            output_path = output_folder / f"{stem_name}_{instrument}.wav"
            sf.write(output_path, output, sr, subtype="FLOAT")
            print(f"  Saved: {output_path}")
        
        # Save instrumental (original - vocals)
        vocals = result.get("vocals", list(result.values())[0]).T
        if original_mono:
            vocals = vocals[:, 0]
            mix = mix[:, 0]
        instrumental = mix - vocals
        instrumental_path = output_folder / f"{stem_name}_instrumental.wav"
        sf.write(instrumental_path, instrumental, sr, subtype="FLOAT")
        print(f"  Saved: {instrumental_path}")

    print("\nDone!")

## 5. Check Output Files

In [None]:
# Check output files
!ls -lh outputs/

## 6. Listen to Results

In [None]:
import IPython.display as ipd
from pathlib import Path

output_dir = Path("outputs")

# Find and display all output files
for audio_file in sorted(output_dir.glob("*.wav")):
    print(f"\n{'='*50}")
    print(f"File: {audio_file.name}")
    print(f"{'='*50}")
    display(ipd.Audio(str(audio_file)))

## 7. Download Results

In [None]:
# Download all output files as a zip
!zip -r outputs.zip outputs/

from google.colab import files
files.download("outputs.zip")

---

## Available Models

| Model | Description | Best For |
|-------|-------------|----------|
| MelBand Roformer Kim | Original vocal separation | General vocal extraction |
| MelBand Roformer Big Beta 6 | Larger model by unwa | Higher quality |
| MelBand Roformer Karaoke | Karaoke-optimized | Background vocal removal |
| MelBand Roformer Denoise | Noise reduction | Cleaning audio |

See the [model registry](https://github.com/openmirlab/melband-roformer-infer) for more options.