<a href="https://colab.research.google.com/github/sqdnoises/vocals-extractor/blob/colab/Vocals%20Extractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Vocals Extractor

### Steps: <font size=2>(you have to do these everytime you reopen this page)</font>
1. **Installation**
  <table>
    <tr>
      <td>1. Click the Run button (</td>
      <td><img src="https://github.com/sqdnoises/vocals-extractor/blob/colab/assets/run.png?raw=true" width="24">↖️ not this one</td>
      <td>)  on the top left side of the <code>Connect to Google Drive & Install</code> box below.</td>
    </tr>
  </table>
  <table>
    <tr>
      <td>2. Give Google Colab access to Google Drive in the popup.</td>
    </tr>
  </table>
  <table>
    <tr>
      <td>3. Wait for installation to complete then move to the box below.</td>
    </tr>
  </table>
2. **Extracting vocals**
  <table>
    <tr>
      <td>1. Make two folders: <code>input</code> and <code>output</code> in your Google Drive.</td>
    </tr>
  </table>
  <table>
    <tr>
      <td>2. Place the audio files you want to extract vocals from in the <code>input</code> folder in Google Drive.
    </tr>
  </table>
  <table>
    <tr>
      <td>3. If you wish to, you can customize the extracted audio settings in the <code>Extract Vocals / Seperation</code> box below.</td>
    </tr>
  </table>
  <table>
    <tr>
      <td>4. Click the Run button (</td>
      <td><img src="https://github.com/sqdnoises/vocals-extractor/blob/colab/assets/run.png?raw=true" width="24">↖️ again, not this one</td>
      <td>) on the top left side of the <code>Extract Vocals / Seperation</code> box below.</td>
    </tr>
  </table>
  <table>
    <tr>
      <td>5. Once it completes, you will find the extracted vocals in the <code>output</code> folder in Google Drive.</td>
    </tr>
  </table>

You may want to use [Google Drive for Desktop](https://support.google.com/drive/answer/10838124) for a seamless uploading and downloading experience.

<font size=1>\*this colab uses ZFTurbo's [Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training/)</font>\
<font size=1>\*[colab inference](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_(Colab_Inference).ipynb) made by [jarredou](https://github.com/jarredou) & deton ([Support jarredou on ko-fi](https://ko-fi.com/Q5Q811R5YI))</font>\
<font size=1>\*colab inference edited by [sqd](https://github.com/sqdnoises) for a more easy interface to normal users</font>

In [1]:
# @markdown # Connect to Google Drive & Install
# @markdown ### Instructions:
# @markdown 1. Click the Run button on the top left side of this box.
# @markdown 2. Give Google Colab access to Google Drive in the popup.
# @markdown 3. Wait for installation to complete then move to the box below.

# Connect to Google Drive
import os
from google.colab import drive
if not os.path.exists("/content/drive"):
    print("Connecting to Google Drive...")
    drive.mount("/content/drive")
    print("Google Drive connected!")
elif dir(os.listdir("/content/drive")) == 0:
    print("Attempting to forcibly reconnect Google Drive...")
    drive.mount("/content/drive", force_remount=True)
    print("Google Drive connected!")
else:
    print("Drive already mounted at /content/drive.")

# Clone the helper code
%cd /content
!rm -rf "/content/vocals-extractor"
!git clone -b colab https://github.com/sqdnoises/vocals-extractor &> /dev/null

# Start installation
%cd vocals-extractor
!chmod +x install.sh
!./install.sh

Drive already mounted at /content/drive.
/content
/content/vocals-extractor
✏️ Starting installation...
🔍 Detected Python version: 3.10.12
✅ Python 3.10 is available.
✅ Git is available.
⏳ Cloning git repository https://github.com/jarredou/Music-Source-Separation-Training [branch: colab-inference]
Cloning into 'Music-Source-Separation-Training'...
remote: Enumerating objects: 1558, done.[K
remote: Counting objects: 100% (829/829), done.[K
remote: Compressing objects: 100% (279/279), done.[K
remote: Total 1558 (delta 717), reused 550 (delta 550), pack-reused 729 (from 2)[K
Receiving objects: 100% (1558/1558), 900.03 KiB | 20.46 MiB/s, done.
Resolving deltas: 100% (991/991), done.
✅ Cloned repository.
⏳ Installing dependencies... This will take a few minutes.
✅ Installed dependencies.
✅ Created directories: output
✅ Installation complete.


In [1]:
# @markdown # Extract Vocals / Seperation
# @markdown ### Instructions:
# @markdown 1. Make two folders: `input` and `output` in your Google Drive.
# @markdown 2. Place the audio files you want to extract vocals from in the `input` folder in Google Drive.
# @markdown 3. You can customize the extracted audio settings below these instructions.
# @markdown 4. Click the Run button on the top left side of this box.
# @markdown 5. Once it completes, you will find the extracted vocals in the `output` folder in Google Drive.

%cd "/content/vocals-extractor/Music-Source-Separation-Training"

import os
import sys
import json
import yaml
import torch
import traceback
from urllib.parse import quote

with open("/content/vocals-extractor/models.json") as f:
    configs = json.load(f)

# Configuration parameters
# @markdown #### Configuration:
# @markdown Configuring these are completely optional. The default settings should be good enough.

# @markdown ---
# @markdown ####   Input & Output options:
drive_input_folder = "input" # @param {type: "string"}
drive_output_folder = "output" # @param {type: "string"}
# @markdown &nbsp;&nbsp;&nbsp; Path to the folders of input and output in Google Drive.\
# @markdown &nbsp;&nbsp;&nbsp; <font size=2>- It is case-sensitive, i.e. `input` is different than `Input`.</font>

# @markdown ---
# @markdown ####   Choose Model:
model = "VOCALS-MelBand-Roformer (by KimberleyJSN)" # @param ['VOCALS-MelBand-Roformer (by KimberleyJSN)', 'VOCALS-MelBand-Roformer (by Becruily)' , 'INST-MelBand-Roformer (by Becruily)', 'VOCALS-MelBand-Roformer Kim FT (by Unwa)', 'VOCALS-Melband-Roformer BigBeta5e (by unwa)', 'VOCALS-Mel-Roformer big beta 4 (by unwa)', 'INST-Mel-Roformer v1e (by unwa)', 'INST-VOC-Mel-Roformer a.k.a. duality (by unwa)', 'INST-VOC-Mel-Roformer a.k.a. duality v2 (by unwa)', 'INST-Mel-Roformer v1 (by unwa)', 'INST-Mel-Roformer v2 (by unwa)', 'VOCALS-BS-RoformerLargev1 (by unwa)', 'VOCALS-InstVocHQ', 'VOCALS-BS-Roformer_1297 (by viperx)', 'VOCALS-BS-Roformer_1296 (by viperx)', 'KARAOKE-MelBand-Roformer (by aufr33 & viperx)', 'OTHER-BS-Roformer_1053 (by viperx)', '4STEMS-SCNet_XL_MUSDB18 (by ZFTurbo)', '4STEMS-SCNet_Large (by starrytong)', '4STEMS-BS-Roformer_MUSDB18 (by ZFTurbo)', '4STEMS-SCNet_MUSDB18 (by starrytong)', 'CROWD-REMOVAL-MelBand-Roformer (by aufr33)', 'VOCALS-VitLarge23 (by ZFTurbo)', 'CINEMATIC-BandIt_Plus (by kwatcharasupat)', 'DRUMSEP-MDX23C_DrumSep_6stem (by aufr33 & jarredou)', 'DE-REVERB-MDX23C (by aufr33 & jarredou)', 'DE-REVERB-MelBand-Roformer aggr./v2/19.1729 (by anvuew)', 'DE-REVERB-Echo-MelBand-Roformer (by Sucial)', 'DENOISE-MelBand-Roformer-1 (by aufr33)', 'DENOISE-MelBand-Roformer-2 (by aufr33)', 'DEBLEED-MelBand-Roformer (by unwa/97chris)']
# @markdown &nbsp;&nbsp;&nbsp; The model to use for vocal seperation.

# @markdown ---
# @markdown ####   Extraction options:
extract_instrumental = True # @param {type:"boolean"}
# @markdown &nbsp;&nbsp;&nbsp; Whether to extract the instrumental version aswell.
export_format = "WAV (High-Quality, Uncompressed)" # @param ["WAV (High-Quality, Uncompressed)", "FLAC (Standard Quality, Compressed)", "FLAC (High-Fidelity, Compressed)"]
also_convert_files_to_mp3 = True # @param {type:"boolean"}
# @markdown &nbsp;&nbsp;&nbsp; MP3 files will be processed and would be wayyy smaller than the actual export format while also losing just a little bit of audio quality. Most people will not be able to tell the difference, so it's a good choice to check this box.
use_tta = False # @param {type:"boolean"}
# @markdown &nbsp;&nbsp;&nbsp; Whether to make the vocal separation process more detailed and accurate.\
# @markdown &nbsp;&nbsp;&nbsp; <font size=2>- May take a bit longer (usually 3x longer than normal), but gives a better output.</font>


# Rofermer's Custom configuration
overlap = 4          # min: 2, max: 40         Default: 2
chunk_size = 485100  # one of: 352800, 485100  Default: 485100
# overlap: Higher means longer separation time. 4 is an already balanced value,
#          2 is fast and some people still won't notice any difference.
#          Normally there's not point going over 8.


# Process export format
drive_input_folder = os.path.join("/content/drive/MyDrive", drive_input_folder)
drive_output_folder = os.path.join("/content/drive/MyDrive", drive_output_folder)
input_folder = drive_input_folder # "/content/vocals-extractor/input"
output_folder = "/content/vocals-extractor/output"
ckpts_folder = "/content/vocals-extractor/Music-Source-Separation-Training/ckpts"

if export_format == "WAV (High-Quality, Uncompressed)":
    export_format = "wav FLOAT"
elif export_format == "FLAC (Standard Quality, Compressed)":
    export_format = "flac PCM_16"
elif export_format == "FLAC (High-Fidelity, Compressed)":
    export_format = "flac PCM_24"
flac_file = export_format.startswith("flac")
pcm_type = export_format.split(" ")[1] if flac_file else None


class IndentDumper(yaml.Dumper):
    def increase_indent(self, flow: bool = False, indentless: bool = False):
        return super(IndentDumper, self).increase_indent(flow, False)

def tuple_constructor(loader, node) -> tuple:
    values = loader.construct_sequence(node)
    return tuple(values)

# Register the constructor with PyYAML
yaml.SafeLoader.add_constructor("tag:yaml.org,2002:python/tuple", tuple_constructor)

def conf_edit(config_path: str, chunk_size: int, overlap: int) -> None:
    with open(config_path, "r") as f:
        data: dict = yaml.load(f, Loader=yaml.SafeLoader)

    # handle cases where "use_amp" is missing from config:
    if "use_amp" not in data.keys():
        data["training"]["use_amp"] = True

    data["audio"]["chunk_size"] = chunk_size
    data["inference"]["num_overlap"] = overlap

    if data["inference"]["batch_size"] == 1:
        data["inference"]["batch_size"] = 2

    print("Using custom overlap and chunk_size values:")
    print(f"overlap = {data['inference']['num_overlap']}")
    print(f"chunk_size = {data['audio']['chunk_size']}")
    print(f"batch_size = {data['inference']['batch_size']}")

    with open(config_path, "w") as f:
        yaml.dump(data, f, default_flow_style=False, sort_keys=False, Dumper=IndentDumper, allow_unicode=True)

def download_file(url: str, path: str = ".") -> str:
    # Encode the URL to handle spaces and special characters
    encoded_url = quote(url, safe=":/")

    os.makedirs(path, exist_ok=True)
    filename = os.path.basename(encoded_url)
    file_path = os.path.abspath(os.path.join(path, filename))

    if os.path.exists(file_path):
        print(f"File '{filename}' already exists at '{os.path.abspath(path)}'.")
        return file_path

    try:
        torch.hub.download_url_to_file(encoded_url, file_path)
        print(f"File '{filename}' downloaded successfully")
    except Exception as e:
        print(f"Error downloading file '{filename}' from '{url}': {e}")

    return file_path

def get_model_config(model_name: str, path: str | None = None):
    """
    Returns model configuration based on the model name.
    Returns: (model_type, config_path, start_check_point, config_urls)
    """

    if model_name not in configs:
        raise ValueError(f"Model {model_name} not found in configurations")

    if path:
        configs[model_name]["config_path"]       = os.path.join(path, configs[model_name]["config_path"])
        configs[model_name]["start_check_point"] = os.path.join(path, configs[model_name]["start_check_point"])

    return configs[model_name]

try:
    # Get model configuration
    model_config = get_model_config(model)
    model_type = model_config["model_type"]

    # Download necessary files
    config_path = download_file(model_config["urls"]["config"], ckpts_folder)
    ckpt_path = download_file(model_config["urls"]["ckpt"], ckpts_folder)

    # Edit configuration if needed
    if model_config["needs_config_edit"]:
        conf_edit(config_path, chunk_size, overlap)

    if model == "INST-Mel-Roformer v1e (by unwa)" and extract_instrumental == False:
        print(f"NOTE: The model you selected '{model}' with `extract_instrumental` set to `False`, would still make an instrumental output. If you want the vocals, please set `extract_instrumental` to `True`.")

    !rm -rf /content/vocals-extractor/output
    !mkdir /content/vocals-extractor/output

    # Execute inference command
    print(f"Google Drive input folder: {drive_input_folder}")
    print(f"Google Drive output folder: {drive_output_folder}")
    print(f"Model type: {model_type}")
    print(f"Config path: {config_path}")
    print(f"Start checkpoint: {ckpt_path}")
    print(f"Input folder: {input_folder}")
    print(f"Output folder: {output_folder}")
    print(f"Extract instrumental: {extract_instrumental}")
    print(f"Flac file: {flac_file}")
    print(f"Use TTA: {use_tta}")
    print(f"PCM type: {pcm_type}")

    print("Executing inference command...")
    !python inference.py \
        --model_type {model_type} \
        --config_path "{config_path}" \
        --start_check_point "{ckpt_path}" \
        --input_folder "{input_folder}" \
        --store_dir "{output_folder}" \
        {("--extract_instrumental" if extract_instrumental else "")} \
        {("--flac_file" if flac_file else "")} \
        {("--use_tta" if use_tta else "")} \
        {("--pcm_type " + pcm_type if pcm_type else "")}

except Exception as e: # Execution failed
    print("Error during execution:")
    traceback.print_exception(e)

else: # Post-processing (Successful execution)
    if model in [
        "INST-Mel-Roformer v1 (by unwa)",
        "INST-Mel-Roformer v1e (by unwa)",
        "INST-Mel-Roformer v2 (by unwa)",
    ] and not (model == "INST-Mel-Roformer v1e (by unwa)" and extract_instrumental == False):
        print("Correcting file names for INST-Mel-Roformer v1/1e/2 models...")
        for filename in os.listdir(output_folder):
            if filename.endswith("_other.wav"):
                base_name = filename[:-10] # Remove "_other.wav"
                new_filename = f"{base_name}_vocals.wav"
                !mv {os.path.join(output_folder, filename)} {os.path.join(output_folder, new_filename)}
                print(f"Renamed: {filename} -> {new_filename}")

            if filename.endswith("_vocals.wav"):
                  base_name = filename[:-11] # Remove "_vocals.wav"
                  new_filename = f"{base_name}_other.wav"
                  !mv {os.path.join(output_folder, filename)} {os.path.join(output_folder, new_filename)}
                  print(f"Renamed: {filename} -> {new_filename}")

        if flac_file:
            for filename in os.listdir(output_folder):
                if filename.endswith("_other.flac"):
                    base_name = filename[:-11] # Remove "_other.flac"
                    new_filename = f"{base_name}_vocals.flac"
                    !mv {os.path.join(output_folder, filename)} {os.path.join(output_folder, new_filename)}
                    print(f"Renamed: {filename} -> {new_filename}")
                elif filename.endswith("_vocals.flac"):
                      base_name = filename[:-12] # Remove "_vocals.flac"
                      new_filename = f"{base_name}_other.flac"
                      !mv {os.path.join(output_folder, filename)} {os.path.join(output_folder, new_filename)}
                      print(f"Renamed: {filename} -> {new_filename}")

    if also_convert_files_to_mp3:
        print("Converting files to MP3...")
        for filename in os.listdir(output_folder):
            if filename.lower().endswith((".wav", ".flac")):
                base_name = os.path.splitext(filename)[0]
                input_file = os.path.join(output_folder, filename)
                output_file = os.path.join(output_folder, f"{base_name}.mp3")
                try:
                  !ffmpeg -i "{input_file}" "{output_file}" &> /dev/null
                  print(f"Converted: {filename} -> {base_name}.mp3")
                except Exception as e:
                   print(f"Error converting {filename}: {e}")

    print("Copying files to Google Drive...")
    !cp -r "{output_folder}"/* "{drive_output_folder}"
    print("Copied files to Google Drive.")

print("Inference and post-processing complete.")

/content/vocals-extractor/Music-Source-Separation-Training
File 'config_vocals_mel_band_roformer_kj.yaml' already exists at '/content/vocals-extractor/Music-Source-Separation-Training/ckpts'.
File 'MelBandRoformer.ckpt' already exists at '/content/vocals-extractor/Music-Source-Separation-Training/ckpts'.
Using custom overlap and chunk_size values:
overlap = 4
chunk_size = 485100
batch_size = 4
Google Drive input folder: /content/drive/MyDrive/input
Google Drive output folder: /content/drive/MyDrive/output
Model type: mel_band_roformer
Config path: /content/vocals-extractor/Music-Source-Separation-Training/ckpts/config_vocals_mel_band_roformer_kj.yaml
Start checkpoint: /content/vocals-extractor/Music-Source-Separation-Training/ckpts/MelBandRoformer.ckpt
Input folder: /content/drive/MyDrive/input
Output folder: /content/vocals-extractor/output
Extract instrumental: True
Flac file: False
Use TTA: False
PCM type: None
Executing inference command...
CUDA is available, use --force_cpu to dis


> <font size=4>If your separation can't start and `Total files found: 0` is shown when running the above box,</font>

Rerun the `Connect to Google Drive & Install` box. Should fix any issues. Otherwise you can refresh the page and retry.