# MVSep-MDX23 Colab Fork v2.3
Adaptation of MVSep-MDX23 algorithm for Colab, with few tweaks:

https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.3/MVSep-MDX23-Colab.ipynb

Recent changes:
<font size=2>

**v2.3**
* HQ3-Instr model replaced by VitLarge23 (thanks to MVSep)
* Improved MDXv2 processing (thanks to Anjok)
* Improved BigShifts algo (v2)
* BigShifts processing added to MDXv3 & VitLarge
* Faster folder batch processing

</font>
<br>

<details>
    <summary>Full changelog :</summary>
<br>
<font size=2>
<br>

[**v2.2.2**](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/v2.2)
* Improved MDXv3 chunking code (thanks to HymnStudio)
* D1581 demo model replaced by new InstVocHQ MDXv3 model.
<br>

**v2.2.1**
* Added custom weights feature
* Fixed some bugs
* Fixed input: you can use a file or a folder as input now
<br>

**v2.2**
* Added MDXv3 compatibility 
* Added MDXv3 demo model D1581 in vocals stem multiband ensemble.
* Added VOC-FT Fullband SRS instead of UVR-MDX-Instr-HQ3.
* Added 2stems feature : output only vocals/instrum (faster processing)
* Added 16bit output format option
* Added "BigShift trick" for MDX models
* Added separated overlap values for MDX, MDXv3 and Demucs
* Fixed volume compensation fine-tuning for MDX-VOC-FT
<br>

[**v2.1 (by deton24)**](https://github.com/deton24/MVSEP-MDX23-Colab_v2.1)
* Updated with MDX-VOC-FT instead of Kim Vocal 2
<br>

[**v2.0**](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/2.0)
* Updated with new Kim Vocal 2 & UVR-MDX-Instr-HQ3 models
* Folder batch processing
* Fixed high frequency bleed in vocals
* Fixed volume compensation for MDX models
<br>
</font>
</details>
<br>

Credits:
* [ZFTurbo/MVSep](https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model)
* Models by [Demucs](https://github.com/facebookresearch/demucs), [Anjok](https://github.com/Anjok07/ultimatevocalremovergui) & [Kimberley Jensen](https://github.com/KimberleyJensen)
* Adaptation & tweaks by [jarredou](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/)
</font>

In [None]:
#@markdown #Installation
#@markdown *Run this cell to install MVSep-MDX23*
print('Installing... This will take 1 minute...')
%cd /content
from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/jarredou/MVSEP-MDX23-Colab_v2.git &> /dev/null
%cd /content/MVSEP-MDX23-Colab_v2
!pip install -r requirements.txt &> /dev/null
# onnxruntime-gpu nightly fix for cuda12.2
!python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
print('Installation done !')

### About settings:


<font size=2>

* **BigShifts :** Better quality/speed performance with values between 3 and 11, **BUT** 11 doesn't always give the best results. Think about it like seed, different values will give slightly different results.<br>
Higher values = longer processing.
</font>

<font size=2>

* **Overlap InstVoc/VitLarge :** No big advantage to use high values when BigShifts is already high. If you use BigShifts=1 (regular processing), you can use higher values like 8 or even 16.<br>
Higher values = longer processing.<br>
 *Same goes with overlap_VOCFT, but with values between 0 and 0.95*
</font>

<font size=2>

* **Weights :** How much importance the result from the given model will have in final results.
</font>


In [None]:
#@markdown #Separation
from pathlib import Path
import glob

%cd /content/MVSEP-MDX23-Colab_v2


input = '/content/drive/MyDrive' #@param {type:"string"}
output_folder = '/content/drive/MyDrive/output' #@param {type:"string"}
#@markdown ---
#@markdown *Bigshifts=1 to disable that feature*

BigShifts = 7 #@param {type:"slider", min:1, max:41, step:1}
#@markdown ---
overlap_InstVoc = 1 #@param {type:"slider", min:1, max:40, step:1}
overlap_VitLarge = 1 #@param {type:"slider", min:1, max:40, step:1}
#@markdown ---
weight_InstVoc = 8 #@param {type:"slider", min:0, max:10, step:1}
weight_VitLarge = 5 #@param {type:"slider", min:0, max:10, step:1}
#@markdown ---
use_VOCFT = False #@param {type:"boolean"}
overlap_VOCFT = 0.1 #@param {type:"slider", min:0, max:0.95, step:0.05}
weight_VOCFT = 2 #@param {type:"slider", min:0, max:10, step:1}
#@markdown ---
vocals_instru_only = True #@param {type:"boolean"}
overlap_demucs = 0.6 #@param {type:"slider", min:0, max:0.95, step:0.05}
#@markdown ---
output_format = 'PCM_16' #@param ["PCM_16", "FLOAT"]
if vocals_instru_only:
    vocals_only = '--vocals_only true'
else:
    vocals_only = ''


if use_VOCFT:
    use_VOCFT = '--use_VOCFT true'
else:
    use_VOCFT = ''

if Path(input).is_file():
  file_path = input
  Path(output_folder).mkdir(parents=True, exist_ok=True)
  !python inference.py \
        --large_gpu \
        --weight_InstVoc {weight_InstVoc} \
        --weight_VOCFT {weight_VOCFT} \
        --weight_VitLarge {weight_VitLarge} \
        --input_audio "{file_path}" \
        --overlap_demucs {overlap_demucs} \
        --overlap_VOCFT {overlap_VOCFT} \
        --overlap_InstVoc {overlap_InstVoc} \
        --overlap_VitLarge {overlap_VitLarge} \
        --output_format {output_format} \
        --BigShifts {BigShifts} \
        --output_folder "{output_folder}" \
        {vocals_only} \
        {use_VOCFT}

else:
  file_paths = sorted([f'"{glob.escape(path)}"' for path in glob.glob(input + "/*")])[:]
  input_audio_args = ' '.join(file_paths)
  Path(output_folder).mkdir(parents=True, exist_ok=True)
  !python inference.py \
          --large_gpu \
          --weight_InstVoc {weight_InstVoc} \
          --weight_VOCFT {weight_VOCFT} \
          --weight_VitLarge {weight_VitLarge} \
          --input_audio {input_audio_args} \
          --overlap_demucs {overlap_demucs} \
          --overlap_VOCFT {overlap_VOCFT} \
          --overlap_InstVoc {int(overlap_InstVoc)} \
          --overlap_VitLarge {int(overlap_VitLarge)} \
          --output_format {output_format} \
          --BigShifts {BigShifts} \
          --output_folder "{output_folder}" \
          {vocals_only} \
          {use_VOCFT}
