<a href="https://colab.research.google.com/github/grashooper/AudioSR-Colab-Fork/blob/main/AudioSR_Colab_Fork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AudioSR-Colab-Fork v0.5

---
Colab adaptation of AudioSR, with some tweaks:

v0.5
- input audio is resampled accordingly to 'input_cutoff' (instead of lowpass filtering)
- each processed chunk is normalised at same LUFS level than input chunk (fix the volume drop issue)

v0.4
- code rework, inference.py created for local CLI usage.

v0.3
- added : multiband ensemble option to use original audio below the given cutoff frequency and the generated audio above.
- fixed : other than .wav input error while saving the final audio

v0.2
- added a chunking feature to process input of any length
- added stereo handling, stereo input channels will be splitted and processed independantly (dual mono) and then reconstructed as stereo audio.
- added overlap feature to smooth the transitions between chunks (don't use high values because AudioSR is not 100% phase accurate and this will create weird phase cancellations accross the overlapping regions)
---
Adaptation & tweaks by [jarredou](https://https://github.com/jarredou/)

Original work [AudioSR: Versatile Audio Super-resolution at Scale](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley



In [1]:
#@markdown # Installation
from google.colab import drive
drive.mount('/content/drive')

!git clone https://github.com/haoheliu/versatile_audio_super_resolution.git
%cd versatile_audio_super_resolution
!pip install cog huggingface_hub unidecode phonemizer einops torchlibrosa transformers ftfy timm librosa pyloudnorm
!pip install huggingface_hub transformers==4.30.2 gradio soundfile progressbar librosa audiosr unidecode
#!pip install -r requirements.txt

!wget https://raw.githubusercontent.com/jarredou/AudioSR-Colab-Fork/main/inference.py
#from IPython.display import clear_output
#clear_output(wait=False)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
fatal: destination path 'versatile_audio_super_resolution' already exists and is not an empty directory.
/content/versatile_audio_super_resolution
--2025-01-26 06:07:15--  https://raw.githubusercontent.com/jarredou/AudioSR-Colab-Fork/main/inference.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10078 (9.8K) [text/plain]
Saving to: ‘inference.py.2’


2025-01-26 06:07:15 (115 MB/s) - ‘inference.py.2’ saved [10078/10078]



### **IMPORTANT NOTE**

#### If the inference cell crashes, restart the runtime (do not disconnect, just restart it), else it will cause memory errors !

*If you're are doing multiple runs, think also to restart the runtime every 4 or 5 files to clean up memory*

In [None]:
%cd /content/versatile_audio_super_resolution
import gc
import os
import random
import numpy as np
from scipy.signal.windows import hann
import soundfile as sf
import torch
from cog import BasePredictor, Input, Path
import tempfile
import librosa
from audiosr import build_model, super_resolution
from scipy import signal
import warnings
warnings.filterwarnings("ignore")

os.environ["TOKENIZERS_PARALLELISM"] = "true"
torch.set_float32_matmul_precision("high")


#@markdown #Inference
input_file_path = '/content/drive/MyDrive/input/Pet Shop Boys - Nightlife Tour Atlanta first half Webcast.mp3' #@param {type:"string"}
output_folder = '/content/drive/MyDrive/output_folder' #@param {type:"string"}
#@markdown ---
ddim_steps= 20 #@param {type:"slider", min:20, max:200, step:10}
overlap = 0.96 #@param {type:"slider", min:0, max:0.96, step:0.04}
guidance_scale=3.5 #@param {type:"slider", min:1, max:15, step:0.5}
seed = 0 # @param {type:"integer"}
chunk_size = 10.24 # @param [5.12, 10.24, 20.48] {type:"raw"}
multiband_ensemble = True # @param {type:"boolean"}
input_cutoff = "8000" #@param [20000, 19000, 18000, 17000, 16000, 14000, 13000, 12000, 11000, 10000, 9000, 8000, 7000, 6000, 5000, 4000, 3000, 2000]
input_cutoff = int(input_cutoff)



!python inference.py --input "{input_file_path}" \
                     --output "{output_folder}" \
                     --ddim_steps {ddim_steps} \
                     --guidance_scale {guidance_scale} \
                     --seed {seed} \
                     --chunk_size {chunk_size} \
                     --overlap {overlap} \
                     --multiband_ensemble {multiband_ensemble} \
                     --input_cutoff {input_cutoff}

/content/versatile_audio_super_resolution
Loading Model...
Loading AudioSR: basic
Loading model on cuda:0
DiffusionWrapper has 258.20 M params.
Model loaded!
Setting seed to: 444686769
overlap = 0.96
guidance_scale = 3.5
ddim_steps = 20
chunk_size = 10.24
multiband_ensemble = True
input file = Pet Shop Boys - Nightlife Tour Atlanta first half Webcast.mp3
audio.shape = (3420695, 2)
input cutoff = 8000
audio is stereo
enable_overlap = True
Processing chunk 1 of 522 for Left/Mono channel
Running DDIM Sampling with 20 timesteps
DDIM Sampler: 100% 20/20 [00:06<00:00,  3.22it/s]
Processing chunk 2 of 522 for Left/Mono channel
Running DDIM Sampling with 20 timesteps
DDIM Sampler: 100% 20/20 [00:03<00:00,  5.72it/s]
Processing chunk 3 of 522 for Left/Mono channel
Running DDIM Sampling with 20 timesteps
DDIM Sampler: 100% 20/20 [00:04<00:00,  4.59it/s]
Processing chunk 4 of 522 for Left/Mono channel
Running DDIM Sampling with 20 timesteps
DDIM Sampler: 100% 20/20 [00:03<00:00,  5.69it/s]
Proces