<a href="https://colab.research.google.com/github/eloimoliner/bwe_historical_recordings/blob/main/colab/demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks 

This notebook is a demo of the historical music denoising method proposed in:

> E. Moliner and V. Välimäki,, "BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks
", submitted to IEEE Transactions on Audio, Speech, and Language Processing, 2022

<p align="center">
<img src="https://user-images.githubusercontent.com/64018465/163122490-55aedb3b-7b21-46fc-838a-9fe90eb09b3e.png" alt="Schema represention"
width="700px"></p>

Listen to our [audio samples](http://research.spa.aalto.fi/publications/papers/icassp22-denoising/)

You can freely use it to enhance your own historical recordings.

### Instructions for running:

* Make sure to use a GPU runtime, click:  __Runtime >> Change Runtime Type >> GPU__
* Press ▶️ on the left of each of the cells
* View the code: Double-click any of the cells
* Hide the code: Double click the right side of the cell


In [1]:
!git clone https://github.com/eloimoliner/bwe_historical_recordings.git

Cloning into 'bwe_historical_recordings'...
remote: Enumerating objects: 202, done.[K
remote: Counting objects: 100% (202/202), done.[K
remote: Compressing objects: 100% (135/135), done.[K
remote: Total 202 (delta 118), reused 113 (delta 54), pack-reused 0[K
Receiving objects: 100% (202/202), 83.98 KiB | 886.00 KiB/s, done.
Resolving deltas: 100% (118/118), done.


In [2]:
%cd bwe_historical_recordings

/content/bwe_historical_recordings


In [3]:
!bash prepare_data.sh

--2023-04-18 13:10:39--  https://github.com/eloimoliner/bwe_historical_recordings/releases/download/v0.0-alpha/audio_examples.zip
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/448304570/9414369f-d90e-4e18-9379-a7c5aab87836?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230418%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230418T131040Z&X-Amz-Expires=300&X-Amz-Signature=35161676f82b95bcff61d8e62ee10fa3fd8a441c7533d4262d240ddf63112035&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=448304570&response-content-disposition=attachment%3B%20filename%3Daudio_examples.zip&response-content-type=application%2Foctet-stream [following]
--2023-04-18 13:10:39--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/448304570/9414369f-d90e-

In [4]:
! pip install hydra-core

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting hydra-core
  Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.5/154.5 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting antlr4-python3-runtime==4.9.*
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.0/117.0 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting omegaconf<2.4,>=2.2
  Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: antlr4-python3-runtime
  Building wheel for antlr4-python3-runtime (setup.py) ... [?25l[?25hdone
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_r

In [31]:
import os
import hydra
import logging
import torch
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
print("CUDA??",torch.cuda.is_available())
import soundfile as sf
import datetime
import numpy as np
import scipy
from tqdm import tqdm

import utils.utils as utils 
import utils.lowpass_utils as lowpass_utils 
import  utils.dataset_loader as dataset_loader
import  utils.stft_loss as stft_loss
import models.discriminators as discriminators
import models.unet2d_generator as unet2d_generator
import models.audiounet as audiounet
import models.seanet as seanet
import models.denoiser as denoiser

import yaml
from pathlib import Path





device=torch.device("cuda" if torch.cuda.is_available() else "cpu")

args = yaml.safe_load(Path('conf/conf.yaml').read_text())
class dotdict(dict):
    """dot.notation access to dictionary attributes"""
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__
args=dotdict(args)
unet_args=dotdict(args.unet_generator)
args_denoiser=dotdict(args.denoiser)

gener_model = unet2d_generator.Unet2d(unet_args=unet_args).to(device)

#dirname = os.path.dirname(__file__)

#print("something went wrong while loading the checkpoint")

checkpoint_filepath_denoiser=os.path.join('/content/bwe_historical_recordings','experiments_denoiser/pretrained_model/checkpoint_denoiser')
unet_model = denoiser.MultiStage_denoise(unet_args=args_denoiser)
unet_model.load_state_dict(torch.load(checkpoint_filepath_denoiser, map_location=device))
unet_model.to(device)



def apply_denoiser_model(segment):
    segment_TF=utils.do_stft(segment,win_size=args.stft["win_size"], hop_size=args.stft["hop_size"], device=device)
    #segment_TF_ds=tf.data.Dataset.from_tensors(segment_TF)
    with torch.no_grad():
        pred = unet_model(segment_TF)
    if args_denoiser.num_stages>1:
        pred=pred[0]

    pred_time=utils.do_istft(pred, args.stft["win_size"], args.stft["hop_size"],device)
    pred_time=pred_time[..., 0:segment.shape[-1]]
    #pred_time=pred_time[0]
    #pred_time=pred_time[0].detach().cpu().numpy()
    return pred_time

def apply_bwe_model(x): 
    x_init=x

    #if args.bwe.add_noise.add_noise:
    n=args.bwe["add_noise"]["power"]*torch.randn(x.shape)
    print("adding noise")
    x=x+n.to(device) #not tested, need to tune the noise power
    
    
    
    xF =utils.do_stft(x,win_size=args.stft["win_size"], hop_size=args.stft["hop_size"], device=device)

    with torch.no_grad():
        y_gF = gener_model(xF)
    
    y_g=utils.do_istft(y_gF, args.stft["win_size"], args.stft["hop_size"], device)
    y_g=y_g[:,0:x.shape[-1]]
    y_g=y_g.unsqueeze(1)


    pred_time=y_g.squeeze(1)
    pred_time=pred_time[0].detach().cpu().numpy()
    return pred_time







CUDA?? True


In [28]:
def process_audio(audio, use_denoiser=True, use_bwe=True):
    try:
        data, samplerate = sf.read(audio)
    except:
        print("reading relative path")
        data, samplerate = sf.read(audio)

    #Stereo to mono
    if len(data.shape)>1:
        data=np.mean(data,axis=1)

    if samplerate!=22050: 
        print("Resampling")

        data=scipy.signal.resample(data, int((22050  / samplerate )*len(data))+1)  


    segment_size=22050*5  #5s segment

    length_data=len(data)
    overlapsize=1024 #samples (46 ms)
    window=np.hanning(2*overlapsize)
    window_right=window[overlapsize::]
    window_left=window[0:overlapsize]
    audio_finished=False
    pointer=0
    denoised_data=np.zeros(shape=(len(data),))
    bwe_data=np.zeros(shape=(len(data),))
    numchunks=int(np.ceil(length_data/segment_size))

      
    for i in tqdm(range(numchunks)):
        if pointer+segment_size<length_data:
            segment=data[pointer:pointer+segment_size]
            #dostft
            segment = torch.from_numpy(segment)
            segment=segment.type(torch.FloatTensor)
            segment=segment.to(device)
            segment=torch.unsqueeze(segment,0)
            print("segment1",segment.shape)
            if use_denoiser:
                denoised_time=apply_denoiser_model(segment)
                segment=denoised_time
                denoised_time=denoised_time[0].detach().cpu().numpy()
                #just concatenating with a little bit of OLA
                if pointer==0:
                    denoised_time=np.concatenate((denoised_time[0:int(segment_size-overlapsize)], np.multiply(denoised_time[int(segment_size-overlapsize):segment_size],window_right)), axis=0)
                else:
                    denoised_time=np.concatenate((np.multiply(denoised_time[0:int(overlapsize)], window_left), denoised_time[int(overlapsize):int(segment_size-overlapsize)], np.multiply(denoised_time[int(segment_size-overlapsize):int(segment_size)],window_right)), axis=0)
                denoised_data[pointer:pointer+segment_size]=denoised_data[pointer:pointer+segment_size]+denoised_time
            print("segment_denoised",segment.shape)
            if use_bwe:
                pred_time =apply_bwe_model(segment)
                print("pred_time",pred_time.shape)
                
                if pointer==0:
                    pred_time=np.concatenate((pred_time[0:int(segment_size-overlapsize)], np.multiply(pred_time[int(segment_size-overlapsize):segment_size],window_right)), axis=0)
                else:
                    pred_time=np.concatenate((np.multiply(pred_time[0:int(overlapsize)], window_left), pred_time[int(overlapsize):int(segment_size-overlapsize)], np.multiply(pred_time[int(segment_size-overlapsize):int(segment_size)],window_right)), axis=0)
                    
                bwe_data[pointer:pointer+segment_size]=bwe_data[pointer:pointer+segment_size]+pred_time
            
            pointer=pointer+segment_size-overlapsize
        else: 
            segment=data[pointer::]

            lensegment=len(segment)
            segment=np.concatenate((segment, np.zeros(shape=(int(segment_size-len(segment)),))), axis=0)

            audio_finished=True
            #dostft
            segment = torch.from_numpy(segment)
            segment=segment.type(torch.FloatTensor)
            segment=segment.to(device)
            segment=torch.unsqueeze(segment,0)
            if use_denoiser:
                denoised_time=apply_denoiser_model(segment)
                segment=denoised_time
                denoised_time=denoised_time[0].detach().cpu().numpy()
                if pointer!=0:
                    denoised_time=np.concatenate((np.multiply(denoised_time[0:int(overlapsize)], window_left), denoised_time[int(overlapsize):int(segment_size)]),axis=0)
                denoised_data[pointer::]=denoised_data[pointer::]+denoised_time[0:lensegment]

            if use_bwe:
                pred_time =apply_bwe_model(segment)
                
                if pointer!=0:
                    pred_time=np.concatenate((np.multiply(pred_time[0:int(overlapsize)], window_left), pred_time[int(overlapsize):int(segment_size)]),axis=0)
                
                bwe_data[pointer::]=bwe_data[pointer::]+pred_time[0:lensegment]
    return denoised_data, bwe_data

In [24]:
#@title #Upload file to denoise
#@markdown not implemented yet, sorry :(
##@markdown Execute this cell to upload a single audio recording you would like to denoise (accepted extensions: .wav, .flac, .mp3)
from google.colab import files
uploaded=files.upload()

Saving 6_original.wav to 6_original (1).wav


In [32]:
#Please select your preferences

use_denoiser=True #@param {type:"boolean"} 
use_bwe=True #@param {type:"boolean"} 

mode="piano" #@param ["piano", "strings", "orchestra"]


if mode=="orchestra":
    checkpoint_filepath = os.path.join('/content/bwe_historical_recordings','experiments_bwe/orchestra/checkpoint_orchestra')

    gener_model.load_state_dict(torch.load(checkpoint_filepath, map_location=device))
elif mode=="piano":
    checkpoint_filepath = os.path.join('/content/bwe_historical_recordings','experiments_bwe/piano/checkpoint_piano')
    gener_model.load_state_dict(torch.load(checkpoint_filepath, map_location=device))

elif mode=="strings":
    checkpoint_filepath = os.path.join('/content/bwe_historical_recordings','experiments_bwe/strings/checkpoint_strings')
    gener_model.load_state_dict(torch.load(checkpoint_filepath, map_location=device))


In [33]:
#@title #Enhance

#@markdown Execute this cell to denoise the uploaded file. Modify it to ad the path to you audio file
#add here your audio file
#fn="audio_examples/1st_Movement-Allegro_mod_-_PHILADELPHIA_SYMPHONY_ORCHESTRA_noisy_input.wav"
fn="audio_examples/HUNGARIAN_RHAPSODY_No._8_-_MARK_HAMBOURG_noisy_input.wav"
print('Processing uploaded file "{name}"'.format(
    name=fn))
denoise_data, bwe_data=process_audio(fn, use_bwe=use_bwe, use_denoiser=use_denoiser)
basename=os.path.splitext(fn)[0]
wav_output_name=basename+"_denoised"+".wav"
sf.write(wav_output_name, denoise_data, 22050)
wav_output_name=basename+"_bwe"+".wav"
sf.write(wav_output_name, bwe_data, 22050)

Processing uploaded file "6_original.wav"


  0%|          | 0/3 [00:00<?, ?it/s]

segment1 torch.Size([1, 110250])
torch.Size([1, 513, 431])
segment_denoised torch.Size([1, 110250])
adding noise
torch.Size([1, 513, 431])


 33%|███▎      | 1/3 [00:00<00:01,  1.87it/s]

pred_time (110250,)
segment1 torch.Size([1, 110250])
torch.Size([1, 513, 431])
segment_denoised torch.Size([1, 110250])
adding noise
torch.Size([1, 513, 431])
pred_time (110250,)


 67%|██████▋   | 2/3 [00:01<00:00,  2.00it/s]

torch.Size([1, 513, 431])
adding noise


100%|██████████| 3/3 [00:01<00:00,  2.01it/s]

torch.Size([1, 513, 431])





In [34]:
#@title #Download

#@markdown Execute this cell to download the enhanced recording
files.download(wav_output_name)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>