<a href="https://colab.research.google.com/github/eloimoliner/bwe_historical_recordings/blob/main/colab/demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A two-stage U-Net for high-fidelity denoising of historical recordings

This notebook is a demo of the historical music denoising method proposed in:

> E. Moliner and V. Välimäki,, "A two-stage U-Net for high-fidelity denosing of historical recordings", submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Singapore, May, 2022

<p align="center">
<img src="https://user-images.githubusercontent.com/64018465/131505025-e4530f55-fe5d-4bf4-ae64-cc9a502e5874.png" alt="Schema represention"
width="400px"></p>

Listen to our [audio samples](http://research.spa.aalto.fi/publications/papers/icassp22-denoising/)

You can freely use it to denoise your own historical recordings.

### Instructions for running:

* Make sure to use a GPU runtime, click:  __Runtime >> Change Runtime Type >> GPU__
* Press ▶️ on the left of each of the cells
* View the code: Double-click any of the cells
* Hide the code: Double click the right side of the cell


In [2]:
!git clone https://github.com/eloimoliner/bwe_historical_recordings.git

Cloning into 'bwe_historical_recordings'...
remote: Enumerating objects: 139, done.[K
remote: Counting objects: 100% (139/139), done.[K
remote: Compressing objects: 100% (85/85), done.[K
remote: Total 139 (delta 77), reused 107 (delta 49), pack-reused 0[K
Receiving objects: 100% (139/139), 63.64 KiB | 2.77 MiB/s, done.
Resolving deltas: 100% (77/77), done.


In [5]:
%cd bwe_historical_recordings

/content/bwe_historical_recordings


In [7]:
!bash prepare_data.sh

--2022-02-07 17:13:25--  https://github.com/eloimoliner/bwe_historical_recordings/releases/download/v0.0-alpha/audio_examples.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/448304570/9414369f-d90e-4e18-9379-a7c5aab87836?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220207%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220207T171326Z&X-Amz-Expires=300&X-Amz-Signature=8bdf4d444f378e7ec222e545c60959a03a76b90b97ad4b2167ec21aca982fbb6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=448304570&response-content-disposition=attachment%3B%20filename%3Daudio_examples.zip&response-content-type=application%2Foctet-stream [following]
--2022-02-07 17:13:26--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/448304570/9414369f-d90e-

In [10]:
! pip install hydra

Collecting hydra
  Downloading Hydra-2.5.tar.gz (82 kB)
[?25l[K     |████                            | 10 kB 17.4 MB/s eta 0:00:01[K     |████████                        | 20 kB 6.3 MB/s eta 0:00:01[K     |████████████                    | 30 kB 6.5 MB/s eta 0:00:01[K     |████████████████                | 40 kB 3.0 MB/s eta 0:00:01[K     |███████████████████▉            | 51 kB 3.7 MB/s eta 0:00:01[K     |███████████████████████▉        | 61 kB 3.0 MB/s eta 0:00:01[K     |███████████████████████████▉    | 71 kB 2.7 MB/s eta 0:00:01[K     |███████████████████████████████▉| 81 kB 3.0 MB/s eta 0:00:01[K     |████████████████████████████████| 82 kB 474 kB/s 
[?25hBuilding wheels for collected packages: hydra
  Building wheel for hydra (setup.py) ... [?25l[?25hdone
  Created wheel for hydra: filename=Hydra-2.5-cp37-cp37m-linux_x86_64.whl size=220759 sha256=c80b745d92929475520352857001a32294f3eb1b8e83eadc6895cee6fa388ba7
  Stored in directory: /root/.cache/pip/wheels/4

In [29]:
import os
import hydra
import logging
import torch
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
print("CUDA??",torch.cuda.is_available())
import soundfile as sf
import datetime
import numpy as np
import scipy
from tqdm import tqdm

import utils.utils as utils 
import utils.lowpass_utils as lowpass_utils 
import  utils.dataset_loader as dataset_loader
import  utils.stft_loss as stft_loss
import models.discriminators as discriminators
import models.unet2d_generator as unet2d_generator
import models.audiounet as audiounet
import models.seanet as seanet
import models.denoiser as denoiser

import yaml
from pathlib import Path





device=torch.device("cuda" if torch.cuda.is_available() else "cpu")

args = yaml.safe_load(Path('conf/conf.yaml').read_text())
class dotdict(dict):
    """dot.notation access to dictionary attributes"""
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__
args=dotdict(args)
unet_args=dotdict(args.unet_generator)
args_denoiser=dotdict(args.denoiser)

gener_model = unet2d_generator.Unet2d(unet_args=unet_args).to(device)

#dirname = os.path.dirname(__file__)
checkpoint_filepath = os.path.join('/content/bwe_historical_recordings','experiments_bwe/orchestra/checkpoint_orchestra')

gener_model.load_state_dict(torch.load(checkpoint_filepath, map_location=device))
#print("something went wrong while loading the checkpoint")

checkpoint_filepath_denoiser=os.path.join('/content/bwe_historical_recordings','experiments_denoiser/pretrained_model/checkpoint_denoiser')
unet_model = denoiser.MultiStage_denoise(unet_args=args_denoiser)
unet_model.load_state_dict(torch.load(checkpoint_filepath_denoiser, map_location=device))
unet_model.to(device)



def apply_denoiser_model(segment):
    segment_TF=utils.do_stft(segment,win_size=args.stft["win_size"], hop_size=args.stft["hop_size"], device=device)
    #segment_TF_ds=tf.data.Dataset.from_tensors(segment_TF)
    with torch.no_grad():
        pred = unet_model(segment_TF)
    if args_denoiser.num_stages>1:
        pred=pred[0]

    pred_time=utils.do_istft(pred, args.stft["win_size"], args.stft["hop_size"],device)
    #pred_time=pred_time[0]
    #pred_time=pred_time[0].detach().cpu().numpy()
    return pred_time

def apply_bwe_model(x): 
    x_init=x

    if args.bwe.add_noise.add_noise:
        n=args.bwe.add_noise.power*torch.randn(x.shape)
        print("adding noise")
        x=x+n.to(device) #not tested, need to tune the noise power
    
    
    if args.bwe.generator.variant=="unet2d":
        xF =utils.do_stft(x,win_size=args.stft["win_size"], hop_size=args.stft["hop_size"], device=device)
    
        with torch.no_grad():
            y_gF = gener_model(xF)
        
        y_g=utils.do_istft(y_gF, args.stft.win_size, args.stft.hop_size, device)
        y_g=y_g[:,0:x.shape[-1]]
        y_g=y_g.unsqueeze(1)
    else:
        with torch.no_grad():
            y_g = gener_model(x)

    pred_time=y_g.squeeze(1)
    pred_time=pred_time[0].detach().cpu().numpy()
    return pred_time







CUDA?? True


In [30]:
def process_audio(audio, use_denoiser=True, use_bwe=True):
    try:
        data, samplerate = sf.read(audio)
    except:
        print("reading relative path")
        data, samplerate = sf.read(audio)

    #Stereo to mono
    if len(data.shape)>1:
        data=np.mean(data,axis=1)

    if samplerate!=22050: 
        print("Resampling")

        data=scipy.signal.resample(data, int((22050  / samplerate )*len(data))+1)  


    segment_size=22050*5  #5s segment

    length_data=len(data)
    overlapsize=1024 #samples (46 ms)
    window=np.hanning(2*overlapsize)
    window_right=window[overlapsize::]
    window_left=window[0:overlapsize]
    audio_finished=False
    pointer=0
    denoised_data=np.zeros(shape=(len(data),))
    bwe_data=np.zeros(shape=(len(data),))
    numchunks=int(np.ceil(length_data/segment_size))

      
    for i in tqdm(range(numchunks)):
        if pointer+segment_size<length_data:
            segment=data[pointer:pointer+segment_size]
            #dostft
            segment = torch.from_numpy(segment)
            segment=segment.type(torch.FloatTensor)
            segment=segment.to(device)
            segment=torch.unsqueeze(segment,0)

            if use_denoiser:
                denoised_time=apply_denoiser_model(segment)
                segment=denoised_time
                denoised_time=denoised_time[0].detach().cpu().numpy()
                #just concatenating with a little bit of OLA
                if pointer==0:
                    denoised_time=np.concatenate((denoised_time[0:int(segment_size-overlapsize)], np.multiply(denoised_time[int(segment_size-overlapsize):segment_size],window_right)), axis=0)
                else:
                    denoised_time=np.concatenate((np.multiply(denoised_time[0:int(overlapsize)], window_left), denoised_time[int(overlapsize):int(segment_size-overlapsize)], np.multiply(denoised_time[int(segment_size-overlapsize):int(segment_size)],window_right)), axis=0)
                denoised_data[pointer:pointer+segment_size]=denoised_data[pointer:pointer+segment_size]+denoised_time

            if use_bwe:
                pred_time =apply_bwe_model(segment)
                
                if pointer==0:
                    pred_time=np.concatenate((pred_time[0:int(segment_size-overlapsize)], np.multiply(pred_time[int(segment_size-overlapsize):segment_size],window_right)), axis=0)
                else:
                    pred_time=np.concatenate((np.multiply(pred_time[0:int(overlapsize)], window_left), pred_time[int(overlapsize):int(segment_size-overlapsize)], np.multiply(pred_time[int(segment_size-overlapsize):int(segment_size)],window_right)), axis=0)
                    
                bwe_data[pointer:pointer+segment_size]=bwe_data[pointer:pointer+segment_size]+pred_time

            pointer=pointer+segment_size-overlapsize
        else: 
            segment=data[pointer::]

            lensegment=len(segment)
            segment=np.concatenate((segment, np.zeros(shape=(int(segment_size-len(segment)),))), axis=0)

            audio_finished=True
            #dostft
            segment = torch.from_numpy(segment)
            segment=segment.type(torch.FloatTensor)
            segment=segment.to(device)
            segment=torch.unsqueeze(segment,0)
            if use_denoiser:
                denoised_time=apply_denoiser_model(segment)
                segment=denoised_time
                denoised_time=denoised_time[0].detach().cpu().numpy()
                if pointer!=0:
                    denoised_time=np.concatenate((np.multiply(denoised_time[0:int(overlapsize)], window_left), denoised_time[int(overlapsize):int(segment_size)]),axis=0)
                denoised_data[pointer::]=denoised_data[pointer::]+denoised_time[0:lensegment]

            if use_bwe:
                pred_time =apply_bwe_model(segment)
                
                if pointer!=0:
                    pred_time=np.concatenate((np.multiply(pred_time[0:int(overlapsize)], window_left), pred_time[int(overlapsize):int(segment_size)]),axis=0)
                
                bwe_data[pointer::]=bwe_data[pointer::]+pred_time[0:lensegment]
    return denoised_data, bwe_data

In [31]:
#@title #Install and Import

#@markdown Execute this cell to install the required data and dependencies. This step might take some time.

#download the files
! git clone https://github.com/eloimoliner/denoising-historical-recordings.git
! wget https://github.com/eloimoliner/denoising-historical-recordings/releases/download/v0.0/checkpoint.zip
! unzip checkpoint.zip -d denoising-historical-recordings/experiments/trained_model/

%cd denoising-historical-recordings

#install dependencies
! pip install hydra-core==0.11.3

#All the code goes here
import unet
import tensorflow as tf
import soundfile as sf
import numpy as np
from tqdm import tqdm
import scipy.signal
import hydra
import os
#workaround to load hydra conf file
import yaml
from pathlib import Path
args = yaml.safe_load(Path('conf/conf.yaml').read_text())
class dotdict(dict):
    """dot.notation access to dictionary attributes"""
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__
args=dotdict(args)
unet_args=dotdict(args.unet)

path_experiment=str(args.path_experiment)

unet_model = unet.build_model_denoise(unet_args=unet_args)

ckpt=os.path.join("/content/denoising-historical-recordings",path_experiment, 'checkpoint')
unet_model.load_weights(ckpt)

def do_stft(noisy):
        
    window_fn = tf.signal.hamming_window

    win_size=args.stft["win_size"]
    hop_size=args.stft["hop_size"]

    
    stft_signal_noisy=tf.signal.stft(noisy,frame_length=win_size, window_fn=window_fn, frame_step=hop_size, pad_end=True)
    stft_noisy_stacked=tf.stack( values=[tf.math.real(stft_signal_noisy), tf.math.imag(stft_signal_noisy)], axis=-1)

    return stft_noisy_stacked

def do_istft(data):
    
    window_fn = tf.signal.hamming_window

    win_size=args.stft["win_size"]
    hop_size=args.stft["hop_size"]

    inv_window_fn=tf.signal.inverse_stft_window_fn(hop_size, forward_window_fn=window_fn)

    pred_cpx=data[...,0] + 1j * data[...,1]
    pred_time=tf.signal.inverse_stft(pred_cpx, win_size, hop_size, window_fn=inv_window_fn)
    return pred_time

def denoise_audio(audio):

    data, samplerate = sf.read(audio)
    print(data.dtype)
    #Stereo to mono
    if len(data.shape)>1:
        data=np.mean(data,axis=1)
    
    if samplerate!=44100: 
        print("Resampling")
   
        data=scipy.signal.resample(data, int((44100  / samplerate )*len(data))+1)  
 
    
    
    segment_size=44100*5  #20s segments

    length_data=len(data)
    overlapsize=2048 #samples (46 ms)
    window=np.hanning(2*overlapsize)
    window_right=window[overlapsize::]
    window_left=window[0:overlapsize]
    audio_finished=False
    pointer=0
    denoised_data=np.zeros(shape=(len(data),))
    residual_noise=np.zeros(shape=(len(data),))
    numchunks=int(np.ceil(length_data/segment_size))
     
    for i in tqdm(range(numchunks)):
        if pointer+segment_size<length_data:
            segment=data[pointer:pointer+segment_size]
            #dostft
            segment_TF=do_stft(segment)
            segment_TF_ds=tf.data.Dataset.from_tensors(segment_TF)
            pred = unet_model.predict(segment_TF_ds.batch(1))
            pred=pred[0]
            residual=segment_TF-pred[0]
            residual=np.array(residual)
            pred_time=do_istft(pred[0])
            residual_time=do_istft(residual)
            residual_time=np.array(residual_time)

            if pointer==0:
                pred_time=np.concatenate((pred_time[0:int(segment_size-overlapsize)], np.multiply(pred_time[int(segment_size-overlapsize):segment_size],window_right)), axis=0)
                residual_time=np.concatenate((residual_time[0:int(segment_size-overlapsize)], np.multiply(residual_time[int(segment_size-overlapsize):segment_size],window_right)), axis=0)
            else:
                pred_time=np.concatenate((np.multiply(pred_time[0:int(overlapsize)], window_left), pred_time[int(overlapsize):int(segment_size-overlapsize)], np.multiply(pred_time[int(segment_size-overlapsize):int(segment_size)],window_right)), axis=0)
                residual_time=np.concatenate((np.multiply(residual_time[0:int(overlapsize)], window_left), residual_time[int(overlapsize):int(segment_size-overlapsize)], np.multiply(residual_time[int(segment_size-overlapsize):int(segment_size)],window_right)), axis=0)
                
            denoised_data[pointer:pointer+segment_size]=denoised_data[pointer:pointer+segment_size]+pred_time
            residual_noise[pointer:pointer+segment_size]=residual_noise[pointer:pointer+segment_size]+residual_time

            pointer=pointer+segment_size-overlapsize
        else: 
            segment=data[pointer::]
            lensegment=len(segment)
            segment=np.concatenate((segment, np.zeros(shape=(int(segment_size-len(segment)),))), axis=0)
            audio_finished=True
            #dostft
            segment_TF=do_stft(segment)

            segment_TF_ds=tf.data.Dataset.from_tensors(segment_TF)

            pred = unet_model.predict(segment_TF_ds.batch(1))
            pred=pred[0]
            residual=segment_TF-pred[0]
            residual=np.array(residual)
            pred_time=do_istft(pred[0])
            pred_time=np.array(pred_time)
            pred_time=pred_time[0:segment_size]
            residual_time=do_istft(residual)
            residual_time=np.array(residual_time)
            residual_time=residual_time[0:segment_size]
            if pointer==0:
                pred_time=pred_time
                residual_time=residual_time
            else:
                pred_time=np.concatenate((np.multiply(pred_time[0:int(overlapsize)], window_left), pred_time[int(overlapsize):int(segment_size)]),axis=0)
                residual_time=np.concatenate((np.multiply(residual_time[0:int(overlapsize)], window_left), residual_time[int(overlapsize):int(segment_size)]),axis=0)

            denoised_data[pointer::]=denoised_data[pointer::]+pred_time[0:lensegment]
            residual_noise[pointer::]=residual_noise[pointer::]+residual_time[0:lensegment]
    return denoised_data

Cloning into 'denoising-historical-recordings'...
remote: Enumerating objects: 219, done.[K
remote: Counting objects: 100% (219/219), done.[K
remote: Compressing objects: 100% (195/195), done.[K
remote: Total 219 (delta 86), reused 98 (delta 16), pack-reused 0[K
Receiving objects: 100% (219/219), 113.67 KiB | 2.58 MiB/s, done.
Resolving deltas: 100% (86/86), done.
--2022-02-07 17:42:20--  https://github.com/eloimoliner/denoising-historical-recordings/releases/download/v0.0/checkpoint.zip
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/401385223/354cec4e-d8be-4126-8b32-9e6509bca537?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220207%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220207T174221Z&X-Amz-Expires=300&X-Amz-Signature=d32c08b1cf9f67422d52ca9

NotFoundError: ignored

In [None]:
#@title #Upload file to denoise

#@markdown Execute this cell to upload a single audio recording you would like to denoise (accepted extensions: .wav, .flac, .mp3)
from google.colab import files
uploaded=files.upload()

Saving Carmen-Habanera_(Love_is_Like_a_Woo_-_Marguerite_D'Alvarez_noisy_input.wav to Carmen-Habanera_(Love_is_Like_a_Woo_-_Marguerite_D'Alvarez_noisy_input.wav


In [28]:
#@title #Denoise

#@markdown Execute this cell to denoise the uploaded file
fn="/audio_examples/1st_Movement-Allegro_mod_-_PHILADELPHIA_SYMPHONY_ORCHESTRA_noisy_input.wav"
print('Denoising uploaded file "{name}"'.format(
    name=fn))
denoise_data, bwe_data=process_audio(fn, use_bwe=True, use_denoiser=True )
basename=os.path.splitext(fn)[0]
wav_output_name=basename+"_denoised"+".wav"
sf.write(wav_output_name, denoise_data, 22050)
wav_output_name=basename+"_bwe"+".wav"
sf.write(wav_output_name, bwe_data, 22050)

Denoising uploaded file "/audio_examples/1st_Movement-Allegro_mod_-_PHILADELPHIA_SYMPHONY_ORCHESTRA_noisy_input.wav"
Resampling


  0%|          | 0/40 [00:00<?, ?it/s]


AttributeError: ignored

In [None]:
#@title #Denoise

#@markdown Execute this cell to denoise the uploaded file
for fn in uploaded.keys():
  print('Denoising uploaded file "{name}"'.format(
      name=fn))
  denoise_data=denoise_audio(fn)
  basename=os.path.splitext(fn)[0]
  wav_output_name=basename+"_denoised"+".wav"
  sf.write(wav_output_name, denoise_data, 44100)

Denoising uploaded file "Carmen-Habanera_(Love_is_Like_a_Woo_-_Marguerite_D'Alvarez_noisy_input.wav"
float64


100%|██████████| 41/41 [00:30<00:00,  1.34it/s]


In [None]:
#@title #Download

#@markdown Execute this cell to download the denoised recording
files.download(wav_output_name)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>