Noisy Audio Gen
===========================

This was the next step after the VAE Output Transformer. With that having run, I now have samples of VAE + NSynth audio data. I will use this same pipeline for training and validating the final model. This, as part of that pipeline creates new audio samples mixing the VAE generated audio with noise.

In [163]:
!pip install pandas soundfile numpy librosa
import glob
import os
import sys
import stat
import json
import librosa as lr
import soundfile as sf
import numpy as np
import pandas as pd
from IPython.display import Audio

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [150]:
# we need an index of the class labels
class_labels = pd.read_csv('data/audioset/class_labels_indices.csv', index_col=0)
class_labels = class_labels.set_index('mid').to_dict()['display_name']

audioset = pd.read_csv('data/audioset/balanced_train_segments.csv')
def named_labels(labels):
    return set(class_labels[s.strip()] for s in labels.split(','))
    
audioset['length'] = audioset['end_seconds'] - audioset['start_seconds']
audioset['labels'] = audioset['positive_labels'].apply(named_labels)
audioset.drop(labels=['positive_labels'], axis=1, inplace=True)
audioset

Unnamed: 0,YTID,start_seconds,end_seconds,length,labels
0,--PJHxphWEs,30.0,40.0,10.0,"{Gush, Speech}"
1,--ZhevVpy1s,50.0,60.0,10.0,{Toothbrush}
2,--aE2O5G5WE,0.0,10.0,10.0,"{Music, Speech, Goat}"
3,--aO5cdqSAg,30.0,40.0,10.0,"{Male singing, Child singing}"
4,--aaILOrkII,200.0,210.0,10.0,"{Gunshot, gunfire, Cap gun}"
...,...,...,...,...,...
22155,zyqg4pYEioQ,20.0,30.0,10.0,"{Speech, Sewing machine}"
22156,zz0ddNfz0h0,30.0,40.0,10.0,"{Car, Motor vehicle (road), Ice cream truck, i..."
22157,zz8TGV83nkE,80.0,90.0,10.0,"{Engine, Motorcycle, Motor vehicle (road), Veh..."
22158,zzlK8KDqlr0,370.0,380.0,10.0,"{Computer keyboard, Clicking, Inside, small ro..."


# Pick out noise

I looked at the AudioSet website for types of Noise: 
https://research.google.com/audioset/ontology/noise_1.html

And decided the ones of interest are Hubbub, speech noise, speech babble and White Noise. So now we're going to filter the audioset data to only these

In [157]:
def is_noise(labels):
    return 'Hubbub, speech noise, speech babble' in labels or 'White noise' in labels or 'Vibration' in labels

audioset = audioset[audioset['labels'].apply(is_noise)]
len(audioset)

177

In [158]:
# This reduces the dataset to only 177 items
audioset

Unnamed: 0,YTID,start_seconds,end_seconds,length,labels
121,-ETSfElMyNc,220.0,230.0,10.0,{White noise}
293,-dEOa2GkXHw,30.0,40.0,10.0,"{Speech, White noise}"
611,0MJPqGKIbZg,20.0,30.0,10.0,"{Outside, rural or natural, Speech, Hubbub, sp..."
617,0NSzeHaja5o,30.0,40.0,10.0,"{Speech, Chatter, Hubbub, speech noise, speech..."
779,0fm0oU8FO0U,30.0,40.0,10.0,"{Helicopter, Vehicle, White noise}"
...,...,...,...,...,...
21413,xXjmPTooKvs,30.0,40.0,10.0,"{Vehicle, Engine, Engine starting, Motor vehic..."
21664,yMdOxfnxkB0,10.0,20.0,10.0,"{Music, Outside, urban or manmade, Speech, Hub..."
21773,yiSeCcyJuxE,10.0,20.0,10.0,"{Music, White noise}"
21915,zIM4eLqtczE,0.0,10.0,10.0,"{Speech, Hubbub, speech noise, speech babble}"


In [221]:
# we're going to need a youtube downloader to grab the samples. At only 177, I think I won't get banned.
# But I'll use a VPN just to be sure
os.makedirs('tools', exist_ok=True)

# I'm running linux so I can just use the curl command
if not os.path.exists('tools/youtube-dl'):
    !curl -L https://yt-dl.org/downloads/latest/youtube-dl -o tools/youtube-dl

# Now we'll get to downloading the data
os.makedirs('data/audioset/audio', exist_ok=True)
filenames = []
for i, row in audioset.iterrows():
    audio_filename = f'data/audioset/audio/{row["YTID"]}.wav'
    
    # if we already have the file - skip
    if os.path.exists(audio_filename):
        filenames.append(audio_filename)
        continue
        
    full_audio_filename = f'data/audioset/audio/{row["YTID"]}-full.aac'
    if not os.path.exists(full_audio_filename):
        print(f'Downloading {row["YTID"]}...')
        download_url = f'https://youtube.com/watch?v={row["YTID"]}'
        self_exec = sys.executable
        !"{self_exec}" tools/youtube-dl "{download_url}" --quiet --no-playlist --extract-audio --audio-format aac --output "{full_audio_filename}"
        print(f'   - Done')
 
    !ffmpeg -loglevel fatal -i "{full_audio_filename}" -ar 16000 -ss "{row['start_seconds']}" -to "{row['end_seconds']}" "{audio_filename}"
    filenames.append(audio_filename)


Downloading 19v5EvjdToU...
[0;31mERROR:[0m Private video
Sign in if you've been granted access to this video
   - Done
Downloading 1XKKLkqdZ34...
   - Done
Downloading 2VwH-n3jjL0...
   - Done
Downloading 2rPv0UjHwR4...
[0;31mERROR:[0m Video unavailable
   - Done
Downloading 3KWSsnZSrq8...
[0;31mERROR:[0m Video unavailable
   - Done
Downloading 4kPu2QmfKaQ...
[0;31mERROR:[0m Private video
Sign in if you've been granted access to this video
   - Done
Downloading 6hb0PdtXhvA...
[0;31mERROR:[0m Video unavailable
This video is no longer available due to a copyright claim by 株式会社バップ
   - Done
Downloading 9PyWiDtI0dw...
[0;31mERROR:[0m Video unavailable
This video is no longer available because the YouTube account associated with this video has been terminated.
   - Done
Downloading Fb1Gm9IfOe4...
[0;31mERROR:[0m Video unavailable
   - Done
Downloading Jlh_dAyVBf4...
[0;31mERROR:[0m Private video
Sign in if you've been granted access to this video
   - Done
Downloading KRGULa

In [222]:
audioset = audioset.copy()
audioset['filename'] = filenames
audioset

Unnamed: 0,YTID,start_seconds,end_seconds,length,labels,filename
121,-ETSfElMyNc,220.0,230.0,10.0,{White noise},data/audioset/audio/-ETSfElMyNc.wav
293,-dEOa2GkXHw,30.0,40.0,10.0,"{Speech, White noise}",data/audioset/audio/-dEOa2GkXHw.wav
611,0MJPqGKIbZg,20.0,30.0,10.0,"{Outside, rural or natural, Speech, Hubbub, sp...",data/audioset/audio/0MJPqGKIbZg.wav
617,0NSzeHaja5o,30.0,40.0,10.0,"{Speech, Chatter, Hubbub, speech noise, speech...",data/audioset/audio/0NSzeHaja5o.wav
779,0fm0oU8FO0U,30.0,40.0,10.0,"{Helicopter, Vehicle, White noise}",data/audioset/audio/0fm0oU8FO0U.wav
...,...,...,...,...,...,...
21413,xXjmPTooKvs,30.0,40.0,10.0,"{Vehicle, Engine, Engine starting, Motor vehic...",data/audioset/audio/xXjmPTooKvs.wav
21664,yMdOxfnxkB0,10.0,20.0,10.0,"{Music, Outside, urban or manmade, Speech, Hub...",data/audioset/audio/yMdOxfnxkB0.wav
21773,yiSeCcyJuxE,10.0,20.0,10.0,"{Music, White noise}",data/audioset/audio/yiSeCcyJuxE.wav
21915,zIM4eLqtczE,0.0,10.0,10.0,"{Speech, Hubbub, speech noise, speech babble}",data/audioset/audio/zIM4eLqtczE.wav
