# Simple Application

This notebook is created as an application for using the trained model on new audio and give a classification based on the 41 different classes:

 > 'Hi-hat' 'Saxophone' 'Trumpet' 'Glockenspiel' 'Cello' 'Knock'
 'Gunshot_or_gunfire' 'Clarinet' 'Computer_keyboard' 'Keys_jangling'
 'Snare_drum' 'Writing' 'Laughter' 'Tearing' 'Fart' 'Oboe' 'Flute' 'Cough'
 'Telephone' 'Bark' 'Chime' 'Bass_drum' 'Bus' 'Squeak' 'Scissors'
 'Harmonica' 'Gong' 'Microwave_oven' 'Burping_or_eructation' 'Double_bass'
 'Shatter' 'Fireworks' 'Tambourine' 'Cowbell' 'Electric_piano' 'Meow'
 'Drawer_open_or_close' 'Applause' 'Acoustic_guitar' 'Violin_or_fiddle'
 'Finger_snapping'
 
The application is built using ipywidgets and voilà. Ipywidgets makes it easy to make buttons and display the audio and results. Voilà removes everything but the widgets so that it looks clean and all code is hidden.

# Imports & setup

In [94]:
import librosa
import librosa.display
from IPython.display import Audio
import pathlib
from pathlib import Path
import os
from fastai.vision.all import *
from fastai.vision.widgets import *
#import torchvision

In [None]:
# Supress warnings that appear when uploading and classifying audio.

# Add recorder inside the notebook

In [95]:
# !sudo apt install ffmpeg
# !pip install torchaudio ipywebrtc
# !jupyter nbextension enable --py widgetsnbextension

In [96]:
from ipywebrtc import AudioRecorder, CameraStream

In [97]:
# The paths to where the recordings are being saved
data_path = Path('../../data/')
data_path.mkdir(exist_ok=True)
audio_path = data_path/'audio'
audio_path.mkdir(exist_ok=True)

In [130]:
# Creating the recorder to put in a widget later
camera = CameraStream(constraints={'audio': True,'video':False})
recorder = AudioRecorder(stream=camera)

# Load model

The path is just set to be the path used in the github repo. Change it to where you have the model located.
```
├───models
│   └─── model_V2.pkl
└───nbs
    ├─── Application.ipynb
    └───.ipynb_checkpoints
```

In [None]:
path = Path('../')
model_path = path/'models'

try:
    learn_inf = load_learner(model_path/'model_V2.pkl')
except:
    posix_backup = pathlib.PosixPath
    pathlib.PosixPath = pathlib.WindowsPath
    learn_inf = load_learner(model_path/'model_V2.pkl')
    pathlib.PosixPath = posix_backup
    

# Creating widgets

In [117]:
btn_upload = widgets.FileUpload(accept=".wav,.mp3")
out_rec = widgets.Output()
out_pl = widgets.Output()
lbl_pred = widgets.Label()
lbl_pred.value = 'Please select audio'
btn_run = widgets.Button(description='Classify recorded')

In [118]:
out_rec.clear_output()
with out_rec: display(recorder)

# Converting audio to images

This is the same function for converting audio to mel spectrograms as used in training the model, except for a few tweaks to make it work with the audio file from the uploader.

In [119]:
def log_mel_spec_tfm(audio, dst_path = path/'../data/imgs/uploaded'):
    data, sample_rate = audio
    
    n_fft = 1024
    hop_length = 512
    n_mels = 80
    fmin = 20
    fmax = sample_rate / 2 
    
    mel_spec_power = librosa.feature.melspectrogram(data, sr=sample_rate, n_fft=n_fft, 
                                                    hop_length=hop_length, 
                                                    n_mels=n_mels, power=2.0, 
                                                    fmin=fmin, fmax=fmax)
    
    #mel_spec_power = librosa.feature.melspectrogram(x, sr=sample_rate)
    
    mel_spec_db = librosa.power_to_db(mel_spec_power, ref=np.max)
    
    
    dst_path.mkdir(exist_ok=True)
    try:
        fname = list(btn_upload.value)[0]
    except:
        fname = 'file.wav'
    
    plt.imsave(dst_path / (fname[:-4] + '.png'), mel_spec_db)
    
    return dst_path / (fname[:-4] + '.png')

# Use the model on uploaded data

When an audiofile is uploaded this function will run which makes a prediction on the image representation of the audio.

In [120]:
def on_click(change):
    out_pl.clear_output()
    with out_pl: display(Audio(btn_upload.data[-1]))
    dst_path = path/'../data/imgs/uploaded'
    audio = librosa.load(io.BytesIO(btn_upload.data[-1]))
    audio_img = log_mel_spec_tfm(audio, dst_path)
    pred,pred_idx,probs = learn_inf.predict(audio_img)
    lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'

on_click_live is almost the same as on_click. This function is called when clicking the clasify button and classifies the sound recording that has been recording using the built in recorder.

In [121]:
def on_click_live(change):
    out_pl.clear_output()
    update_live_audio()
    audio = librosa.load(audio_path/'file.wav')
    x, sr = audio
    with out_pl: display(Audio(data=x, rate=sr))
    dst_path = path/'../data/imgs/uploaded'
    audio_img = log_mel_spec_tfm(audio, dst_path)
    pred,pred_idx,probs = learn_inf.predict(audio_img)
    lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'

In [122]:
# Saves the audio file from the recorder as wav and puts it in a player
def update_live_audio():
    with open(audio_path/'recording.webm', 'wb') as f:
        f.write(recorder.audio.value)
    !ffmpeg -i ../../data/audio/recording.webm -ac 1 -f wav ../../data/audio/file.wav -y -hide_banner -loglevel panic
    sig, sr = librosa.load(audio_path/'file.wav')
    print(sig.shape)
    Audio(data=sig, rate=sr)

In [123]:
btn_upload.observe(on_click, names=['data'])

In [124]:
btn_run.on_click(on_click_live)

The VBox gathers all the widgets specified and is what makes it look nice and tidy.

If you upload multiple audio files, it will only use the last file uploaded to create a prediction.

In [126]:
widgets.VBox([widgets.Label('Select your audio by using "upload" or record by tapping the record button followed by "Classife recorded"'),
      out_rec, btn_run, btn_upload, out_pl, lbl_pred])

VBox(children=(Label(value='Select your audio by using "upload" or record by tapping the record button followe…

(63504,)


 -0.01415384] as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel_spec_power = librosa.feature.melspectrogram(data, sr=sample_rate, n_fft=n_fft,


  1.01241698e-04  1.01913436e-04] as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel_spec_power = librosa.feature.melspectrogram(data, sr=sample_rate, n_fft=n_fft,


In [28]:
#!pip install voila
#!jupyter serverextension enable voila --sys-prefix