# Deployment

Goal: Fill out this notebook to import your trained model and build a Gradio interface.

After mocking up Gradio interface: Deploy your model to HuggingFace Spaces

In [1]:
!pip install gradio

Collecting gradio
  Downloading gradio-3.4.1-py3-none-any.whl (5.3 MB)
[K     |████████████████████████████████| 5.3 MB 4.1 MB/s eta 0:00:01
Collecting uvicorn
  Downloading uvicorn-0.18.3-py3-none-any.whl (57 kB)
[K     |████████████████████████████████| 57 kB 28.9 MB/s  eta 0:00:01
[?25hCollecting fsspec
  Downloading fsspec-2022.8.2-py3-none-any.whl (140 kB)
[K     |████████████████████████████████| 140 kB 116.3 MB/s eta 0:00:01
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting aiohttp
  Downloading aiohttp-3.8.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (948 kB)
[K     |████████████████████████████████| 948 kB 91.8 MB/s eta 0:00:01
[?25hCollecting orjson
  Downloading orjson-3.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (270 kB)
[K     |████████████████████████████████| 270 kB 102.5 MB/s eta 0:00:01
Collecting fastapi
  Downloading fastapi-0.85.0-py3-none-any.whl (55 kB)
[K     |████████████████████████████

  Building wheel for ffmpy (setup.py) ... [?25ldone
[?25h  Created wheel for ffmpy: filename=ffmpy-0.3.0-py3-none-any.whl size=4712 sha256=2336d524e75173c5818c4db9a4b784932aa3cd7a0debe9f01fe30b80b55da058
  Stored in directory: /root/.cache/pip/wheels/13/e4/6c/e8059816e86796a597c6e6b0d4c880630f51a1fcfa0befd5e6
  Building wheel for python-multipart (setup.py) ... [?25ldone
[?25h  Created wheel for python-multipart: filename=python_multipart-0.0.5-py3-none-any.whl size=31678 sha256=154a2a1a0e02fb7aa1356ab360dbbbc56ffd76d6b196a34705b2af61ea0f18fa
  Stored in directory: /root/.cache/pip/wheels/2c/41/7c/bfd1c180534ffdcc0972f78c5758f89881602175d48a8bcd2c
Successfully built ffmpy python-multipart
Installing collected packages: mdurl, uc-micro-py, rfc3986, multidict, markdown-it-py, h11, frozenlist, yarl, starlette, pynacl, mdit-py-plugins, linkify-it-py, httpcore, bcrypt, asynctest, async-timeout, aiosignal, websockets, uvicorn, python-multipart, pydub, pycryptodome, paramiko, orjson, http

In [2]:
# This file contains all the main external libs we'll use
from fastai.imports import * #used for fastai
from IPython import display #used to display media in notebook
import matplotlib.pyplot as plt #used to plot in notebook

from fastai.data.all import *
from fastai.data.external import *
from fastai.vision.all import *

import librosa;
import librosa.display;

import PIL as Pillow;
import gradio as gr;
import soundfile as sf;
import tempfile;

#### Load the learner from `.pkl` file. 

This will complain that you need some functions that aren't available in the namespace. Copy the necessary ones from the previous notebook.

In [3]:
# TODO: put the functions that are missing from the namespace here
# ....
# This code takes a single channel image (greyscale) and converts it into a 3-channel image (RGB)
# It also normalizes so that all values are between [0,255]
def mono_to_color(X, mean=None, std=None, norm_max=None, norm_min=None, eps=1e-6):
    # Stack X as [X,X,X]
    X = np.stack([X, X, X], axis=-1)

    # Standardize
    mean = mean or X.mean()
    std = std or X.std()
    Xstd = (X - mean) / (std + eps)
    _min, _max = Xstd.min(), Xstd.max()
    norm_max = norm_max or _max
    norm_min = norm_min or _min
    if (_max - _min) > eps:
        # Scale to [0, 255]
        V = Xstd
        V[V < norm_min] = norm_min
        V[V > norm_max] = norm_max
        V = 255 * (V - norm_min) / (norm_max - norm_min)
        V = V.astype(np.uint8)
    else:
        # Just zero
        V = np.zeros_like(Xstd, dtype=np.uint8)
    return V

# Lots of libraries and methods for generating a spectrogram
# Under the hood all these algorithms rely on a Fast Fourier Transform
# Originally, I wanted to use torchaudio because that is CUDA enabled and can be accelerated on GPU
# Unfortunately, the shape of the data returned simply did not make any sense and working with tensors
# as opposed to numpy arrays was incredibly annoying. It just didn't work.
# I tried other libraries as well but ultimately settled on librosa since it seems the most widely used
# There exist fast.ai packages like fastaudio and other forks but like many open-source things they are unmaintained
# Using those type of packages breaks the dependencies and on my environment forced a non-CUDA
# accelerated version of pytorch which is useless
def create_spectrogram(file_path):
    samples, sample_rate = librosa.core.load(file_path, sr=2000)
    
    # the parameters here are tunable and are hard-coded to what i've found works well for this dataset
    n_fft=256
    hop_length=32
    win_length=192
    
    # Compute spectrogram, using some sensible defaults
    # Opportunities here to tweak possibly, just not sure how much it would help
    # We don't use Nicholas' settings as for some reason they get bad results with this library
    D = librosa.stft(samples, n_fft=n_fft, hop_length=hop_length, win_length=win_length)
    
    # Normalize to decibals
    S_dB = librosa.amplitude_to_db(np.abs(D), ref=np.max)
    return S_dB, sample_rate

# This is an alternative type of spectrogram.
# My understanding is that it normalizes the spectrogram based on what humans can perceive which is
# Settings here are open to tweaking
def create_mel_spectrogram(file_path):
    samples, sample_rate = librosa.core.load(file_path, sr=2000)
    
    # the parameters here are tunable and are hard-coded to what i've found works well for this dataset
    n_fft=256
    hop_length=32
    win_length=192
    fmax = 1000 # cut off at 1000Hz
    
    # Compute mel spectrogram
    S = librosa.feature.melspectrogram(y=samples, sr=sample_rate, fmax=fmax, n_fft=n_fft, hop_length=hop_length, win_length=win_length, center=False)
    
    # Normalize to decibals
    S_dB = librosa.power_to_db(S, ref=np.max)
    return S_dB, sample_rate


# Converts a spectrogram (numpy) to a 3-channel (RGB) Fast.AI Image
def spectrogram_to_image(spec):
    
    # Most vision models in fast.ai use images with three channels (RGB)
    # spectrogram functions don't return images, they return data
    # plot libraries like matplotlib help us visualize the data as an image, but it is not an image
    # it is a multi-dimensional array-like object whose values can be positive or negative
    
    # We need to convert it from this format into a 3 channel (RGB) whose values are bounded between [0, 255]
    colored_np = mono_to_color(spec)
    
    # Pillow is a fork of PIL (standard Python image library), we consider Pillow.Image to be regular Python images
    # In order to use Pillow features like crop, we have to convert the image from numpy into PIL (Pillow)
    pillow_image = Pillow.Image.fromarray(colored_np) # convert to regular python image
    
    # When you manually convert a spectrogram to an image without using matplotlib you have to flip it vertically
    flipped_image = pillow_image.transpose(Image.FLIP_TOP_BOTTOM) # flip image
    
    # This will crop the image by taking from the height to make a square
    h, w, *other = pillow_image.shape
    cropped_image = flipped_image.crop((0, h-w, w, h))
    cropped_image_np = np.array(cropped_image) # back to numpy
        
    # Kinda confusing but fast.ai has a class called PILImage and so we convert our real PIL image into a fast.ai one
    fast_ai_image = PILImage.create(cropped_image_np)
    return fast_ai_image


# I chose to load the audio files directly into fast.ai using the DataBlock API.
# Alternatively, we could have pre-computed all the spectrograms in the 00_getting_started.ipynb but
# I decided against it because then it would be unlikely for anyone to actually make modifications to the images
# Writing 40,000 files to disk is painfully slow so you wouldn't even be able to get started quickly
# This method creates a transformer which can take paths to audio files and transform them into spectrogram images.
# If you do some research you will see lots of people doing pretty terrible things involving file.io because
# it is not easy to figure out how to turn a spectrogram into a 3-channel image and further how to get that
# into a fast.ai image. This solution is pretty clever because since it is all in-memory it is insanely fast and
# does not rely on any type of pre-computation.
class SpecgramTransform(Transform):
    def __init__(self): self.aug = create_spectrogram
    def encodes(self, audio_file: Path):
        aug_img, sample_rate = self.aug(audio_file)
        image = spectrogram_to_image(aug_img)
        return image

# Alternative transformer for generating images of Mel Spectrograms
class MelSpecgramTransform(Transform):
    def __init__(self): self.aug = create_mel_spectrogram
    def encodes(self, audio_file: Path):
        aug_img, sample_rate = self.aug(audio_file)
        image = spectrogram_to_image(aug_img)
        return image

    
# IGNORE: Unless you decide to use SpecgramTransform 
# We may need to crop images. If you use the SpecgramTransform, cropping will likely be required 
# to turn the image into a square.
# It could be reasonable to cut-off the image at frequencies we know a whale call couldn't exist
# Ultimately, images will need to be square I believe so they would need to get filled with something
# The reason I created this transform is because fast.ai doesn't give you a transformer for precision cropping
# You either crop and cut out the center or you don't crop at all
# This transformer lets you optionally crop from any direction and leaves sides alone that you don't specify crops for
class CropImageTransform(Transform):
    def __init__(self, left=None, upper=None, right=None, lower=None):
        print("test constructor")
        self.aug = self.__crop_image
        self.box = (left, upper, right, lower)
    
    def __compute_box(self, image):
        # get current dimensions of image
        # *other is because we don't know if we will receive two elements or more
        # we get two for a greyscale image, we get three for a RGB
        h, w, *other = image.shape
        
        #get desired crop entered by user
        left, upper, right, lower = self.box
        
        #don't crop sides that user didn't want cropped
        left = 0 if left is None else left
        upper = 0 if upper is None else upper
        right = w if right is None else right
        lower = h if lower is None else lower
        
        #save computed box
        self.box = (left, upper, right, lower)
        
    def __crop_image(self, image):
        print("test crop")

        self.__compute_box(image)
        print(self.box)
        image_cropped = image.crop(self.box)
        return image_cropped
        
    # this transformer works on PILImages (fast.ai)
    # this transfomer does not work on PIL.Image (Pillow/PIL)
    def encodes(self, image):
        print("encode")
        print(image)
        cropped_image = self.aug(image)
        return cropped_image

We have to define a prediction function for our model:

In [4]:
learn = load_learner('model.pkl')

In [5]:
labels = learn.dls.vocab

def predict(audio):
    # grab data from Gradio upload
    sample_rate, data = audio 
    
    # recall that our dataset loads paths to audio files first, not the files themselves
    # lets make a temporary (in-memory) file
    temp_file = tempfile.NamedTemporaryFile(suffix='.aiff')
    
    # use soundfile library to write to temp file
    sf.write(temp_file.name, data, sample_rate)
    
    # get our prediction results
    pred,pred_idx,probs = learn.predict(Path(temp_file.name))
    
    # close tempfile
    temp_file.close()
    
    # return prediction results
    return {labels[i]: float(probs[i]) for i in range(len(labels))}

In [6]:
title = "North Atlantic Right Whale Classifier"
description = "A NARW up-call classifier trained on the The Marinexplore and Cornell University Whale Detection Challenge dataset (Kaggle) with fastai."
article="<p style='text-align: center'><a href='https://www.kaggle.com/competitions/whale-detection-challenge' target='_blank'>Dataset</a></p>"
enable_queue=True

gr.Interface(fn=predict, inputs=gr.Audio(type="numpy"), outputs=gr.outputs.Label(num_top_classes=2),title=title,description=description,article=article,allow_flagging="never").launch(share=True)

  "Usage of gradio.outputs is deprecated, and will not be supported in the future, please import your components from gradio.components",


Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://20886.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces: https://huggingface.co/spaces


(<gradio.routes.App at 0x7fd56e313f10>,
 'http://127.0.0.1:7860/',
 'https://20886.gradio.app')

#### Your Job

Deploy to HuggingFace Spaces