### Gradio and HuggingFace

In this demo, we show how to build ready to deploy or use deep learning models. 

Hugging Face hosts thousands of pre-trained models in [Model Hub](https://huggingface.co/models). They also built high-level APIs so we can easily use and deploy pre-trained models using [Pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines). 

`gradio` provides APIs so we can easily build web applications that use our pre-trained models from Hugging Face. `gradio` also provides APIs so we can easily incorporate input and output web UIs.

After building the `gradio`, we can have permanent hosting using [Hugging Face Spaces](https://huggingface.co/blog/gradio-spaces). 

Let us first install Hugging Face `transformers` and `gradio`.

**Note:** For some examples, it is best to launch the app in another tab to enable access to the required inputs such as microphone or webcam. Running the app may also lock the `python` kernel and the notebook becomes unresponsive. In that case, please restart the kernel and clear the output. 

In [None]:
!pip install transformers --upgrade
!pip install gradio --upgrade
!pip install torch torchvision torchaudio --upgrade
!pip install torchao --upgrade

#### Hello world in gradio

As a tradition, let us build the simplest `gradio` app. It accepts a `text` input and calls the `greet()` function to process this input and convert into another text. The output of `greet()` becomes the output of the `gradio` app.

To see our application, we call `launch()` after constructing our `gradio` `Interface`.

In [None]:
import gradio as gr

def greet(name):
    return "Hello " + name + "!!"

gr.Interface(fn=greet, inputs="text", outputs="text").launch()

#### Object Recognition using ResNet18

In our discussion about PyTorch, we used a pre-trained ResNet18 model from `torchvision`. We use `jupyter` notebook to show the results. The `jupyter` notebook is not an application that we can deploy and other people use with ease. The same with Google's colab. 

In this example, we use `gradio` to build a simple app that an end user can easily interact with. 

In [None]:
import gradio as gr
import torch 
import torchvision
import torchvision.transforms as transforms
import requests
from torchvision.models import resnet18, ResNet18_Weights


normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

resnet = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
resnet.eval()

# Download human-readable labels for ImageNet.
response = requests.get("https://git.io/JJkYN")
labels = response.text.split("\n")

def classify(img):
    # By default, gradio image is numpy
    img = torch.from_numpy(img)
    # Numpy image is channel last. PyTorch is channel 1st.
    img = img.permute(2, 0, 1)
    
    # The transforms before prediction
    img = torchvision.transforms.Resize(256, antialias=True)(img)
    img = torchvision.transforms.CenterCrop(224)(img).float()/255.
    img = normalize(img)
    
    # We insert batch size of 1
    img = img.unsqueeze(0)
    
    # The actual prediction
    with torch.no_grad():
        pred = resnet(img)
    
    # Convert the prediction to probabilities
    pred = torch.nn.functional.softmax(pred, dim=1)
    # Remove the batch dim. torch.squeeze() can also be used.
    pred = pred.squeeze()
    
    # torch to numpy space
    pred = pred.cpu().numpy()
    
    return {labels[i]: float(pred[i]) for i in range(1000)}
    

gr.Interface(fn=classify, 
             inputs="image",
             outputs=gr.Label(num_top_classes=5),
             title="1k Object Recognition",
             examples=['assets/wonder_cat.jpg', 'assets/aki_dog.jpg','assets/birdie1.jpg'],
             description="Demonstrates a pre-trained model from torchvision for image classification.",
             allow_flagging="never").launch(inbrowser=False)

#### Using HuggingFace and Gradio

Loading a pre-trained model from torchvision, pre-processing the input, and post processing the output are all messy. Sometimes, we just want to load and use a machine learning model. Hugging Face provides a shortcut for all these steps through the use of `pipeline`. In `pipeline`, we supply the task name and the pre-trained model that is stored in Hugging Face Model Hub.

In this example, we use a much better model compared to ResNet18. It is called [BEIT](https://arxiv.org/abs/2106.08254) and can classify objects up to about 22k categories. We construct the `gradio` app by calling `from_pipeline()`.

In [None]:
import gradio as gr
from transformers import pipeline

pipe = pipeline(task="image-classification", 
                 # model that can do 22k-category classification
                 model="microsoft/beit-base-patch16-224-pt22k-ft22k",
                 device=0)
gr.Interface.from_pipeline(pipe, 
                           title="22k Image Classification",
                           description="Object Recognition using Microsoft BEIT",
                           examples = ['assets/wonder_cat.jpg', 'assets/aki_dog.jpg','assets/birdie1.jpg'],
                           allow_flagging="never").launch()

#### Automatic Speech Recognition (ASR)

Let us shift to audio or speech domain. In this example, we demonstrate an Automatic Speech Recognition (ASR) system. We will use our microphone to record audio which is then converted to text using ASR. In this example, best to open the application in another browser tab by setting `inbrowser=True`.

Before running the `gradio` app, this ASR requires `sentencepice` module. Let us install it first.

In [None]:
!pip install sentencepiece --upgrade

In this ASR, we use OpenAI Whisper.

In [None]:
import gradio as gr
from transformers import pipeline

pipe = pipeline(task="automatic-speech-recognition", 
                model="openai/whisper-tiny")
gr.Interface.from_pipeline(pipe,
                           title="Automatic Speech Recognition (ASR)",
                           description="Using pipeline with OpenAI Whisper",
                           examples=['assets/ljspeech.wav',],
                           ).launch(inbrowser=True)

#### Text to Speech (TTS)

Let us do the reverse of ASR or Text to Speech (TTS). In this example, we supply text and this text is converted to speech using the voice of Linda Johnson. We use a pre-trained model of [FastSpeech2](https://arxiv.org/abs/2006.04558) that is provided by Facebook in Model Hub.

In this example, we use [`load()`](https://gradio.app/docs/#load) method to load the pre-trained model. 

In [None]:
import gradio as gr
gr.load("huggingface/facebook/fastspeech2-en-ljspeech", 
        description="TTS using FastSpeech2",
        title="Text to Speech (TTS)",
        examples=[["The quick brown fox jumps over the lazy dog."]]
        ).launch(inbrowser=True)