<a href="https://colab.research.google.com/github/jinnovation/uncommon-hacks-22-demo/blob/master/Uncommon_Hacks_'22_Bootstrapping_ML_Projects_with_Hugging_Face_and_Gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bootstrapping ML Projects with Hugging Face and Gradio
**Author**: [Jonathan Jin](https://jonathanj.in/)

*This notebook was given to attendees of Jonathan's talk at [Uncommon Hacks 2022](https://uncommonhacks.com/) at the University of Chicago as a "parting gift."*

## Overview

Machine learning, from a beginner's perspective, can feel impenetrable. Even with some experience designing and training models—knowledge that itself takes a fair amount of time to develop—the sheer amount of techniques, model architectures, machine learning frameworks, and tooling can make things feel less like a learning curve and more like a learning cliff.

Thankfully, ML as a field evolves quickly, and things have improved fairly significantly in the last couple of years. In particular, the advent of [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning), combined with the practice of [fine-tuning pre-trained models](https://huggingface.co/docs/transformers/training) for domains such as natural language processing (NLP), led to it becoming common practice for researchers to publish their models in **pre-trained form**—ready for use in your own applications, no expensive and time-consuming training and data collection required.

[Hugging Face](https://huggingface.co) (named after the 🤗 emoji) has emerged as one of the most prominent players in this space. Their [Model Hub](https://huggingface.co/models) is like a "GitHub for pre-trained models." Companies such as Google and Facebook/Meta often provide versions of their models, such as [Facebook's BART](https://huggingface.co/facebook/bart-large-mnli) or [Google's BERT](https://huggingface.co/bert-base-uncased?text=The+goal+of+life+is+%5BMASK%5D), ready for immediate use.

Combining the prevalent availability of pre-trained models with "applications-as-code" libraries such as [Gradio](https://gradio.app/) or [Streamlit](https://streamlit.io/) (Python's answers to R's [Shiny](https://shiny.rstudio.com)), we have a great foundation for **prototyping and playing around with machine learning in products**. This can allow you to use machine learning to build products/projects or prototype ideas, without needing to design, develop, or train your own model—and all of the associated overhead.

This notebook will briefly walk you through Hugging Face and Gradio. My hope is that this notebook will encourage you to play around with machine learning from a different angle—less "how can I design and train a model to solve this problem?" and more "how can I use machine learning in a product to solve this problem?"

Enjoy. 😃

## How to Use This Notebook

I'm assuming you've opened this notebook in Google Colaboratory. If so, you effectively have your own "copy" of this notebook, and you can run all code blocks in here directly—no risk of overwriting or clobbering anyone else.

Alternatively, this notebook is also accessible on GitHub at: [`jinnovation/uncommon-hacks-22-demo`](https://github.com/jinnovation/uncommon-hacks-22-demo). You are more than welcome to fork that repo and run the notebook yourself locally using a local [Jupyter](https://jupyter.org/install) instance.

## Getting Started

First we'll install our dependencies. Run the following command to install the following packages:
- Gradio, which we'll use to create "applications-as-code";
- [🤗 Transformers](https://huggingface.co/docs/transformers/index), which we'll use to download and use a pre-trained model in our application.


In [None]:
%pip install gradio transformers

*Side note: The `transformers` package is in reference to the [transformer](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) machine learning model that's emerged as "the new hotness" in natural language processing of late. Because in-depth knowledge of the transformers design and architecture is not necessary for this exercise ~and because I also have no idea how they work lol~ we won't be talking too much about this. Feel free to read up on your own time!*

## Intro to Gradio

I've mentioned Gradio a couple of times at this point, so let's talk a little about it.

[Gradio](https://gradio.app/) allows you to quickly build interactive web applications in Python. If you've taken statistics courses, you may have run into [Shiny](https://shiny.rstudio.com/) for [R](https://en.wikipedia.org/wiki/R_(programming_language)). Gradio might be considered one of Python's answers to Shiny. [Streamlit](https://streamlit.io/) is another example.

These packages allow developers to quickly build interactive web applications around Python functions. As such, they're a great tool for:
- Prototyping product ideas;
- Generating interactive visualizations or reports, e.g. presenting analyses and results in a tactile and engaging way.

Let's set up a quick "hello world" application using Gradio. Run the following code block to create an interactive "web app" that you can run from within this notebook itself.

In [10]:
import gradio as gr

def greet(name):
    return "Hello " + name + "!"

iface = gr.Interface(
    fn=greet,
    inputs=gr.inputs.Textbox(
        lines=2, 
        placeholder="Type in your name here and hit 'Submit' below",
    ),
    outputs="text",
    title="Hello World: Gradio Edition",
    description="A toy 'web app' made with Gradio. Says hello to the input name.",
    theme="dark",
    examples=["JJin", "world"],
)
iface.launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://25937.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<fastapi.applications.FastAPI at 0x7fd7b95c9b10>,
 'http://127.0.0.1:7867/',
 'https://25937.gradio.app')

This example really only scratches the surface of Gradio's capabilities. For example, as outputs, you can have anything from [images](https://gradio.app/docs/#o_image) to [interactive chat interfaces](https://gradio.app/docs/#o_chatbot). Inputs can be anything from [dropdowns](https://gradio.app/docs/#i_dropdown) to file uploads for [images](https://gradio.app/docs/#i_image), [videos](https://gradio.app/docs/#i_video) and [audio](https://gradio.app/docs/#i_audio).

Take a look at the [Interfaces](https://gradio.app/docs/#interface) documentation for more cool stuff that you can do. The [Getting Started documentation for Gradio](https://gradio.app/getting_started/) is also a great read.

## Intro to Hugging Face

[Hugging Face](https://huggingface.co) is a startup founded in 2016. They are the authors of the [`transformers`](https://huggingface.co/transformers/) package through which machine learning engineers and researchers can: download pre-trained state-of-the-art models and "fine-tune" them for their specific use cases, e.g. refining a NLP model for better performance against tweets. They are a very prominent player in NLP research and development, and their "Model Hub" has emerged as the de facto standard for accessing the cutting-edge of NLP research.

For this exercise, we'll be using `transformers` here to source a pre-trained model to build a simple image-recognition application.



## Using an "out-of-the-box" visualization

Gradio has tight integration with the Hugging Face Hub. With Gradio, you can load up any model in the Hugging Face Hub with a default interactive visualization. 

For more details, read: [Gradio > Getting Started > Loading Hugging Face Models and Spaces](https://gradio.app/getting_started/#loading-hugging-face-models-and-spaces).

Here, we'll load a pre-trained version of Google's [Vision Transformer (ViT)](https://huggingface.co/google/vit-base-patch16-224) model, referred to in the Hugging Face Hub under the ID of `google/vit-base-patch16-224`.

In [11]:
import gradio as gr

ui = gr.Interface.load(
    name="huggingface/google/vit-base-patch16-224",
    theme="dark-huggingface",
    
)
ui.launch()

Fetching model from: https://huggingface.co/google/vit-base-patch16-224
Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://41975.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<fastapi.applications.FastAPI at 0x7fd7b95c9b10>,
 'http://127.0.0.1:7868/',
 'https://41975.gradio.app')

Right out the gate, we already have a fair amount of stuff taken care of for us. With just a couple lines of code, we have image-uploading capabilities, routing to the underlying model for predictions, and a UI to wrap all of that functionality. Without Gradio, this is the kind of stuff you'd have to write an entire web application from scratch for, which then brings with it its own challenges and decisions and overhead:
- What web framework do I use? Angular? React? Plain HTML?
- Do I even know how to write Javascript (I definitely don't...)?
- Where do I host this if I want people to play around with it? [AWS](https://aws.amazon.com/)? [Google Cloud](https://cloud.google.com/)?
- Do I have experience working with cloud platforms like AWS or GCP? Time to learn!!

## Customizing with selectable example inputs

Using the default visualization from Hugging Face is all well and good, but our customization options are by definition non-existent here.

Next, let's try using the same model to provide our own customized interface. Namely, we'll provide a **drop-down of sample inputs** in addition to the image upload pane.

First, let's download the pre-trained model so we can use it in our own code. We'll be using the [`transformers.pipeline`](https://huggingface.co/docs/transformers/v4.17.0/en/main_classes/pipelines#transformers.pipeline) abstaction here. This will let us pass inputs to the model and receive outputs almost as though it were a regular Python function. In our case, our "function" will be called simply `classifier`.


In [13]:
from transformers import pipeline

MODEL_NAME = "google/vit-base-patch16-224"

classifier = pipeline("image-classification", model=MODEL_NAME)

Downloading:   0%|          | 0.00/68.0k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/330M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/160 [00:00<?, ?B/s]

The `"image-classification"` argument refers to the type of "task" the pipeline performs. In this case, because our task (and the model we're using) performs image classification, we specify accordingly. Take a look at the `pipeline` documentation for more details, as well as an overview of the other types of "tasks" that `transformers` supports.

Case in point, you'll notice that the resulting `classifier` is of type `ImageClassificationPipeline`.

In [14]:
type(classifier)

transformers.pipelines.image_classification.ImageClassificationPipeline

Now, we can pass in a path to a static image file as argument and receive a set of predictions and their respective probabilities. For example:

In [27]:
!wget https://i.imgur.com/9ayWKzZ.png -O surprised-pikachu.png
classifier("./surprised-pikachu.png")

--2022-03-27 20:25:35--  https://i.imgur.com/9ayWKzZ.png
Resolving i.imgur.com (i.imgur.com)... 146.75.28.193
Connecting to i.imgur.com (i.imgur.com)|146.75.28.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97373 (95K) [image/png]
Saving to: ‘surprised-pikachu.png’


2022-03-27 20:25:35 (27.7 MB/s) - ‘surprised-pikachu.png’ saved [97373/97373]



[{'label': 'comic book', 'score': 0.33174237608909607},
 {'label': 'toyshop', 'score': 0.05597229301929474},
 {'label': 'jigsaw puzzle', 'score': 0.04012775793671608},
 {'label': 'laptop, laptop computer', 'score': 0.019154300913214684},
 {'label': 'rubber eraser, rubber, pencil eraser',
  'score': 0.015469899401068687}]

This output tells us that the model thinks, with "confidence score" 0.33, that the image of surprised Pikachu is a comic book. This is mostly incorrect. That said, the fact that the model does not recognize Pikachu is to be expected, considering [Vision Transformer](https://huggingface.co/google/vit-base-patch16-224) was more likely than not **not** trained using Pokemon training data.

(There's a good opportunity here to fine-tune ViT to classify images of Pokemon, but that's left as an exercise to the reader. 😈)

In any case, let's download some more images to use as example inputs for our application. 

In [34]:
!wget https://i.imgur.com/BNCe5bK.jpeg -O baby-capybara.jpg
!wget https://i.imgur.com/gGd0HnK.jpeg -O baby-akita.jpg

import pathlib
from typing import Dict, Union, Optional

EXAMPLE_INPUTS: Dict[str, Union[str, pathlib.Path]] = {
    "surprised-pikachu": pathlib.Path("./surprised-pikachu.png"),
    "capybara": pathlib.Path("./baby-capybara.jpg"),
    "puppy": pathlib.Path("./baby-akita.jpg"),
}

--2022-03-27 20:34:50--  https://i.imgur.com/BNCe5bK.jpeg
Resolving i.imgur.com (i.imgur.com)... 146.75.32.193
Connecting to i.imgur.com (i.imgur.com)|146.75.32.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 230703 (225K) [image/jpeg]
Saving to: ‘baby-capybara.jpg’


2022-03-27 20:34:50 (48.8 MB/s) - ‘baby-capybara.jpg’ saved [230703/230703]

--2022-03-27 20:34:50--  https://i.imgur.com/gGd0HnK.jpeg
Resolving i.imgur.com (i.imgur.com)... 146.75.32.193
Connecting to i.imgur.com (i.imgur.com)|146.75.32.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25739 (25K) [image/jpeg]
Saving to: ‘baby-akita.jpg’


2022-03-27 20:34:50 (88.1 MB/s) - ‘baby-akita.jpg’ saved [25739/25739]



Now we'll write a new function, `classify`, that can take in either an input image (either a path to one or a image file loaded into a Python object), or an "example ID," which will refer to one of the three example inputs we just set up in the previous block.

In [56]:
import PIL
from typing import Tuple

def classify(
    input: Optional[Union[str, PIL.Image.Image]] = None, 
    example_id: Optional[str] = None,
) -> Tuple[str, Dict[str, float]]:
    if input is None and example_id is not None:
        input = PIL.Image.open(EXAMPLE_INPUTS[example_id])

    classifications = classifier(input)
    return input, {
        c["label"]: c["score"] for c in classifications
    }

In [57]:
classify(example_id="puppy")

(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=600x400 at 0x7FD61E055050>,
 {'Eskimo dog, husky': 0.11203732341527939,
  'Norwegian elkhound, elkhound': 0.043681707233190536,
  'Siberian husky': 0.0859639123082161,
  'dingo, warrigal, warragal, Canis dingo': 0.0856500044465065,
  'malamute, malemute, Alaskan malamute': 0.04764827340841293})

In [58]:
classify(input=PIL.Image.open("./baby-capybara.jpg"))

(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=900x675 at 0x7FD61DF126D0>,
 {'beaver': 0.5411075353622437,
  'marmot': 0.14890287816524506,
  'mink': 0.07747175544500351,
  'porcupine, hedgehog': 0.016281893476843834,
  'weasel': 0.018355373293161392})

Now, finally, we have all the pieces needed to set up our custom Gradio application. We alluded earlier, in `Intro to Gradio`, to several rich input and output types that Gradio supports. We'll use several here:
- [`gradio.inputs.Image`](https://gradio.app/docs/#i_image), to allow users to upload their own images for classification;
- [`gradio.inputs.Dropdown`](https://gradio.app/docs/#i_dropdown), which will allow users to select one of our example inputs from a drop-down menu;
- [`gradio.outputs.Image`](https://gradio.app/docs/#o_image), which we'll use to show the image that's being classified (user-uploaded or otherwise);
- [`gradio.outputs.Label`](https://gradio.app/docs/#o_label), which we'll use to visualize the model's prediction with their corresponding confidence scores.

In [61]:
gr.Interface(
    fn=classify,
    inputs=[
            gr.inputs.Image(optional=True, type="pil"),
            gr.inputs.Dropdown(choices=list(EXAMPLE_INPUTS.keys()), type="value", default=None),
    ],
    outputs=[
             gr.outputs.Image(type="pil"),
             gr.outputs.Label(type="confidences"),
    ],
    theme="dark-huggingface",
).launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://28210.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<fastapi.applications.FastAPI at 0x7fd7b95c9b10>,
 'http://127.0.0.1:7869/',
 'https://28210.gradio.app')

Try selecting one of the example inputs from the drop-down. Try also uploading your own image to see what predictions you get out.

## Conclusion

This notebook gave an example of how to use pre-trained models with Python tooling to quickly get started with building prototypes of ML-based applications, allowing you to avoid most of the overhead associated with: web application development; deploying your application to cloud environments; and so on.

If this feels like "not really" machine learning, that's because a lot of "ML engineering," in my experience, really straddles the line between what's typically considered "applied ML" or "ML research" and what's typically considered "standard" software engineering. I'd love to go into this topic more, but this notebook's already long enough. 😇

Still, I hope that this notebook helps give some inspiration about how to get started with **building ML-based applications** without necessarily needing to design, build, and train your own ML model from scratch. I hope this provides some inspiration for your hacking this weekend, as well as provides some color around the opportunities available to you in industry around ML that aren't research- or theory-based.

Thanks for reading. 🤗

## Further Reading

If you're interested in reading more about ML infrastructure, check out:
- [This blog post from Twitter](https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows) talking about their system for setting up and running "workflows" to train models and put them into production;
- [This other blog post from Twitter](https://blog.twitter.com/engineering/en_us/topics/insights/2018/twittertensorflow) discussing their adoption of TensorFlow and some of their additional Twitter-specific tooling on top of it;
- [This blog post by my teammates Sam and Josh at Spotify](https://engineering.atspotify.com/2019/12/the-winding-road-to-better-machine-learning-infrastructure-through-tensorflow-extended-and-kubeflow/), where he presents an overview of our machine learning platform and its individual components;
- [This other blog post by my **other** teammate, Maisha, at Spotify](https://engineering.atspotify.com/2022/01/product-lessons-from-ml-home-spotifys-one-stop-shop-for-machine-learning/), where she talks about building out a product of ours called "ML Home";
- [This talk by my **other** teammate, Divita, here at Spotify](https://www.youtube.com/watch?v=_j9gr4IJar0), where she presents a present-day snapshot of where things stand with Spotify's ML Platform and where we're looking to go next.

I also highly recommend the seminal paper ["Hidden Technical Debt in Machine Learning Systems"](https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf). This paper presents a comprehensive overview of the motives and challenges associated with putting machine learning in production, outlining many of the complexities associated with it that extend beyond simply designing and training the model. It's **heavily** cited, to the point where this particular diagram has basically achieved meme status:

![the boxes](https://i.imgur.com/KtkRuLu.png)