# Gradio

Gradio is a web app framework designed to facilitate the development and deployment of ML and DL apps. Have a look at [their website](https://www.gradio.app).

The following adapts their [Quickstart Guide](https://www.gradio.app/guides/quickstart).

In [None]:
import sys

if 'google.colab' in sys.modules:
    !pip install gradio

In [None]:
import gradio as gr
import numpy as np

---

## 1. Intro

### Hello, World

Docs:

- [Textbox](https://www.gradio.app/docs/textbox)
- [Interface](https://www.gradio.app/docs/interface)

In [None]:
def greet(name):
    return f"Hello {name}!"

demo = gr.Interface(
    fn=greet,
    inputs=gr.Textbox( # customize your textbox
        lines=2,
        placeholder="Name here..."
        ),
    outputs="text"
)

demo.launch()

### Multiple Input and Output Components

Docs:

- [Slider](https://www.gradio.app/docs/slider)

In [None]:
import gradio as gr

# how about adding a second "checkbox" as a third input
# to allow the user to tick whether it's rainy or not,
# and add text that changes accordingly?
def greet(name, is_morning, temperature):
    salutation = "Good morning" if is_morning else "Good evening"
    greeting = f"{salutation} {name}. It is {temperature} degrees today"
    celsius = (temperature - 32) * 5 / 9
    return greeting, round(celsius, 2)

demo = gr.Interface(
    fn=greet,
    inputs=[
        "text",
        "checkbox",
        gr.Slider(0, 100) # you can add a default 'value=75' to your slider if you want
    ],
    outputs=[     # to add labels, try this (thx ChatGPT!):
        "text",   # gr.Textbox(label="Greeting"),  # Custom label for the first output
        "number", # gr.Number(label="Temperature in Celsius")  # Custom label for the second output
    ],
)
demo.launch()


### An Image Example

You are obviously free to do whatever you like with your inputs, and they are not limited to text only! Here is an example where we modify an image.

Docs:
- [Image](https://www.gradio.app/docs/image)

In [None]:
def filter(input_img):
    # sepia filter
    img_filter = np.array([
        [0.393, 0.769, 0.189],
        [0.349, 0.686, 0.168],
        [0.272, 0.534, 0.131]
    ])
    img_filter = img_filter.astype(np.float64) # make sure the contents are floats
    filter_img = input_img.dot(img_filter.T)
    filter_img /= filter_img.max()
    return filter_img

demo = gr.Interface(filter, gr.Image(), "image")

demo.launch(show_api=False, debug=True)

# There are quite a few matrices that have an interesting effect,
# it can be a nice idea to go look for others and modify this app
# to allow the user to choose between different filters!
# Color Inversion
# img_filter = np.array([
#     [-1, 0, 0],
#     [0, -1, 0],
#     [0, 0, -1]
# ]) + 1

# # Cool Filter
# img_filter = np.array([
#     [0.9, 0, 0],
#     [0, 0.9, 0],
#     [0, 0, 1.1]
# ])

### Using Blocks

For more control, `Blocks` are the way to go. Here we can play with flipping various kinds of data left/right or upside down!

Docs:
- [Blocks](https://www.gradio.app/docs/blocks)
- [Markdown](https://www.gradio.app/docs/markdown)
- [Tab](https://www.gradio.app/docs/tab)
- [Button](https://www.gradio.app/docs/button)
- [Accordion](https://www.gradio.app/docs/accordion)
- [Audio](https://www.gradio.app/docs/audio)

Also [`np.fliplr`](https://numpy.org/doc/stable/reference/generated/numpy.fliplr.html) and [`np.flipud`](https://numpy.org/doc/stable/reference/generated/numpy.flipud.html).

In [None]:
def flip_text(x):
    return x[::-1]

def flip_image(x):
    return np.fliplr(x)  # try also np.flipud

# help from ChatGPT (buggy, but still nice) for this!
def reverse_audio(audio_data):
    # audio_data is a tuple with (sample_rate, audio_array)
    # print(audio_data)
    sample_rate, audio_array = audio_data
    reversed_audio = audio_array[::-1]  # Reverse the audio data
    # Return the reversed audio data along with the sample rate
    return (sample_rate, reversed_audio)

# note the 'with' syntax, to allow you to populate your blocks
with gr.Blocks() as demo:
    # use markdown syntax in the app
    gr.Markdown("# Flip text, image, or audio files using this demo.")
    with gr.Tab("Flip Text"):
        text_input = gr.Textbox()
        text_output = gr.Textbox()
        text_button = gr.Button("Flip")
    with gr.Tab("Flip Image"):
        with gr.Row():
            image_input = gr.Image()
            image_output = gr.Image()
        image_button = gr.Button("Flip")
    with gr.Tab("Reverse Audio"):
        with gr.Row():
            audio_input = gr.Audio()
            audio_output = gr.Audio()
        audio_button = gr.Button("Reverse")

    with gr.Accordion("Open for More!"):
        gr.Markdown("Look at me...")

    # now we have three different functions, with three different effects, in each one of our tabs!
    text_button.click(flip_text, inputs=text_input, outputs=text_output)
    image_button.click(flip_image, inputs=image_input, outputs=image_output)
    audio_button.click(reverse_audio, inputs=audio_input, outputs=audio_output)

demo.launch(show_api=False, debug=True)

---

## 2. Chatbots!

Adapated from  the [Chatbot](https://www.gradio.app/guides/quickstart#chatbots) part of the Quickstart, [How to Create a Chatbot with Gradio](https://www.gradio.app/guides/creating-a-chatbot-fast#a-streaming-example-using-openai) and [How to Create a Custom Chatbot with Gradio Blocks](https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks#adding-markdown-images-audio-or-videos).

Gradio apps these days are used for two main purposes:
- to show off diffusion models (we are doing this too!);
- to build chatbots.

The [triple threat](https://en.wiktionary.org/wiki/triple_threat) of Huggingface and its Hub of models/tokenizers/datasets, Gradio apps<sup>1</sup> and [Huggingface Spaces](https://huggingface.co/spaces) (to deploy apps and provide GPUS) had made this open-source corp hugely important in the field, and turbocharged the development of this ecosystem.

Let's look at some chatbot examples.

#### Check out [Huggingface Spaces](https://huggingface.co/spaces)!

That works like GitHub in that you create repository following a certain syntax for your app, and then you can edit the app locally, push it to Spaces and see the updates (even if you don't have a GPU on your machine). And just like on GitHub, you can *fork* projects (copy and import someone else's code into your own account and then modify it to make it your own). Check out the [intro guide](https://huggingface.co/spaces/launch).


<small>1: note that there's also [Streamlit](https://streamlit.io/), let me know if you tried that and found it better! And it seems now [Docker](https://www.docker.com/) might also be supported on Spaces 🥳</small>

In [None]:
import sys

if 'google.colab' in sys.modules:
    !pip -q install gradio
    !pip -q install transformers

import gradio as gr

In [None]:
import time
import random

### Example: a chatbot that responds yes or no

This is obviously the simplest possible thing you can do, but this logic of randomness can be combined with branches (`if/else`) to create all sorts of elaborate paths of decision guiding your bot's answers: attempts at chatbots before Deep Learning would use elaborate systems like this one (for the literature lovers, see [this](https://drive.google.com/file/d/1v-q2M8ZlCcoCGKvw0rgpmyBg6bbH4DWF/view?usp=sharing), [that](https://drive.google.com/file/d/12-wIbonK8d8w5UXpFM_-nzMfBoDpq4Zq/view?usp=sharing) and [that](https://drive.google.com/file/d/1G_T2MjCQCuLYIQPpe0GnBOu3PO67deyL/view?usp=sharing) articles, for instance).

Docs:
- [ChatInterface](https://www.gradio.app/docs/chatinterface)

And the [`random.choice`](https://docs.python.org/3/library/random.html#functions-for-sequences) function in Python.

In [None]:
def random_response(message, history):
    return random.choice(["Yes", "No"])

gr.ChatInterface(random_response).launch()

### Another example using the user’s input and history

Here you can see that I use both a modulo logic (a bit boring, randomness might be nicer) to decide between three different kinds of answer. Then, in option 2, I randomly pick one interaction in the history, and use the user input!

Reminder: `history` is mentioned [here](https://www.gradio.app/guides/quickstart#chatbots).

In [None]:
def sometimes_agree_sometimes_remembers(message, history):
    # print(*history, sep="\n") # you can have a look at the history object if you want
    if len(history) % 3 == 0:
        return f"Yes, I do think that '{message}'"
    elif len(history) % 3 == 1:
        # history is an array of arrays, each containing ["user input", "bot response"]
        past_message = random.choice(history)[0] # [0] for user input, [1] for bot response
        return f"Wait, didn't you say earlier: '{past_message}'"
    else:
        return "I don't think so"

gr.ChatInterface(sometimes_agree_sometimes_remembers).launch(debug=True)

### Streaming chatbots

Isn't it nice if your bot types live instead of giving you a full answer?

For that, and any gradual process of answering, use the [`yield` keyword of Python generators](https://realpython.com/introduction-to-python-generators/).

See [this part](https://www.gradio.app/guides/creating-a-chatbot-fast#a-streaming-example-using-openai).

In [None]:
def slow_echo(message, history):
    for i in range(len(message)):
        time.sleep(0.3) # try random.random() * .5 (or another number) for irregular typing speed!
        yield f"You typed: {message[: i+1]}"

gr.ChatInterface(slow_echo).queue().launch()

### A customised Yes-Bot

See [this part](https://www.gradio.app/guides/creating-a-chatbot-fast#customizing-your-chatbot).

Docs & more:
- [Descriptive content](https://www.gradio.app/guides/key-features#descriptive-content)
- [Examples](https://www.gradio.app/guides/more-on-examples)
- [Styling](https://www.gradio.app/guides/key-features#styling)
- [Theming guide](https://www.gradio.app/guides/theming-guide)

In [None]:
def yes_bot(message, history):
    if message.endswith("?"):
        # silly fun with yes answers
        return random.choice(
            ["Yes", "Yes yes yes!", "Whoa, so totally yes!", "Hell yes!", "Man, I couldn't agree more!", "Absolutely!", "Haha I thought this too!"]
            )
    else:
        return "Ask me anything!"

gr.ChatInterface(
    yes_bot,
    chatbot=gr.Chatbot(height=300),
    textbox=gr.Textbox(placeholder="Ask me a yes or no question", container=False, scale=7),
    title="Yes Bot",
    description="Ask Yes Man any question",
    theme="soft",
    examples=["Hello", "Am I cool?", "Are tomatoes vegetables?"],
    cache_examples=True,
    retry_btn=None,
    undo_btn="Delete Previous",
    clear_btn="Clear",
).launch()

### Additional Inputs

See [this part](https://www.gradio.app/guides/creating-a-chatbot-fast#additional-inputs).

You may want to add additional parameters to your chatbot and expose them to your users through the Chatbot UI. For example, suppose you want to add a textbox for a system prompt, or a slider that sets the number of tokens in the chatbot's response. The `ChatInterface` class supports an `additional_inputs` parameter which can be used to add additional input components.

The `additional_inputs` parameters accepts a component or a list of components. You can pass the component instances directly, or use their string shortcuts (e.g. "`textbox`" instead of `gr.Textbox()`). If you pass in component instances, and they have not already been rendered, then the components will appear underneath the chatbot (and any examples) within a `gr.Accordion()`. You can set the label of this accordion using the `additional_inputs_accordion_name parameter`.

In [None]:
def echo(message, history, system_prompt, max_tokens):
    # This is very useful, as it allows to modify the behaviour of the
    # chatbot on top of the user input (the system prompt, in this case,
    # is re-added every single time, without the user having to re-type it)
    response = f"System prompt: {system_prompt}\n Message: {message}."
    # of course, in a real chatbot like below, you would feed the message
    # to the chatbot, then stream the completion by the neural net!
    for i in range(min(len(response), int(max_tokens))):
        time.sleep(0.05) # again, you could use a random logic for a better typing effect
        yield response[: i+1]

demo = gr.ChatInterface(
    echo,
    additional_inputs=[ # this is displayed in an accordion below the box
        gr.Textbox("You are helpful AI.", label="System Prompt"),
        gr.Slider(10, 100, label="Max tokens")
    ]
)


demo.queue().launch(debug=True)

### Example using a local, open-source LLM with Hugging Face

See [this part](https://www.gradio.app/guides/creating-a-chatbot-fast#example-using-a-local-open-source-llm-with-hugging-face).

The model we use is [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1) (pretty big, best run on Colab!), see also [this post for the dataset](https://www.together.ai/blog/redpajama-data-v2). Chat models are regular language models finetuned on specific chat datasets (especially, they include markers for "user input" and "assistant responses", as well as, sometimes, overall directives like "system prompt" (defining the overall identity of the bot). In Huggingface, you would recognise them as having a "-chat" identifier, for instance for the [Llama 2 family](https://huggingface.co/meta-llama).

Docs:
- [StopingCriteria](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.StoppingCriteria) \(see also [this nice post](https://discuss.huggingface.co/t/implimentation-of-stopping-criteria-list/20040/2)\)
- [TextIteratorStreamer](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.TextIteratorStreamer)

In [None]:
import torch

# Get cpu, gpu or mps device for training.
# See: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html#creating-models
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import StoppingCriteria
from transformers import StoppingCriteriaList
from transformers import TextIteratorStreamer

from threading import Thread

**Note**

This chatbot uses almost all the memory of a free Colab instance. Unfortunately, I haven't been able to free the memory so that I would be able to restart this app for debugging without restarting the runtime (and re-downloading the model) 😬.

The upside is: it is quite powerful! Try speak to it in different languages, or ask it code questions!

In [None]:
MODEL_ID = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16)
model = model.to(device) # move model to GPU

class StopOnTokens(StoppingCriteria):
    """
    Class used `stopping_criteria` in `generate_kwargs` that provides an additional
    way of stopping the generation loop (if this class returns `True` on a token,
    the generation is stopped)).
    """
    # note: Python now supports type hints, see this: https://realpython.com/lessons/type-hinting/
    #       (for the **kwargs see also: https://realpython.com/python-kwargs-and-args/)
    # this could also be written: def __call__(self, input_ids, scores, **kwargs):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [29, 0] # see the cell below to understand where these come from
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

def predict(message, history):

    history_transformer_format = history + [[message, ""]]
    stop = StopOnTokens()

    # useful to debug
    # msg = "history"
    # print(msg)
    # print(*history_transformer_format, sep="\n")
    # print("***")

    # at each step, we feed the entire history in string format,
    # restoring the format used in their dataset with new lines
    # and <human>: or <bot>: added before the messages
    messages = "".join(
        ["".join(
            ["\n<human>:"+item[0], "\n<bot>:"+item[1]]
         )
        for item in history_transformer_format]
    )
    # # to see what we feed to our net:
    # msg = "string prompt"
    # print(msg)
    # print("-" * len(msg))
    # print(messages)
    # print("-" * 40)

    # convert the string into tensors & move to GPU
    model_inputs = tokenizer([messages], return_tensors="pt").to(device)

    streamer = TextIteratorStreamer(
        tokenizer,
        # timeout=30.,    # without the timeout, if there's an issue the bot will hang indefinitely
        skip_prompt=True, # (haven't implemented the error handling yet 🙈)
        skip_special_tokens=True
    )
    
    generate_kwargs = dict(
        model_inputs,
        streamer=streamer,
        max_new_tokens=1024,
        do_sample=True,
        top_p=0.95,
        top_k=1000,
        temperature=1.0,
        pad_token_id=tokenizer.eos_token_id, # mute annoying warning: https://stackoverflow.com/a/71397707
        num_beams=1,  # this is for beam search (disabled), see: https://huggingface.co/blog/how-to-generate#beam-search
        stopping_criteria=StoppingCriteriaList([stop])
    )
    t = Thread(target=model.generate, kwargs=generate_kwargs)
    t.start()

    partial_message  = ""
    for new_token in streamer:
        # seen the format <human>: and \n<bot> above (when 'messages' is defined)?
        # we stream the message *until* we encounter '<', which is by the end
        if new_token != '<':
            partial_message += new_token
            yield partial_message


gr.ChatInterface(predict).queue().launch(debug=True)

How do we know what the stop words are? (This is in part a design choice!)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
print("The model stop words are:")
for tok in [29, 0]:
    print(f"  - `{tokenizer.decode([tok])}`")

print("If you wanted to know what token was associated with `<`, you'd do the opposite:")
print("`<` encoded as:", tokenizer.encode("<"))

---

## Experiments

- You could try to modify this code to work with the latest Llama models by Meta (you must register on [their site](https://ai.meta.com/llama/), then on Huggingface once you get permission, to be able to download the code). After that (same as with various restricted models/datasets/etc. on the Hub), you would need to log into HF:
```python
from pathlib import Path
from huggingface_hub import notebook_login
if not (Path.home()/'.huggingface'/'token').exists():
    notebook_login()
```
- Another example that would allow you to play with the cutting-edge LLMs is the [OpenAI example](https://www.gradio.app/guides/creating-a-chatbot-fast#a-streaming-example-using-openai) in the Gradio tutorial. You would first need to register (with credit card) and get an API key on [their website](https://platform.openai.com/)...

- Gradio ships with a [`Flagging`](https://www.gradio.app/guides/key-features#styling) logic, that allows you to harvest data from your users for free! You can also implement [`likes`](https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks#liking-disliking-chat-messages), that could be interesting!

- The current trend these days is to work with multimodality (systems that are able to handle more than one type of data: text and images, for instance, or text and music). See [this last part](https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks#adding-markdown-images-audio-or-videos) of the Gradio Chatbot tutorial for examples, as well as the two apps they recommend [project-baize/Baize-7B](https://huggingface.co/spaces/project-baize/chat-with-baize) and [MAGAer13/mPLUG-Owl](https://huggingface.co/spaces/MAGAer13/mPLUG-Owl) (and as said you could clone these projects, study the code, and transform them into your own project)!