### Gradio for Image Apps

In this notebook, we can see how to use the combination of two powerful libraries, [HuggingFace](https://huggingface.co/) and [Gradio](https://www.gradio.app/), to build Generative AI applications. Gradio allows to quickly create a simple web interface to make the access to most LLMs models more user friendly.

In [1]:
# libraries
import os
import IPython.display

from transformers import pipeline
import gradio as gr

#### 1. Image Captioning

The task to perform is generating, given an image as input, a corresponding caption. For this purpose, we'll use the `Salesforce/blip-image-captioning-base` from `HuggingFace`, a 14M parameter captioning [model](https://huggingface.co/Salesforce/blip-image-captioning-base).

Free images can be found [here](https://free-images.com/).

In [2]:
# load huggingface pipeline
get_completion = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

In [3]:
# apply it to one example
image_url = "https://free-images.com/sm/9596/dog_animal_greyhound_983023.jpg"
display(IPython.display.Image(url=image_url))
print(f"{get_completion(image_url)[0]['generated_text']}")



a dog wearing a santa hat and a red scarf


In [None]:
def captioner(image):
    result = get_completion(image)
    return result[0]['generated_text']

gr.close_all()
demo = gr.Interface(
    fn=captioner,
    inputs=[gr.Image(label="Upload image", type="pil")],  # type to convert the input in the desired format (here PIL)
    outputs=[gr.Textbox(label="Caption")],
    title="Image Captioning App",
    description="Using the `Salesforce/blip-image-captioning-base` model from `HuggingFace`",
    allow_flagging="never",
    examples=["christmas_dog.jpeg", "bird_flight.jpeg", "cow.jpeg"]
)
demo.launch(share=False)

In [5]:
# remeber to close all the ports
gr.close_all()

##### Note on `gr.Image()`
- The `type` parameter is the format that the `fn` function expects to receive as its input.  If `type` is `numpy` or `pil`, `gr.Image()` will convert the uploaded file to this format before sending it to the `fn` function.
- If `type` is `filepath`, `gr.Image()` will temporarily store the image and provide a string path to that image location as input to the `fn` function.

#### 2. Image Generation

We now want to perform the "opposite" action, i.e., given a caption, generate an image. We can do this with a *stable diffusion* model. Here we'll use the `runwayml/stable-diffusion-v1-5` model from `HuggingFace` (see more [here](https://huggingface.co/runwayml/stable-diffusion-v1-5)). This time, we will not directly use the `pipeline` wrapper but run all the (few) steps hidden behind.

In [7]:
#!pip install diffusers

In [8]:
from diffusers import StableDiffusionPipeline
import torch

In [9]:
torch.cuda.is_available()

True

In [13]:
#!pip install accelerate

In [16]:
get_completion = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, low_cpu_mem_usage=True) # convert to torch.float32 for CPU
get_completion = get_completion.to('cuda') # only if torch.cuda.is_available() == True

Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: 
```
pip install accelerate
```
.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.


In [17]:
def image_generator(caption):
    image = get_completion(caption)
    return image.images[0]

In [None]:
# one simple Interface()
gr.close_all()
demo = gr.Interface(
    fn=image_generator,
    inputs=[gr.Textbox(label="Your prompt")],
    outputs=[gr.Image(label="Result")],
    title="Image Generation with Stable Diffusion",
    #description="Generate any image with Stable Diffusion",
    allow_flagging="never",
    examples=["a long bridge leading to nowhere","mice fighting in the ancient city of Rome"])
demo.launch(share=False)

In [None]:
gr.close_all()

In [None]:
# a more complex Interface()
gr.close_all()

def image_generator(prompt, negative_prompt, steps, guidance, width, height):
  params = {                               # params available for this diffuser
    "negative_prompt": negative_prompt,
    "num_inference_steps": steps,
    "guidance_scale": guidance,
    "width": width,
    "height": height
  }

  image = get_completion(prompt, **params)  # passed as a dict
  return image.images[0]

demo = gr.Interface(
    fn=image_generator,
    inputs=[                                         # inputs can be passed as a list of gradio objects
        gr.Textbox(label="Your prompt"),
        gr.Textbox(label="Negative prompt", info="What you do NOT want to see in the image"),
        gr.Slider(label="Inference Steps", minimum=1, maximum=100, value=25, info="N. of denoising steps"),
        gr.Slider(label="Guidance Scale", minimum=1, maximum=20, value=7, info="How much the text prompt influences the result"),
        gr.Slider(label="Width", minimum=64, maximum=512, step=64, value=512),  # image width
        gr.Slider(label="Height", minimum=64, maximum=512, step=64, value=512), # image height
        ],
    outputs=[gr.Image(label="Result")],
    title="Image Generation with Stable Diffusion",
    allow_flagging="never",
    )
demo.launch(share=False, debug=False) # debug=True to show errors

In [None]:
gr.close_all()

In [60]:
# a Blocks UI - more freedom on where to put columns/rows
with gr.Blocks() as demo:
    gr.Markdown("# Image Generation with Stable Diffusion")  # title
    prompt = gr.Textbox(label="Your prompt")                 # row 1
    with gr.Row():                                           # row 2
      with gr.Column():                                      # column 1 in row 2
        negative_prompt = gr.Textbox(label="Negative prompt", info="What you do NOT want to see in the image")
        steps = gr.Slider(label="Inference Steps", minimum=1, maximum=100, value=25, info="N. of denoising steps")
        guidance = gr.Slider(label="Guidance Scale", minimum=1, maximum=20, value=7, info="How much the text prompt influences the result")
        width = gr.Slider(label="Width", minimum=64, maximum=512, step=64, value=512)
        height = gr.Slider(label="Height", minimum=64, maximum=512, step=64, value=512)
        btn = gr.Button("Submit")
      with gr.Column():                                      # column 2 in row 2
        output = gr.Image(label="Result")
    btn.click(fn=image_generator, inputs=[prompt,negative_prompt,steps,guidance,width,height], outputs=[output]) # must specify the Submit button
gr.close_all()
demo.launch(share=False)

Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



In [61]:
demo.close()

Closing server running on port: 7861


In [62]:
# another Blocks UI
with gr.Blocks() as demo:
    gr.Markdown("# Image Generation with Stable Diffusion")
    with gr.Row():
        with gr.Column(scale=4):                         # this column takes 4/5 of space
            prompt = gr.Textbox(label="Your prompt")
        with gr.Column(scale=1, min_width=50):           # this is remaining 1/5 unless it's less than 50 pixels
            btn = gr.Button("Submit")                    # Now the Submit button is next to the prompt
    with gr.Accordion("Advanced options", open=False):   # Sho/w/Hide the following advanced options by clicking on this tab
            negative_prompt = gr.Textbox(label="Negative prompt")
            with gr.Row():
                with gr.Column():
                    steps = gr.Slider(label="Inference Steps", minimum=1, maximum=100, value=25, info="N. of denoising steps")
                    guidance = gr.Slider(label="Guidance Scale", minimum=1, maximum=20, value=7, info="How much the text prompt influences the result")
                with gr.Column():
                    width = gr.Slider(label="Width", minimum=64, maximum=512, step=64, value=512)
                    height = gr.Slider(label="Height", minimum=64, maximum=512, step=64, value=512)
    output = gr.Image(label="Result")                     # Move also this upwards
    btn.click(fn=image_generator, inputs=[prompt,negative_prompt,steps,guidance,width,height], outputs=[output])
gr.close_all()
demo.launch(share=False)

Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



In [63]:
gr.close_all()

Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860


### Acknowledgements

Thanks to DeepLearning.AI and Gradio for the courses that inspired this notebook.