# Lesson 8: Object Detection

- In the classroom, the libraries are already installed for you.
- If you would like to run this code on your own machine, you can install the following:

```
    !pip install transformers
    !pip install gradio
    !pip install timm
    !pip install inflect
    !pip install phonemizer
```

In [1]:
    !pip install transformers
    !pip install gradio
    !pip install timm
    !pip install inflect
    !pip install phonemizer

Collecting gradio
  Downloading gradio-4.20.1-py3-none-any.whl (17.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.0/17.0 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.110.0-py3-none-any.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.1/92.1 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.2.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client==0.11.0 (from gradio)
  Downloading gradio_client-0.11.0-py3-none-any.whl (308 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m308.2/308.2 kB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━

**Note:**  `py-espeak-ng` is only available Linux operating systems.

To run locally in a Linux machine, follow these commands:
```
    sudo apt-get update
    sudo apt-get install espeak-ng
    pip install py-espeak-ng
```

### Build the `object-detection` pipeline using 🤗 Transformers Library

- This model was release with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) from Carion et al. (2020)

In [3]:
!pip install helper

Collecting helper
  Downloading helper-2.5.0.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: helper
  Building wheel for helper (setup.py) ... [?25l[?25hdone
  Created wheel for helper: filename=helper-2.5.0-py2.py3-none-any.whl size=19168 sha256=600e1fa688fb1c070f881d23efdaa17d3acd4ef5b68311bf3139d639f6edd335
  Stored in directory: /root/.cache/pip/wheels/13/8a/92/4a4267b8371d6e453121c917021173b4a682d691f123b9f647
Successfully built helper
Installing collected packages: helper
Successfully installed helper-2.5.0


In [2]:
from helper import load_image_from_url, render_results_in_image

ImportError: cannot import name 'load_image_from_url' from 'helper' (unknown location)

In [1]:
from transformers import pipeline

- Here is some code that suppresses warning messages.

In [2]:
from transformers.utils import logging
logging.set_verbosity_error()

from helper import ignore_warnings
ignore_warnings()

ImportError: cannot import name 'ignore_warnings' from 'helper' (/usr/local/lib/python3.10/dist-packages/helper/__init__.py)

In [3]:
od_pipe = pipeline("object-detection", "./models/facebook/detr-resnet-50")

OSError: Incorrect path_or_model_id: './models/facebook/detr-resnet-50'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

Info about [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50)

Explore more of the [Hugging Face Hub for more object detection models](https://huggingface.co/models?pipeline_tag=object-detection&sort=trending)

### Use the Pipeline

In [None]:
from PIL import Image

In [None]:
raw_image = Image.open('huggingface_friends.jpg')
raw_image.resize((569, 491))

In [None]:
pipeline_output = od_pipe(raw_image)

- Return the results from the pipeline using the helper function `render_results_in_image`.

In [None]:
processed_image = render_results_in_image(
    raw_image,
    pipeline_output)

In [None]:
processed_image

### Using `Gradio` as a Simple Interface

- Use [Gradio](https://www.gradio.app) to create a demo for the object detection app.
- The demo makes it look friendly and easy to use.
- You can share the demo with your friends and colleagues as well.

In [None]:
import os
import gradio as gr

In [None]:
def get_pipeline_prediction(pil_image):

    pipeline_output = od_pipe(pil_image)

    processed_image = render_results_in_image(pil_image,
                                            pipeline_output)
    return processed_image

In [None]:
demo = gr.Interface(
  fn=get_pipeline_prediction,
  inputs=gr.Image(label="Input image",
                  type="pil"),
  outputs=gr.Image(label="Output image with predicted instances",
                   type="pil")
)

- `share=True` will provide an online link to access to the demo

In [None]:
demo.launch(share=True, server_port=int(os.environ['PORT1']))

In [None]:
demo.close()

### Close the app
- Remember to call `.close()` on the Gradio app when you're done using it.

### Make an AI Powered Audio Assistant

- Combine the object detector with a text-to-speech model that will help dictate what is inside the image.

- Inspect the output of the object detection pipeline.

In [None]:
pipeline_output

In [None]:
od_pipe

In [None]:
raw_image = Image.open('huggingface_friends.jpg')
raw_image.resize((284, 245))

In [None]:
from helper import summarize_predictions_natural_language

In [None]:
text = summarize_predictions_natural_language(pipeline_output)

In [None]:
text

### Generate Audio Narration of an Image

In [None]:
tts_pipe = pipeline("text-to-speech",
                    model="./models/kakao-enterprise/vits-ljs")

More info about [kakao-enterprise/vits-ljs](https://huggingface.co/kakao-enterprise/vits-ljs).

In [None]:
narrated_text = tts_pipe(text)

### Play the Generated Audio

In [None]:
from IPython.display import Audio as IPythonAudio

In [None]:
IPythonAudio(narrated_text["audio"][0],
             rate=narrated_text["sampling_rate"])

### Try it yourself!
- Try these models with other images!