# Object detection

Object detection is a more challenging task than simple image classification. It can find named objects within a given image, draw a bounding box around it and even produce multiple annotations if multiple classes exist within an image.

This additional complication means that this process is typically slower than simple image classification. As with classification, we first need to load in the relevant libraries.

In [None]:
## Import all required packages for this example.
from PIL import Image  ## This library manages reading images from file.
from transformers import pipeline  ## The library manages many pretrained models.
import glob  ## This library efficiently works with multiple files in a directory.
import matplotlib.pyplot as plt  ## This package is useful for displaying the images.
import matplotlib.patches as patches
import tqdm

Once again, we need to prepare out dataset.

In [None]:
folder = 'example_data/coco/*'  ## Note * is a wildcard character, meaning match everything.
images = sorted(glob.glob(folder))
images = [Image.open(i) for i in images[:10]]

When we load our model, we can pass one of a few model options. This time we're going to use `OWL-ViT`, a model developed by Google specifically for zero-shot object detection.

In [None]:
model = "google/owlv2-base-patch16-ensemble"
detector = pipeline(model=model, task="zero-shot-object-detection")

Now the model is ready to be applied to images. We define the classes we are looking for in exactly the same way as when we were classifying the image as a whole, with one exception. The object detection task takes a new, optional parameter: `threshold`. This value sets a minimum model score required to recognise an object. This is necessary for object detection, as the model will return all matched objects regardless of the likelihood of a match. Some tuning is required to determine an effective threshold based on the candidate labels and dataset. Here, we found `0.3` to be appropriate.

In [None]:
labels = ['elephant','man','woman']
detections = [detector(i,candidate_labels = labels,threshold=0.3) for i in images]

Once again, we can view the objects detected in a given image. Here we plot the image, and overlay the bounding boxes for each object found in the image.

In [None]:
for index in range(10):

    plt.imshow(images[index])
    ax = plt.gca()
    
    if detections[index]:
        for result in detections[index]:
            box = result["box"]
            label = f"{result['label']} ({result['score']:.2f})"
            
            # Draw bounding box
            rect = patches.Rectangle(
                (box["xmin"], box["ymin"]),
                box["xmax"] - box["xmin"],
                box["ymax"] - box["ymin"],
                linewidth=2,
                edgecolor="lime",
                facecolor="none"
            )
            ax.add_patch(rect)
            
            # Draw label
            ax.text(
                box["xmin"],
                box["ymin"] - 5,
                label,
                color="black",
                fontsize=10,
                backgroundcolor="lime"
            )

    plt.axis("off")
    plt.tight_layout()
    plt.show()

As with the classification task, there is an element of experimentation required to determine the appropriate codebook and threshold for a given application of zero-shot object detection. You may find that the model is not suitable for your dataset - you can always look for other hosted on [HuggingFace](https://huggingface.co/models?pipeline_tag=zero-shot-object-detection&sort=trending), most of which should work by replacing the model name when making the pipeline.