# Perform object detection with pollen-vision

Learn how to perform zero shot object detection with the pollen-vision library, using the OWL-ViT model.

This notebook will show you how to use our wrapper for the OWL-ViT object detection model developped by the Google Research lab. 

**Insérer image annotées avec plein de bounding boxes pour faire les bgs**

## A word on OWL-ViT
OWL-ViT stands for Vision Transformer for Open-World Localization. It is a zero shot object detection model, meaning the model is able to perform object detection based on text queries, without needing to retrain the model on any labeled data, as it is the case with traditional Deep Learning object detection models.

You can find more information on the model on the dedicated page of the [Hugging Face documentation](https://huggingface.co/docs/transformers/model_doc/owlvit). The implementation of the wrapper actually uses Hugging Face's [transformers library](https://huggingface.co/docs/transformers/index).

In [None]:
import numpy as np
from pathlib import Path
from PIL import Image
from os import listdir

from pollen_vision.vision_models.object_detection.owl_vit.owl_vit_wrapper import OwlVitWrapper

In [None]:
object_detection_wrapper = OwlVitWrapper()

Import the image you want to perform the inference on. 

Here we will take one of the test image of the project. We placed the demo images and videos in the examples folder. Feel free to import your own image!

In [None]:
demo_folder_path = Path.cwd() / "data"
img = Image.open(demo_folder_path / listdir(demo_folder_path)[0])
img

## Run inference with the model

As explained, the OWL-ViT model is a zero shot object detection model and takes text queries as input. The inference is performed with the *infer* method. Just pass as argument a list of the candidate for the object detection that you want to detect. OWL-ViT will only try to detect classes that are in the list.

NB: Please note that the image passed as argument for the *infer* method must be a **numpy array object**.

In [None]:
predictions = object_detection_wrapper.infer(im=np.array(img), candidate_labels=["robot", "man"], detection_threshold=0.2)
print(predictions)

Change the candidates list and check what you can detect!

### Visualize detection results

You can visualize easliy the predictions of the model with the *draw_predictions* method.

In [None]:
object_detection_wrapper.draw_predictions?

In [None]:
object_detection_wrapper.draw_predictions(in_im=np.array(img), predictions=predictions)

## Infer with your camera

In [None]:
import cv2 as cv

cap = cv.VideoCapture(0)

In [None]:
_, im = cap.read()

In [None]:
predictions = object_detection_wrapper.infer(im, ["man"])

In [None]:
wrapper.draw_predictions(np.array(im), predictions=predictions)

In [None]:
cap.release()

## Plug depth camera



## Final notes

That's all folks! You can use this script if you want to perform zero shot object detection on video frames. The scripts gathers every commands that you saw here in the notebook.

Check out the other notebooks if you want to learn how to use other vision models like RAM to check whether an object is in the frame or not or SAM to perform object segmentation.