<a href="https://colab.research.google.com/github/thesteve0/impatient-computer-vision/blob/main/4_object_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Object Detection

In Classification we try to come up with one label for the entire image. Now we are going to look at a more involved technique, Object Detection.  With Object Detection there are actually two tasks being accomplished:
1. The model has to find all the "things" in the image
2. Then it has to determine what those "things" actually are

Typically, the models will surround the object with a bounding rectangle and then apply a label to the rectangle. There are some models that can also produce oriented rectangle which is helpful for use cases like drone imagery. Not only can you detect the car but you can tell its direction.

Before we get started go ahead and run the housekeeping again. Once you have mapped the drive we will go back to the slides and discuss Object Detection

In [None]:
from google.colab import drive
drive.mount('/content/drive')

!pip install fiftyone==1.4.1 torch torchvision umap-learn

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.brain as fob

name = "our-photos"
dir = "/content/drive/MyDrive/impatient-cv/flickr-labeled"

dataset = fo.Dataset.from_dir(
    dataset_dir=dir,
    dataset_type=fo.types.FiftyOneDataset,
    name=name
)

print(dataset)

## Running a model

For the rest of the notebooks we are not going to dig in on custom models, we will just be using models from the FiftyOne Zoo. As much as possible, I will try to select a fairly typical model. Remember, the point is to give you a broad overview so you have a place to start when you dig in on the more complicated models.

For this task we will use a Faster R-CNN that has a ResNet50 backbone - [faster-rcnn-resnet50-fpn-coco-torch](https://docs.voxel51.com/model_zoo/models.html#faster-rcnn-resnet50-fpn-coco-torch). This means the Faster R-CNN model uses the resnet model inside of its neural network, using it for classification of the detected objects. This model was trained on the [COCO dataset](https://cocodataset.org/#home) which is a more modern dataset but still similar in focus to ImageNet.

In [None]:
model = foz.load_zoo_model("faster-rcnn-resnet50-fpn-coco-torch")

dataset.apply_model(
    model,
    label_field="objects_detected",
    num_workers=2,
)

sample = dataset.first()
sample

## Viewing in the application
Now let's bring up the results in the application. You will see that, for some objects, we will get multiple detections. We can discuss how to handle those while we look at the data.

In [None]:
session = fo.launch_app(dataset, auto=False)
session.url

# session.show()

## Embedding the detections

In the same way embeddings of our images were helpful for exploring and assessing our classifications, we can do the same with each of the individual detections. In this case we don't want all the candidate detections to be embedded, instead we will just use the high confidence detections. We will filter the data and then pass that to `compute_visualization`.

In [None]:
from fiftyone import ViewField as F

high_confidence_data = dataset.filter_labels("objects_detected", F("confidence") > 0.7)

# Load the model for calculating embeddings
resnet18_in = foz.load_zoo_model("resnet18-imagenet-torch")

results = fob.compute_visualization(
    samples=high_confidence_data,
    patches_field="objects_detected",
    model=resnet18_in,
    brain_key="objects_detected",
    embeddings="emb_objects_detected",
    progres=True,
    num_workers=2
)

session.refresh()

## Wrap up

In this notebook we went from trying to classify the whole image into finding objects of interest in the photograph and then classifying the object. For our original question, "Is there a person in the image", this type of output would have been much better suited than a classification model.
In this case we could just iterate through the detections and if there was a human detection with a high confidence we could say the image contained a person.

You may ask what is "high confidence?" Excellent question, and unfortunately the answer is "it depends." There is usually a tradeoff between:
1. Set the confidence high, which eliminates false positives but may omit false negatives
2. Set the confidence low, which will include more true images but may also now include false positives

The only way to handle this is to experiment and determine a threshold that works well for your intended application. To understand more you can read about the tradeoff between [precision and recall](https://spotintelligence.com/2024/09/11/precision-and-recall/)

With our next notebook we are going to go even more fine-grained with our predicitions and try to predict every pixel in the image - time for Segmentation!

[Segmentation](https://github.com/thesteve0/impatient-computer-vision/blob/main/5_segmentation.ipynb)