<font color="#de3023"><h1><b>REMINDER MAKE A COPY OF THIS NOTEBOOK, DO NOT EDIT</b></h1></font>

![](https://media-cldnry.s-nbcnews.com/image/upload/newscms/2021_11/3457540/210317-students-masks-jm-1522.jpg)

# **Social Distancing Detection with YOLO**

**In this notebook, we will attempt to use the *YOLO (You Only Look Once)* model to assess whether people are abiding by social distancing guidelines to curb the spread of COVID-19.**


💡 As a result, we will have a few objectives:
- Discover more about the YOLO model and how it works with object detection.
- Understand and compute midpoint and euclidean distances
- Learn how to adjust bounding and anchor boxes based on the objects in an image

##### <font color=darkorange>**Change Hardware Accelerator to GPU to train faster (Runtime -> Change Runtime Type -> Hardware Accelerator -> GPU)**

# **Ethical Concerns**

**Before we begin working with YOLO, it's important that we carefully consider the possible ramifications of implementing our code.** 🛑

If computer vision (CV) was to be used to identify social distancing violations, that would necessarily entail a 24/7 surveillance of public spaces. Since COVID-related responses are handled by the government, this would mean that violations reported by a CV-based AI system would be handled by the police.

In a time where African Americans are [3.5x more likely to be harassed and killed by the police](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)01609-3/fulltext), having AI detect social distancing violations becomes increasingly precarious. And that's not to mention the ethical concerns with public surveillance as reported by a number of philosophers.

Now, some may argue that using AI to detect crime, even in the interest of public health, gives dangerous amounts of power to the police. This may remind you of [Amazon providing facial recognition tech to ICE](https://www.washingtonpost.com/business/2019/07/12/no-tech-ice-protesters-demand-amazon-cut-ties-with-federal-immigration-enforcement/) (U.S. Immigration and Customs Enforcement) that empowers them in their detaining and seperation of immigrant families.

Others might contend that curbing COVID-19 is a more noble cause, thus making it okay to use CV and AI. We are then left in a gray area — and those are the hardest to judge.

**How do we balance the powers of AI with our basic liberties and privacy? As emerging AI enthusiasts, it's up to you to determine that for yourself.**

## ✏️ **Exercise**: **Being a Responsible Technologist**

If you choose to go into computer science and AI, you will make choices that impact real people. In the cell below, write a few sentences in response to each question.

*There are no right answers!*


In [None]:
#@markdown ### 1) Do you think we should think use AI to detect social distancing violations? Why or why not?
Answer1 = "" # @param {type:"string"}

#@markdown ### 2) Say you are the leader of a team in a large tech company tasked with creating an AI social distancing detector. What choices would you make while coding and designing your model to try and ensure that it does not harm others?
Answer2 = "" # @param {type:"string"}

#@markdown ### 3) After the model is coded and trained, what choices would you make while deploying the model to the real world? What situations would you want to avoid? What policies/laws could help ensure the model is used for good?
Answer3 = "" # @param {type:"string"}


Now that we've grappled with the ethical concerns behind our work, we can go ahead and begin coding our model.

**Please run the cell below to download the data and libraries we'll be working with!**

In [None]:
#@title Run this to prepare our environment! { display-mode: "form" }

# To keep versions the same; there is currently a version mismatch -- these pip installs
# should be able to be removed in the future

import matplotlib.pyplot as plt
import os
from PIL import Image
import gdown

import argparse
import numpy as np
from keras.layers import Conv2D, Input, BatchNormalization, LeakyReLU, ZeroPadding2D, UpSampling2D, Add, Concatenate
# from keras.layers.merge import add, concatenate
from keras.models import Model, load_model
import struct
import cv2
from copy import deepcopy
import pandas as pd

import tensorflow as tf

# tf.keras.models.load_model

# Prepare data
DATA_ROOT = '/content/data'
os.makedirs(DATA_ROOT, exist_ok=True)


image_url = 'https://drive.google.com/uc?id=125fNdCScl8-K6rtb-E6uBi5_gYLSbk3-'
image_path = os.path.join(DATA_ROOT, 'image.jpg')
gdown.download(image_url, image_path, True)

image_url = 'https://drive.google.com/uc?id=1lNPGFHVkltqqlffNYPfxk1Weytr1gIgB'
img_sd1_path = os.path.join(DATA_ROOT, 'social_distance1.jpg')
gdown.download(image_url, img_sd1_path, True)

image_url = 'https://drive.google.com/uc?id=1A5ddwSZhSvyjF8JZTTG43Y8RuOB1MQo4'
img_sd2_path = os.path.join(DATA_ROOT, 'social_distance2.jpg')
gdown.download(image_url, img_sd2_path, True)

#Try integrating other images by replacing the drive link with one that directs to another social distancing related image!
image_url = 'https://drive.google.com/uc?id=17-rsyFNkbONGE7ZLk7JIEQiflKmSEwYT'
img_sd3_path = os.path.join(DATA_ROOT, 'social_distance3.jpg')
gdown.download(image_url, img_sd3_path, True)
'''
image_url = 'https://drive.google.com/uc?id=12ZpZ5H0kJIkWk6y4ktGfqR5OTKofL7qw'
image_path = os.path.join(DATA_ROOT, 'image.jpg')
gdown.download(image_url, image_path, True)
'''

model_url = 'https://drive.google.com/uc?id=19XKJWMKDfDlag2MR8ofjwvxhtr9BxqqN'
model_path = os.path.join(DATA_ROOT, 'yolo_weights.h5')
gdown.download(model_url, model_path, True)

labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", \
              "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", \
              "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", \
              "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", \
              "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", \
              "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", \
              "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", \
              "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse", \
              "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", \
              "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

class BoundBox:
    def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax

        self.objness = objness
        self.classes = classes

        self.label = -1
        self.score = -1

    def get_label(self):
        if self.label == -1:
            self.label = np.argmax(self.classes)

        return self.label

    def get_score(self):
        if self.score == -1:
            self.score = self.classes[self.get_label()]

        return self.score

def _interval_overlap(interval_a, interval_b):
    x1, x2 = interval_a
    x3, x4 = interval_b

    if x3 < x1:
        if x4 < x1:
            return 0
        else:
            return min(x2,x4) - x1
    else:
        if x2 < x3:
             return 0
        else:
            return min(x2,x4) - x3

def _sigmoid(x):
    return 1. / (1. + np.exp(-x))

def bbox_iou(box1, box2):
    intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
    intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])

    intersect = intersect_w * intersect_h

    w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
    w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin

    union = w1*h1 + w2*h2 - intersect

    return float(intersect) / union

def preprocess_image(input_image, target_height, target_width):
    """Preprocesses the input image for model prediction.

    This function resizes the input image while maintaining the aspect ratio,
    and embeds the resized image into a 'letterbox' if needed.

    Args:
        input_image (PIL.Image): The input image.
        target_height (int): The target height for the model.
        target_width (int): The target width for the model.

    Returns:
        numpy.ndarray: The preprocessed image.
    """
    image_array = np.asarray(input_image)
    original_height, original_width, _ = image_array.shape

    # Compute the aspect ratio multiplier
    aspect_ratio_multiplier = min(target_width / original_width, target_height / original_height)

    # Compute new size preserving the aspect ratio
    new_width = int(original_width * aspect_ratio_multiplier)
    new_height = int(original_height * aspect_ratio_multiplier)

    # Resize the image while preserving the aspect ratio
    resized_image = cv2.resize(image_array / 255.0, (new_width, new_height))

    # Compute the padding values
    pad_vert = (target_height - new_height) // 2
    pad_horz = (target_width - new_width) // 2

    # Pad the resized image to fit the target size ('letterboxing')
    letterboxed_image = np.pad(resized_image, ((pad_vert, target_height - new_height - pad_vert),
                                               (pad_horz, target_width - new_width - pad_horz),
                                               (0, 0)), 'constant', constant_values=0.5)

    # Add an extra dimension to fit the model's input shape (batch_size, height, width, channels)
    final_image = np.expand_dims(letterboxed_image, axis=0)

    return final_image

def decode_netout(netout_, obj_thresh, anchors_, image_h, image_w, net_h, net_w):
    netout_all = deepcopy(netout_)
    boxes_all = []
    for i in range(len(netout_all)):
      netout = netout_all[i][0]
      anchors = anchors_[i]

      grid_h, grid_w = netout.shape[:2]
      nb_box = 3
      netout = netout.reshape((grid_h, grid_w, nb_box, -1))
      nb_class = netout.shape[-1] - 5

      boxes = []

      netout[..., :2]  = _sigmoid(netout[..., :2])
      netout[..., 4:]  = _sigmoid(netout[..., 4:])
      netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
      netout[..., 5:] *= netout[..., 5:] > obj_thresh

      for i in range(grid_h*grid_w):
          row = i // grid_w
          col = i % grid_w

          for b in range(nb_box):
              # 4th element is objectness score
              objectness = netout[row][col][b][4]
              #objectness = netout[..., :4]
              # last elements are class probabilities
              classes = netout[row][col][b][5:]

              if((classes <= obj_thresh).all()): continue

              # first 4 elements are x, y, w, and h
              x, y, w, h = netout[row][col][b][:4]

              x = (col + x) / grid_w # center position, unit: image width
              y = (row + y) / grid_h # center position, unit: image height
              w = anchors[b][0] * np.exp(w) / net_w # unit: image width
              h = anchors[b][1] * np.exp(h) / net_h # unit: image height

              box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
              #box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, None, classes)

              boxes.append(box)

      boxes_all += boxes

    # Correct boxes
    boxes_all = correct_yolo_boxes(boxes_all, image_h, image_w, net_h, net_w)

    return boxes_all

def correct_yolo_boxes(boxes_, image_h, image_w, net_h, net_w):
    boxes = deepcopy(boxes_)
    if (float(net_w)/image_w) < (float(net_h)/image_h):
        new_w = net_w
        new_h = (image_h*net_w)/image_w
    else:
        new_h = net_w
        new_w = (image_w*net_h)/image_h

    for i in range(len(boxes)):
        x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
        y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h

        boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
        boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
        boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
        boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)
    return boxes

def do_nms(boxes_, nms_thresh, obj_thresh):
    boxes = deepcopy(boxes_)
    if len(boxes) > 0:
        num_class = len(boxes[0].classes)
    else:
        return

    for c in range(num_class):
        sorted_indices = np.argsort([-box.classes[c] for box in boxes])

        for i in range(len(sorted_indices)):
            index_i = sorted_indices[i]

            if boxes[index_i].classes[c] == 0: continue

            for j in range(i+1, len(sorted_indices)):
                index_j = sorted_indices[j]

                if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
                    boxes[index_j].classes[c] = 0

    new_boxes = []
    for box in boxes:
        label = -1

        for i in range(num_class):
            if box.classes[i] > obj_thresh:
                label = i
                # print("{}: {}, ({}, {})".format(labels[i], box.classes[i]*100, box.xmin, box.ymin))
                box.label = label
                box.score = box.classes[i]
                new_boxes.append(box)

    return new_boxes


from PIL import ImageDraw, ImageFont
import colorsys

def draw_boxes_and_get_coordinates(image_, boxes, labels):
    image = image_.copy()
    image_w, image_h = image.size
    font = ImageFont.truetype(font='/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf',
                    size=np.floor(3e-2 * image_h + 0.5).astype('int32'))
    thickness = (image_w + image_h) // 300

    # Generate colors for drawing bounding boxes.
    hsv_tuples = [(x / len(labels), 1., 1.)
                  for x in range(len(labels))]
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(
        map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))
    np.random.seed(10101)  # Fixed seed for consistent colors across runs.
    np.random.shuffle(colors)  # Shuffle colors to decorrelate adjacent classes.
    np.random.seed(None)  # Reset seed to default.
    co_ordinates_with_labels = []
    for i, box in reversed(list(enumerate(boxes))):
        c = box.get_label()
        predicted_class = labels[c]
        score = box.get_score()
        top, left, bottom, right = box.ymin, box.xmin, box.ymax, box.xmax

        label = '{} {:.2f}'.format(predicted_class, score)
        draw = ImageDraw.Draw(image)

        bbox = font.getbbox(label) #returns left, top, right, bottom

        #calculates what textsize(label) used to (width and height)
        label_size = [bbox[2] - bbox[0], bbox[3] - bbox[1]]

        #label_size = draw.textsize(label)

        top = max(0, np.floor(top + 0.5).astype('int32'))
        left = max(0, np.floor(left + 0.5).astype('int32'))
        bottom = min(image_h, np.floor(bottom + 0.5).astype('int32'))
        right = min(image_w, np.floor(right + 0.5).astype('int32'))
        print(label, (left, top), (right, bottom))
        co_ordinates_with_labels.append([predicted_class, score, [(left, top), (right, bottom)]])
        if top - label_size[1] >= 0:
            text_origin = np.array([left, top - label_size[1]])
        else:
            text_origin = np.array([left, top + 1])

        # My kingdom for a good redistributable image drawing library.
        for i in range(thickness):
            draw.rectangle(
                [left + i, top + i, right - i, bottom - i],
                outline=colors[c])
        draw.rectangle(
            [tuple(text_origin), tuple(text_origin + label_size)],
            fill=colors[c])
        draw.text(text_origin, label, fill=(0, 0, 0), font=font)
        #draw.text(text_origin, label, fill=(0, 0, 0))
        del draw
    return image,  co_ordinates_with_labels

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print('No GPU Found! D:')
else:
  print('Found GPU at: {}'.format(device_name))

# **What Is YOLO?**
YOLO is an innovative approach to object detection that reframes the traditional "image classification" problem as a "regression" problem. Rather than using region-based approaches like RCNN, YOLO considers object detection as a single regression problem applied to the entire image. It predicts both bounding boxes and class probabilities directly from the raw image in one pass through the CNN (Convolutional Neural Network) model.

In the YOLO framework, the model divides the input image into a grid of cells. Each cell predicts one or more bounding boxes and their corresponding class probabilities. The bounding boxes represent the potential locations of objects within that cell, and the class probabilities indicate the likelihood of the object belonging to different predefined classes (like "person", "dog", "couch", or more).

Let's take a little sneak peek at how the model works using an image from the original [research paper](https://arxiv.org/abs/1506.02640):

![](https://drive.google.com/uc?export=view&id=10VQ4igt5A2ijEa_cg9o2g6tjjQQX4qNe)

As you can see, YOLO draws a bunch of different bounding boxes and then ultimately confines them down to the boxes with the greater significance and probability.

**For this notebook, we'll use the [Darknet YOLO model](https://pjreddie.com/darknet/yolo/) made by the authors of the paper!**

# **Building Our Model**

## **Inspecting Darknet Model**

We've downloaded the Darknet model in the form of weights. You can observe the path of this weight file in the `model_path` variable:

In [None]:
model_path

To load this model from the weights we just downloaded, we use the `load_model(some_path, compile=False)` function.

In this case, we set `compile` equal to `false` because we'd like to inspect the model. In case you wanted to directly train and evaluate the model, then you'd go ahead and compile the model by setting `compile=True`.

✏️ Load this model into a variable called `darknet`!

In [None]:
### YOUR CODE HERE
darknet = None
### END CODE

In [None]:
#@title Instructor Solution { display-mode: "form" }

darknet = load_model(model_path, compile=False)

Now let's go ahead and look at what composes this model.

*Hint: What function could you call to look at the details of a neural network?*

In [None]:
### YOUR CODE HERE

### END CODE

In [None]:
#@title Instructor Solution { display-mode: "form" }

darknet.summary()

As we can see, the neural network used by YOLOv3 consists mainly of convolutional layers, with some shortcut connections and upsample layers. For a full description of this network please refer to the [YOLOv3 Paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf).

## **Loading and Resizing Our Images**

We've already downloaded a few images that you can apply your YOLO algorithm to. Below is a list of the available images that you can load:

- PeopleOnStairs.jpg (available at `image_path`)
- house.jpg (available at `img_sd1_path`)
- watch_party.jpg (available at `img_sd2_path`)
- pizza_party.jpg (available at `img_sd3_path`)

Now, go ahead and take a quick glance at these images. Try using Pillow's Image library to open the image with `Image.open()`.

In [None]:
### YOUR CODE HERE

### END CODE

In [None]:
#@title Instructor Solution { display-mode: "form" }

""" Answers will depend on what image students choose to look at """

Image.open(img_sd2_path)

Some neat images, right!

Now, the input size of Darknet is `(416, 416) `, so we need to preprocess our images into the required size by resizing them. We have implemented the preprocessing for you in the ` preprocess_image(image, net_h, net_w) ` function, which takes the original image, the target height and width `net_h, net_w ` as input and returns the new image in the required size in a different format that Darknet can understand.

Try using Pillow's Image library to `open` the image from a path and see its `size` attribute! You might find the [PIL documentation](https://pillow.readthedocs.io/en/stable/reference/Image.html) useful :)

In [None]:
net_h, net_w = 416, 416

#### YOUR CODE HERE

# Open your image
image_pil = None

# get the width and height size of image_pil
image_w, image_h = None, None

# preprocess the image
new_image = None

#### END CODE

In [None]:
#@title Instructor Solution { display-mode: "form" }
net_h, net_w = 416, 416

# Open your image
image_pil = Image.open(img_sd2_path) #change depending on what image to process

# get the width and height size of image_pil
image_w, image_h = image_pil.size

# preprocess the image
new_image = preprocess_image(image_pil, net_h, net_w)

To plot images, try using the `plt.imshow(some_image)` function (documentation [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html)!)

In [None]:
# Display the resized image
fig, (ax1, ax2) = plt.subplots(1, 2)

# Plot the resized image on the first subplot
ax1.imshow(new_image[0])
ax1.set_title("Resized Image")

# Plot the original image on the second subplot
ax2.imshow(image_pil)
ax2.set_title("Original Image")

# Display the subplots side by side
plt.show()
print("Old image dimensions: (", image_w, ",", image_h, "), new dimensions: (", net_w, ",", net_h, ")")

In [None]:
#@title Instructor Solution { display-mode: "form" }

# Display the resized image
fig, (ax1, ax2) = plt.subplots(1, 2)

# Plot the resized image on the first subplot
ax1.imshow(new_image[0])
ax1.set_title("Resized Image")

# Plot the original image on the second subplot
ax2.imshow(image_pil)
ax2.set_title("Original Image")

# Display the subplots side by side
plt.show()
print("Old image dimensions: (", image_w, ",", image_h, "), new dimensions: (", net_w, ",", net_h, ")")

As you'll notice, the **aspect ratio of your image is preserved** since without doing so, object detection becomes much harder as the objects will be distorted. In our case, we've plastered our image onto a gray background to preserve its dimensions.

## **Detecting Objects**



Now that we've resized our images, it's time to *predict* objects using YOLO and Darknet (hint: use the `darknet.predict` function and passing your resized image in)!

In [None]:
### YOUR CODE HERE
yolo_outputs = None
#### END CODE

In [None]:
#@title Instructor Solution { display-mode: "form" }

yolo_outputs = darknet.predict(new_image)

Excellent! We've now run YOLO and gotten `yolo_outputs` which represents its **predictions** for a) where the bounding boxes should be and b) how likely it is there will be an object within those bounding boxes.

Now, we need to actually plot these predictions on our image and label any and all bounding boxes we have.

**Let's start by adding two additional support variables:**

- `anchors`: Anchors are predefined bounding boxes of various sizes and aspect ratios used to assist the model in detecting objects in an image. During the training process, the model learns to predict offsets from these anchor boxes rather than directly predicting absolute bounding box coordinates. It's like a template that you can use, rather than having to start from scratch!
- `obj_thresh`: This is a parameter that represents the minimum confidence score or probability required for an object detection to be considered valid. In our case, anything below the `obj_thresh` will not be included in our final plot.

Once we've included these, we will go ahead and call the function

```
draw_boxes_and_get_coordinates(image_pil, boxes, labels)
```
to add bounding boxes to the image and give us the coordinates of all the objects.



In [None]:
anchors = [[[116,90], [156,198], [373,326]], [[30,61], [62,45], [59,119]], [[10,13], [16,30], [33,23]]]
# These are the standard template boxes for YOLOv3! And remember since all images are the same size (416 x 416), we can just use plain old numbers!

In [None]:
obj_thresh = 0.45
# This will only keep bounding box predictions with confidence scores of 0.45 or higher

# Transform the raw predictions (yolo_outputs) into meaningful bounding box coordinates, class probabilities, and objectness scores using the provided anchor boxes (anchors) and image dimensions.
boxes = decode_netout(yolo_outputs, obj_thresh, anchors, image_h, image_w, net_h, net_w)

# Draw bounding boxes on the image using labels and get the co-ordinates
image_detect, coordinates_with_labels = draw_boxes_and_get_coordinates(image_pil, boxes, labels)

# Plot the image with bounding boxes
plt.figure(figsize=(8, 8))
plt.imshow(image_detect)
plt.title("Detected Objects with Bounding Boxes")
plt.show()

Well, that looks decent? We're seeing some initial classifications, but still: there's too many boxes there that are overlapping and are duplicates.

To account for this, we need to pass in one more support variable:
- `nms_thresh`: The `nms_thresh` is a threshold value between 0 and 1 that determines the amount of overlap required for two bounding boxes to be considered redundant. If the Intersection over Union (IoU) between two boxes exceeds the `nms_thresh`, the box with the lower confidence score will be suppressed or removed from the final detections.

Check [this page](https://hasty.ai/docs/mp-wiki/metrics/iou-intersection-over-union#:~:text=The%20bigger%20the%20overlapping%2C%20the,score%2C%20the%20better%20the%20result.&text=The%20best%20possible%20value%20is,to%20reach%20such%20a%20score.&text=Our%20suggestion%20is%20to%20consider,score%20as%20the%20poor%20one.) out to learn more about IoU!

**Let's go ahead and pass in `nms_thresh` and see what happens.**

(and in case you were wondering what happened to the resized image, YOLO only needs that 416x416 image to do its calculations and can extrapolate its bounding boxes to the original dimension after!)


In [None]:
obj_thresh = 0.45
nms_thresh = 0.45

boxes = decode_netout(yolo_outputs, obj_thresh, anchors, image_h, image_w, net_h, net_w)

# Suppress redundant, overlapping boxes
boxes = do_nms(boxes, nms_thresh, obj_thresh)

# Draw bounding boxes on the image using labels and get the co-ordinates
image_detect, coordinates_with_labels = draw_boxes_and_get_coordinates(image_pil, boxes, labels)

plt.figure(figsize=(8,8))
plt.imshow(image_detect)
plt.show()

**Perfect! Now test out other values for `nms_thresh` and `obj_thresh` — how do the variables relate to one another?**

Now, since we are interested in the ```person``` label category, we can write a function to filter the ```person``` object coordinates with threshold prediction confidence.



In [None]:
def filter_person_category(image_pil, coordinates_with_labels):

  image = image_pil.copy()
  person_coordinates = []
  img_draw = ImageDraw.Draw(image)

  ###YOUR CODE HERE###
  for data in coordinates_with_labels:
    # printing data here might be helpful!

    # store the appropriate value in our data variable
    label = None
    confidence = None
    coordinates = None
    top_left_coordinates = None
    bottom_right_coordinates = None

    # Replace "False" with your condition. What do we want to filter for? And how confident should the result be to include it?
    if False:

      # Draw bounding box for object of 'person' category
      person_coordinates.append(coordinates)
      img_draw.rectangle([top_left_coordinates[0], top_left_coordinates[1],
                          bottom_right_coordinates[0], bottom_right_coordinates[1]],
                         outline='green',width=5)
  ###END CODE###
  del img_draw
  return image, person_coordinates

In [None]:
#@title Instructor Solution { display-mode: "form" }
def filter_person_category(image_pil, coordinates_with_labels):

  image = image_pil.copy()
  person_coordinates = []
  img_draw = ImageDraw.Draw(image)

  ### YOUR CODE HERE ###
  for data in coordinates_with_labels:
    # printing data here might be helpful!
    print(data)
    # store the appropriate value in our data variable
    label = data[0]
    confidence = data[1]
    coordinates = data[2]
    top_left_coordinates = data[2][0]
    bottom_right_coordinates = data[2][1]

    # Replace "False" with your condition. What do we want to filter for? And how confident should the result be to include it?
    if label == "person" and confidence > 0.75:

      # Draw bounding box for object of 'person' category
      person_coordinates.append(coordinates)
      img_draw.rectangle([top_left_coordinates[0], top_left_coordinates[1],
                          bottom_right_coordinates[0], bottom_right_coordinates[1]],
                         outline='green',width=5)

  del img_draw
  return image, person_coordinates

Let's go ahead and apply the `filter_person_category` function to your image!

In [None]:
image_person, person_coordinates = filter_person_category(image_pil,coordinates_with_labels)

And now, let's view the image with only the person category marked!

In [None]:
plt.figure(figsize=(12,12))
plt.imshow(image_person)
plt.show()

### **You've just created your first object detection with YOLO!**

# **Testing For Social Distancing**

## **Compute Midpoints**


To determine whether people are socially distant in our images or not, we will need to a) determine their midpoints to help us b) determine the distance between them.

Below, create a function `get_midpoints` that takes in the peoples' coordinates (hint: what list already did that for us?) and returns the bottom center midpoint.

Then, use that function to calculate the midpoints for your image!

In [None]:
#Add parameters here, what does this function need in order to calculate midpoints
def get_midpoints():

  midpoints = []

  ### YOUR CODE HERE

  ### END CODE ###
  return midpoints

my_image_midpoints = get_midpoints()
print("Here are the midpoints of every person:", my_image_midpoints)

In [None]:
#@title Instructor Solution { display-mode: "form" }
def get_midpoints(person_coordinates):

  midpoints = []
  for i,coordinates in enumerate(person_coordinates):

    (x1,y1),(x2,y2) = coordinates
    #compute bottom center of bbox
    x_mid = int((x1+x2)/2)
    y_mid = int(y2)
    mid   = (x_mid,y_mid)
    midpoints.append(mid)
  return midpoints

my_image_midpoints = get_midpoints(person_coordinates)
print("Here are the midpoints of every person:", my_image_midpoints)

## **Compute Euclidean Distance** ##

**Now, we have the midpoints for each bounding box — or each person — in your image. Let's go ahead and find the distance — in particular, the euclidean distance — between each person and all the other people in the frame.**

Make sure your code deals with the scenario where you don't compute the same distance twice!

For example, ensure that the distance between ```person 1``` and ```person 2``` is same as that of ```person 2``` and ```person 1```

And hint: use the ```distance.euclidean(arg1, arg2) ```
from the ```scipy``` library


In [None]:
from scipy.spatial import distance

def compute_distance(midpoints,num):
  # Create n * n matrix to store the distance
  dist = np.zeros((num,num))
  ### YOUR CODE HERE ###

  ### END CODE ###
  return dist

In [None]:
#@title Instructor Solution { display-mode: "form" }
from scipy.spatial import distance

def compute_distance(midpoints,num):
  # Create n * n matrix to store the distance
  dist = np.zeros((num,num))
  ### YOUR CODE HERE ###
  for i in range(num):
    for j in range(i+1,num):
      if i!=j:
        dst = distance.euclidean(midpoints[i], midpoints[j])
        dist[i][j]=dst
  ### END CODE ###
  return dist

With your function made, go ahead and apply it to your image!

In [None]:
### YOUR CODE HERE
num_people = None
dist = None
### END CODE

In [None]:
#@title Instructor Solution { display-mode: "form" }
num_people =  len(my_image_midpoints)
dist = compute_distance(my_image_midpoints,num_people)

## **Bringing Everything Together** ###

We now have a) a model to detect objects, b) the midpoint of each person in frame, and c) the distance between each image.

**It's time to finally check for social distancing!** 🎉

Your goal is to now write a function `filter_pairs_less_distance` that takes the distance between each pair of individuals and the number of people in the image. Then, output a list containing the first individual who failed the distance threshold, a list containing the second individual, and then a value that indicates the distance between them.

To do so, you'll have to use a threshold which we've defined as `distance_threshold`. Note that `distance_threshold` is a hyper-parameter and can differ across images as camera calibration isn't being accounted for.

**Note: There's no need to compute the distance of person with themself!**

In [None]:
def filter_pairs_less_distance(dist,num_people):

  distance =[]
  person1=[]
  person2=[]

  #Specify threshold
  threshold = 300
  ### YOUR CODE HERE ###

  ### CODE ENDS ###
  return person1, person2, distance

In [None]:
#@title Instructor Solution { display-mode: "form" }
'''
Returns the distance , pairs id with distance less than threshold
'''
def filter_pairs_less_distance(dist,num_people):

  distance =[]
  person1=[]
  person2=[]
  threshold = 300
  for i in range(num_people):
    for j in range(i,num_people):
      if( (i!=j) & (dist[i][j]<=threshold)):
        person1.append(i)
        person2.append(j)
        distance.append(dist[i][j])
  return person1, person2, distance

**And now, let's go ahead and apply this function to your particular image!**

In [None]:
person1, person2, distance = filter_pairs_less_distance(dist,num_people)

Chances are that there are a few folks who are skimping on social distancing guidelines and were caught by your program. For these folks, let's go ahead and outline them in red!

In [None]:
def plot_red_bbox(img,coordinates,person1,person2):
  img_copy = img.copy()
  img_draw = ImageDraw.Draw(img_copy)
  no_social_distance = np.unique(person1 + person2)

  for i in no_social_distance:
    (x1,y1) = coordinates[i][0]
    (x2,y2) = coordinates[i][1]
    img_draw.rectangle([x1,y1,x2,y2],outline='red',width=10)
  del img_draw
  return img_copy

And then the moment of truth, we can apply this to your image!

In [None]:
img = plot_red_bbox(image_person,person_coordinates,person1,person2)

plt.figure(figsize=(18,7))
plt.imshow(img)

In case your plot seems incorrect, modify the `threshold` value as you see fit!

# **Congratulations! 🎉 You've just been able to track social distancing with YOLO!**