<p> <center> <a href="../Start_here.ipynb">Home Page</a> </center> </p>

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="4.Model_deployment_with_DeepStream.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="1.Data_labeling_and_preprocessing.ipynb">1</a>
        <a href="2.Object_detection_using_TAO_YOLOv4.ipynb">2</a>
        <a href="3.Model_deployment_with_Triton_Inference_Server.ipynb">3</a>
        <a href="4.Model_deployment_with_DeepStream.ipynb">4</a>
        <a >5</a>
    </span>
</div>

# Measure object size using OpenCV

***

**The goal of this notebook is to make you understand how to:**

- Get started with OpenCV basics like loading images, converting colors, and more
- Use OpenCV for tasks like edge and color detection
- Get information about the contours from the edge/color masks
- Automatically estimate the size of objects
- Process and render a live video stream

**Contents of this notebook:**

- [Introduction to OpenCV](#Introduction-to-OpenCV)
- [Edge and color detection](#Edge-and-color-detection)
    - [Canny edge detection algorithm](#Canny-edge-detection-algorithm)
    - [HSV color thresholding](#HSV-color-thresholding)
- [Size detection with contours](#Size-detection-with-contours) 
- [Get object size statistics](#Get-object-size-statistics)

## Introduction to OpenCV

Deep learning is a great tool and delivers stunning results, but it doesn't necessarily represent the best choice for every task. In situations where it is enough to look at the colors or shapes of objects to get the desired results, avoiding feeding images through a neural network and working at the pixel level can be much faster and more convenient. Not only does this not require an annotated dataset and training resources, but it also requires less effort to achieve comparable performance. And this is where OpenCV comes in!

[OpenCV](https://opencv.org/) (Open Source Computer Vision Library) is an open-source, modular library that includes several hundred computer vision algorithms intended primarily for real-time applications. OpenCV is written in C++ and its primary interface is in C++ but provides bindings in other programming languages including Python and Java. The API for these interfaces can be found in the online documentation. 

OpenCV natively delivers features for a myriad of use cases, from simple to very complex. Among the simpler ones, we think for example of tasks such as detection of edges, colors, lines, circles, and template matching. Among the more complex ones, we find instead camera calibration, real-time pose estimation, and the possibility of inferring neural networks.

<img src="images/res_mario.jpg" width="720">
<div style="font-size:11px">Source: https://docs.opencv.org/</div><br>

In this notebook, we will explore how to use OpenCV to measure the size of our fruits and sort them into three categories based on that.

## Edge and color detection

OpenCV offers great functionalities for tasks such as edge and color detection. As soon as one of these is performed, getting the size of an object is immediate as they are strongly correlated. Let's see them in practice in the cells below.

In [None]:
import cv2
import imutils
from imutils import perspective, contours
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial import distance as dist

sample_image = "../data/testing/image_2/0072.png"

# load the sample image with OpenCV
image = cv2.imread(sample_image)

# OpenCV loads images in BGR. Convert to RGB to view with plt
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()

### Canny edge detection algorithm

In this cell, we perform edge detection using the Canny algorithm. It is a multi-stage algorithm that includes noise reduction, finding the intensity gradient of the image, non-maximum suppression, and hysteresis thresholding. We follow it with two rounds of dilation and erosion to close the gaps between object edges and render a more clear boundary.

In [None]:
# convert it to grayscale, and blur it slightly
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (3, 3), 0)

# perform edge detection, then perform dilation + erosion to
# close gaps in between object edges
edged = cv2.Canny(gray, 60, 120)
edged = cv2.dilate(edged, None, iterations=2)
edged = cv2.erode(edged, None, iterations=2)

plt.imshow(edged, cmap="gray")
plt.show()

### HSV color thresholding

Another solution that leads to the same result uses threshold-based color filtering in the HSV (for hue, saturation, value) color space. In this case, we do not detect edges but entire regions whose HSV values lie in one or more ranges, leading to a filled mask representing the position of the objects.

In [None]:
blurred = cv2.GaussianBlur(image, (3, 3), 0)
# convert to hsv color space
hsv = cv2.cvtColor(blurred, cv2.COLOR_BGR2HSV)

# color thresholding
# lower boundary red color range values: Hue (0 - 15)
lower1 = np.array([0, 50, 20])
upper1 = np.array([15, 255, 255])
# using inRange function to get only red colors
lower_mask = cv2.inRange(hsv, lower1, upper1) 
# upper boundary red color range values: Hue (170 - 180)
lower2 = np.array([170, 50, 20])
upper2 = np.array([180, 255, 255])
upper_mask = cv2.inRange(hsv, lower2, upper2)

# merge the masks
mask = lower_mask | upper_mask
# remove noise
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)

plt.imshow(mask, cmap="gray")
plt.show()

## Size detection with contours

In both cases, we now have the ability to find contours in the edge or color mask which define the size of the object. In particular, we keep the largest one which should match the only fruit in the image.

We also initialize a `pixels_per_metric` variable which stores how many pixels there are in a unit of measurement, say centimeters. This variable should be set by taking an object with a known size and seeing its pixel size in the camera footage. For a more robust system, you could also include a reference object in each frame, such as an ArUco marker, and use it to calibrate the camera from time to time.

In [None]:
# find contours in the edge map
cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
                        cv2.CHAIN_APPROX_SIMPLE)

# or in the HSV color mask
# cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
#     cv2.CHAIN_APPROX_SIMPLE)

cnts = imutils.grab_contours(cnts)

# sort the contours, keep the largest, and set 'pixels_per_metric' calibration variable
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
c = cnts[0]
pixels_per_metric = 38 # pixels per cm

As a final step, we can obtain the bounding box with the minimal area of the contour and access its dimensions. This gives an idea of the size of the object and is particularly accurate if it is rectangular in shape.

In [None]:
def midpoint(ptA, ptB):
    return ((ptA[0] + ptB[0]) * 0.5, (ptA[1] + ptB[1]) * 0.5)

# compute the rotated bounding box of the contour
orig = image.copy()
box = cv2.minAreaRect(c)
box = cv2.cv.BoxPoints(box) if imutils.is_cv2() else cv2.boxPoints(box)
box = np.array(box, dtype="int")

# order the points in the contour such that they appear
# in top-left, top-right, bottom-right, and bottom-left
# order, then draw the outline of the rotated bounding box
box = perspective.order_points(box)
cv2.drawContours(orig, [box.astype("int")], -1, (0, 255, 0), 2)

# loop over the original points and draw them
for (x, y) in box:
    cv2.circle(orig, (int(x), int(y)), 5, (0, 0, 255), -1)

# unpack the ordered bounding box, then compute the midpoint
# between the top-left and top-right coordinates, followed by
# the midpoint between bottom-left and bottom-right coordinates
(tl, tr, br, bl) = box
(tltrX, tltrY) = midpoint(tl, tr)
(blbrX, blbrY) = midpoint(bl, br)

# compute the midpoint between the top-left and bottom-left points,
# followed by the midpoint between the top-right and bottom-right
(tlblX, tlblY) = midpoint(tl, bl)
(trbrX, trbrY) = midpoint(tr, br)

# draw the midpoints on the image
cv2.circle(orig, (int(tltrX), int(tltrY)), 5, (255, 0, 0), -1)
cv2.circle(orig, (int(blbrX), int(blbrY)), 5, (255, 0, 0), -1)
cv2.circle(orig, (int(tlblX), int(tlblY)), 5, (255, 0, 0), -1)
cv2.circle(orig, (int(trbrX), int(trbrY)), 5, (255, 0, 0), -1)

# draw lines between the midpoints
cv2.line(orig, (int(tltrX), int(tltrY)), (int(blbrX), int(blbrY)),
    (255, 0, 255), 2)
cv2.line(orig, (int(tlblX), int(tlblY)), (int(trbrX), int(trbrY)),
    (255, 0, 255), 2)

# compute the Euclidean distance between the midpoints
dA = dist.euclidean((tltrX, tltrY), (blbrX, blbrY))
dB = dist.euclidean((tlblX, tlblY), (trbrX, trbrY))

# compute the size of the object
dimA = dA / pixels_per_metric
dimB = dB / pixels_per_metric
print(f"Size of the bounding rectangle: {dimA:.1f}cm x {dimB:.1f}cm \n")

# draw the object sizes on the image
cv2.putText(orig, "{:.1f}cm".format(dimA),
    (int(tltrX - 15), int(tltrY - 10)), cv2.FONT_HERSHEY_SIMPLEX,
    0.65, (0, 0, 0), 2)
cv2.putText(orig, "{:.1f}cm".format(dimB),
    (int(trbrX + 5), int(trbrY)), cv2.FONT_HERSHEY_SIMPLEX,
    0.65, (0, 0, 0), 2)

# show the output image
plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.show()

If the object has spherical symmetry, we can use the minimum enclosing circle instead and use the diameter as a better measure of its size.

In [None]:
# compute the minimum enclosing circle of the contour
orig = image.copy()
(x, y), radius = cv2.minEnclosingCircle(c)

# draw the circle
cv2.circle(orig, (int(x), int(y)), int(radius), (0, 255, 0), 2)

# draw a diameter and end points
cv2.line(orig, (int(x - radius), int(y)), (int(x + radius), int(y)),
    (255, 0, 255), 2)
cv2.circle(orig, (int(x - radius), int(y)), 5, (255, 0, 0), -1)
cv2.circle(orig, (int(x + radius), int(y)), 5, (255, 0, 0), -1)

# draw the center
cv2.circle(orig, (int(x), int(y)), 5, (0, 0, 255), -1)

# compute the size of the object
dimR = radius / pixels_per_metric
print(f"Diameter of the object: {2 * dimR:.1f}cm")

# draw the object sizes on the image
cv2.putText(orig, "{:.1f}cm".format(2 * dimR),
    (int(x - 15), int(y - 10)), cv2.FONT_HERSHEY_SIMPLEX,
    0.65, (0, 0, 0), 2)

# show the output image
plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.show()

Great, we now have a way to measure the size of an object using OpenCV! If all the images were taken by the same camera in the same location, or if there was a reference object in each image to compare with, we could easily estimate the sizes of a series of objects using OpenCV and compare them. Our fruit dataset doesn't have this property, but we will still assume all the images of the oranges were taken from the same camera and classify the diameter of fruits, in particular oranges, into three different categories: `small`, `medium`, and `large`.

## Get object size statistics

Let's start by loading a module that takes an image as input and returns the size of an object as output. In particular, we are interested in detecting the size of oranges and automating the retrieval of dataset statistics. We will use the HSV mask to detect the color orange and the minimum enclosing circle strategy since we expect relatively spherical fruits. Next, we will apply this function to some images of oranges and get a size distribution for the fresh oranges in our dataset.

In [None]:
import sys
sys.path.append("../source_code/N5")
from calc_object_size import calc_object_size
import os

image_dir = "../source_code/N5/oranges"
output_dir = "../source_code/N5/output"
valid_image_ext = ['.jpg', '.png', '.jpeg']

!rm -rf $output_dir
!mkdir $output_dir

sizes = []
for image in os.listdir(image_dir):
    if os.path.splitext(image)[1].lower() in valid_image_ext:
        img_path = os.path.join(image_dir, image)
        output_path = os.path.join(output_dir, image)
        sizes.append(calc_object_size(img_path, output_path))

In [None]:
# Simple grid visualizer
from math import ceil
      
def visualize_images(image_dir, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(image_dir, image) for image in os.listdir(image_dir) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img)
        
# Visualizing the sample images
OUTPUT_PATH = os.path.join("../source_code/N5", "output")
COLS = 3 # number of columns in the visualizer grid
IMAGES = 9 # number of images to visualize

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

Now let's draw a histogram to see how the dimensions are distributed in our dataset and highlight the threshold for our division into three categories.

In [None]:
plt.figure(figsize=[8,5])
plt.hist(sizes, bins=len(sizes)//2, color='orange', edgecolor='k', alpha=0.6)
plt.axvline(np.quantile(sizes, 1/3), color='g', linestyle='dashed', lw=3)
plt.axvline(np.quantile(sizes, 2/3), color='g', linestyle='dashed', lw=3)
plt.title("Histogram of the size of oranges: tertiles in green")
plt.xlabel("cm")
plt.show()

Now that we have collected this data, we can classify oranges into three dimensions - `small`, `medium`, and `large` - continuously on a live video stream. Sorting will take place according to the following rule: if the size of an orange is in the first tertile of the size distribution, we will say that the orange is small; if it exceeds the second threshold, then it is large; otherwise, it is classified as medium.

Please note that these thresholds could perfectly be predefined values as well, and the previous part could be skipped if data is not yet available or the sizes are set by other standards.

In [None]:
input_video = "../source_code/N5/oranges.mp4" # Source: https://depositphotos.com
output_video = "../source_code/N5/out.avi"

# thresholds
q1 = np.quantile(sizes, 1/3)
q2 = np.quantile(sizes, 2/3)

print(f"Using thresholds {q1:.1f}cm and {q2:.1f}cm ...")
pixels_per_metric = 28


# load a video stream from a file
cap = cv2.VideoCapture(input_video)

# Check if video opened successfully
if not cap.isOpened():
    print("Error opening video file")
    sys.exit(1)

# get the video size
frame_width = int(cap.get(3))
frame_height = int(cap.get(4)) 
size = (frame_width, frame_height)
fps = 20
   
# VideoWriter object will save a processed frame of the 
# above video in the output video file
out = cv2.VideoWriter(output_video, 
    cv2.VideoWriter_fourcc(*'XVID'), fps, size)
  
while cap.isOpened():
    ret, frame = cap.read() # ret checks return at each frame
    
    if ret:
        blurred = cv2.GaussianBlur(frame, (3, 3), 0)
        # convert to hsv color space
        hsv = cv2.cvtColor(blurred, cv2.COLOR_BGR2HSV)

        # color thresholding
        # orange color range values: Hue (5 - 25)
        lower = np.array([5, 140, 190])
        upper = np.array([25, 255, 255])
        # using inRange function to get only orange colors
        mask = cv2.inRange(hsv, lower, upper)
        # remove noise
        mask = cv2.erode(mask, None, iterations=2)
        mask = cv2.dilate(mask, None, iterations=2)

        # find contours in the edge map
        cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
            cv2.CHAIN_APPROX_SIMPLE)
        cnts = imutils.grab_contours(cnts)
        
        orig = frame.copy()
        # loop over the contours individually
        for c in cnts:
            # if the contour is not sufficiently large, ignore it
            if cv2.contourArea(c) < 10000:
                continue

            # compute the minimum enclosing circle of the contour
            (x, y), radius = cv2.minEnclosingCircle(c)
            
            # compute the size of the object
            dimR = radius / pixels_per_metric
            
            color = (0, 255, 0) # medium -> green
            category = "medium"
            if 2 * dimR < q1: 
                color = (255, 255, 0) # small -> cyan
                category = "small"
            if 2 * dimR > q2:
                color = (0, 0, 255) # large -> red
                category = "large"
                
            # draw the circle
            cv2.circle(orig, (int(x), int(y)), int(radius), color, 2)

            # draw a diameter and end points
            cv2.line(orig, (int(x - radius), int(y)), (int(x + radius), int(y)),
                (255, 0, 255), 2)
            cv2.circle(orig, (int(x - radius), int(y)), 5, (255, 0, 0), -1)
            cv2.circle(orig, (int(x + radius), int(y)), 5, (255, 0, 0), -1)

            # draw the center
            cv2.circle(orig, (int(x), int(y)), 5, (0, 0, 255), -1)

            # draw the object sizes on the image
            cv2.putText(orig, "{:.1f}cm".format(2 * dimR),
                (int(x - 15), int(y - 10)), cv2.FONT_HERSHEY_SIMPLEX,
                0.65, (0, 0, 0), 2)
            
            cv2.putText(orig, f"{category}",
                (int(x - 15), int(y + 20)), cv2.FONT_HERSHEY_SIMPLEX,
                0.65, (0, 0, 0), 2)
 
        # output the frame
        out.write(orig)
        
    else:
        break
    
# release input and output
cap.release()
out.release()
cv2.destroyAllWindows()

In [None]:
# Convert video profile to be compatible with Jupyter notebook
!ffmpeg -loglevel panic -y -an -i ../source_code/N5/out.avi -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 ../source_code/N5/output.mp4

In [None]:
# Display the output
from IPython.display import HTML
HTML("""
 <video width="640" height="480" controls>
 <source src="../source_code/N5/output.mp4"
 </video>
""".format())

Note that here we performed this process in a separate notebook for educational purposes using only OpenCV functions, but in a true computer vision application this type of OpenCV post-processing may need to be integrated into a more complex DeepStream or Triton Inference Server pipeline, seen in the previous two notebooks. In the first case, you can find the implementation of a reference application that includes OpenCV in the [DeepStream Python Apps](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps) GitHub repository, while in the second use case an OpenCV routine can be easily added to the function that processes the result from the server and used to get additional insights from the images.

Congratulations, you have successfully completed the **end-to-end computer vision** bootcamp! With this material, you now have a broader idea of what it takes to bring computer vision applications to life just starting from unlabeled data. You've also worked with multiple NVIDIA SDKs and followed a development flow that generalizes to many other use cases. Thank you for participating!

***

## References

- [1] *https://pyimagesearch.com/2016/03/28/measuring-size-of-objects-in-an-image-with-opencv*
- [2] *https://learnopencv.com/read-write-and-display-a-video-using-opencv-cpp-python*

## Licensing

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).

<br>
<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="4.Model_deployment_with_DeepStream.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="1.Data_labeling_and_preprocessing.ipynb">1</a>
        <a href="2.Object_detection_using_TAO_YOLOv4.ipynb">2</a>
        <a href="3.Model_deployment_with_Triton_Inference_Server.ipynb">3</a>
        <a href="4.Model_deployment_with_DeepStream.ipynb">4</a>
        <a >5</a>
    </span>
</div>

<br>
<p> <center> <a href="../Start_here.ipynb">Home Page</a> </center> </p>