# OpenCV Object Detection Tutorial in Python

In this tutorial, we will introduce you to OpenCV, an open-source computer vision library. You will use it with Python and NumPy in Google Colab to isolate out specific RGB colors with an accompanying threshold in a video. To do so, you will learn how to process video input, create a bitmask, and use edge detection and contouring. We will also teach you how to integrate a color selecting widget using Matplotlib and IPyWidgets.

## Setup

**Setup:**
You'll have to download and import this example video into the Google Colab files section yourself (on the left hand side)
[Example Video](https://drive.google.com/file/d/1pa0MORL5tW-71HTS5Z_cuYTWgX2CMgfC/view?usp=sharing)

In [None]:
# Import all of our libraries
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets
import cv2
import numpy as np
from google.colab import files

# Included an example video
VIDEO_PATH = "Ball Bounce 3d.mp4"

# Define a minimum area for our contours, this is to make sure we aren't detecting noise
# or really small things as real objects.
CONTOUR_AREA_MINIMUM = 300

## How to Create a Basic Color Picker

The ultimate goal of this tutorial is to track objects in a video of a specific color. To select an appropriate color to track, we will assume the object we want to track appears in the first frame of the input video, and that that object's color does not change significantly throughout the video.

The first step in this process is to create an interface to collect an input color from the user. This will be the color of the object that we want to track in the video. We want to be able to collect this input, but also to view what the color is so the user can adjust and change their color as needed.

### select_color function

This function will help us **visualize a chosen RGB** color combination, using a Python library called Matplotlib. This library provides a flexible platform for generating various types of graphs, charts, and figures to visually represent data. Here, we will use it to simply display a square the same color as our chosen color.

select_color:
1. Takes three parameters: red, green, and blue, representing RGB values ranging from 0 to 255.
2. Converts these RGB values into a format that Matplotlib can interpret as a color.
3. Sets up a small plot figure (2x2) using Matplotlib.
4. Generates a small matrix representing the specified color.
5. Displays the color using Matplotlib's `imshow` function, showing the single-pixel color matrix.
6. Removes axis markings with `plt.axis('off')` for a cleaner display.
7. Sets the plot title to showcase the RGB values passed as parameters using `plt.title`.
8. Finally, it shows the plot with `plt.show()`.

Effectively, when you call this select_color function with specific RGB values, it creates a small square display presenting that exact color. This code is helpful for quickly visualizing a specific RGB combination, providing an instant representation of a chosen color.

In [None]:
def select_color(red, green, blue):
    color = (red / 255, green / 255, blue / 255)
    plt.figure(figsize=(2, 2))
    # this creates a matrix of our color that matplotlib can interpret and display
    plt.imshow([[color]])
    plt.axis('off')
    plt.title(f"RGB: ({red}, {green}, {blue})")
    plt.show()

### interactive_sliders

This function is used to collect user input, using the IPython widgets library. It **creates an interface using sliders** and allows the user to adjust these sliders and select the color they want to use as input.

interactive_sliders function:
1. Defines a function that creates sliders for adjusting RGB values interactively.
2. Sets up four sliders using the library IPython widgets.
3. red_slider, green_slider, and blue_slider are created with default values of 128 and a range from 0 to 255, controlling the respective RGB components.
4. There's also a threshold_slider with the same range, but this will be used later.
5. Uses the interact function from the ipywidgets to link the `select_colors` function  with these sliders.
6. The interact function dynamically updates the select_color function by passing the current slider values for red, green, and blue whenever they change. Every time the sliders change, the function `select_colors` will be rerun. For this tutorial, this means our square will be redisplayed with the updated color.

When you execute interactive_sliders(), it generates a graphical interface displaying sliders for adjusting Red, Green, and Blue values. As you move these sliders, the associated `select_color` function gets updated with the new RGB values, immediately displaying the resulting color in a small square plot below the sliders. This provides a real-time visual representation of the selected color as the sliders are adjusted.


In [None]:
def interactive_sliders():
    # Create sliders for RGB values
    red_slider = widgets.IntSlider(value=128, min=0, max=255, description='Red')
    green_slider = widgets.IntSlider(value=128, min=0, max=255, description='Green')
    blue_slider = widgets.IntSlider(value=128, min=0, max=255, description='Blue')
    threshold_slider = widgets.IntSlider(value=128, min=0, max=255, description='Threshold') # this will be used later

    # Define the interaction function
    interact(select_color, red=red_slider, green=green_slider, blue=blue_slider, threshold=threshold_slider)

interactive_sliders()

interactive(children=(IntSlider(value=128, description='Red', max=255), IntSlider(value=128, description='Gree…

## Basic Color Picker with bitmask of the first video frame

For this part of the tutorial, we will be using a Computer Vision technique called "bit masking".

### Bitmasking

Bitmasking in computer vision involves using bitwise operations to manipulate individual bits in an image to achieve specific goals, like **isolating pixels of a particular color**.

Our goal is to isolate objects in video frames that have a certain color. A good first step is to figure our where in our input video this color exists.

We can look over every pixel in the image and see if it matches our selected color or if it falls within a set threshold (distance away) from our color. If it does, we can include it by setting that pixel to a 1 in the bitmask. If it doesn't match, we set it to a 0, effectively blacking it out from being seen.

Here's how you might use bitmasking to isolate pixels of a certain color:

1. Color Representation:
In an image, colors are represented as combinations of red (R), green (G), and blue (B) values, often in an 8-bit format where each color channel ranges from 0 to 255.

2. Define the Target Color:
Choose the color you want to isolate by specifying its RGB values.

3. Create a Mask:
Generate a mask by comparing the RGB values of each pixel in the image with the target color and given threshold.
Only pixels which fall within the target color's threshold are set to a 1 in the bitmask, and all other colored pixels are set to a 0. Setting a pixel to a 0 effectively blacks it out from being seen, and we can ignore it.

4. Apply the Mask:
Use the generated mask to selectively extract or manipulate pixels in the original image.

By applying the mask to the original image using a bitwise AND operation, you can isolate the pixels that match the target color while setting others to zero, allowing us to ignore them.

The functions below collect an input color using the functions we defined above, display the original first frame of the input video, and then apply the bitmask created using the input color to the first frame of the video.

All the pixels that fall within the target color's threshold shine through the bitmask, while the other pixels which are set to 0 are blacked out.

Using this idea of bitmasking, we proceed to starting to process our input video data. To start, we will capture just the first frame of the input, and assuming that the target object is present in this first frame, we will select a color to capture this object.

In [None]:
# Collects the first frame of a video
# Open the video file
video_capture = cv2.VideoCapture(VIDEO_PATH)

# Check if the video file is opened successfully
if not video_capture.isOpened():
    print("Error: Could not open video file")
    exit()

# Read the first frame
ret, first_frame = video_capture.read()

# Release the video capture object
video_capture.release()

if not ret:
    print("Error: Could not read the first frame")
    exit()


Now that we have collected the first frame of the input video, we will construct our first bitmask.

We will use the two functions we defined previously, `select_color` and `interactive_sliders` to do this (but we will slightly modify `select_color`).

To avoid code duplication, we make a function `display_figure` to display a figure with Matplotlib.

In [None]:
# We define some global variables
selected_color = [128, 128, 128] # the selected color of the object to track
threshold_global = 128 # we set a default threshold,
# any pixel's red, green, and blue value can be within a maximum of 128 values
# away from the selected_color's red, green, and blue values.


def display_figure(figsize, image, title):
    # displays an image with specified figure size and title
    plt.figure(figsize=figsize)
    plt.imshow(image)
    plt.axis('off')
    plt.title(title)
    plt.show()

Our `create_mask` function creates a bitmask using the color we select using `select_color` and `interactive_sliders`. This function uses the RGB values and threshold variable created in `select_color` and passes them to set an upper and lower threshold colors. Any pixel with color values between the upper and lower color thresholds will be included in the bitmask, and will not be blacked out.

The upper threshold, lower threshold, and the initial frame are passed into the function `cv2.inRange()`, which creates a bitmask with all pixels set to a 0 if they do not fall into the range.

Here we also modify `select_color` to display not only a square with our selected color, but a copy of the first frame collected from the input video, and that first frame with a bitmask using the selected color and threshold values applied.

For `Ball Bounce 3d.mp4`, our target object to track is the red ball. A good color selection that targets the red ball should display a bitmask with all the pixels blacked out except for the red ball at the top of the screen.

The lower the threshold, the less pixels will be included in the mask. The higher the threshold, the more pixels will be included. Play around with these values and see what happens as the bitmask is updated in realtime.

In [None]:
## For the example video of Ball Bounce 3d.mp4 try using Red = 128, Green = 50, Blue = 50

## Same as the color picker above this, but this will grab the first frame of the input
# video and create a bitmask from the selected color, then apply this bitmask to the first frame.

def create_mask(frame):
    global selected_color
    global threshhold_global
    # Create a binary mask based on the selected color
    # Take half of threshold, subtract it from the input color to get
    # a lower threshold color, add it to input to get an upper threshold color

    # We define it in BGR fashion, or BLUE, GREEN, RED.
    # Since selected_color = [red, green, blue], we use selected_color[2] first.
    threshold = threshold_global // 2
    lower_bound = np.array([max(0, selected_color[2] - threshold), max(0, selected_color[1] - threshold), max(0, selected_color[0] - threshold)])
    upper_bound = np.array([min(255, selected_color[2] + threshold), max(255, selected_color[1] + threshold), max(255, selected_color[0] + threshold)])
    mask = cv2.inRange(frame, lower_bound, upper_bound)
    return mask


def select_color(red, green, blue, threshold):
    # Using the global keyword in here let's us change the variable "selected_color"
    # that we define outside of this function, instead of creating another local variable.
    # Because we use this function inside of an interact function, it can't return anything.
    global selected_color
    global threshold_global
    threshold_global = threshold
    selected_color = [red, green, blue]

    # Display the selected color using matplotlib
    image = [[(red / 255, green / 255, blue / 255)]]
    display_figure((2,2),image,f"RGB: ({red}, {green}, {blue})")

    mask = create_mask(first_frame)
    result = cv2.bitwise_and(first_frame, first_frame, mask=mask)

    display_figure((6,6),cv2.cvtColor(first_frame, cv2.COLOR_BGR2RGB),"Original Image")

    # Display the result
    display_figure((6,6),cv2.cvtColor(result, cv2.COLOR_BGR2RGB),"Image with Selected Color Mask")

# Show sliders for RGB and threshold values
interactive_sliders()

interactive(children=(IntSlider(value=128, description='Red', max=255), IntSlider(value=128, description='Gree…

## Color bitmask and contouring of video with bounding boxes

Now we will actually process our input video.

A video is just a sequence of images played rapidly in succession. We can process an entire video by reading each of these frames and processing the frames separately. OpenCV provides a really conveinient way to do this using a VideoCapture object.

This object lets us see information about the video and also lets us read individual frames of the video. The object holds the frames of a video in such a way that we can iterate over them in a for loop or a while loop. We call our VideoCapture object holding our input video `cap`.

We use the `cap.get` function to access information about the video, including its width, height, and frames per second. We create a VideoWriter object (our output video) which has these same parameters.
We also specify a video encoding using  `cv2.VideoWriter_fourcc('M','J','P','G')`. We found this particular encoding works well for downloads in Google Colab, but for different projects this codec should be chosen more carefully. [Here is a beginner friendly link to learn more about creating videos with OpenCV, including codecs](https://docs.opencv.org/3.4/dd/d43/tutorial_py_video_display.html).

We iterate over the video frames, frame by frame, using the while loop. For each frame we create a mask with the function we defined earlier. Then we call the contours function on this mask.

Simply put, `findContours` in OpenCV finds and identifies the contours (i.e., outlines or boundaries) of objects in an image. Contours are continuous curves that represent the boundaries of objects or shapes present in an image. It works best on binary images, like our bitmask ([A link to learn more!](https://docs.opencv.org/3.4/d4/d73/tutorial_py_contours_begin.html)). After execution, `cv2.findContours` returns a list of contours, each represented as points. Each contour is a Python list of (x, y) coordinates that outline a specific object in the image. If you're using the example input video, this should be the location, size, and shape of the red balls.
We iterate over the output of `cv2.findContours`. Each contour has a certain area. We set a contour area minimum of 300 pixels in the first step, since this works well with our example video. This helps us avoid detecting video noise as objects, since we know the objects that we want to detect are generally larger than 300 pixels. This is a constant that works for this tutorial, but for other projects, this variable should be chosen carefully.

`cv2.boundingRect` is another function in OpenCV which works on a given contour. It takes our contour as a parameter. This function finds a bounding rectangle that will completely enclose a given contour. It returns the four coordinates of the rectangle in the form of an initial x and y coordinate (representing the bottom left corner), and a width and heighth values which we can use to the find the other four corners. But we only need the bottom left corner and the top right in order to draw rectangles in OpenCV.

Next, we call `cv2.rectangle`. This function simply draws a rectangle on a given video frame. We can call this function and pass in our frame, and also the coordinates we have calculated with `boundingRect` to draw the rectangle around the detected object on the input video frame. We use the color blue to draw the rectangle (255, 0, 0).

Then we add a label to the bounding box around the detected object, labeling it "Selected Color". We use the color blue to write the text (255, 0, 0) using the OpenCV function `cv2.putText()`

Lastly, we add the altered frame to our VideoWriter object we defined earlier. The check to see if the key 'q' is pressed is more helpful if we are using a webcamera as visual input, but it is useful for troubleshooting if something is going wrong with the loop and it will not halt.

And finally, once we exit from the while loop, we release both the captured input video frame and the output VideoWriter. We no longer need to make changes to either of them. This is just like closing a file once we are done editing it.

In [None]:
## Wraps a box around things with the selected color from before.
## Possible areas of expansion : Add another selected color.

# To collect video from the camera, use the following settings for cap:
# NOTE: You cannot do this in Google Colab,
# cap = cv2.VideoCapture()
# cap = cv2.VideoCapture(0, apiPreference=cv2.CAP_AVFOUNDATION) # Initialize video capture

# To collect video from a filepath:
cap = cv2.VideoCapture(VIDEO_PATH)
if not cap.isOpened():    # Check if video capture is successful
    print("Error: Video capture not opened.")
    exit()

# Get the height and width of the inputted video.
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

frame_fps = int(cap.get(cv2.CAP_PROP_FPS))

# This creates an output video
# Define the codec and create VideoWriter object. The output is stored in 'outpy.avi' file.
# You will be able to find the video in the file tab in google colab
out = cv2.VideoWriter('outpy.avi',cv2.VideoWriter_fourcc('M','J','P','G'), frame_fps, (frame_width,frame_height))

# Check if the VideoWriter is initialized successfully
if not out.isOpened():
    print("Error: Could not initialize VideoWriter.")
    exit()

# Quit the program if the video is empty, or we fail to retrieve video data for some other reason
while True:
    # Read a frame from the video capture
    ret, frame = cap.read()
    if not ret:
        print("End of video")
        break

    mask = create_mask(frame)

    # Create the image contours
    contours, _ = cv2.findContours(mask,
                                           cv2.RETR_TREE,
                                           cv2.CHAIN_APPROX_SIMPLE)

    # For every contour that OpenCV finds, draw a rectangle around it.
    for contour in contours:
        # Get the area inside the contour
        area = cv2.contourArea(contour)
        # If the area is larger than the minimum we specify
        if(area > CONTOUR_AREA_MINIMUM):
            # Find the x, y, width, and heighth coordinates to draw a rectangle around it
            x, y, w, h = cv2.boundingRect(contour)
            # Draw the rectangle onto the frame
            frame = cv2.rectangle(frame, (x, y),
                                       (x + w, y + h),
                                       (255, 0, 0), 2)
            # Put text around the box showing that this object is our selected color
            cv2.putText(frame, "Selected Colour", (x, y),
                        cv2.FONT_HERSHEY_SIMPLEX,
                        1.0, (255, 0, 0))

    # Put the frame we altered to have the bounding box and text into a new video
    out.write(frame)

    # Break the loop when the 'q' key is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture and close all OpenCV windows
cap.release()
out.release()

print("Finished Processing Video! Checkout the video outpy.avi in the files tab")

End of video
Finished Processing Video! Checkout the video outpy.avi in the files tab


Now you can watch your output video, called `outpy.avi`! It should look like [this video](https://drive.google.com/file/d/1G5ytsHIFncjXsmKZbigd43yUBuRTlyVX/view?usp=sharing) if done correctly. Each object with the color that you selected should be within a labeled bounding rectangle.

This object detecting program is a great first step to understanding how object detection and video processing works in OpenCV. It also can work as a stepping stone to larger projects. What about creating a bitmask based on the size and shape of objects, rather than color (this would allow you to detect objects of any color)? Now that you can detect objects in the frame, can you track them?

There are some features of this code that allow video input to be taken from your computer web camera. As an additional exercise, you could uncomment the relevant sections and export the code to VSCode or another python IDE, and try to get this form of video input working.