Name: Kirti Kishore

UID: 120148286

email: kiki1@umd.edu

ENPM673 Project 2 Part 1

Disclaimer: Please upload the video in the colab file before running this program. As well as change the name of the video file I have given in this program to whatever may the name be in your local environment.



Block 1: Importing Libraries

In [None]:
import cv2
import os
import numpy as np


Explanation: This block imports the necessary libraries for computer vision and array manipulation.

Block 2: Video and Output Setup

In [None]:
video_path = 'proj2_v2.mp4'
output_directory = 'frames'
os.makedirs(output_directory, exist_ok=True)


Explanation: This block defines the path to the input video file and creates a directory to store extracted frames.

Block 3: Output Video Parameters

In [None]:
output_video_path = 'output_video.mp4'
fps_output = 5
fourcc = cv2.VideoWriter_fourcc(*'XVID')

Explanation: These parameters define the output video file path, desired frames per second (fps), and the video codec (XVID).

Block 4: Video Capture and Frame Information

In [None]:
cap = cv2.VideoCapture(video_path)
fps_input = cap.get(cv2.CAP_PROP_FPS)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))


Explanation: This block initializes a VideoCapture object, retrieves the input video's frames per second and total number of frames.

Block 5: Image Processing Parameters

In [None]:
laplacian_threshold = 150
rho = 1
theta = np.pi / 180
threshold_hough = 100
min_line_length = 100
max_lines = 5
roi_x_min, roi_x_max, roi_y_min, roi_y_max = 100, 500, 100, 700


Explanation: These parameters include the Laplacian threshold, Hough Transform parameters, line length, maximum number of lines, and the region of interest (ROI).

Block 6: Laplacian Kernel and Floodfill Parameters

In [None]:
laplacian_kernel = np.array([[0, 1, 0],
                             [1, -4, 1],
                             [0, 1, 0]])
floodfill_seed_point = (roi_x_min + (roi_x_max - roi_x_min) // 2, roi_y_min + (roi_y_max - roi_y_min) // 2)
floodfill_lo_diff = 30
floodfill_up_diff = 30


Explanation: This block defines the Laplacian kernel and parameters for floodfill, including seed point and color differences.

Block 7: Video Writer Setup

In [None]:
out = cv2.VideoWriter(output_video_path, fourcc, fps_output, (int(cap.get(3)), int(cap.get(4))))


Explanation: This block initializes a VideoWriter object with the specified output file, codec, output fps, and frame size.

Block 8: Main Loop for Processing Frames

In [None]:
def custom_connected_components(binary_image):
    labeled_image = np.zeros_like(binary_image)
    label_count = 1

    def dfs(x, y, label):
        if x < 0 or y < 0 or x >= binary_image.shape[1] or y >= binary_image.shape[0]:
            return
        if binary_image[y, x] == 1 and labeled_image[y, x] == 0:
            labeled_image[y, x] = label
            dfs(x + 1, y, label)
            dfs(x - 1, y, label)
            dfs(x, y + 1, label)
            dfs(x, y - 1, label)

    for y in range(binary_image.shape[0]):
        for x in range(binary_image.shape[1]):
            if binary_image[y, x] == 1 and labeled_image[y, x] == 0:
                dfs(x, y, label_count)
                label_count += 1

    return labeled_image, label_count


for frame_number in range(total_frames):
    # Read the frame
    ret, frame = cap.read()

    # Check if the frame was read successfully
    if not ret:
        break

    # Convert the frame to grayscale
    gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Compute the custom Laplacian using convolution
    laplacian_output = cv2.filter2D(gray_frame, cv2.CV_64F, laplacian_kernel)

    # Compute the variance of the custom Laplacian
    laplacian_var = np.var(laplacian_output)

    # Skip frames that have variance below the threshold
    if laplacian_var < laplacian_threshold:
        continue

    # Apply Canny edge detection
    edges = cv2.Canny(gray_frame, 50, 150)

    # Use Harris corner detector to detect corners in the image
    corners = cv2.cornerHarris(gray_frame, 2, 3, 0.04)

    # Threshold the corners to keep only strong ones
    corners_thresh = 0.1 * corners.max()
    corner_img = np.zeros_like(gray_frame)
    corner_img[corners > corners_thresh] = 255

    # Find centroids of the detected corners

    labeled_image, label_count = custom_connected_components(np.uint8(corner_img))

    # Draw the corners on the original frame with larger circles
    verified_corners = []
    for label in range(1, label_count + 1):
        label_coords = np.column_stack(np.where(labeled_image == label))

        # Check if there are any coordinates for the current label
        if label_coords.size == 0:
            continue

        centroid = np.mean(label_coords, axis=0).astype(int)
        x, y = centroid

        # Check if the corner is within the ROI
        if roi_x_min < x < roi_x_max and roi_y_min < y < roi_y_max:
            close_corners = np.linalg.norm(label_coords - centroid, axis=1) < 20
            if np.sum(close_corners) == 1:
                cv2.circle(frame, (x, y), 10, (0, 0, 255), -1)  # Larger green circles
                verified_corners.append((x, y))



    # Draw lines forming a bounding box around the paper
    if len(verified_corners) == 4:
        cv2.line(frame, tuple(verified_corners[0]), tuple(verified_corners[1]), (255, 0, 0), 2)
        cv2.line(frame, tuple(verified_corners[1]), tuple(verified_corners[3]), (255, 0, 0), 2)
        cv2.line(frame, tuple(verified_corners[2]), tuple(verified_corners[3]), (255, 0, 0), 2)
        cv2.line(frame, tuple(verified_corners[2]), tuple(verified_corners[0]), (255, 0, 0), 2)

    # Save the frame with verified corners and overlays as a frame in the output video
    if len(verified_corners) == 4:
        out.write(frame)

Explanation: This loop processes each frame from the input video, applies various image processing techniques, and writes frames with verified corners to the output video.

Block 9: Release Video Capture and Writer

In [None]:
cap.release()
out.release()
print(f"Output video generated successfully at {output_video_path} with {fps_output} fps.")


Output video generated successfully at output_video.mp4 with 5 fps.


Explanation: This block releases the VideoCapture and VideoWriter objects and prints a success message.

