###### %%writefile test
## Final Exam
<b>The general objective is to solve five different games.<br>
The code for each game should be written in an individual cell.<br>
Add comments in your code to explain your approach.<br>
You can design your own approach, and use any method you learned in this class.</b><br>

1A. Input images from video file WiiPlay.mp4 with level 15 (frame number between 4820 and 5000).<br> 
1B. (5pts) Acquire a <b>face template</b> from the first frame (frame number = 4820).<br>
1C. (10pts) Try to detect the face the same as the template on subsequent frames, draw a <b>red</b> rectangle around the detected face, and show the output images in the <b>"find_this_mii"</b> window.<br><br>

2A. Input images from video file WiiPlay.mp4 with level 8 (frame number between 2180 and 2380).<br>
2B. (5pts) Detect <b>pedestrians</b> on each frame and draw a <b>green</b> rectangle around your detection.<br>
2C. (5pts) Detect <b>faces</b> on each frame and draw a <b>blue</b> rectangle around your detection.<br>
2D. (10pts) Try to find two faces look alike each other, draw a <b>red</b> rectangle around each of the two faces, and show the output images in the <b>"find_two_look_alike"</b> window.<br><br>

3A. Input images from video file WiiPlay.mp4 with level 9 (frame number between 2480 and 2600).<br>
3B. (5pts) <b>Detect </b>faces(or pedestrians) on the first frame and draw a <b>blue</b> rectangle around your detection.<br>
3C. (10pts) <b>Track </b>faces(or pedestrians) on subsequent frames and draw a <b>green</b> rectangle around your tracking.<br>
3D. (5pts) Try to find out the fastest character, draw a <b>red</b> rectangle around the fastest character, and show the output images in the <b>"find_the_fastest_character"</b> window.<br><br>

4A. Input images from video file WiiPlay.mp4 with level 6 (frame number between 1650 and 1800).<br>
4B. (10pts) Compute and show <b>optical flows</b> on each frame using <b>blue</b> arrows.<br>
4C. (5pts) Try to detect two odd character who face the opposite direction from everyone else, draw a <b>red</b> rectangle around each of the two character, and show the output images in the <b>"find_two_odds"</b> window.<br><br>

5A. Input continuous BGR images from webcam.
5B. (5pts) Use <i>MediaPipe()</i> to detect and track one of your hands
5C. (5pts) Obtain the positions of 21 HandLandmarks, draw a <b>blue</b> circle around each HandLandmark.
5D. (10pts) Design an algorithm to recognize three hand gestures of Rock, Scissor, Paper. Write the type of the recognized hand gesture on the upper left corner using <i>cv2.putText()</i>.

6. (5pts) Any comments regarding the final exam? Which steps you believe you have completed? Which steps bother you?<br>
7. (5pts) Any suggestion to teaching assistants to improve this class? Any suggestion to teacher to improve this class?<br>
8. Upload your Jupyter file (*.ipynb) with code and report . 

In [1]:
# problem 1
# frames 4820 - 5000 ( lvl15 )
# from first frame detect mii face then locate it after

from typing import Iterator, Union, Tuple, List
import numpy, cv2

class VideoReader:
    
    video_stream  = None
    start_frame   = None
    num_frames    = None
    frame_ratio   = None
    frame_counter = None
    
    @classmethod
    def initialize(class_, video_file_name : str) -> "VideoReader":
        class_.video_stream = cv2.VideoCapture(video_file_name)
        if not (class_.video_stream.isOpened()):
            raise IOError("Cannot open video file \"{}\"\n".format(video_file_name))
        return class_
    
    @classmethod
    def configure(class_, start_frame : int, num_frames : int, frame_ratio : float = 1) -> "VideoReader":
        (class_.start_frame, class_.num_frames, class_.frame_ratio, class_.frame_counter) = (
            start_frame, num_frames, frame_ratio, num_frames + 1
        )
        return class_
    
    @classmethod
    def read(class_) -> Iterator[ numpy.ndarray ]:
        ratio = class_.frame_ratio
        while True:
            if (class_.num_frames < class_.frame_counter):
                class_.frame_counter = 0
                class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, class_.start_frame)
            latest_frame = class_.video_stream.read()[1]
            yield ((latest_frame) if (ratio == 1) else (cv2.resize(latest_frame, None, None, fx = ratio, fy = ratio)))
            class_.frame_counter += 1
            
    @classmethod
    def set_frame(class_, frame_number : int) -> "VideoReader":
        
        class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
        
        return class_
    
    @classmethod
    def reset_counter(class_):
        class_.frame_counter = class_.num_frames + 1
            
    @classmethod
    def release(class_):
        
        class_.video_stream.release()
        class_.video_stream  = None
        class_.start_frame   = None
        class_.num_frames    = None
        class_.frame_ratio   = None
        class_.frame_counter = None

class BasicFaceDetector:
    
    face_detector = None
    detector_file_name = "./haarcascade_frontalface_alt2.xml"
    
    @classmethod
    def initialize(class_) -> "BasicFaceDetector":
        class_.face_detector = cv2.CascadeClassifier(class_.detector_file_name)
        if (class_.face_detector.empty()):
            raise IOError("Cannot find XML file \"{}\"\n".format(class_.detector_file_name))
        return class_
    
    @classmethod
    def detect(class_, source_image : numpy.ndarray, ** kwargs) -> Union[ numpy.ndarray, tuple ]:
        return class_.face_detector.detectMultiScale(source_image, ** kwargs)
    
class ImageUtils:
    
    @staticmethod
    def draw_boundary_boxes(display_frame : numpy.ndarray, 
            boundary_boxes : Union[ type(None), List[ tuple ]], color = (0, 0, 255), width = 5) -> numpy.ndarray:
        
        if (boundary_boxes is None):
            return display_frame
        
        new_frame = numpy.copy(display_frame)
        
        for (x, y, w, h) in boundary_boxes:
            new_frame = cv2.rectangle(new_frame, (x, y), (x + w, y + h), color, width)
            
        return new_frame
    
    @staticmethod
    def enlarge_boundary_boxes(
            boundary_boxes : List[ tuple ], ratio : float, max_height : int, max_width : int) -> List[ tuple ]:
        
        def enlarge(boundary_box : tuple) -> tuple:
            (x, y, w, h) = boundary_box
            (new_w, new_h) = (int(w * ratio), int(h * ratio))
            (dif_w, dif_h) = (abs(w - new_w) // 2, abs(h - new_h) // 2)
            (end_x, end_y) = (x + new_w, y + new_h)
            return (
                max(0, x - dif_w),
                max(0, y - dif_h),
                min(max_width, (end_x - x)),
                min(max_height, (end_y - y))
            )
        return list(map(enlarge, boundary_boxes))

if (__name__ == "__main__"):
    
    start_frame     = 4820
    
    num_frames      = 180
    
    frame_ratio     = 0.75
    
    threshold       = 0.96
    
    window_name     = "find_this_mii"
    
    video_file_name = "./wiiplay.mp4"
    
    # load the video file and initialize the frame range to loop
    VideoReader.initialize(video_file_name).configure(start_frame, num_frames, frame_ratio)
    
    # initialize the haarcascade face detector 
    # must first download "haarcascade_frontalface_alt2.xml" from the official GitHub repository
    BasicFaceDetector.initialize()
    
    # convert an image from BGR to HSV
    def to_hsv(current_frame : numpy.ndarray) -> numpy.ndarray:
        
        return cv2.cvtColor(current_frame, cv2.COLOR_BGR2HSV)
    
    # resize an image by a scaling factor
    def resize_face(face_image : numpy.ndarray, scale : float = 0.90) -> numpy.ndarray:
        
        return cv2.resize(face_image, None, None, fx = scale, fy = scale)
    
    # obtain a face template from the first frame
    def fetch_template() -> numpy.ndarray:
        
        # fetch the first frame from the video
        current_frame = next(VideoReader.read())
        
        # detect and obtain the bounding box of the face from the first frame
        boundary_box = BasicFaceDetector.detect(
            cv2.cvtColor(current_frame, cv2.COLOR_BGR2GRAY), scaleFactor = 1.2, minSize = (40, 40), maxSize = (95, 95))
        
        # raise an exception if no face is detected in the first frame
        if (boundary_box.__len__() < 1):
            
            raise Exception("Cannot capture template face\n")
            
        # unpack values from the bounding box tuple
        (x, y, w, h) = boundary_box[0]
        
        # crop the first frame according to the bounding box and return the cropped image
        return current_frame[ y : y + h, x : x + w ]
    
    # obtain the bounding box of a connected region
    def comp_to_boundary_box(c : numpy.ndarray, i : int) -> Tuple[ int ]:
        
        # find the coordinates of a connected region labeled i
        (a, b) = numpy.where(c == i)
        
        # find the upper left corner of the connected region
        (q, r) = (numpy.min(b), numpy.min(a))
        
        # find the lower right corner and return the bounding box as a tuple
        return (int(q), int(r), int(numpy.max(b) - q), int(numpy.max(a) - r))
    
    # obtain the template image 
    template_face = resize_face(to_hsv(fetch_template()))
    
    # endlessly reading frames from video loop until Esc is pressed
    for frame in VideoReader.read():
        
        # detect faces in the current frame and obtain their bounding boxes
        faces        = BasicFaceDetector.detect(frame, scaleFactor = 1.2, minSize = (40, 40), maxSize = (95, 95))
        
        # perform template matching to find the template image in the current frame
        match_result = cv2.matchTemplate(to_hsv(frame), template_face, cv2.TM_CCOEFF)
        
        # basic normalization
        match_result = cv2.normalize(match_result, None, 0, 1, cv2.NORM_MINMAX)
        
        # filter out low confidence matching via thresholding, and locate the coordinates
        match_result = numpy.where(match_result >= threshold)
        
        # convert the coordinates to standard bounding boxes
        match_boundary_boxes = list( 
            (x, y, template_face.shape[1], template_face.shape[0]) for (x, y) in zip(match_result[1], match_result[0])  )
        
        # initialize an empty mask
        match_mask  = numpy.zeros(shape = frame.shape, dtype = numpy.uint8)
        
        # initialize an empty mask
        faces_mask  = numpy.zeros(shape = frame.shape, dtype = numpy.uint8)
        
        # mark the regions with matches by drawing solid rectangles
        match_mask  = ImageUtils.draw_boundary_boxes(match_mask, match_boundary_boxes, (255, 255, 255), -1)
        
        # mark the regions with faces detected by drawing solid rectangles
        faces_mask  = ImageUtils.draw_boundary_boxes(faces_mask, faces, (255, 255, 255), -1)
            
        # merge the two masks using "bitwise_and" to obtain regions of matched faces
        result_mask = cv2.bitwise_and(match_mask, faces_mask)
        
        # reduce the mask from 3 channels to just 1 channel (grayscale)
        result_mask = cv2.cvtColor(result_mask, cv2.COLOR_BGR2GRAY)
        
        # find the connected components in the merged mask
        n, c        = cv2.connectedComponents(result_mask)
        
        # only consider cases with at least one component ( background (=1) + components (>=1) -> (>1) )
        if (n > 1):
            
            # obtain bounding boxes from component regions and stack them together
            comp  = numpy.stack([  comp_to_boundary_box(c, i) for i in range(1, n)  ])
            
            # draw red bounding boxes on faces that satisfy what the description demands
            frame = ImageUtils.draw_boundary_boxes(frame, comp, (0, 0, 255), 4)
                
        # show the resulting image
        cv2.imshow(window_name, frame)
        
        # terminate the program when Esc is pressed
        if (cv2.waitKey(1) == 27):
            
            break
            
    cv2.destroyWindow(window_name)
    
    VideoReader.release()

In [2]:
# problem 2
# frames 2180 - 2380 ( lvl8 )
# detect pedestrians, their faces, and find two identicals

from typing import Iterator, Union, Tuple, List
from imutils.object_detection import non_max_suppression
import face_recognition, itertools, mediapipe, numpy, cv2

class VideoReader:
    
    video_stream  = None
    start_frame   = None
    num_frames    = None
    frame_ratio   = None
    frame_counter = None
    
    @classmethod
    def initialize(class_, video_file_name : str) -> "VideoReader":
        class_.video_stream = cv2.VideoCapture(video_file_name)
        if not (class_.video_stream.isOpened()):
            raise IOError("Cannot open video file \"{}\"\n".format(video_file_name))
        return class_
    
    @classmethod
    def configure(class_, start_frame : int, num_frames : int, frame_ratio : float = 1) -> "VideoReader":
        (class_.start_frame, class_.num_frames, class_.frame_ratio, class_.frame_counter) = (
            start_frame, num_frames, frame_ratio, num_frames + 1
        )
        return class_
    
    @classmethod
    def read(class_) -> Iterator[ numpy.ndarray ]:
        ratio = class_.frame_ratio
        while True:
            if (class_.num_frames < class_.frame_counter):
                class_.frame_counter = 0
                class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, class_.start_frame)
            latest_frame = class_.video_stream.read()[1]
            yield ((latest_frame) if (ratio == 1) else (cv2.resize(latest_frame, None, None, fx = ratio, fy = ratio)))
            class_.frame_counter += 1
            
    @classmethod
    def set_frame(class_, frame_number : int) -> "VideoReader":
        
        class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
        
        return class_
    
    @classmethod
    def reset_counter(class_):
        class_.frame_counter = class_.num_frames + 1
            
    @classmethod
    def release(class_):
        
        class_.video_stream.release()
        class_.video_stream  = None
        class_.start_frame   = None
        class_.num_frames    = None
        class_.frame_ratio   = None
        class_.frame_counter = None

class FaceDetector:
    
    face_detector = None
    
    @classmethod
    def initialize(class_, confidence_threshold : float, model_selection : int) -> "FaceDetector":
        class_.face_detector = mediapipe.solutions.face_detection.FaceDetection(
            model_selection = model_selection, min_detection_confidence = confidence_threshold
        )
        return class_
    
    @classmethod
    def detect(class_, source_image : numpy.ndarray) -> List[ tuple ]:
        def get_ratio_box(detection : mediapipe.framework.formats.detection_pb2.Detection) -> Tuple[ float ]:
            relative_boundary_box = detection.location_data.relative_bounding_box
            return (
                relative_boundary_box.xmin,
                relative_boundary_box.ymin,
                relative_boundary_box.width,
                relative_boundary_box.height
            )
        def rescale_boundary_box(boundary_box : Tuple[ float ], width : int, height : int) -> Tuple[ int ]:
            return (
                int(boundary_box[0] * width),
                int(boundary_box[1] * height),
                int(boundary_box[2] * width),
                int(boundary_box[3] * height)
            )
        (height, width) = source_image.shape[:2]
        detection_result = class_.face_detector.process(cv2.cvtColor(source_image, cv2.COLOR_BGR2RGB))
        if (detection_result.detections is None):
            return None
        boundary_boxes = [
            rescale_boundary_box(get_ratio_box(detection), width, height)
                for detection in detection_result.detections
        ]
        return boundary_boxes
    
    @staticmethod
    def convert_to_endpoint_boxes(boundary_boxes : List[ tuple ]) -> List[ tuple ]:
        return [
            (x, y, x + w, y + h) for (x, y, w, h) in boundary_boxes
        ]
    
    @staticmethod
    def convert_to_distance_boxes(boundary_boxes : List[ tuple ]) -> List[ tuple ]:
        return [
            (x, y, X - x, Y - y) for (x, y, X, Y) in boundary_boxes
        ]
    
    @classmethod
    def suppress_boundary_boxes(class_, boundary_boxes : List[ tuple ], threshold : float) -> List[ tuple ]:
        boundary_boxes = numpy.array(class_.convert_to_endpoint_boxes(boundary_boxes))
        return class_.convert_to_distance_boxes(
            non_max_suppression(boundary_boxes, probs = None, overlapThresh = threshold))

class MovementDetector:
    
    class List(list):
        
        def __init__(self):
            super(MovementDetector.List, self).__init__(self)
            
        def pop_first(self):
            if (self.__len__()):
                return self.pop(0)
            
    def fetch_background(self, num_background_frames : int) -> numpy.ndarray:
        
        def fetch_frame(frame_number : int) -> numpy.ndarray:
            return cv2.cvtColor(
                next(VideoReader.set_frame(frame_number).read()), cv2.COLOR_BGR2GRAY)
        
        # fetch frames and resetting the frame counter when finished
        def fetch_frames(frame_numbers : List[ int ]) -> numpy.ndarray:
            initial_frame = VideoReader.video_stream.get(cv2.CAP_PROP_POS_FRAMES)
            frames = numpy.stack([  fetch_frame(index) for index in frame_numbers  ])
            VideoReader.set_frame(initial_frame)
            VideoReader.reset_counter()
            return frames
        
        # randomly select a certain number of frames within the loop
        indices = numpy.random.choice(
            range(VideoReader.start_frame, VideoReader.start_frame + VideoReader.num_frames), num_background_frames)
        
        # return the median of those randomly-fetched images
        # median is used because it is more robust against outliers and eliminates a significantly amount of noises
        return numpy.median(fetch_frames(indices), axis = 0)
    
    def __init__(self, num_background_frames : int) -> None:
        
        self.background_frame = numpy.uint8(self.fetch_background(num_background_frames))
        self.frame_history = self.List()
        
        # the number of previous frames used to subtract with the background
        self.trace_limit = 8
        
    def configure(self, trace_limit : int) -> "MovementDetector":
        self.trace_limit = trace_limit
        return self
        
    def append(self, latest_frame : numpy.ndarray) -> "MovementDetector":
        
        # convert the lastest frame to grayscale and add it to the frame history
        self.frame_history.append(cv2.cvtColor(numpy.uint8(latest_frame), cv2.COLOR_BGR2GRAY))
        
        return self
    
    def pop_front(self) -> Union[ type(None), numpy.ndarray ]:
        return self.frame_history.pop_first()
    
    def detect(self) -> numpy.ndarray:
        
        # detect movements obtained from background subtraction with the previous 8 frames in the queue
        return numpy.sum(numpy.stack([
            cv2.absdiff(self.background_frame, frame) 
                for frame in self.frame_history[-self.trace_limit:]
        ]), axis = 0)
    
    def __len__(self):
        return self.frame_history.__len__()
    
    
# because template matching performed poorly, the "face recognition" library is used
# must install the library via "pip install face_recognition"
# if there is an error concerning "dlib", download the wheel from PyPi / GitHub and install locally
# after installing "dlib", retry "pip install face_recognition"
class FaceRecognizer:
    
    @staticmethod
    def find_identical_pair(faces : Union[ numpy.ndarray, List[numpy.ndarray], Tuple[numpy.ndarray] ]) -> Tuple[ tuple, float ]:
        
        encodings = [
            face_recognition.face_encodings(faces[index]) for index in range(faces.__len__())
        ]
        
        for pair in itertools.combinations(list(range(len(faces))), 2):
            
            first_image_encoding = encodings[pair[0]]
            second_image_encoding = encodings[pair[1]]
            if ((first_image_encoding.__len__() == 0) or (second_image_encoding.__len__() == 0)):
                return
            first_image_encoding = first_image_encoding[0]
            second_image_encoding = second_image_encoding[0]
            result = (face_recognition.compare_faces([ first_image_encoding ], second_image_encoding))
            if (result):
                return pair
            
    @classmethod
    def find_identicals(class_, original_image : numpy.ndarray, boundary_boxes : Tuple[ tuple ]) -> Tuple[ tuple ]:
        faces = [  original_image[ y : y + h, x : x + w ] for (x, y, w, h) in boundary_boxes  ]
        if (faces.__len__() < 2):
            return None
        return class_.find_identical_pair(faces)
    
class ImageUtils:
    
    @staticmethod
    def draw_boundary_boxes(display_frame : numpy.ndarray, 
            boundary_boxes : Union[ type(None), List[ tuple ]], color = (0, 0, 255), width = 5) -> numpy.ndarray:
        
        if (boundary_boxes is None):
            return display_frame
        
        new_frame = numpy.copy(display_frame)
        
        for (x, y, w, h) in boundary_boxes:
            new_frame = cv2.rectangle(new_frame, (x, y), (x + w, y + h), color, width)
            
        return new_frame
    
    @staticmethod
    def enlarge_boundary_boxes(
            boundary_boxes : List[ tuple ], ratio : float, max_height : int, max_width : int) -> List[ tuple ]:
        
        def enlarge(boundary_box : tuple) -> tuple:
            (x, y, w, h) = boundary_box
            (new_w, new_h) = (int(w * ratio), int(h * ratio))
            (dif_w, dif_h) = (abs(w - new_w) // 2, abs(h - new_h) // 2)
            (end_x, end_y) = (x + new_w, y + new_h)
            return (
                max(0, x - dif_w),
                max(0, y - dif_h),
                min(max_width, (end_x - x)),
                min(max_height, (end_y - y))
            )
        return list(map(enlarge, boundary_boxes))

# using thresholding and morphology, find the binary image for the cursor
def compute_cursor_mask(current_colored_frame : numpy.ndarray) -> numpy.ndarray:
    hsv_colored_frame = cv2.cvtColor(current_colored_frame, cv2.COLOR_BGR2HSV)
    thresholded = cv2.inRange(hsv_colored_frame, (79, 69, 133), (128, 222, 286))
    morph_list = [
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 5,  5))                ),   
        lambda img : cv2.erode(img,  cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 7,  7))                ),   
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 9,  9))                ),   
        lambda img : cv2.erode(img,  cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 6,  6)), iterations = 2),   
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (11, 11)), iterations = 2),    
    ]
    for morph in morph_list:
        thresholded = morph(thresholded)
    return numpy.uint8(thresholded)

# using thresholding and morphology, find the binary image for the counter (timer UI)
def compute_counter_mask(current_colored_frame : numpy.ndarray) -> numpy.ndarray:
    hsv_colored_frame = cv2.cvtColor(current_colored_frame, cv2.COLOR_BGR2HSV)
    thresholded = cv2.inRange(hsv_colored_frame, (0, 0, 44), (0, 4, 63))
    morph_list = [
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 5,  5))                ),   
        lambda img : cv2.erode(img,  cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 6,  6))                ),   
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (10, 10))                ),   
        lambda img : cv2.erode(img,  cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (10, 10))                ),   
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (10, 10)), iterations = 2),
        lambda img : cv2.erode(img,  cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ( 7,  7)), iterations = 2),      
        lambda img : cv2.dilate(img, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (15, 15)), iterations = 2)
    ]
    for morph in morph_list:
        thresholded = morph(thresholded)
    return numpy.uint8(thresholded)

# ignore movements from both the cursor and the counter (timer UI)
def filter_difference(difference_mask : numpy.ndarray, 
        cursor_mask : numpy.ndarray, counter_mask : numpy.ndarray) -> numpy.ndarray:

    # merge the two masks via "bitwise_or" to obtain a mask of regions in which to ignore movements
    ignorable_mask = numpy.uint8(cv2.bitwise_or(cursor_mask, counter_mask))
    
    """
        Let A <-> all movements
        Let B <-> all movements from cursor and timer
        ----------------------------------------------
        Truth Table:
            A B C
            0 0 0
            0 1 0
            1 0 1
            1 1 0
    """
    
    # obtain a more accurate mask by following the aforementioned logic
    return numpy.uint8(cv2.bitwise_and(
        numpy.uint8(cv2.bitwise_not(ignorable_mask)),
        numpy.uint8(cv2.bitwise_xor(ignorable_mask, difference_mask))
    ))

# obtain the bounding boxes of masked regions
def mask_to_boundary_boxes(difference_mask : numpy.ndarray, thresh_1 : int, thresh_2 : int) -> numpy.ndarray:
    
    # find the contours of connected regions within the binary mask
    contours = cv2.findContours(difference_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]
    
    # convert contours to bounding boxes if the bounded area is greater than "thresh_1"
    boxes = [  (x, y, w, h) for (x, y, w, h) in (cv2.boundingRect(cont) for cont in contours) if ((w * h) >= thresh_1) ]
    
    # return "None" if there are no bounding boxes which satisfy the above criteria
    if (boxes.__len__() == 0):
        return None
    
    # filter out bounding boxes whose bounded area is less than "thresh_2"
    boxes = list(filter(lambda x : ((x[2] * x[3]) >= thresh_2), boxes))
    
    # the bounding boxes if there are some, and "None" otherwise
    return ((boxes) if (boxes.__len__()) else (None))

if (__name__ == "__main__"):
    
    first_frame          = 2180
    
    num_frames           = 200
    
    frame_ratio          = 0.75
    
    num_backgrounds      = 30
    
    trace_history        = 8
    
    confidence_threshold = 0.26
    
    model_selection      = 1
    
    difference_thresh    = 127
    
    comp_threshold_0     = 400
    
    comp_threshold_1     = 4600
    
    comp_threshold_2     = 6900
    
    overlap_thresh       = 0.01
    
    window_name          = "find_two_look_alike"
    
    video_file_name      = "./wiiplay.mp4"
    
    # load the video and initialize the frame range to loop
    VideoReader.initialize(video_file_name).configure(first_frame, num_frames, frame_ratio)
    
    # initialize the mediapipe face detector
    FaceDetector.initialize(confidence_threshold, model_selection)
    
    # initialize the movement detector based on background subtraction
    movement_detector = MovementDetector(num_backgrounds).configure(trace_history)
    
    # create a named window
    cv2.namedWindow(window_name)
    
    # endlessly read frames from video loop
    for frame in VideoReader.read():
        
        # copy the current frame so as not to overwrite the original frame
        display_frame = numpy.copy(frame)        
        
        # add the current frame to the movement detector
        movement_detector.append(frame)
        
        # only start running the following after obtaining more than a certain number of frames
        if (movement_detector.__len__() <= trace_history):
            continue
            
        # through background subtraction, obtain the binary mask of regions with movements
        difference_mask  = numpy.uint8(255 * (movement_detector.detect() >= difference_thresh))
        
        # obtain the counter (timer UI) mask via thresholding and morphology
        counter_mask     = compute_counter_mask(frame)
        
        # obtain the cursor mask via thresholding and morphology
        cursor_mask      = compute_cursor_mask(frame)
        
        # remove the oldest frame from the queue 
        movement_detector.pop_front()
        
        # obtain the final difference mask by ignoring movements of the cursor and the timer
        difference_mask  = filter_difference(difference_mask, cursor_mask, counter_mask)
        
        # detect moving pedestrians according to the final mask and ignore movements less than thresholds
        pedestrian_boxes = mask_to_boundary_boxes(difference_mask, comp_threshold_1, comp_threshold_2)
        
        # detect pedestrian faces using mediapipe's face detector and obtain their bounding boxes
        face_boxes       = FaceDetector.detect(frame)
        
        # perform non maximum suppression on the bounding boxes if obtained
        face_boxes       = ((face_boxes) if (face_boxes is None) 
                                else (FaceDetector.suppress_boundary_boxes(face_boxes, overlap_thresh)))

        # use "face_recongition" to determine if any pair of faces are identical (their indices will be returned)
        identical_boxes  = ((None) if (face_boxes is None) else (FaceRecognizer.find_identicals(frame, face_boxes)))
        
        # using indices, construct a new list of bounding boxes for the pair of identical faces
        if (identical_boxes is not None):
            
            identical_boxes = [  face_boxes[i] for i in identical_boxes  ]
        
        # draw bounding boxes of pedestrians
        display_frame    = ImageUtils.draw_boundary_boxes(display_frame, pedestrian_boxes, (0, 255, 0), 5)
        
        # draw bounding boxes of detected faces
        display_frame    = ImageUtils.draw_boundary_boxes(display_frame, face_boxes,       (255, 0, 0), 4)
        
        # draw bounding boxes of identical faces on top of detected faces (width 5 > width 4)
        display_frame    = ImageUtils.draw_boundary_boxes(display_frame, identical_boxes,  (0, 0, 255), 5)
        
        # display the resulting image
        cv2.imshow(window_name, display_frame)
        
        # terminate the program if Esc is pressed
        if (cv2.waitKey(1) == 27):
            break
            
    cv2.destroyWindow(window_name)
    
    VideoReader.release()

In [3]:
# problem 3
# frames 2481 - 2601 ( lvl9 )
# detect pedestrians and draw blue boxes on first frame, track pedestrians and draw green boxes, draw red box on fastest mii

from typing import Iterator, Union, Tuple, List
from imutils.object_detection import non_max_suppression
import mediapipe, numpy, cv2

class VideoReader:
    
    video_stream  = None
    start_frame   = None
    num_frames    = None
    frame_ratio   = None
    frame_counter = None
    
    @classmethod
    def initialize(class_, video_file_name : str) -> "VideoReader":
        class_.video_stream = cv2.VideoCapture(video_file_name)
        if not (class_.video_stream.isOpened()):
            raise IOError("Cannot open video file \"{}\"\n".format(video_file_name))
        return class_
    
    @classmethod
    def configure(class_, start_frame : int, num_frames : int, frame_ratio : float = 1) -> "VideoReader":
        (class_.start_frame, class_.num_frames, class_.frame_ratio, class_.frame_counter) = (
            start_frame, num_frames, frame_ratio, num_frames + 1
        )
        return class_
    
    @classmethod
    def read(class_) -> Iterator[ numpy.ndarray ]:
        ratio = class_.frame_ratio
        while True:
            if (class_.num_frames < class_.frame_counter):
                class_.frame_counter = 0
                class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, class_.start_frame)
            latest_frame = class_.video_stream.read()[1]
            yield ((latest_frame) if (ratio == 1) else (cv2.resize(latest_frame, None, None, fx = ratio, fy = ratio)))
            class_.frame_counter += 1
            
    @classmethod
    def set_frame(class_, frame_number : int) -> "VideoReader":
        
        class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
        
        return class_
    
    @classmethod
    def reset_counter(class_):
        class_.frame_counter = class_.num_frames + 1
            
    @classmethod
    def release(class_):
        
        class_.video_stream.release()
        class_.video_stream  = None
        class_.start_frame   = None
        class_.num_frames    = None
        class_.frame_ratio   = None
        class_.frame_counter = None

class FaceDetector:
    
    face_detector = None
    
    @classmethod
    def initialize(class_, confidence_threshold : float, model_selection : int) -> "FaceDetector":
        class_.face_detector = mediapipe.solutions.face_detection.FaceDetection(
            model_selection = model_selection, min_detection_confidence = confidence_threshold
        )
        return class_
    
    @classmethod
    def detect(class_, source_image : numpy.ndarray) -> List[ tuple ]:
        def get_ratio_box(detection : mediapipe.framework.formats.detection_pb2.Detection) -> Tuple[ float ]:
            relative_boundary_box = detection.location_data.relative_bounding_box
            return (
                relative_boundary_box.xmin,
                relative_boundary_box.ymin,
                relative_boundary_box.width,
                relative_boundary_box.height
            )
        def rescale_boundary_box(boundary_box : Tuple[ float ], width : int, height : int) -> Tuple[ int ]:
            return (
                int(boundary_box[0] * width),
                int(boundary_box[1] * height),
                int(boundary_box[2] * width),
                int(boundary_box[3] * height)
            )
        (height, width) = source_image.shape[:2]
        detection_result = class_.face_detector.process(cv2.cvtColor(source_image, cv2.COLOR_BGR2RGB))
        if (detection_result.detections is None):
            return None
        boundary_boxes = [
            rescale_boundary_box(get_ratio_box(detection), width, height)
                for detection in detection_result.detections
        ]
        return boundary_boxes
    
    @staticmethod
    def convert_to_endpoint_boxes(boundary_boxes : List[ tuple ]) -> List[ tuple ]:
        return [
            (x, y, x + w, y + h) for (x, y, w, h) in boundary_boxes
        ]
    
    @staticmethod
    def convert_to_distance_boxes(boundary_boxes : List[ tuple ]) -> List[ tuple ]:
        return [
            (x, y, X - x, Y - y) for (x, y, X, Y) in boundary_boxes
        ]
    
    @classmethod
    def suppress_boundary_boxes(class_, boundary_boxes : List[ tuple ], threshold : float) -> List[ tuple ]:
        boundary_boxes = numpy.array(class_.convert_to_endpoint_boxes(boundary_boxes))
        return class_.convert_to_distance_boxes(
            non_max_suppression(boundary_boxes, probs = None, overlapThresh = threshold))

class ImageUtils:
    
    @staticmethod
    def draw_boundary_boxes(display_frame : numpy.ndarray, 
            boundary_boxes : Union[ type(None), List[ tuple ]], color = (0, 0, 255), width = 5) -> numpy.ndarray:
        
        if (boundary_boxes is None):
            return display_frame
        
        new_frame = numpy.copy(display_frame)
        
        for (x, y, w, h) in boundary_boxes:
            new_frame = cv2.rectangle(new_frame, (x, y), (x + w, y + h), color, width)
            
        return new_frame
    
    @staticmethod
    def enlarge_boundary_boxes(
            boundary_boxes : List[ tuple ], ratio : float, max_height : int, max_width : int) -> List[ tuple ]:
        
        def enlarge(boundary_box : tuple) -> tuple:
            (x, y, w, h) = boundary_box
            (new_w, new_h) = (int(w * ratio), int(h * ratio))
            (dif_w, dif_h) = (abs(w - new_w) // 2, abs(h - new_h) // 2)
            (end_x, end_y) = (x + new_w, y + new_h)
            return (
                max(0, x - dif_w),
                max(0, y - dif_h),
                min(max_width, (end_x - x)),
                min(max_height, (end_y - y))
            )
        return list(map(enlarge, boundary_boxes))
    
class TrackerList:
    
    def __init__(self, reference_frame : numpy.ndarray, boundary_boxes : List[ tuple ]) -> "TrackerList":
        
        # [ [ "index", "tracker object", "previous centroid", "distance history", "flag" ] ]
        self.trackerList = [  [index, cv2.TrackerKCF_create(), (None, None), [], 0] for index in range(len(boundary_boxes))  ]
        
        # initialize the KCF tracker for each pedestrian
        # KCF is used because it is approximately ten times faster than MIL
        for index, tracker, *_ in self.trackerList:
            tracker.init(reference_frame, boundary_boxes[index])
            
    def update(self, latest_frame : numpy.ndarray, filter_lost : bool = False) -> List[ tuple ]:
               
        # update the trackers for each pedestrian by passing them the latest frame
        updated_boxes = [  (index, tracker.update(latest_frame)) for index, tracker, *_ in self.trackerList  ]
        
        # initialize an empty list used to store the median of each pedestrian's speed history
        # "median" is considered more robust than "mean" (average) when there are outliers
        medians = []
        
        # calculate the median of each pedestrian's speed history after update
        for index, updated_box in updated_boxes:
            
            # obtain the center of the bounding box
            latest_centroid = self.compute_centroids(*updated_box[1])
            
            # fetch the previous center of the bounding box
            previous_centroid = self.trackerList[index][2]
            
            # only compute the displacement if previous centroids exist
            if (previous_centroid != (None, None)):
                latest_distance = self.compute_distance_ss(latest_centroid, previous_centroid)
                self.trackerList[index][3].append(latest_distance)
                
            # calculate the median of each pedestrian's speed history
            median = numpy.median(self.trackerList[index][3])
            
            # add the new median to list
            medians.append(median)
            
            # update the latest center point
            self.trackerList[index][2] = latest_centroid
            
        # find the fastest pedestrian of the bunch
        fastest = numpy.argmax(medians)
                
        # whether or not to consider trackers which lost track
        if (filter_lost):
            updated_boxes = list(filter(lambda x : x[1][0], updated_boxes))
            
        # format the data structure and return the result
        return list(map(lambda x : (x[0], x[1][1], (x[0] == fastest)), updated_boxes))
    
    @staticmethod
    def compute_centroids(x : int, y : int, w : int, h : int) -> Tuple[ int ]:
        
        # calculate the center point of this particular bounding box
        return (x + w // 2, y + h // 2)
    
    @staticmethod
    def compute_distance_ss(coordinates_1 : Tuple[ int ], coordinates_2 : Tuple[ int ]) -> float:
        
        # calculate the displacement using Manhattan distance
        # Manhattan distance is used instead of Euclidean distance because it saves time
        return numpy.sum(numpy.abs(numpy.array(coordinates_1) - numpy.array(coordinates_2)))
        
class PeopleDetector:
    
    people_detector = None
    
    @classmethod
    def initialize(class_, frame : numpy.ndarray) -> "PeopleDetector":
        class_.frame = frame
        class_.people_detector = cv2.HOGDescriptor()
        class_.people_detector.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
        return class_
    
    @classmethod
    def detect(class_, threshold : float = 0.01, win_stride : tuple = (5, 5), ** kwargs) -> Union[ tuple, numpy.ndarray ]:
        return class_.people_detector.detectMultiScale(
            class_.frame, hitThreshold = threshold, winStride = win_stride, ** kwargs)[0]

if (__name__ == "__main__"):
    
    video_file_name = "./wiiplay.mp4"
    
    start_frame     = 2481
    
    num_frames      = 120
    
    frame_ratio     = 1
    
    threshold       = 0.5
    
    window_name     = "find_the_fastest_character"
    
    # load the video and initialize the frame range to loop
    VideoReader.initialize(video_file_name).configure(start_frame, num_frames, frame_ratio)
    
    # obtain the first frame of the loop
    template_frame          = next(VideoReader.read())
    
    # detect pedestrians using the HOG detector and perform non maximum suppression
    template_boundary_boxes = FaceDetector.suppress_boundary_boxes(
        PeopleDetector.initialize(template_frame).detect(), threshold)
    
    # initialize a history list to track each bounding box as the video progresses
    trackerList             = TrackerList(template_frame, template_boundary_boxes)
    
    # initialize an output window
    cv2.namedWindow(window_name)
    
    # draw blue bounding boxes on the first frame
    template_frame = ImageUtils.draw_boundary_boxes(template_frame, template_boundary_boxes, (255, 0, 0), 5)
    
    # display the first frame containing detected pedestrians and await keyboard movements to continue
    cv2.imshow(window_name, template_frame);  cv2.waitKey(0)
    
    # endlessly read frames from video loop until Esc is pressed or we wish to stop
    for frame in VideoReader.read():
        
        # for each pedestrian, obtain the updated bounding box and a flag of whether it is the fastest
        for index_label, bbox, fastest in trackerList.update(frame, True):

            # draw red rectangles if it is the fastest; otherwise, draw green rectangles
            color        = ((0, 0, 255) if (fastest) else (0, 255, 0))
            
            # unpack values from the bounding box
            (x, y, w, h) = [  int(i) for i in bbox  ]
            
            # draw the bounding box
            frame        = cv2.rectangle(frame, (x, y), (x + w, y + h), color, 5)
        
        # display the resulting image
        cv2.imshow(window_name, frame)
        
        # terminate the program if Esc is pressed or the first loop has ended
        if ((cv2.waitKey(1) == 27) or ((VideoReader.num_frames - 1) < VideoReader.frame_counter)):
            break
            
    cv2.destroyWindow(window_name)
            
    VideoReader.release()

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


In [4]:
# problem 4 
# frames 1650 - 1800 ( lvl6 )
# with blue arrows draw optical flow, detect two people facing different direction and draw red rectangles

from typing import Iterator, Union, Tuple, List
from imutils.object_detection import non_max_suppression
import mediapipe, numpy, cv2

class VideoReader:
    
    video_stream  = None
    start_frame   = None
    num_frames    = None
    frame_ratio   = None
    frame_counter = None
    
    @classmethod
    def initialize(class_, video_file_name : str) -> "VideoReader":
        class_.video_stream = cv2.VideoCapture(video_file_name)
        if not (class_.video_stream.isOpened()):
            raise IOError("Cannot open video file \"{}\"\n".format(video_file_name))
        return class_
    
    @classmethod
    def configure(class_, start_frame : int, num_frames : int, frame_ratio : float = 1) -> "VideoReader":
        (class_.start_frame, class_.num_frames, class_.frame_ratio, class_.frame_counter) = (
            start_frame, num_frames, frame_ratio, num_frames + 1
        )
        return class_
    
    @classmethod
    def read(class_) -> Iterator[ numpy.ndarray ]:
        ratio = class_.frame_ratio
        while True:
            if (class_.num_frames < class_.frame_counter):
                class_.frame_counter = 0
                class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, class_.start_frame)
            latest_frame = class_.video_stream.read()[1]
            yield ((latest_frame) if (ratio == 1) else (cv2.resize(latest_frame, None, None, fx = ratio, fy = ratio)))
            class_.frame_counter += 1
            
    @classmethod
    def set_frame(class_, frame_number : int) -> "VideoReader":
        
        class_.video_stream.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
        
        return class_
    
    @classmethod
    def reset_counter(class_):
        class_.frame_counter = class_.num_frames + 1
            
    @classmethod
    def release(class_):
        
        class_.video_stream.release()
        class_.video_stream  = None
        class_.start_frame   = None
        class_.num_frames    = None
        class_.frame_ratio   = None
        class_.frame_counter = None

class BasicFaceDetector:
    
    face_detector = None
    detector_file_name = "./haarcascade_frontalface_alt2.xml"
    
    @classmethod
    def initialize(class_) -> "BasicFaceDetector":
        class_.face_detector = cv2.CascadeClassifier(class_.detector_file_name)
        if (class_.face_detector.empty()):
            raise IOError("Cannot find XML file \"{}\"\n".format(class_.detector_file_name))
        return class_
    
    @classmethod
    def detect(class_, source_image : numpy.ndarray, ** kwargs) -> Union[ numpy.ndarray, tuple ]:
        return class_.face_detector.detectMultiScale(source_image, ** kwargs)

class FaceDetector:
    
    face_detector = None
    
    @classmethod
    def initialize(class_, confidence_threshold : float, model_selection : int) -> "FaceDetector":
        class_.face_detector = mediapipe.solutions.face_detection.FaceDetection(
            model_selection = model_selection, min_detection_confidence = confidence_threshold
        )
        return class_
    
    @classmethod
    def detect(class_, source_image : numpy.ndarray) -> List[ tuple ]:
        def get_ratio_box(detection : mediapipe.framework.formats.detection_pb2.Detection) -> Tuple[ float ]:
            relative_boundary_box = detection.location_data.relative_bounding_box
            return (
                relative_boundary_box.xmin,
                relative_boundary_box.ymin,
                relative_boundary_box.width,
                relative_boundary_box.height
            )
        def rescale_boundary_box(boundary_box : Tuple[ float ], width : int, height : int) -> Tuple[ int ]:
            return (
                int(boundary_box[0] * width),
                int(boundary_box[1] * height),
                int(boundary_box[2] * width),
                int(boundary_box[3] * height)
            )
        (height, width) = source_image.shape[:2]
        detection_result = class_.face_detector.process(cv2.cvtColor(source_image, cv2.COLOR_BGR2RGB))
        if (detection_result.detections is None):
            return None
        boundary_boxes = [
            rescale_boundary_box(get_ratio_box(detection), width, height)
                for detection in detection_result.detections
        ]
        return boundary_boxes
    
    @staticmethod
    def convert_to_endpoint_boxes(boundary_boxes : List[ tuple ]) -> List[ tuple ]:
        return [
            (x, y, x + w, y + h) for (x, y, w, h) in boundary_boxes
        ]
    
    @staticmethod
    def convert_to_distance_boxes(boundary_boxes : List[ tuple ]) -> List[ tuple ]:
        return [
            (x, y, X - x, Y - y) for (x, y, X, Y) in boundary_boxes
        ]
    
    @classmethod
    def suppress_boundary_boxes(class_, boundary_boxes : List[ tuple ], threshold : float) -> List[ tuple ]:
        boundary_boxes = numpy.array(class_.convert_to_endpoint_boxes(boundary_boxes))
        return class_.convert_to_distance_boxes(
            non_max_suppression(boundary_boxes, probs = None, overlapThresh = threshold))
        
class ImageUtils:
    
    @staticmethod
    def draw_boundary_boxes(display_frame : numpy.ndarray, 
            boundary_boxes : Union[ type(None), List[ tuple ]], color = (0, 0, 255), width = 5) -> numpy.ndarray:
        
        if (boundary_boxes is None):
            return display_frame
        
        new_frame = numpy.copy(display_frame)
        
        for (x, y, w, h) in boundary_boxes:
            new_frame = cv2.rectangle(new_frame, (x, y), (x + w, y + h), color, width)
            
        return new_frame
    
    @staticmethod
    def enlarge_boundary_boxes(
            boundary_boxes : List[ tuple ], ratio : float, max_height : int, max_width : int) -> List[ tuple ]:
        
        def enlarge(boundary_box : tuple) -> tuple:
            (x, y, w, h) = boundary_box
            (new_w, new_h) = (int(w * ratio), int(h * ratio))
            (dif_w, dif_h) = (abs(w - new_w) // 2, abs(h - new_h) // 2)
            (end_x, end_y) = (x + new_w, y + new_h)
            return (
                max(0, x - dif_w),
                max(0, y - dif_h),
                min(max_width, (end_x - x)),
                min(max_height, (end_y - y))
            )
        return list(map(enlarge, boundary_boxes))
        
# LEFT-labeled direction is colored "blue"
LEFT  = (255, 0, 0)

# RIGHT-labeled direction is colored "red"
RIGHT = (0, 0, 255)

class OpticalDetector:
    
    previous_frame = None
    
    optical_frame  = None
    
    @classmethod
    def initialize(class_) -> "OpticalDetector":
        
        class_.previous_frame = None
        class_.optical_frame  = None
        
        return class_
    
    @classmethod
    def find_optical_frame(class_, latest_frame : numpy.ndarray) -> numpy.ndarray:
        
        # convert the latest frame to grayscale
        latest_frame = cv2.cvtColor(latest_frame, cv2.COLOR_BGR2GRAY)
        
        # only try to compute the optical frame if previous frame exists
        if (class_.previous_frame is not None):
            
            # obtain the optical frame 
            class_.optical_frame = cv2.calcOpticalFlowFarneback(
                class_.previous_frame, latest_frame, 
                class_.optical_frame, 0.5, 5, 13, 10, 5, 1.1, 
                cv2.OPTFLOW_FARNEBACK_GAUSSIAN
            )
            
        # update the previous frame
        class_.previous_frame = numpy.copy(latest_frame)
        
        return class_.optical_frame
    
# obtain the direction in which the minority (odd pedestrians) are facing 
def get_min_direction(faces_mask : numpy.ndarray, boundary_boxes : List[ tuple ]) -> int:
    
    directions = []
    
    for (x, y, w, h) in boundary_boxes:
        
        # crop the image given a bounding box
        image  = faces_mask[ y : y + h, x : x + w ]
        
        # calculate the number of arrow pixels of the first direction
        first  = numpy.where((image ==  LEFT))[0].__len__()
        
        # calculate the number of arrow pixels of the second direction
        second = numpy.where((image == RIGHT))[0].__len__()
        
        # save the greater of the two
        directions.append(((LEFT) if (first > second) else (RIGHT)))
        
    # calculate the number of pedestrians facing the first direction
    first  = sum(  (direction ==  LEFT) for direction in directions  )
    
    # calculate the number of pedestrians facing the second direction
    second = sum(  (direction == RIGHT) for direction in directions  )
    
    # return the greater direction and the number of pedestrians
    return ((LEFT, first) if (first < second) else (RIGHT, second))
        
if (__name__ == "__main__"):
    
    video_file_name = "./wiiplay.mp4"
    
    start_frame     = 1650
    
    num_frames      = 150
    
    model_selection = 1
    
    threshold       = 0.32
    
    frame_ratio     = 0.5
    
    enlarge_ratio   = 1.3
    
    pixel_thresh    = 3000
    
    strides         = 10
    
    window_name     = "find_two_odds"
    
    # load the video and initialize the frame range to loop
    VideoReader.initialize(video_file_name).configure(start_frame, num_frames, frame_ratio)
    
    # initialize the haarcascade face detector (must use "haarcascade_frontalface_alt2.xml")
    # should download the cascade classifier from the official GitHub repository
    BasicFaceDetector.initialize()
    
    # initialize the optical flow detector 
    OpticalDetector.initialize()
    
    cv2.namedWindow(window_name)
    
    # endlessly read frames from the video until Esc is pressed
    for frame in VideoReader.read():

        display_frame   = numpy.copy(frame)
        
        # obtain the frame of optical flow
        optical_frame   = OpticalDetector.find_optical_frame(frame)
        
        # initialize an empty mask
        direction_frame = numpy.zeros(shape = display_frame.shape, dtype = numpy.uint8)
        
        # detect faces from the current frame using the haarcascade face detector
        faces           = BasicFaceDetector.detect(frame)
        
        # perform non maximum suppression on the bounding boxes if there are detected faces
        faces           = ((faces) if (faces is None) else (FaceDetector.suppress_boundary_boxes(faces, threshold)))
        
        # enlarge the bounding boxes by a certain ratio if there are bounding boxes of faces
        if (faces is not None):
            
            faces = ImageUtils.enlarge_boundary_boxes(faces, enlarge_ratio, * display_frame.shape[ : 2])
        
        # obtain a binary mask of faces by drawing solid (white) rectangles on an empty (black) mask
        faces_mask      = ImageUtils.draw_boundary_boxes(
            numpy.zeros(shape = display_frame.shape, dtype = numpy.uint8), faces, color = (255, 255, 255), width = -1)

        if (optical_frame is not None):
            
            for index in numpy.ndindex(optical_frame[::strides, ::strides].shape[:2]):
                
                # obtain the first point
                pt1   = tuple(i * strides for i in index)
                
                # obtain the displacement
                delta = numpy.int32(optical_frame[pt1])[::-1]
                
                # obtain the second point
                pt2   = tuple(pt1 + 2 * delta)
                
                # only consider cases where "delta" is within a certain range
                if (1 <= cv2.norm(delta) <= 10):

                    # use different colors for arrows pointing in opposite directions
                    direction = ((LEFT) if (pt1[::-1] > pt2[::-1]) else (RIGHT))
                    
                    # draw different colored arrows on the direction mask 
                    cv2.arrowedLine(direction_frame, pt1[::-1], pt2[::-1],   direction, 1, cv2.LINE_AA, 0, 0.01)
                    
                    # draw blue arrows on the output image to show the optical flow 
                    cv2.arrowedLine(display_frame,   pt1[::-1], pt2[::-1], (255, 0, 0), 1, cv2.LINE_AA, 0, 0.01)
        
            # only consider the optical flow of face regions
            faces_mask      = numpy.where((faces_mask == (255, 255, 255)), direction_frame, faces_mask)
            
            # as there are only two odd people, finding the direction in which few people are facing (turning from/to)
            direction, num  = get_min_direction(faces_mask, faces)
            
            # only consider the case if at least two people facing in odd directions are detected
            if (num > 1):
            
                for (x, y, w, h) in faces:

                    # calculate the number of arrows of the first color (first direction)
                    first  = numpy.where((faces_mask[ y : y + h, x : x + w ] ==  LEFT))[0].__len__()

                    # calculate the number of arrows of the second color (second direction)
                    second = numpy.where((faces_mask[ y : y + h, x : x + w ] == RIGHT))[0].__len__()

                    # determine the direction in which this pedestrian is facing 
                    direct = ((RIGHT) if ((first > pixel_thresh) and (second > pixel_thresh) and (second > first)) else (LEFT))

                    # draw red rectangles on odd pedestrians if the number of arrow pixels surpass a certain threshold
                    if ((first > pixel_thresh) and (second > pixel_thresh) and (direct == direction)):

                        display_frame = cv2.rectangle(display_frame, (x, y), (x + w, y + h), (0, 0, 255), 5)
        
        # display the resulting image
        cv2.imshow(window_name, display_frame)
        
        # terminate the program if Esc is pressed
        if (cv2.waitKey(1) == 27):
            break
            
    cv2.destroyWindow(window_name)
            
    VideoReader.release()

In [None]:
# problem 5
# continuous frames from webcam
# detect 21 finger joints, and determine whether posture is rock, paper or scissor

from typing import Iterator, Union, Tuple, Dict, List
import mediapipe, numpy, math, cv2

class HandDetector:
    
    hand_drawer    = None
    hand_detector  = None
    hand_processor = None
    
    @classmethod
    def initialize(class_) -> "HandDetector":
        class_.hand_drawer   = mediapipe.solutions.drawing_utils
        class_.hand_detector = mediapipe.solutions.hands
        return class_
    
    @classmethod
    def configure(class_, min_detection_confidence : float = 0.7, min_tracking_confidence : float = 0.7) -> "HandDetector":
        class_.hand_processor = class_.hand_detector.Hands(
            min_detection_confidence = min_detection_confidence, 
            min_tracking_confidence  = min_tracking_confidence
        )
        return class_
    
    @classmethod
    def detect(class_, frame : numpy.ndarray) -> "mediapipe.python.solution_base.SolutionOutputs":
        
        # convert the image to grayscale and detect hands
        return class_.hand_processor.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    
    @classmethod
    def draw_landmarks(class_, source_image : numpy.ndarray, 
            hand_landmarks : "mediapipe.python.solution_base.SolutionOutputs") -> numpy.ndarray:
    
        display_frame = numpy.copy(source_image)
        
        # draw blue circles onto each joint 
        class_.hand_drawer.draw_landmarks(
            display_frame, hand_landmarks, class_.hand_detector.HAND_CONNECTIONS,
            class_.hand_drawer.DrawingSpec(color = (255, 0, 0), thickness = 2, circle_radius = 5))
        
        return display_frame
    
    @staticmethod
    def finger_up(M, U, Z) -> bool:
        
        # if the upper joint is closer to the palm than the lower joint, it means the finger is bent
        # if the finger is not bent, it is pointing up
        return (math.dist(M, Z) >= math.dist(U, Z))

    @classmethod
    def is_rock(class_, points) -> bool:
        
        # all fingers excluding the thumb is pointing down
        return not any([
            class_.finger_up(points[ 7], points[ 6], points[0]),
            class_.finger_up(points[11], points[10], points[0]),
            class_.finger_up(points[15], points[14], points[0]),
            class_.finger_up(points[19], points[18], points[0])
        ])

    @classmethod
    def is_paper(class_, points) -> bool:
        
        # all fingers excluding the thumb is pointing up
        return all([
            class_.finger_up(points[ 7], points[ 6], points[0]),
            class_.finger_up(points[11], points[10], points[0]),
            class_.finger_up(points[15], points[14], points[0]),
            class_.finger_up(points[19], points[18], points[0])
        ])

    @classmethod
    def is_scissor(class_, points) -> bool:
        
        # index and middle finger pointing up but ring and pinky finger are pointing down
        return all([
            class_.finger_up(points[ 7], points[ 6], points[0]),
            class_.finger_up(points[11], points[10], points[0])
        ]) and not any([
            class_.finger_up(points[15], points[14], points[0]),
            class_.finger_up(points[19], points[18], points[0])
        ])

    @classmethod
    def get_hand_posture(class_, points) -> int:
        if   (class_.is_paper(  points)):
            return 1
        elif (class_.is_rock(   points)):
            return 2
        elif (class_.is_scissor(points)):
            return 3
        else:
            # if an undetermined combination is detected, "unknown" will be the result
            return 0

    @staticmethod
    def posture_to_text(posture) -> str:
        
        # obtain the text to display on the output frame
        postures = [ "Unknown", "Paper", "Rock", "Scissor" ]
        
        return postures[posture]

class CamReader:
    
    cam_stream = None
    
    @classmethod
    def initialize(class_, camera : int = 0) -> "CamReader":
        class_.cam_stream = cv2.VideoCapture(camera)
        if not (class_.cam_stream.isOpened()):
            raise IOError("Cannot open video camera {}\n".format(camera))
        return class_
    
    @classmethod
    def read(class_) -> Iterator[ numpy.ndarray ]:
        while True:
            yield class_.cam_stream.read()[1]
            
    @classmethod
    def find_shape(class_) -> Dict[ str, int ]:
        return {
            "HEIGHT" : class_.cam_stream.get(cv2.CAP_PROP_FRAME_HEIGHT),
            "WIDTH"  : class_.cam_stream.get(cv2.CAP_PROP_FRAME_WIDTH)
        }
            
    @classmethod
    def release(class_) -> None:
        class_.cam_stream.release()
        class_.cam_stream = None
    
if (__name__ == "__main__"):
    
    camera_number            = 0
    
    min_detection_confidence = 0.7
    
    min_tracking_confidence  = 0.7
    
    window_name              = "Rock Paper Scissors"

    # initialize and open the webcam labeled 0
    CamReader.initialize(camera_number)
        
    # obtain the height and width of the camera
    (height, width) = CamReader.find_shape().values()
    
    # initialize the mediapipe hand detector and configure the minimal confidence values
    HandDetector.initialize().configure(min_detection_confidence, min_tracking_confidence)
    
    # endlessly read and horizontally flip frames from the webcam until Esc is pressed
    for frame in (  cv2.flip(__frame, 1) for __frame in CamReader.read()  ):
        
        # detect hands using the mediapipe hand detector
        detected_hands = HandDetector.detect(frame)
        
        # only consider cases where hands are detected
        if (detected_hands.multi_hand_landmarks):
            
            # iterate through each detected hand
            for hand_landmarks in detected_hands.multi_hand_landmarks:
                
                # draw the 21 finger joints (points) onto the output frame
                frame = HandDetector.draw_landmarks(frame, hand_landmarks)
        
                # initialize an empty list to store the (x,y,z) coordinates of each joint
                points = []
            
                # iterate through the 21 joints
                for point in HandDetector.hand_detector.HandLandmark:
                    
                    # obtain (x, y, z) coordinates
                    normalized_landmark = hand_landmarks.landmark[point]
                    
                    # obtain (x, y) coordinates
                    pixel_coordinates_landmark = HandDetector.hand_drawer._normalized_to_pixel_coordinates(
                        normalized_landmark.x, normalized_landmark.y, width, height)
                    
                    # add to list
                    points.append((normalized_landmark.x, normalized_landmark.y, normalized_landmark.z))
                    
            # detect the hand posture and put text to the upper left corner of the output frame
            frame = cv2.putText(frame, HandDetector.posture_to_text(HandDetector.get_hand_posture(points)), 
                (30, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 1, cv2.LINE_AA)
    
        # display the output frame 
        cv2.imshow(window_name, frame)
        
        # terminate the program if Esc is pressed
        if (cv2.waitKey(1) == 27):
            break
                        
    cv2.destroyWindow(window_name)
    
    CamReader.release()

<b>Problem 6</b>

I believe myself to have finished all five problems. The final exam is more challenging compared to the mid-term exam. I'd been working on the second problem for 4 days because it was impratical to detect pedestrians on each frame using the HOG detector. To tackle this, I had to review the mid-term exam and the previous practices, and then revert to using background subtraction, thresholding and morphology to improve the performance. Nevertheless, I have learned a lot from this course. 

<b>Problem 7</b>

The overall execution of this class is excellent beyond words. To further improve this course, I believe that compressing the materials for each practice into separate zip files would make downloading easier. There had been a handful of times where the professor had already started explaining the code but Google Drive could not load everything because of bad Internet. 