In [6]:
# Copyright 2022 Grid Dynamics International, Inc. All Rights Reserved

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [None]:
import numpy as np
import cv2
import os
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import math

import json

from tqdm import tqdm
import time

import boto3

Here we're loading python libraries which will help us conduct this demo prototype using AWS. Numpy is a standard library focused on matrix manipulations in Python. CV2 or OpenCV is a standard computer vision library which is used for a wide variety of image processing tasks in Python. OS allows us to automate access to file directories and file structures for retrieving files in Python. Matplotlib is a standard charting library in Python.

Math gives us access to some advanced math functionality. TQDM is a lightweight process bar for python. Time allows us to measure time taken for function calls in Python (Time based Profiling). 

Boto3 is a AWS SDK Library which allows us to call different functionalities from AWS right inside this jupyter notebook instance.

# Main functions

## Video I/O functions

Here we define some of the standard functions in Python to make the video usable for analysis along with AWS Lookout for Vision. split_video takes as input a path to the video, a sample rate to sample the video at and an output directory to push image frames extracted from the video at the sampling rate. 

split_video supports codecs supported by opencv only. 

In [27]:
def split_video(input_video_path, output_dir, sample_rate=240):
    '''
    input_video_path: Path to Input Video
    output_dir: Path to Directory for Storing Image Frames
    sample_rate: Sampling rate to extract frames from video (240 on a 60 fps video = 1 frame every 4 s)
    '''
    video_capture = cv2.VideoCapture(input_video_path)
    result = []
    
    frame_id = 0
    while True:
        success, image = video_capture.read()
        if not success:
            break
        if (frame_id % sample_rate) == 0:
            image_name = f"frame_{frame_id}"
            save_path = os.path.join(output_dir, f"{image_name}.jpg")
            cv2.imwrite(save_path, image)
            result.append((image_name, save_path))
        frame_id += 1
    
    return result

Once the entire process of extracting the images from the video, identifying objects of interest, labeling anomalies, ,tracking objects and detecting anomalies is completed for a complete demo we provide functions to convert individual frames to a video using some of the standard codecs available within opencv. 

Save_video_from_frames takes in a list(called frames) of tuples of image name, image path referring to the frame name and frame paths and a save_path for saving the video.

save_video_from_frames supports codecs supported by opencv only.

In [28]:
def save_video_from_frames(frames, save_path, fps=60):
    '''
    frames: list of image name, image path
    save_path: Path/Directory to save the rendered video to.
    '''
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')

    out = None

    for frame_name, frame_path in tqdm(frames):
        frame = cv2.imread(frame_path)
        if out is None:
            out = cv2.VideoWriter(save_path, fourcc, fps, (frame.shape[1], frame.shape[0]))
        out.write(frame.astype('uint8'))

    out.release()

## Object detection functions

Here we prepare the data for annotation and prediction using AWS Lookout for Vision. Since we're working off a video, the objects in question may not be suitable organised for an expert to review. In addition, objects may be out of focus or there may be multiple objects per frame of the video. Therefore, here we demonstrate a non-learning based approach to detect objects of interest in individual frames of an image and how to refine this approach further. We then cut-out the objects of interest from the relevant frames and store them, saving them for further processing and then to train an anomaly detection model.

In [29]:
SHADOWS_LBC = (65, 55, 30)
SHADOWS_UBC = (95, 85, 40)
SHADOWS_MORPH_KER_SZ = 5
SHADOWS_DIL_ITER = 4
EDGES_THRESH_1 = 50
EDGES_THRESH_2 = 85
EDGES_MORPH_KER_SZ = 7
EDGES_DIL_ITER = 1
BACKGROUND_THRESH = 80
OBJECTS_MORPH_ERODE_KER_SZ = 5
OBJECTS_MORPH_DILATE_KER_SZ = 3
AREA_LOWER_BOUND = 1800
BOX_INCR_PERCENT = 0.1
AREA_INTERSECTION_THRESH = 0.75

These are a set of parameters defined in order for the bounding box based object detection method to work smoothly.

SHADOWS_LBC, SHADOWS_UBC - Upper and lower bounds for selecting the shadows mask

SHADOWS_MORPH_KER_SZ - Kernel Size for the morphological dilation of the shadows mask
(dilation - slight increasing of a mask + joining "broken" disjoint parts of shadows)

EDGES_THRESH_1 - first threshold for the hysteresis procedure for the Canny Edge Detector

EDGES_THRESH_2 - second threshold for the hysteresis procedure for the Canny Edge Detector

EDGES_MORPH_KER_SZ - Kernel Size for the morphological dilation of the edges mask

OBJECTS_MORPH_ERODE_KER_SZ, OBJECTS_MORPH_DILATE_KER_SZ - Kernel Size for the morphological opening transformation (erodion followed by dilation, used to remove noise and detach individual objects, that are too close)

AREA_LOWER_BOUND - minimum area to consider for detection of an object

BOX_INCR_PERCENT - Additonal increment percentage for the box size

AREA_INTERSECTION_THRESH - a threshold to consider two intersecting boxes to belong to the same object

In [30]:
def find_objects_bboxes(image, area_lower_bound):    
    '''
    image: Pass an image onto this function
    area_lower_bound: exclude detected objects which are smaller than area lower bound (measured in pixels)
    '''
    edges = cv2.Canny(image, EDGES_THRESH_1, EDGES_THRESH_2)
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (EDGES_MORPH_KER_SZ, EDGES_MORPH_KER_SZ))
    edges = cv2.dilate(edges, kernel, iterations=EDGES_DIL_ITER)
    
    shadows = cv2.inRange(image, np.asarray(SHADOWS_LBC), np.asarray(SHADOWS_UBC))
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (SHADOWS_MORPH_KER_SZ, SHADOWS_MORPH_KER_SZ))
    shadows = cv2.dilate(shadows, kernel, iterations=SHADOWS_DIL_ITER)
    
    imgray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, background = cv2.threshold(imgray, BACKGROUND_THRESH, 255, 0)
    
    objects = cv2.bitwise_not(cv2.bitwise_or(background, cv2.bitwise_or(shadows, edges)))
    
    kernel_e = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
                                         (OBJECTS_MORPH_ERODE_KER_SZ, OBJECTS_MORPH_ERODE_KER_SZ))
    kernel_d = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
                                         (OBJECTS_MORPH_DILATE_KER_SZ, OBJECTS_MORPH_DILATE_KER_SZ))

    objects = cv2.morphologyEx(objects, cv2.MORPH_ERODE, kernel_e)
    objects = cv2.morphologyEx(objects, cv2.MORPH_DILATE, kernel_d)
    
    contours, hierarchy = cv2.findContours(objects, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    
    top_level_mask = (hierarchy[0, :, 3] == -1)
    
    area = np.array([cv2.contourArea(ctr) for ctr in contours])
    area_mask = (area > area_lower_bound)

    mask = top_level_mask & area_mask
    objects_contours = np.array(contours, dtype=object)[mask]
    
    boxes = [cv2.boundingRect(contour) for contour in objects_contours]
    
    increased_boxes = []
    for x, y, w, h in boxes:
        w_add = w * BOX_INCR_PERCENT
        h_add = h * BOX_INCR_PERCENT
        x_inc = max(0, x - w_add)
        y_inc = max(0, y - h_add)
        w_inc = w + 2 * w_add
        h_inc = h + 2 * h_add
        increased_boxes.append((x_inc, y_inc, w_inc, h_inc))
    
    return increased_boxes

## Lookout for vision related functions

In [31]:
IMAGE_WIDTH = 384
IMAGE_HEIGHT = 384
PAD_COLOR = (0, 0, 0)

Since we get bounding boxes and co-ordinates from our prior pipeline, what we need to now do is to crop those objects of interest from the image frames. Once this is done, since AWS Lookout for Vision expects all the images to be of the same size, we resize the images to be of the same size before the images are sent to be labeled using AWS Lookout for Vision. 

In [32]:
# based on the code from AWS blogpost here:
# https://aws.amazon.com/blogs/machine-learning/computer-vision-based-anomaly-detection-using-amazon-lookout-for-vision-and-aws-panorama/
def crop_n_resize_image(img, bbox, size, padColor=0):
    '''
    img: Image to be modified
    bbox: bbox dimensions 
    size: Output size of the resized image
    padColor: If there is a specification for Pad Color in (R,G,B)
    '''
    # crop images ==============================
    crop = img[bbox[1]:bbox[1] + bbox[3], bbox[0]:bbox[0] + bbox[2]].copy()
    
    # cropped image size
    h, w = crop.shape[:2]
    # designed crop image sizes
    sh, sw = size

    # interpolation method
    if h > sh or w > sw: # shrinking image
        interp = cv2.INTER_AREA
    else: # stretching image
        interp = cv2.INTER_CUBIC

    # aspect ratio of image
    aspect = w/h 

    # compute scaling and pad sizing
    if aspect > 1: # horizontal image
        new_w = sw
        new_h = np.round(new_w/aspect).astype(int)
        pad_vert = (sh-new_h)/2
        pad_top, pad_bot = np.floor(pad_vert).astype(int), np.ceil(pad_vert).astype(int)
        pad_left, pad_right = 0, 0
    elif aspect < 1: # vertical image
        new_h = sh
        new_w = np.round(new_h*aspect).astype(int)
        pad_horz = (sw-new_w)/2
        pad_left, pad_right = np.floor(pad_horz).astype(int), np.ceil(pad_horz).astype(int)
        pad_top, pad_bot = 0, 0
    else: # square image
        new_h, new_w = sh, sw
        pad_left, pad_right, pad_top, pad_bot = 0, 0, 0, 0

    # set pad color
    if len(img.shape) == 3 and not isinstance(padColor, (list, tuple, np.ndarray)): # color image but only one color provided
        padColor = [padColor] * 3

    # scale and pad
    scaled_img = cv2.resize(crop, (new_w, new_h), interpolation=interp)
    scaled_img = cv2.copyMakeBorder(scaled_img, pad_top, pad_bot, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=padColor)

    return scaled_img

We now write a function to call Lookout For Vision and then detect anomalies in the images. For this we need to first encode the images into byte format, invoke the AWS Lookout for Vision client and link to the trained model. The model then returns a response. The response consists of two details which we are interested in:
1. IsAnomalous?
2. Confidence

In [33]:
def get_prediction(lookout_client, image, project_name, model_version):
    '''
    lookout_client: Reference to AWS Lookout for Vision Client
    image: Image to be sent to AWS Lookout for Vision
    project_name: Reference to project name
    model_version: Refer to which Anomaly Detection model on AWS Lookout for Vision is to be used
    '''
    image_bytes = cv2.imencode('.jpg', image)[1].tobytes()
    response=lookout_client.detect_anomalies(ProjectName=project_name, ContentType='image/jpeg', Body=image_bytes,
                                             ModelVersion=model_version)
    return response['DetectAnomalyResult']['IsAnomalous'], response['DetectAnomalyResult']['Confidence']

## Multi-object tracking

In [34]:
DETECTION_RATE = 15

In [35]:
DST_THRESHOLD = 100

DETECTION_RATE - rate to use object detection X frames (mainly to detect new objects that enters the frame)

DST_THRESHOLD - distance threshold to consider two boxes belong to the same object (to not count twice objects that were already detected previosly)

In [36]:
def get_center(box):
    '''
    box: a bounding box in X, Y, W, H format
    returns a center of a bounding box
    '''
    return box[0] + box[2] / 2, box[1] + box[3] / 2

In [37]:
class MultiTracker:
    '''
    tracks bounding boxes for detected objects through the video
    new objects should be initializes using add_boxes function (takes object detection results as an input)
    '''
    def __init__(self, inactive_thresh=2, upd_area_thresh=2.0, dst_thresh=100,
                 tracker_func=cv2.legacy.TrackerKCF_create):
        '''
        inactive_thresh: inactive objects threshold - if object is not present on the frame in {inactive_thresh}
          continious updates, drop it
        upd_area_thresh: a lower bound of are fraction to replace a box if a larger one was detected for
          the same object
        dst_thresh: distance threshold to consider two boxes belong to the same object
        tracker_func: OpenCV tracker creation function for individual objects
        '''
        self.trackers = []
        self.boxes = []
        self.inactive_time = []
        self.inactive_thresh = inactive_thresh
        self.upd_area_thresh = upd_area_thresh
        self.dst_thresh = dst_thresh
        self.tracker_func = tracker_func
        
        
    def get_objects(self):
        '''
        returns a list of bounding boxes for all objects on the frame
        '''
        result = []
        for obj_id, box in enumerate(self.boxes):
            if self.inactive_time[obj_id] == 0:
                result.append((obj_id, box))
        return result

    def _is_same_obj(self, box1, box2):
        '''
        checks if 2 boxes represent the same object (based on centroid distance)
        box1, box2: bounding boxes in X, Y, W, H format
        '''
        cx1, cy1 = get_center(box1)
        cx2, cy2 = get_center(box2)
        dist = math.hypot(cx1 - cx2, cy1 - cy2)
        return dist < self.dst_thresh
    
    
    def _create_tracker(self, box, frame):
        '''
        initializes a new tracker for a newly found object
        box: a bounding box in X, Y, W, H format
        frame: the entire frame image 
        '''
        tracker = self.tracker_func()
        tracker.init(frame, box)
        return tracker
        
        
    def add_boxes(self, boxes, frame):
        '''
        adds all new detected boxes to the tracker (creates trackers for them);
        ignores boxes for already tracked objects
        boxes: a list of boxes in X, Y, W, H format
        frame: the entire frame image 
        '''
        for box in boxes:
            found = False
            for obj_id, existing_box in enumerate(self.boxes):
                if self.inactive_time[obj_id] != 0:
                    continue
                if self._is_same_obj(box, existing_box):
                    if box[2] * box[3] >= existing_box[2] * existing_box[3] * self.upd_area_thresh:
                        # the new box is significantly bigger, replace the old one with the new one
                        self.trackers[obj_id] = self._create_tracker(box, frame)
                        self.inactive_time[obj_id] = 0
                        self.boxes[obj_id] = box
                    found = True
                    break
            if not found:
                # this box is a new one -> create a new tracker for this object
                self.trackers.append(self._create_tracker(box, frame))
                self.inactive_time.append(0)
                self.boxes.append(box)
                    

    def update(self, frame):
        '''
        updates existing object positions based on a new frame
        frame: a next frame image
        '''
        for obj_id, tracker in enumerate(self.trackers):
            if self.inactive_time[obj_id] < self.inactive_thresh:
                success, bbox = tracker.update(frame)
                if success:
                    self.inactive_time[obj_id] = 0
                    self.boxes[obj_id] = bbox
                else:
                    self.inactive_time[obj_id] += 1

## Functions to make a demo video

In [38]:
def add_info(img, info):
    '''
    adds text info to the bottom of the image
    img: an image to add info
    info: list of key, value pairs to print on the image
    '''
    H, W = img.shape[:2]
    
    cols = 2
    rows = (len(info) + cols - 1) // cols
    scale = 1.4
    color = (0, 0, 0)
    thickness = 2
    
    rh = 50 + rows * 70
    rw = W
    text_box = np.ones((rh, rw, 3), dtype=np.uint8) * 255
    
    for (i, (k, v)) in enumerate(info):
        c = i // rows
        r = i % rows
        x = 50 + ((rw // cols - 100) * c)
        y = 70 + r * 70
        text = "{}: {}".format(k, v)
        cv2.putText(text_box, text, (x, y), cv2.FONT_HERSHEY_SIMPLEX, scale, color, thickness)
        
    return cv2.vconcat([img, text_box])

In [39]:
def make_frames_with_tracking_boxes(frames, tracking_ad_results, output_result_dir):
    '''
    adds bounding boxes and text info to the video frames, saves it to a new folder
    frames: a list of frames with their names (each frame is represented as a pair: a name and a path to the image)
    tracking_ad_results: dict with all tracking and anomaly detection results; keys are frame names and values are
        lists of all bounding boxes, each bounding box is a tuple (idx, box, is_anomaly, confidence)
            idx - object id, persisted between frames
            box - bounding box in X, Y, W, H format
            is_anomaly - boolean flag for anomaly classification
            confidence - AWS Lookout for Vision confidence of resulting verdict
    output_result_dir: a directory to save resulting video frames
    '''
    result_frames = []
    objects_total = set()
    anomalies_total = set()
    for frame_name, frame_path in tqdm(frames):
        if frame_name not in tracking_ad_results:
            break
        objects_frame = set()
        anomalies_frame = set()
        
        frame = cv2.imread(frame_path)
        boxes = tracking_ad_results[frame_name]

        for idx, box, is_anomaly, confidence in boxes:
            x, y, w, h = box
            x, y, w, h = int(x), int(y), int(w), int(h)
            if confidence is None:
                color = (255, 0, 0)
                box_name = str(idx)
            else:
                anomaly_prob = confidence if is_anomaly else 1 - confidence
                color = (0, 255 * (1 - anomaly_prob), 255 * anomaly_prob)
                box_name = '{} {} {:.1f}'.format(idx, "A" if is_anomaly else "N", confidence)
            cv2.putText(frame, box_name, (x, y - 15), cv2.FONT_HERSHEY_PLAIN, 2, color, 2)
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 3)
            objects_total.add(idx)
            objects_frame.add(idx)
            if is_anomaly:
                anomalies_total.add(idx)
                anomalies_frame.add(idx)
        
        info = [
            ("#objects total", len(objects_total)),
            ("#anomalies total", len(anomalies_total)),
            ("anomalies percentage total", '{:.2f}'.format(len(anomalies_total) / len(objects_total) * 100)),
            ("#objects in the frame", len(objects_frame)),
            ("#anomalies in the frame", len(anomalies_frame)),
            ("anomalies percentage in the frame", '{:.2f}'.format(len(anomalies_frame) / len(objects_frame) * 100)),
        ]
        frame = add_info(frame, info)

        save_path = os.path.join(output_result_dir, f'{frame_name}.jpg')
        cv2.imwrite(save_path, frame)

        result_frames.append((frame_name, save_path))
    return result_frames

# Demo pipeline

## Create and train an AWS Lookout for Vision model

In [8]:
bucket = "lookoutvision-us-east-1-017e8d732b"
project = "anomaly_detection_demo"
dataset_folder = 'datasets/cookies_bboxes_01_2/'

In [3]:
client = boto3.client('lookoutvision', region_name='us-east-1')


This part is based on https://github.com/aws-samples/amazon-lookout-for-vision/blob/main/Amazon%20Lookout%20for%20Vision%20Lab.ipynb

Note: Training a model can (will) take a few hours as it uses Deep Learning in the background. Once your model is trained, you can continue with this notebook

### Creating Project

In [4]:
print('Creating project:' + project)
response=client.create_project(ProjectName=project)
print('project ARN: ' + response['ProjectMetadata']['ProjectArn'])
print('Done!')

Creating project:anomaly_detection_demo
project ARN: arn:aws:lookoutvision:us-east-1:313307525435:project/anomaly_detection_demo
Done!


### Creating training dataset

In [9]:
#Creating training dataset
dataset_type ='train'
manifest_file = os.path.join(dataset_folder, 'annotations_train.manifest')

print('Creating dataset...')
dataset=json.loads('{ "GroundTruthManifest": { "S3Object": { "Bucket": "' + bucket + '", "Key": "'+ manifest_file + '" } } }')

response=client.create_dataset(ProjectName=project, DatasetType=dataset_type, DatasetSource=dataset)
print('Dataset Status: ' + response['DatasetMetadata']['Status'])
print('Dataset Status Message: ' + response['DatasetMetadata']['StatusMessage'])
print('Dataset Type: ' + response['DatasetMetadata']['DatasetType'])
print('Done!')

Creating dataset...
Dataset Status: CREATE_IN_PROGRESS
Dataset Status Message: The dataset is creating.
Dataset Type: train
Done!


### Creating test dataset

In [11]:
dataset_type ='test'
manifest_file = os.path.join(dataset_folder, 'annotations_test.manifest')

print('Creating dataset...')
dataset=json.loads('{ "GroundTruthManifest": { "S3Object": { "Bucket": "' + bucket + '", "Key": "'+ manifest_file + '" } } }')

response=client.create_dataset(ProjectName=project, DatasetType=dataset_type, DatasetSource=dataset)
print('Dataset Status: ' + response['DatasetMetadata']['Status'])
print('Dataset Status Message: ' + response['DatasetMetadata']['StatusMessage'])
print('Dataset Type: ' + response['DatasetMetadata']['DatasetType'])
print('Done!')

Creating dataset...
Dataset Status: CREATE_IN_PROGRESS
Dataset Status Message: The dataset is creating.
Dataset Type: test
Done!


### Creating/training Model

In [12]:
output_bucket = bucket
output_folder = os.path.join('projects', project, 'models/')

    
print('Creating model...')
output_config=dataset=json.loads('{ "S3Location": { "Bucket": "' + output_bucket + '", "Prefix": "'+ output_folder + '" } } ')

response=client.create_model(ProjectName=project, OutputConfig=output_config)
print('ARN: ' + response['ModelMetadata']['ModelArn'])
print('Version: ' + response['ModelMetadata']['ModelVersion'])
print('Status: ' + response['ModelMetadata']['Status'])
print('Message: ' + response['ModelMetadata']['StatusMessage'])
print('Done!')

Creating model...
ARN: arn:aws:lookoutvision:us-east-1:313307525435:model/anomaly_detection_demo/1
Version: 1
Status: TRAINING
Message: The model is being trained.
Done!


In [13]:
model_version = response['ModelMetadata']['ModelVersion']

In [15]:
while client.describe_model(ProjectName=project, ModelVersion=model_version)['ModelDescription']['Status']!='TRAINED':
    print('.',end='')
    time.sleep(5)
print('Done!')

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

## Define I/O locations

In [16]:
s3_input_video_path = 's3://lookoutvision-us-east-1-017e8d732b/input_video/cookies-01.mp4'

In [17]:
workdir = '../data'

In [18]:
input_video_path = os.path.join(workdir, 'cookies-01.mp4')

In [19]:
output_directory = os.path.join(workdir, 'output')
os.makedirs(output_directory, exist_ok=True)

In [20]:
output_frames_dir = os.path.join(output_directory, 'video_frames')
os.makedirs(output_frames_dir, exist_ok=True)

In [21]:
output_result_dir = os.path.join(output_directory, 'tracking_frames')
os.makedirs(output_result_dir, exist_ok=True)

In [22]:
result_video_path = os.path.join(output_directory, 'pipeline_demo.mp4')

copy the input video from s3 to a local folder:

In [25]:
!aws s3 cp {s3_input_video_path} {input_video_path}

download: s3://lookoutvision-us-east-1-017e8d732b/input_video/cookies-01.mp4 to ../data/cookies-01.mp4


## Split input video into frames

In [40]:
frames = split_video(input_video_path, output_frames_dir, sample_rate=1)

## Create an AWS Lookout client

In [41]:
client.describe_model(ProjectName=project,ModelVersion=model_version)['ModelDescription']

{'ModelVersion': '1',
 'ModelArn': 'arn:aws:lookoutvision:us-east-1:313307525435:model/anomaly_detection_demo/1',
 'CreationTimestamp': datetime.datetime(2022, 9, 30, 16, 12, 51, 376000, tzinfo=tzlocal()),
 'Status': 'TRAINED',
 'StatusMessage': 'Training completed successfully.',
 'Performance': {'F1Score': 0.856249988079071,
  'Recall': 0.7611111402511597,
  'Precision': 0.9785714149475098},
 'OutputConfig': {'S3Location': {'Bucket': 'lookoutvision-us-east-1-017e8d732b',
   'Prefix': 'projects/anomaly_detection_demo/models/'}},
 'EvaluationManifest': {'Bucket': 'lookoutvision-us-east-1-017e8d732b',
  'Key': 'projects/anomaly_detection_demo/models/EvaluationManifest-anomaly_detection_demo-1.json'},
 'EvaluationResult': {'Bucket': 'lookoutvision-us-east-1-017e8d732b',
  'Key': 'projects/anomaly_detection_demo/models/EvaluationResult-anomaly_detection_demo-1.json'},
 'EvaluationEndTimestamp': datetime.datetime(2022, 9, 30, 17, 46, 11, 103000, tzinfo=tzlocal())}

## Start Lookout for vision model

In [27]:
response=client.start_model(ProjectName=project, ModelVersion=model_version, MinInferenceUnits=1)
print('Status: ' + response['Status'])


Status: STARTING_HOSTING


In [28]:
while client.describe_model(ProjectName=project,
                            ModelVersion=model_version)['ModelDescription']['Status'] != 'HOSTED':
    print('.',end='')
    time.sleep(5)
print('Done!')

...................................Done!


## Main step: do object tracking and anomaly detection

In [29]:
def _get_coordinates(rect):
    '''
    returns bottom left and top right corners coordinates of a rectangle
    rect: rectangle in X, Y, W, H format
    '''
    x1, y1, w, h = rect
    x2 = x1 + w
    y2 = y1 + h
    x1, x2 = min(x1, x2), max(x1, x2)
    y1, y2 = min(y1, y2), max(y1, y2)
    return x1, y1, x2, y2


def intersect_rectangles(rect1, rect2):
    '''
    finds intersection of 2 rectangles as a rectangle or None, if they don't intersect
    rect1, rect2: rectangles in X, Y, W, H format
    '''
    l1, d1, r1, u1 = _get_coordinates(rect1)
    l2, d2, r2, u2 = _get_coordinates(rect2)
    l = max(l1, l2)
    r = min(r1, r2)
    d = max(d1, d2)
    u = min(u1, u2)
    if l < r and d < u:
        return l, d, r - l, u - d
    else:
        return None
    
def get_area(rect):
    '''
    finds area of a rectangle
    rect: a rectangle in X, Y, W, H format
    '''
    x, y, w, h = rect
    return w * h

Objects that touch left or right edge considered as partially visible (because in our scenario the "conveyor belt" moves from left to right

In [30]:
EDGE_WIDTH_PERCENT = 0.05

EDGE_WIDTH_PERCENT - a width of left and right edge areas (in percentage of the total image width)

In [31]:
def not_touches_edge(box, frame_shape):
    '''
    checks if a bounding box doesn't touch an image edge (left or right)
    box: a bounding box in X, Y, W, H format
    frame_shape: image size in H, W format
    '''
    x_min = frame_shape[1] * EDGE_WIDTH_PERCENT
    x_max = frame_shape[1] - x_min
    y_min = frame_shape[0] * EDGE_WIDTH_PERCENT
    y_max = frame_shape[0] - y_min
    x0, y0, w, h = box
    x1 = x0 + w
    y1 = y0 + h
    return x_min <= x0 < x_max and x_min <= x1 < x_max
#     return x_min <= x0 < x_max and y_min <= y0 < y_max and x_min <= x1 < x_max and y_min <= y1 < y_max

The main pipeline steps:
* iterate over frames, detect new objects and track existing ones
    * iterate over objects and drop overlapping boxes (rarely happens because of false positives in tracking algorithm)
    * iterate over objects and assign anomaly classification labels (using AWS Lookout for vision for a new objects;
        do it only once for every object id, that cache the resulting label for the rest of the video)

In [32]:
downscale_ratio = 0.4

tracker = MultiTracker(dst_thresh=DST_THRESHOLD * downscale_ratio, tracker_func=cv2.legacy.TrackerKCF_create,
                        upd_area_thresh=1.15)
tracking_ad_results = {}

ad_predictions = {}

for frame_id, (frame_name, frame_path) in enumerate(tqdm(frames)):
    frame = cv2.imread(frame_path)
    downscale_size = (int(frame.shape[1] * downscale_ratio), int(frame.shape[0] * downscale_ratio))
    downscaled_frame = cv2.resize(frame, downscale_size)
    
    tracker.update(downscaled_frame)
    
    if frame_id % DETECTION_RATE == 0:
        # do object detection to add new objects
        boxes = find_objects_bboxes(frame, AREA_LOWER_BOUND)
        downscaled_boxes = np.array(boxes) * downscale_ratio
        
        tracker.add_boxes(downscaled_boxes, downscaled_frame)
    
    boxes = [(idx, (np.array(box) / downscale_ratio)) for idx, box in tracker.get_objects()]
    
    # filter overlapping boxes
    overlapped_boxes = set()
    for idx1, box1 in boxes:
        for idx2, box2 in boxes:
            if idx1 != idx2:
                intersection = intersect_rectangles(box1, box2)
                if intersection is not None and \
                        get_area(intersection) >= min(get_area(box1), get_area(box2)) * AREA_INTERSECTION_THRESH:
                    if get_area(box1) < get_area(box2):
                        overlapped_boxes.add(idx1)
                    else:
                        overlapped_boxes.add(idx2)
    boxes = [(idx, box) for idx, box in boxes if idx not in overlapped_boxes]
    
    # add anomaly detection results
    result = []
    for idx, bbox in boxes:
        if idx in ad_predictions:
            is_anomaly, confidence = ad_predictions[idx]
        else:
            if not_touches_edge(bbox, frame.shape):
                bbox = tuple(map(int, bbox))
                image_bbox = crop_n_resize_image(frame, bbox, (IMAGE_WIDTH, IMAGE_HEIGHT), PAD_COLOR)
                is_anomaly, confidence = get_prediction(client, image_bbox, project, model_version)
                ad_predictions[idx] = is_anomaly, confidence
            else:
                # object is partially visible, do not use anomaly classification for this frame
                # (will be done later, when objects fully enter the frame)
                is_anomaly, confidence = False, None
        result.append((idx, bbox, is_anomaly, confidence))
        
    tracking_ad_results[frame_name] = result
        

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1820/1820 [11:30<00:00,  2.63it/s]


## Stop AWS Lookout for vision model (to save costs)

In [33]:
response = client.stop_model(ProjectName=project, ModelVersion=model_version)
print('Status: ' + response['Status'])

Status: STOPPING_HOSTING


## Save resulting frames and make a demo video

In [34]:
result_frames = make_frames_with_tracking_boxes(frames, tracking_ad_results, output_result_dir)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1820/1820 [03:39<00:00,  8.29it/s]


In [35]:
save_video_from_frames(result_frames, result_video_path)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1820/1820 [02:11<00:00, 13.80it/s]
