# Business problem

My company, <b/>ZeroEyes</b>, provides a service that uses artificial intelligence
to identify, prevent & mitigate active shooter situations using pre-existing surveillance
cameras. 

To make every camera frame valuable, I decided to create a tool to help identify when
rolling shutter distortion occurs on cameras they monitor to avoid missing a crucial
frame when it matters most. For example, if a shooter is caught running down a hallway, swiftly rotating their body, 
or riding in a fast-moving vehicle, eliminating rolling shutter distortion is paramount.
Some potential use cases for the tool I have laid the framework for include using it in the first step in a process to attempt to correct
this distortion, or simply use it to identify cameras that often produce images with
rolling shutter distortion.

# Imports

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from pycocotools.coco import COCO
import cv2
import os
import glob
import h5py
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, accuracy_score

import functions as fn

import tensorflow as tf
from tensorflow import keras
from keras import backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping, CSVLogger

seed=42



Note: I highly recommend reviewing the `functions.py` file, which contains essential functions used to preprocess data & apply the artificial rolling shutter effect.

# Data synthesis


Unfortunately, there is no readily available data on rolling shutter distortion.
Consequently, I had to synthesize my own by applying a custom rolling shutter effect. Using the COCO 2017 dataset, I was able to create a new dataset using a built-from-scratch
rolling shutter effect. 

Using the COCO 2017 dataset, I was able to create a new dataset using a built-from-scratch
rolling shutter effect. Using people as the target for my model, I extracted polygonal
segmentation annotations provided by COCO and applied the rolling shutter effect to the area
within the segmentaion.

One of the advantages of using COCO is that it is a very large dataset, containing over
330,000 images in total, 200,000 of which are annotated. For the purposes of my project,
since not every image included a person in it, I had about 70,000 images to work with. Another advantage is that it is well-known and oft-used dataset, and is widely-regarded as a 
benchmark for machine learning algorithms.

However, there are some limitations to this dataset. The annotation boundaries are often
imprecise. For example, the polygonal segmentation annotations provided typically do not
capture all the pixels of any given object, which introduces additional distortion of its 
own in the data synthesis process. Further, some annotations are outright inaccurate.

The following code automatically transforms the images in the `./data/` directory and adds the newly created data into the `./created_data/` directory.

In [2]:
dataDir='data'
trainDir='train2017'
valDir='val2017'
seed=42
transformed=.5
train=.8
test = 1 - train
strategy_list = ['left', 'right']
destDataDir='created_data'
destTrainDir='train'
destTestDir='test'
destValDir='val'
destNormDir='NORMAL'
destRSDir='ROLLING_SHUTTER'

np.random.seed(seed)

trainAnn=f'{dataDir}/annotations/instances_{trainDir}.json'
valAnn=f'{dataDir}/annotations/instances_{valDir}.json'

cocoTrain=COCO(trainAnn)
cocoVal=COCO(valAnn)

catIdsTrain = cocoTrain.getCatIds(catNms=['person'])
imgIdsTrain = cocoTrain.getImgIds(catIds=catIdsTrain)
imgIdsTrain = cocoTrain.getImgIds(imgIds=imgIdsTrain)
annIdsTrain = cocoTrain.getAnnIds(imgIds=imgIdsTrain, catIds=catIdsTrain, iscrowd=None)

catIdsVal = cocoVal.getCatIds(catNms=['person'])
imgIdsVal = cocoVal.getImgIds(catIds=catIdsVal)
imgIdsVal = cocoVal.getImgIds(imgIds=imgIdsVal)
annIdsVal = cocoVal.getAnnIds(imgIds=imgIdsVal, catIds=catIdsVal, iscrowd=None)

loading annotations into memory...
Done (t=10.56s)
creating index...
index created!
loading annotations into memory...
Done (t=0.37s)
creating index...
index created!


Note: depending on your computer, the below code can take anywhere from 30 minutes to a few hours to synthesize data

In [None]:
np.random.seed(seed)

prog = 0
iters = 0
limit = 5000 # number of images to process for both RS and normal; results in around double the amount (limit of 200 = ~400 total images)

ids_ignore = []

# clears destination folders
destDir = os.path.join(os.getcwd(), destDataDir.replace('/', '\\'))
for traintestval in [destTrainDir, destTestDir, destValDir]:
    for normrs in [destNormDir, destRSDir]:
        files = glob.glob(f'{destDir}\\{traintestval}\\{normrs}\\*')
        for f in files:
            os.remove(f)

### ROLLING SHUTTER

ids_list = []
for i in np.arange(limit):
    id = np.random.choice(imgIdsTrain)
    if int(id) not in ids_ignore:
        ids_list.append(id)

for id in ids_list: # rolling shutter
    print(f'Loading image id: {id}')

    try:
        imgDict = cocoTrain.loadImgs([id])[0]
        annDict = cocoTrain.loadAnns(cocoTrain.getAnnIds(imgIds=imgDict['id'], catIds=catIdsTrain, iscrowd=None))

        fpath = '{}/{}/{}'.format(dataDir, trainDir, imgDict['file_name'])
        img = cv2.imread(fpath)
        fn.remove_true_blacks(img) # removes true blacks from photo (so data synthesis process is not confused by pre-existing true blacks in images)

        ann = annDict[np.random.randint(0, len(annDict))]
        strategy = strategy_list[np.random.randint(0, len(strategy_list))] # picks between left or right distortion

        masked, maskedInv = fn.generate_masked_images(img, ann) # cuts out annotated section of image, and also returns inverse "cutout" (remainder of image after being cut out)

        masked_rs = fn.apply_rolling_shutter(masked, ann, intensity=.7, strategy=strategy) # applies rolling shutter effect to masked "cutout"

        maskedInv_filled = fn.fill_inv_masked(maskedInv, ann, strategy=strategy) # fills missing pixel values in inverse "cutout"

        final = fn.recombine_masked_imgs(masked_rs, maskedInv_filled) # places post-rolling-shutter-effect "cutout" on top of inverse "cutout"

        if(prog < train): # handles whether or not to save to train or test directory
            fpath = '{}/{}/{}/{}'.format(destDataDir, destTrainDir, destRSDir, f'rs-{iters}.png')
            cv2.imwrite(fpath, final)
        else:
            fpath = '{}/{}/{}/{}'.format(destDataDir, destTestDir, destRSDir, f'rs-{iters}.png')
            cv2.imwrite(fpath, final)
    except:
        print('Something went wrong, continuing to next image')

    prog += 1 / len(ids_list)
    iters += 1
    print(f'Finished! {50*prog:.2f}% done')

### NORMAL

prog = 0
iters = 0
ids_list = []
for i in np.arange(limit):
    ids_list.append(np.random.choice(imgIdsTrain))

for id in ids_list: 
    print(f'Loading image id: {id}')

    imgDict = cocoTrain.loadImgs([id])[0]
    annDict = cocoTrain.loadAnns(cocoTrain.getAnnIds(imgIds=imgDict['id'], catIds=catIdsTrain, iscrowd=None))

    fpath = '{}/{}/{}'.format(dataDir, trainDir, imgDict['file_name'])
    img = cv2.imread(fpath)
    fn.remove_true_blacks(img) # removes true blacks from photo to make sure model doesn't learn that images without true blacks are unmodified

    if(prog < train): # handles whether or not to save to train or test directory
        fpath = '{}/{}/{}/{}'.format(destDataDir, destTrainDir, destNormDir, f'rs-{iters}.png')
        cv2.imwrite(fpath, img)
    else:
        fpath = '{}/{}/{}/{}'.format(destDataDir, destTestDir, destNormDir, f'rs-{iters}.png')
        cv2.imwrite(fpath, img)

    prog += 1 / len(ids_list)
    iters += 1
    print(f'Finished loading! {50 + 50*prog:.2f}% done')


# Modeling

Unfortunately, due to time constraints I only had time to train one transfer-learning CNN model. I chose to use MobileNetV3Large for its time efficiency at little cost to performance. I also trained a logistic regression and a random forest model, although they expectedly performed worse than the MobileNetV3Large model did.

## MobileNetV3Large

In [3]:
train_data = keras.preprocessing.image_dataset_from_directory(
    'created_data/train', 
    labels='inferred',
    subset="training",
    validation_split=.2,
    seed=seed,
    shuffle=True)

val_data = keras.preprocessing.image_dataset_from_directory(
    'created_data/train', 
    labels='inferred',
    subset="validation",
    validation_split=.2,
    seed=seed,
    shuffle=True)

test_data = keras.preprocessing.image_dataset_from_directory(
    'created_data/test', 
    labels='inferred',
    shuffle=False)

Found 7936 files belonging to 2 classes.
Using 6349 files for training.
Found 7936 files belonging to 2 classes.
Using 1587 files for validation.
Found 1974 files belonging to 2 classes.


In [4]:
def preprocess(image, label):
    resized_image = tf.image.resize(image, [512,512])
    final_image = keras.applications.mobilenet_v3.preprocess_input(resized_image)
    return final_image, label

def recall_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

In [5]:
train_data = train_data.map(preprocess).prefetch(1)
val_data = val_data.map(preprocess).prefetch(1)
test_data = test_data.map(preprocess).prefetch(1)

Note: the below code will take multiple hours to run, even on the highest-end of computers

In [None]:
base_model_mobilenetv3 = keras.applications.MobileNetV3Large(weights = 'imagenet', include_top = False)

model_dir = 'models'
model_uuid = 'model_MobileNetV3_v1'

for layer in base_model_mobilenetv3.layers:
    layer.trainable = False

avg = keras.layers.GlobalAveragePooling2D()(base_model_mobilenetv3.output)
output = keras.layers.Dense(1, activation = 'sigmoid')(avg)
model_mobilenetv3 = keras.Model(inputs = base_model_mobilenetv3.input, outputs = output)

early_stopping = EarlyStopping(monitor='val_loss', verbose=2, patience=10, min_delta=.00250)
model_checkpoint = ModelCheckpoint(f'{model_dir}/{model_uuid}_weights{{epoch:08d}}.h5', verbose = 2, save_best_only=False, period=1)
csv_logger = CSVLogger(f'{model_dir}/{model_uuid}.csv', separator = ',', append = True)

optimizer = keras.optimizers.SGD(learning_rate = 0.2, momentum = 0.9, decay = 0.01)
model_mobilenetv3.compile(loss = 'binary_crossentropy', optimizer = optimizer,  metrics = ['accuracy', recall_m, precision_m, f1_m])

results = model_mobilenetv3.fit_generator(train_data,
    epochs=1000,
    validation_data=val_data,
    callbacks=[early_stopping, model_checkpoint, csv_logger])

Once the code has been run once, the following code can be used to obtain the metrics of the best model (and adapted to view those of the other models).

In [7]:
base_model_mobilenetv3 = keras.applications.MobileNetV3Large(weights = 'imagenet', include_top = False)

model_dir = 'models'
model_uuid = 'model_MobileNetV3_v1'

for layer in base_model_mobilenetv3.layers:
    layer.trainable = False

avg = keras.layers.GlobalAveragePooling2D()(base_model_mobilenetv3.output)
output = keras.layers.Dense(1, activation = 'sigmoid')(avg)
final_model_mobilenetv3 = keras.Model(inputs = base_model_mobilenetv3.input, outputs = output)

early_stopping = EarlyStopping(monitor='val_loss', verbose=2, patience=10, min_delta=.00250)
model_checkpoint = ModelCheckpoint(f'{model_dir}/{model_uuid}_weights{{epoch:08d}}.h5', verbose = 2, save_best_only=False, period=1)
csv_logger = CSVLogger(f'{model_dir}/{model_uuid}.csv', separator = ',', append = True)

optimizer = keras.optimizers.SGD(learning_rate = 0.2, momentum = 0.9, decay = 0.01)
final_model_mobilenetv3.compile(loss = 'binary_crossentropy', optimizer = optimizer,  metrics = ['accuracy', recall_m, precision_m, f1_m])

final_model_mobilenetv3.load_weights('models/model_MobileNetV3_v1_weights00000059.h5')

final_model_mobilenetv3.evaluate(test_data)



[0.796765148639679,
 0.566362738609314,
 0.2683192789554596,
 0.49627789855003357,
 0.34728729724884033]

## Logistic regression & random forest

Unfortunately for these `sklearn` models, I had to convert all the images to `numpy` arrays, meaning I had to store every image in memory. Computers with low available memory may struggle to run the following code

In [None]:
train_imgs = keras.preprocessing.image.ImageDataGenerator(rescale=1./255).flow_from_directory('../created_data/train', batch_size=8000)
test_imgs = keras.preprocessing.image.ImageDataGenerator(rescale=1./255).flow_from_directory('../created_data/test', batch_size=2000)

X_i, y_i = next(train_imgs)
X_test, y_test = next(test_imgs)
X_train, X_val, y_train, y_val = train_test_split(X_i, y_i, train_size = 0.75, random_state = seed)

X_train = X_train.reshape(5952, -1)
X_val = X_val.reshape(1984, -1)
X_test = X_test.reshape(1974, -1)

y_train = y_train[:,1]
y_val = y_val[:,1]
y_test = y_test[:,1]

In [None]:
lr = LogisticRegression()
lr.fit(X_train,y_train)

lr_pred = lr.predict(X_test)

precision_score(y_test, lr_pred), accuracy_score(y_test, lr_pred)

(0.5146316851664985, 0.5207700101317123)

In [None]:
rf = RandomForestClassifier()
rf.fit(X_train, y_train)

rf_pred = rf.predict(X_test)

precision_score(y_test, rf_pred), accuracy_score(y_test, rf_pred)

(0.4727694090382387, 0.48226950354609927)

Due to the complexity of the task set forth, the models performed about as well as I expected. 
To achieve better results, I would likely need to train on a wider range of data, as well as for a longer period
That said, the MobileNetV3Large model managed to eke out a ~10% better accuracy than the 
simpler models, indicating that the model was able to recognize some elements
of the distortion effect, if not in full.

# Evaluation

The MobileNetV3Large model I have created is a proof of concept that models can be trained to recognize this type of distortion, although is not necessarily ready for implementation. Additionally, some wrinkles in the data synthesis process still must be ironed out; for example, as inaccurate and imprecise COCO annotations
hampered the data synthesis process introducing additional, unrelated distortion, more accurately annotated data must be sourced to improve the synthetic data quality. Additionally, pre-existing distortion within the COCO dataset needs to be accounted for.

I used accuracy as my primary metric to understand if the model could even recognize rolling shutter distortion, but in future iterations, I would focus more heavily on the precision score of models as rolling shutter is an admittedly rare occurrence and should only be flagged by the model unless it is completely sure it exists in the image.

For some future recommendations for continuing this project, I'd like to implement various neural network
architectures to achieve better metric scores. Next, I'd want to explore with locating and reversing this distortion in an
attempt to repair the image.