# Dataset creation notebook
Syncing the datasets correctly is important for training performance. The DVR recorded in FPV-googles does not have a fixed framerate. In this notebook, frames are synched by matching the frames with the lowest MSE. This notebook also resizes the hires video to native DVR res. Output is saved as a four dimentional npy file. 
### Imports
cv2 is used for reading video files, numpy for matrices, skimage for resizing the hires video and matplotlib for viewing frames.

In [1]:
import cv2
import numpy as np
from skimage.transform import resize
import matplotlib.pyplot as plt

Simple functions for showing frames. The first shows a single frame, the other one shows 25 frames. Useful for finding how far into the hires video the DVR starts.

In [6]:
def showFrame(frame, title = 'Frame', show = True):
    """ Shows a single frame """
    plt.imshow(frame)
    plt.title(title)
    if show:
        plt.show()
        

def showBatch(vid):
    """ Shows 25 frames in a grid """
    plt.figure(figsize=(10,10))
    for n in range(25):
        ax = plt.subplot(5,5,n+1)
        plt.imshow(vid[n])
        plt.title(n)
        plt.axis('off')
    plt.show()

Simple progressbar, not mine. Got it from https://gist.github.com/aubricus/f91fb55dc6ba5557fbab06119420dd6a#file-print_progress-py

In [7]:
def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required  : current iteration (Int)
        total       - Required  : total iterations (Int)
        prefix      - Optional  : prefix string (Str)
        suffix      - Optional  : suffix string (Str)
        decimals    - Optional  : positive number of decimals in percent complete (Int)
        length      - Optional  : character length of bar (Int)
        fill        - Optional  : bar fill character (Str)
        printEnd    - Optional  : end character (e.g. "\r", "\r\n") (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filledLength = int(length * iteration // total)
    bar = fill * filledLength + '-' * (length - filledLength)
    print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
    # Print New Line on Complete
    if iteration == total: 
        print()

Simple function that imports a video file using cv2. The outputed file is of shape frames height width 3

In [9]:
def importVideo(filename, start, length, cx, cw, cy, ch):        
    """ Import a video file using cv2. Input filename, startframe, length in frames, startx, width, starty and height in pixels"""
    stop = start + length
    cap = cv2.VideoCapture(filename)
    filelen = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    if stop < filelen:
        frameCount = stop
    else:
        frameCount = filelen
        print(filename, " is not longer than spesified length, full video is loaded")

    frameWidth = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frameHeight = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    buf = np.empty((length, frameHeight, frameWidth, 3), np.dtype('uint8'))
    cap.set(cv2.CAP_PROP_POS_FRAMES, start)
    fc = start
    p = 0
    ret = True
    print('loading', filename)
    while (fc < frameCount  and ret):
        if fc > start:
            printProgressBar(fc, frameCount, prefix = 'Loading:', suffix = 'Complete', length = 50)
            ret, buf[p] = cap.read()
            buf[p] = cv2.cvtColor(buf[p], cv2.COLOR_BGR2RGB)
            p += 1
        else:
            printProgressBar(fc, frameCount, prefix = 'Setting up:', suffix = 'Complete', length = 50)
            cap.read()
        fc += 1

    cap.release()
    return buf[:stop, cy:ch, cx:cw, :]

The following cell defines some variables used for loading the videos.

In [13]:
loResFilePath = "./BETADVR/PICT0037.AVI"
hiResFilePath = "./BETADVR/CADDX000012.MP4"
#           StartF Len in frames  x  width y  height
loResDim = [4180 , 1000         , 0, 640 , 0, 480 ]
hiResDim = [19540, 2*loResDim[1], 0, 1920, 0, 1080]

In the folowing cell, a lowres video and a highres video is loaded into memory. The timestamp of the lowres video is showed for 25 frames to make synching easier. Comment out the high res loading to only see the timestamp and a sample frame. 

In [None]:
X  = importVideo(loResFilePath, loResDim[0], loResDim[1], loResDim[2], loResDim[3], loResDim[4], loResDim[5])
HR = importVideo(hiResFilePath, hiResDim[0], hiResDim[1], hiResDim[2], hiResDim[3], hiResDim[4], hiResDim[5])
showBatch(X[: , :40, 590:, :]) #
showFrame(X[0], 'frame')

The dtype should be uint8 for both arrays. HR is about double the length of X

In [None]:
print(X.dtype, X.shape, HR.dtype, HR.shape)

The first frames should be almost the same. The getFirstFrame function finds the first match if the lowres video starts after the highres video, but time and memory is wasted if the gap is large.

In [None]:
showFrame(X[0, :, :, :])
showFrame(HR[0, :, :, :])

The resize function from skimage is used to resize the highres video down to the lowres videos dimentions. Replace this function with one that utilizes GPU if you know of one.

In [None]:
Y = []
rn = 0
print("Reshaping video to fit NN")
for row in HR:
    printProgressBar(rn, len(HR), prefix = 'Progress:', suffix = 'Complete', length = 50)
    Y.append(resize(HR[rn], (480, 640, 3)))
    rn += 1

The new Y variable is now in float format, and it is no longer use for HR

In [None]:
HR = None
del HR
Y = np.array(Y)
Y = Y * 256
Y = Y.astype(int)
print(Y.shape)
print(X.shape)

Matching the datasets comes down to getting the frame with the lowest MSE. Only checking a fixed number of frames makes execution way faster, while the risk of getting the correct frame is high. Checking the entire hiRes set would make execution time exponential instead of linear. If the drone is still in the air, and the captured frames are simalar, the MSE might be lowest on the same frames. Double matches gets printet. These are to be expected, but should be investigated. 

The matrix operations could be executed faster using GPU acceleration. 

In [None]:
def getFirstFrame(X, Y):
    """ Calculates the MSE between the first loRes frame and all the hiRes frames. Returns the position of the frame with the lowest MSE """
    sumDelta = np.zeros(len(Y))
    for i in range(len(Y)):
        sumDelta[i] = (( Y[i] - X[0])**2).mean(axis=None)
    print('First frame: ', np.argmin(sumDelta))
    return np.argmin(sumDelta)

def matchDataset(X, Y):
    """ Matches each loRes frame with a hiRes frame. Adjust windowsize if it does not work """
    lastframe = getFirstFrame(X, Y)
    y = np.empty((len(X), len(X[0]), len(X[0, 0]), len(X[0, 0, 0])), np.dtype('uint8'))
    y[0] = Y[lastframe]
    frames = np.zeros(len(X))
    for i in range(len(X)):
        sumDelta = np.zeros(8)
        for j in range(len(sumDelta)):
            sumDelta[j] = ((X[i] - Y[j + lastframe])**2).mean(axis=None)
        currF = np.argmin(sumDelta)
        lastframe = currF + lastframe
        if currF == 0:
            print('Doublematched:', i, ' and ', lastframe)
        elif currF > 6:
            print('Skipped 7 frames. Check out frame ', i)
        frames[i] = lastframe
        y[i] = Y[lastframe]
        printProgressBar(i, len(X), prefix = 'Matching datasets:', suffix = 'Complete', length = 50)
    return frames, y

frames, y = matchDataset(X, Y)

Try running the following cell with different frames. There shall be no diffrences exept the noise visable to the human eye

In [None]:
frame = 1 
showFrame( y[frame, 60:420, :, :], 'Y')
showFrame( X[frame, 60:420, :, :], 'X')
showFrame( y[frame, 60:420, :, :] - X[frame, 60:420, :, :], 'Delta' )
print(X.shape, y.shape)

Save the dataset using fitting names. Cut off the last few frames as they usualy are compleatly unusable. We do not need the extra frames anymore

In [None]:
np.save('NAME_NUM_UINT8Y', y[:-2,60:420, :, :])
np.save('NAME_NUM_UINT8X', X[:-2,60:420, :, :])