###### Experiment with Various Video Stabalization Models

This notebook will explore using [StabNet and the DUT](https://github.com/Annbless/DUTCode) as a potential image stabalization models. For SageMaker you will want to launch the instance as ml.g4dn.xlarge (4 vCPU + 16 GiB + 1 GPU) Python 3 (PyTorch 1.6 Python 3.6 GPU Optimized)

We have some pretrained models. So we need to look into what it actually takes to run a pytorch inference. My first work will to get the models running via an inference in a notebook.

The inferences can be run using scripts/deploy_samples.sh. I've brought the code into this notebook so I can study it. to better understand what is going on and where.

## Running StabNet Stabalizer

These qre the interesting pieces:

```bash
OutputBasePath='results/'
StabNetPath='ckpt/stabNet.pth'
InputPath='images/'
```

```bash
# Run the StabNet model
echo " Stabiling using the StabNet model "
echo "-----------------------------------"

python ./scripts/StabNetStabilizer.py \
    --modelPath=$StabNetPath \
    --OutputBasePath=$OutputBasePath \
    --InputBasePath=$InputPath 
```

So given this I will pull the code from StabNetStabilizer.py to explore it. I will modify it as needed to support running from the notebook so I can really get in and explore what is going on.

```python
import torch
import torch.nn as nn
import argparse
from PIL import Image
import cv2
import os
import traceback
import math
import time
import sys
parentddir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
sys.path.append(parentddir)

from models.StabNet.v2_93 import *
from models.StabNet.model import stabNet

parser = argparse.ArgumentParser()
parser.add_argument('--modelPath', default='./models')
parser.add_argument('--before-ch', type=int)
parser.add_argument('--OutputBasePath', default='data_video_local')
parser.add_argument('--InputBasePath', default='')
parser.add_argument('--max-span', type=int, default=1)
parser.add_argument('--refine', type=int, default=1)
parser.add_argument('--no_bm', type=int, default=1)
args = parser.parse_args()

MaxSpan = args.max_span
args.indices = indices[1:]
batch_size = 1

before_ch = max(args.indices)#args.before_ch
after_ch = max(1, -min(args.indices) + 1)

model = stabNet()
r_model = torch.load(args.modelPath)
model.load_state_dict(r_model)
model.cuda()
model.eval()

def cvt_img2train(img, crop_rate = 1):
    img = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2GRAY))
    if (crop_rate != 1):
        h = int(height / crop_rate)
        dh = int((h - height) / 2)
        w = int(width / crop_rate)
        dw = int((w - width) / 2)

        img = img.resize((w, h), Image.BILINEAR)
        img = img.crop((dw, dh, dw + width, dh + height))
    else:
        img = img.resize((width, height), Image.BILINEAR)
    img = np.array(img)
    img = img * (1. / 255) - 0.5
    img = img.reshape((1, height, width, 1))
    return img

def make_dirs(path):
    if not os.path.exists(path): os.makedirs(path)

cvt_train2img = lambda x: ((np.reshape(x, (height, width)) + 0.5) * 255).astype(np.uint8)

def warpRevBundle2(img, x_map, y_map):
    assert(img.ndim == 3)
    assert(img.shape[-1] == 3)
    rate = 4
    x_map = cv2.resize(cv2.resize(x_map, (int(width / rate), int(height / rate))), (width, height))
    y_map = cv2.resize(cv2.resize(y_map, (int(width / rate), int(height / rate))), (width, height))
    x_map = (x_map + 1) / 2 * width
    y_map = (y_map + 1) / 2 * height
    dst = cv2.remap(img, x_map, y_map, cv2.INTER_LINEAR)
    assert(dst.shape == (height, width, 3))
    return dst

production_dir = args.OutputBasePath
make_dirs(production_dir)

image_len = len([ele for ele in os.listdir(args.InputBasePath) if ele[-4:] == '.jpg'])
images = []

for i in range(image_len):

    image = cv2.imread(os.path.join(args.InputBasePath, '{}.jpg'.format(i)))
    image = cv2.resize(image, (width, height))
    images.append(image)

print('inference with {}'.format(args.indices))

tot_time = 0

print('totally {} frames for stabilization'.format(len(images)))

before_frames = []
before_masks = []
after_frames = []
after_temp = []

cnt = 0

frame = images[cnt]

cnt += 1

for i in range(before_ch):
    before_frames.append(cvt_img2train(frame, crop_rate))
    before_masks.append(np.zeros([1, height, width, 1], dtype=np.float))
    temp = before_frames[i]
    temp = ((np.reshape(temp, (height, width)) + 0.5) * 255).astype(np.uint8)

    temp = np.concatenate([temp, np.zeros_like(temp)], axis=1)
    temp = np.concatenate([temp, np.zeros_like(temp)], axis=0)


for i in range(after_ch):
    frame = images[cnt]
    cnt = cnt + 1
    frame_unstable = frame
    after_temp.append(frame)
    after_frames.append(cvt_img2train(frame, 1))

length = 0
in_xs = []
delta = 0

dh = int(height * 0.8 / 2)
dw = int(width * 0.8 / 2)
all_black = np.zeros([height, width], dtype=np.int64)
frames = []

black_mask = np.zeros([dh, width], dtype=np.float)
temp_mask = np.concatenate([np.zeros([height - 2 * dh, dw], dtype=np.float), np.ones([height - 2 * dh, width - 2 * dw], dtype=np.float), np.zeros([height - 2 * dh, dw], dtype=np.float)], axis=1)
black_mask = np.reshape(np.concatenate([black_mask, temp_mask, black_mask], axis=0),[1, height, width, 1]) 

try:
    while(True):

        in_x = []
        if input_mask:
            for i in args.indices:
                if (i > 0):
                    in_x.append(before_masks[-i])
        for i in args.indices:
            if (i > 0):
                in_x.append(before_frames[-i])
        in_x.append(after_frames[0])
        for i in args.indices:
            if (i < 0):
                in_x.append(after_frames[-i])
        if (args.no_bm == 0):
            in_x.append(black_mask)
        # for i in range(after_ch + 1):
        in_x = np.concatenate(in_x, axis = 3)
        # for max span
        if MaxSpan != 1:
            in_xs.append(in_x)
            if len(in_xs) > MaxSpan: 
                in_xs = in_xs[-1:]
                print('cut')
            in_x = in_xs[0].copy()
            in_x[0, ..., before_ch] = after_frames[0][..., 0]
        tmp_in_x = np.array(in_x.copy())
        for j in range(args.refine):
            start = time.time()
            img, black, x_map_, y_map_ = model.forward(torch.Tensor(tmp_in_x.transpose((0, 3, 1, 2))).cuda())
            img = img.cpu().clone().detach().numpy()
            black = black.cpu().clone().detach().numpy()
            x_map_ = x_map_.cpu().clone().detach().numpy()
            y_map_ = y_map_.cpu().clone().detach().numpy()
            tot_time += time.time() - start
            black = black[0, :, :]
            xmap = x_map_[0, :, :, 0]
            ymap = y_map_[0, :, :, 0]
            all_black = all_black + np.round(black).astype(np.int64)
            img = img[0, :, :, :].reshape(height, width)
            frame = img + black * (-1)
            frame = frame.reshape(1, height, width, 1)
            tmp_in_x[..., -1] = frame[..., 0]
        img = ((np.reshape(img + 0.5, (height, width))) * 255).astype(np.uint8)
        
        net_output = img

        img_warped = warpRevBundle2(cv2.resize(after_temp[0], (width, height)), xmap, ymap)
        frames.append(img_warped)

        if cnt + 1 <= len(images):
            frame_unstable = images[cnt]
            cnt = cnt + 1
            ret = True
        else:
            ret = False  
        
        if (not ret):
            break
        length = length + 1
        if (length % 10 == 0):
            print("length: " + str(length))      
            print('fps={}'.format(length / tot_time))

        before_frames.append(frame)
        before_masks.append(black.reshape((1, height, width, 1)))
        before_frames.pop(0)
        before_masks.pop(0)
        after_frames.append(cvt_img2train(frame_unstable, 1))
        after_frames.pop(0)
        after_temp.append(frame_unstable)
        after_temp.pop(0)
except Exception as e:
    traceback.print_exc()
finally:
    print('total length={}'.format(length + 2))

    black_sum = np.zeros([height + 1, width + 1], dtype=np.int64)
    for i in range(height):
        for j in range(width):
            black_sum[i + 1][j + 1] = black_sum[i][j + 1] + black_sum[i + 1][j] - black_sum[i][j] + all_black[i][j]
    max_s = 0
    ans = []
    for i in range(0, int(math.floor(height * 0.5)), 10):
        print(i)
        print(max_s)
        for j in range(0, int(math.floor(width * 0.5)), 10):
            if (all_black[i][j] > 0):
                continue
            for hh in range(i, height):
                dw = int(math.floor(float(max_s) / (hh - i + 1)))
                for ww in range(j, width):
                    if (black_sum[hh + 1][ww + 1] - black_sum[hh + 1][j] - black_sum[i][ww + 1] + black_sum[i][j] > 0):
                        break
                    else:
                        s = (hh - i + 1) * (ww - j + 1)
                        if (s > max_s):
                            max_s = s
                            ans = [i, j, hh, ww]
    videoWriter = cv2.VideoWriter(os.path.join(production_dir, 'StabNet_stable.mp4'), 
        cv2.VideoWriter_fourcc(*'MP4V'), 25, (ans[3] - ans[1] + 1, ans[2] - ans[0] + 1))
    for frame in frames:
        frame_ = frame[ans[0]:ans[2] + 1, ans[1]:ans[3] + 1, :]
        videoWriter.write(frame_)
    videoWriter.release()
```

## Setup

In [1]:
import os

FFMPEG_TAR = 'ffmpeg-release-amd64-static.tar.xz'
if os.path.exists(FFMPEG_TAR):
    os.remove(FFMPEG_TAR)
    
!wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
!tar -xf ffmpeg-release-amd64-static.tar.xz
!ffmpeg-4.4-amd64-static/ffmpeg -version

if os.path.exists(FFMPEG_TAR):
    os.remove(FFMPEG_TAR)

!/opt/conda/bin/python -m pip install --upgrade pip
!pip install scikit-image
!pip install easydict
!pip install pypng

!conda update -n base -y -c defaults conda
!conda install -y -c conda-forge cupy

wget: /opt/conda/lib/libuuid.so.1: no version information available (required by wget)
--2021-09-08 21:06:10--  https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
Resolving johnvansickle.com (johnvansickle.com)... 107.180.57.212
Connecting to johnvansickle.com (johnvansickle.com)|107.180.57.212|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39577132 (38M) [application/x-xz]
Saving to: ‘ffmpeg-release-amd64-static.tar.xz’


2021-09-08 21:06:12 (17.5 MB/s) - ‘ffmpeg-release-amd64-static.tar.xz’ saved [39577132/39577132]

tar: ffmpeg-4.4-amd64-static/GPLv3.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: ffmpeg-4.4-amd64-static/manpages/ffmpeg-all.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: ffmpeg-4.4-amd64-static/manpages/ffmpeg-scaler.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: ffmpeg-4.4-amd64-static/manpages/ffmpeg-resampler.txt

## My Exploration of Pytorch version of StabNet.

In [2]:
import torch
import torch.nn as nn
import argparse
from PIL import Image
import cv2
import os
import traceback
import math
import time
import sys

project_location = os.getcwd()
sys.path.append(os.path.join(project_location,'DUTCode'))
project_location

'/root/hand-tracking-stabilization'

In [3]:
from models.StabNet.v2_93 import *
from models.StabNet.model import stabNet

#help(stabNet)

### Load the Pre-trained Models from S3

I pulled the pre-trained models to S3 to protect them from disappearing. At the time the pre-trained models were at [https://drive.google.com/drive/folders/15T8Wwf1OL99AKDGTgECzwubwTqbkmGn6](https://drive.google.com/drive/folders/15T8Wwf1OL99AKDGTgECzwubwTqbkmGn6)

In [4]:
%%time

import os
import boto3

project_dir = os.getcwd()

data_dir = 'DUTPretrained'
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)
os.chdir(data_dir)

s3 = boto3.client('s3')
s3.download_file('madat-machine-learning-data', 'Mark/capstone-project/pre-trained-models/ckpt-20210817T154228Z-001.zip', 'ckpt-20210817T154228Z-001.zip')

from zipfile import ZipFile

with ZipFile('ckpt-20210817T154228Z-001.zip', 'r') as zipObj:
    zipObj.extractall()
    
os.chdir('..')
os.getcwd()

CPU times: user 2.85 s, sys: 1.16 s, total: 4.02 s
Wall time: 7.54 s


'/root/hand-tracking-stabilization'

In [5]:
# From deploy_samples.sh

#OutputBasePath='results/'
#StabNetPath='ckpt/stabNet.pth'
#InputPath='images/'

#--modelPath=$StabNetPath \
#--OutputBasePath=$OutputBasePath \
#--InputBasePath=$InputPath 
class Arguments:
    
    modelPath = 'DUTPretrained/ckpt/stabNet.pth'
    before_ch = 0
    OutputBasePath = 'results/'
    InputBasePath = 'DUTCode/images/'
    max_span = 1
    refine = 1
    no_bm=1
    
args = Arguments()

MaxSpan = args.max_span
batch_size = 1

# WATCH FOR THESE: Not sure what these are used for. 
args.indices = indices[1:]
before_ch = max(args.indices)#args.before_ch
after_ch = max(1, -min(args.indices) + 1)

before_ch, after_ch, args.indices

(32, 1, [1, 2, 4, 8, 16, 32])

In [6]:
model = stabNet()
r_model = torch.load(args.modelPath)
model.load_state_dict(r_model)
model.cuda()
model.eval()

stabNet(
  (resnet50): KitModel(
    (resnet_v2_50_conv1_Conv2D): Conv2d(13, 64, kernel_size=(7, 7), stride=(2, 2))
    (resnet_v2_50_block1_unit_1_bottleneck_v2_preact_FusedBatchNorm): BatchNorm2d(64, eps=1e-05, momentum=0.003, affine=True, track_running_stats=True)
    (resnet_v2_50_block1_unit_1_bottleneck_v2_shortcut_Conv2D): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
    (resnet_v2_50_block1_unit_1_bottleneck_v2_conv1_Conv2D): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (resnet_v2_50_block1_unit_1_bottleneck_v2_conv1_BatchNorm_FusedBatchNorm): BatchNorm2d(64, eps=1e-05, momentum=0.003, affine=True, track_running_stats=True)
    (resnet_v2_50_block1_unit_1_bottleneck_v2_conv2_Conv2D): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (resnet_v2_50_block1_unit_1_bottleneck_v2_conv2_BatchNorm_FusedBatchNorm): BatchNorm2d(64, eps=1e-05, momentum=0.003, affine=True, track_running_stats=True)
    (resnet_v2_50_block1_unit_1_bottleneck_v2_con

In [7]:
tmp_stable_video_filename = 'temp_StabNet_stable.mp4'
stable_video_filename = 'StabNet_stable.mp4'

def cvt_img2train(img, crop_rate = 1):
    img = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2GRAY))
    if (crop_rate != 1):
        h = int(height / crop_rate)
        dh = int((h - height) / 2)
        w = int(width / crop_rate)
        dw = int((w - width) / 2)

        img = img.resize((w, h), Image.BILINEAR)
        img = img.crop((dw, dh, dw + width, dh + height))
    else:
        img = img.resize((width, height), Image.BILINEAR)
    img = np.array(img)
    img = img * (1. / 255) - 0.5
    img = img.reshape((1, height, width, 1))
    return img

def make_dirs(path):
    if not os.path.exists(path): os.makedirs(path)

cvt_train2img = lambda x: ((np.reshape(x, (height, width)) + 0.5) * 255).astype(np.uint8)

def warpRevBundle2(img, x_map, y_map):
    assert(img.ndim == 3)
    assert(img.shape[-1] == 3)
    rate = 4
    x_map = cv2.resize(cv2.resize(x_map, (int(width / rate), int(height / rate))), (width, height))
    y_map = cv2.resize(cv2.resize(y_map, (int(width / rate), int(height / rate))), (width, height))
    x_map = (x_map + 1) / 2 * width
    y_map = (y_map + 1) / 2 * height
    dst = cv2.remap(img, x_map, y_map, cv2.INTER_LINEAR)
    assert(dst.shape == (height, width, 3))
    return dst

production_dir = args.OutputBasePath
make_dirs(production_dir)

image_len = len([ele for ele in os.listdir(args.InputBasePath) if ele[-4:] == '.jpg'])
images = []

for i in range(image_len):

    #print(os.path.join(args.InputBasePath, '{:03d}.jpg'.format(i)))
    
    image = cv2.imread(os.path.join(args.InputBasePath, '{:03d}.jpg'.format(i)))
    image = cv2.resize(image, (width, height))
    images.append(image)

print('inference with {}'.format(args.indices))

tot_time = 0

print('totally {} frames for stabilization'.format(len(images)))

before_frames = []
before_masks = []
after_frames = []
after_temp = []

cnt = 0

frame = images[cnt]

cnt += 1

for i in range(before_ch):
    before_frames.append(cvt_img2train(frame, crop_rate))
    before_masks.append(np.zeros([1, height, width, 1], dtype=np.float))
    temp = before_frames[i]
    temp = ((np.reshape(temp, (height, width)) + 0.5) * 255).astype(np.uint8)

    temp = np.concatenate([temp, np.zeros_like(temp)], axis=1)
    temp = np.concatenate([temp, np.zeros_like(temp)], axis=0)


for i in range(after_ch):
    frame = images[cnt]
    cnt = cnt + 1
    frame_unstable = frame
    after_temp.append(frame)
    after_frames.append(cvt_img2train(frame, 1))

length = 0
in_xs = []
delta = 0

dh = int(height * 0.8 / 2)
dw = int(width * 0.8 / 2)
all_black = np.zeros([height, width], dtype=np.int64)
frames = []

black_mask = np.zeros([dh, width], dtype=np.float)
temp_mask = np.concatenate([np.zeros([height - 2 * dh, dw], dtype=np.float), np.ones([height - 2 * dh, width - 2 * dw], dtype=np.float), np.zeros([height - 2 * dh, dw], dtype=np.float)], axis=1)
black_mask = np.reshape(np.concatenate([black_mask, temp_mask, black_mask], axis=0),[1, height, width, 1]) 

try:
    while(True):

        in_x = []
        if input_mask:
            for i in args.indices:
                if (i > 0):
                    in_x.append(before_masks[-i])
        for i in args.indices:
            if (i > 0):
                in_x.append(before_frames[-i])
        in_x.append(after_frames[0])
        for i in args.indices:
            if (i < 0):
                in_x.append(after_frames[-i])
        if (args.no_bm == 0):
            in_x.append(black_mask)
        # for i in range(after_ch + 1):
        in_x = np.concatenate(in_x, axis = 3)
        # for max span
        if MaxSpan != 1:
            in_xs.append(in_x)
            if len(in_xs) > MaxSpan: 
                in_xs = in_xs[-1:]
                print('cut')
            in_x = in_xs[0].copy()
            in_x[0, ..., before_ch] = after_frames[0][..., 0]
        tmp_in_x = np.array(in_x.copy())
        for j in range(args.refine):
            start = time.time()
            img, black, x_map_, y_map_ = model.forward(torch.Tensor(tmp_in_x.transpose((0, 3, 1, 2))).cuda())
            img = img.cpu().clone().detach().numpy()
            black = black.cpu().clone().detach().numpy()
            x_map_ = x_map_.cpu().clone().detach().numpy()
            y_map_ = y_map_.cpu().clone().detach().numpy()
            tot_time += time.time() - start
            black = black[0, :, :]
            xmap = x_map_[0, :, :, 0]
            ymap = y_map_[0, :, :, 0]
            all_black = all_black + np.round(black).astype(np.int64)
            img = img[0, :, :, :].reshape(height, width)
            frame = img + black * (-1)
            frame = frame.reshape(1, height, width, 1)
            tmp_in_x[..., -1] = frame[..., 0]
        img = ((np.reshape(img + 0.5, (height, width))) * 255).astype(np.uint8)

        net_output = img

        img_warped = warpRevBundle2(cv2.resize(after_temp[0], (width, height)), xmap, ymap)
        frames.append(img_warped)

        if cnt + 1 <= len(images):
            frame_unstable = images[cnt]
            cnt = cnt + 1
            ret = True
        else:
            ret = False  

        if (not ret):
            break
        length = length + 1
        if (length % 10 == 0):
            print("length: " + str(length))      
            print('fps={}'.format(length / tot_time))

        before_frames.append(frame)
        before_masks.append(black.reshape((1, height, width, 1)))
        before_frames.pop(0)
        before_masks.pop(0)
        after_frames.append(cvt_img2train(frame_unstable, 1))
        after_frames.pop(0)
        after_temp.append(frame_unstable)
        after_temp.pop(0)
except Exception as e:
    traceback.print_exc()
finally:
    print('total length={}'.format(length + 2))

    black_sum = np.zeros([height + 1, width + 1], dtype=np.int64)
    for i in range(height):
        for j in range(width):
            black_sum[i + 1][j + 1] = black_sum[i][j + 1] + black_sum[i + 1][j] - black_sum[i][j] + all_black[i][j]
    max_s = 0
    ans = []
    for i in range(0, int(math.floor(height * 0.5)), 10):
        print(i)
        print(max_s)
        for j in range(0, int(math.floor(width * 0.5)), 10):
            if (all_black[i][j] > 0):
                continue
            for hh in range(i, height):
                dw = int(math.floor(float(max_s) / (hh - i + 1)))
                for ww in range(j, width):
                    if (black_sum[hh + 1][ww + 1] - black_sum[hh + 1][j] - black_sum[i][ww + 1] + black_sum[i][j] > 0):
                        break
                    else:
                        s = (hh - i + 1) * (ww - j + 1)
                        if (s > max_s):
                            max_s = s
                            ans = [i, j, hh, ww]
    videoWriter = cv2.VideoWriter(os.path.join(production_dir, tmp_stable_video_filename), 
        cv2.VideoWriter_fourcc(*'MP4V'), 25, (ans[3] - ans[1] + 1, ans[2] - ans[0] + 1))
    for frame in frames:
        frame_ = frame[ans[0]:ans[2] + 1, ans[1]:ans[3] + 1, :]
        height, width, channels = frame_.shape
        videoWriter.write(frame_)
    videoWriter.release()

print("Frame shape:",height, width, channels)
full_stable_video_filename = os.path.join(production_dir, stable_video_filename)
full_tmp_stable_video_filename = os.path.join(production_dir, tmp_stable_video_filename)

# OpenCV doesn't ship with the H264 codec that you need to see the video in a notebook; due to licensing incompatabilities. For that reason 
# I encode as MP4V and then I post process with FFMPEG
if os.path.exists(full_stable_video_filename):
    os.remove(full_stable_video_filename)
    
!ffmpeg-4.4-amd64-static/ffmpeg -i {full_tmp_stable_video_filename} -c:v h264 {full_stable_video_filename}
    
if os.path.exists(full_tmp_stable_video_filename):
    os.remove(full_tmp_stable_video_filename)


inference with [1, 2, 4, 8, 16, 32]
totally 480 frames for stabilization
[2021-09-08 21:07:12.824 pytorch-1-6-gpu-py3-ml-g4dn-xlarge-594def216eaae0b31fbf025840e5:4025 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2021-09-08 21:07:12.911 pytorch-1-6-gpu-py3-ml-g4dn-xlarge-594def216eaae0b31fbf025840e5:4025 INFO profiler_config_parser.py:102] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.
length: 10
fps=1.6208877013374174
length: 20
fps=2.9592381438328204
length: 30
fps=4.081515932912033
length: 40
fps=5.040014683956966
length: 50
fps=5.865948430498416
length: 60
fps=6.5900198131433765
length: 70
fps=7.227461575660989
length: 80
fps=7.792907440035924
length: 90
fps=8.297485019515678
length: 100
fps=8.75209073002504
length: 110
fps=9.159254882889485
length: 120
fps=9.531647451642298
length: 130
fps=9.870204863499486
length: 140
fps=10.180722996010047
length: 150
fps=10.449978923832168
length: 160
fps=10.705428023051246
length: 170
fps=10.9

In [8]:
import cv2
import os

image_folder = 'DUTCode/images'
tmp_video_name = 'tmp_unstable.mp4'
video_name = 'unstable.mp4'
tmp_clipped_video_name = 'tmp_clipped_unstable.mp4'
clipped_video_name = 'clipped_unstable.mp4'

height = 0
width = 0
total_number = 480
    
for img in os.listdir(image_folder):
    file, ext = os.path.splitext(img)
    if img.endswith(".jpg"):
        file_name_corrected = os.path.join(image_folder,file.zfill(3)+'.jpg')
        uncorrected_file_name =  os.path.join(image_folder,img)
        #print(uncorrected_file_name +' => ' + file_name_corrected)
        os.rename(uncorrected_file_name,file_name_corrected)
        
        frame = cv2.imread(file_name_corrected)
        height, width, channels = frame.shape


fourcc = cv2.VideoWriter_fourcc(*'mp4v')
print(width, height)

video = cv2.VideoWriter(tmp_video_name, fourcc, 25, (width, height))

# Now that everything is renamed it should be in the correct order
for img_number in range(480):
    image = os.path.join(image_folder,str(img_number).zfill(3)+'.jpg')
    #print(image)
    img = cv2.imread(image)
    
    video.write(img)

video.release()


clipped_video = cv2.VideoWriter(tmp_clipped_video_name, fourcc, 25, (342, 206))

# Now that everything is renamed it should be in the correct order
for img_number in range(480):
    image = os.path.join(image_folder,str(img_number).zfill(3)+'.jpg')
    #print(image)
    img = cv2.imread(image)

    y_offset = (height-206)//2
    x_offset = (width-342)//2
    crop_img = img[y_offset:y_offset+206, x_offset:x_offset+342]
    #print(crop_img.shape)
    clipped_video.write(crop_img)

clipped_video.release()

print('Complete')

# OpenCV doesn't ship with the H264 codec that you need to see the video in a notebook; due to licensing incompatabilities. For that reason 
# I encode as MP4V and then I post process with FFMPEG
if os.path.exists(video_name):
    os.remove(video_name)
if os.path.exists(clipped_video_name):
    os.remove(clipped_video_name)

!ffmpeg-4.4-amd64-static/ffmpeg -i {tmp_video_name} -c:v h264 {video_name}
!ffmpeg-4.4-amd64-static/ffmpeg -i {tmp_clipped_video_name} -c:v h264 {clipped_video_name}
    
if os.path.exists(tmp_video_name):
    os.remove(tmp_video_name)
if os.path.exists(tmp_clipped_video_name):
    os.remove(tmp_clipped_video_name)

640 360
Complete
ffmpeg version 4.4-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
 

In [9]:
%%html
<center>Full Unstable (640x360)</center>
<video width="640" height="360" controls autoplay>
    <source src="unstable.mp4" type="video/mp4">
</video>

<table>
  <tr>
    <td>
        <center>Clipped Unstabalized (342x206)</center>
        <video width="342" height="206" controls autoplay>
          <source src="clipped_unstable.mp4" type="video/mp4">
        </video>
    </td>
    <td>
        <center>Stabalized (342x206)</center>
        <video width="342" height="206" controls autoplay>
          <source src="results/StabNet_stable.mp4" type="video/mp4">
        </video>
    </td>
</table>

0,1
Clipped Unstabalized (342x206),Stabalized (342x206)


## Running DUT Stabalizer

These are the interesting pieces:

```bash
OutputBasePath='results/'
SmootherPath='ckpt/smoother.pth'
RFDetPath='ckpt/RFDet_640.pth.tar'
PWCNetPath='ckpt/network-default.pytorch'
MotionProPath='ckpt/MotionPro.pth'
InputPath='images/'
```

```bash
# Run the DUT model
echo " Stabiling using the DUT model "
echo "-----------------------------------"

python ./scripts/DUTStabilizer.py \
	--SmootherPath=$SmootherPath \
    --RFDetPath=$RFDetPath \
    --PWCNetPath=$PWCNetPath \
    --MotionPro=$MotionProPath \
    --InputBasePath=$InputPath \
    --OutputBasePath=$OutputBasePath 
```

So given this I will pull the code from StabNetStabilizer.py to explore it. I will modify it as needed to support running from the notebook so I can really get in and explore what is going on.

```python
import torch
import torch.nn as nn
import numpy as np
import os
import math
import cv2
import sys
parentddir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
sys.path.append(parentddir)

from models.DUT.DUT import DUT
from tqdm import tqdm
from utils.WarpUtils import warpListImage
from configs.config import cfg
import argparse

torch.set_grad_enabled(False)

def parse_args():
    parser = argparse.ArgumentParser(description='Control for stabilization model')
    parser.add_argument('--SmootherPath', help='the path to pretrained smoother model, blank for jacobi solver', default='')
    parser.add_argument('--RFDetPath', help='pretrained RFNet path, blank for corner detection', default='')
    parser.add_argument('--PWCNetPath', help='pretrained pwcnet path, blank for KTL tracker', default='')
    parser.add_argument('--MotionProPath', help='pretrained motion propagation model path, blank for median', default='')
    parser.add_argument('--SingleHomo', help='whether use multi homograph to do motion estimation', action='store_true')
    parser.add_argument('--InputBasePath', help='path to input videos (cliped as frames)', default='')
    parser.add_argument('--OutputBasePath', help='path to save output stable videos', default='./')
    parser.add_argument('--OutNamePrefix', help='prefix name before the output video name', default='')
    parser.add_argument('--MaxLength', help='max number of frames can be dealt with one time', type=int, default=1200)
    parser.add_argument('--Repeat', help='max number of frames can be dealt with one time', type=int, default=50)
    return parser.parse_args()

def generateStable(model, base_path, outPath, outPrefix, max_length, args):

    image_base_path = base_path
    image_len = min(len([ele for ele in os.listdir(image_base_path) if ele[-4:] == '.jpg']), max_length)
    # read input video
    images = []
    rgbimages = []
    for i in range(image_len):
        image = cv2.imread(os.path.join(image_base_path, '{}.jpg'.format(i)), 0)
        image = image * (1. / 255.)
        image = cv2.resize(image, (cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT))
        images.append(image.reshape(1, 1, cfg.MODEL.HEIGHT, cfg.MODEL.WIDTH))

        image = cv2.imread(os.path.join(image_base_path, '{}.jpg'.format(i)))
        image = cv2.resize(image, (cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT))
        rgbimages.append(np.expand_dims(np.transpose(image, (2, 0, 1)), 0))

    x = np.concatenate(images, 1).astype(np.float32)
    x = torch.from_numpy(x).unsqueeze(0)

    x_RGB = np.concatenate(rgbimages, 0).astype(np.float32)
    x_RGB = torch.from_numpy(x_RGB).unsqueeze(0)

    with torch.no_grad():
        origin_motion, smoothPath = model.inference(x.cuda(), x_RGB.cuda(), repeat=args.Repeat)

    origin_motion = origin_motion.cpu().numpy()
    smoothPath = smoothPath.cpu().numpy()
    origin_motion = np.transpose(origin_motion[0], (2, 3, 1, 0))
    smoothPath = np.transpose(smoothPath[0], (2, 3, 1, 0))

    x_paths = origin_motion[:, :, :, 0]
    y_paths = origin_motion[:, :, :, 1]
    sx_paths = smoothPath[:, :, :, 0]
    sy_paths = smoothPath[:, :, :, 1]

    frame_rate = 25
    frame_width = cfg.MODEL.WIDTH
    frame_height = cfg.MODEL.HEIGHT
    
    print("generate stabilized video...")
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    out = cv2.VideoWriter(os.path.join(outPath, outPrefix + 'DUT_stable.mp4'), fourcc, frame_rate, (frame_width, frame_height))

    new_x_motion_meshes = sx_paths - x_paths
    new_y_motion_meshes = sy_paths - y_paths

    outImages = warpListImage(rgbimages, new_x_motion_meshes, new_y_motion_meshes)
    outImages = outImages.numpy().astype(np.uint8)
    outImages = [np.transpose(outImages[idx], (1, 2, 0)) for idx in range(outImages.shape[0])]
    for frame in tqdm(outImages):
        VERTICAL_BORDER = 60
        HORIZONTAL_BORDER = 80

        new_frame = frame[VERTICAL_BORDER:-VERTICAL_BORDER, HORIZONTAL_BORDER:-HORIZONTAL_BORDER]
        new_frame = cv2.resize(new_frame, (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_CUBIC)
        out.write(new_frame)
    out.release()

if __name__ == "__main__":

    args = parse_args()
    print(args)

    smootherPath = args.SmootherPath
    RFDetPath = args.RFDetPath
    PWCNetPath = args.PWCNetPath
    MotionProPath = args.MotionProPath
    homo = not args.SingleHomo
    inPath = args.InputBasePath
    outPath = args.OutputBasePath
    outPrefix = args.OutNamePrefix
    maxlength = args.MaxLength

    model = DUT(SmootherPath=smootherPath, RFDetPath=RFDetPath, PWCNetPath=PWCNetPath, MotionProPath=MotionProPath, homo=homo)
    model.cuda()
    model.eval()

    generateStable(model, inPath, outPath, outPrefix, maxlength, args)
```

In [10]:
# From deploy_samples.sh

#OutputBasePath='results/'
#SmootherPath='ckpt/smoother.pth'
#RFDetPath='ckpt/RFDet_640.pth.tar'
#PWCNetPath='ckpt/network-default.pytorch'
#MotionProPath='ckpt/MotionPro.pth'
#InputPath='images/'

#--SmootherPath=$SmootherPath \
#--RFDetPath=$RFDetPath \
#--PWCNetPath=$PWCNetPath \
#--MotionPro=$MotionProPath \
#--InputBasePath=$InputPath \
#--OutputBasePath=$OutputBasePath 
class DUTArguments:
    
    SmootherPath='DUTPretrained/ckpt/smoother.pth'
    RFDetPath='DUTPretrained/ckpt/RFDet_640.pth.tar'
    PWCNetPath='DUTPretrained/ckpt/network-default.pytorch'
    MotionProPath='DUTPretrained/ckpt/MotionPro.pth'
    SingleHomo=True
    OutputBasePath = 'results/'
    InputBasePath = 'DUTCode/images/'
    MaxLength = 1200
    OutNamePrefix = ''
    Repeat = 50
    
dut_args = DUTArguments()

In [11]:
def generateStable(model, base_path, outPath, outPrefix, max_length, args):

    image_base_path = base_path
    image_len = min(len([ele for ele in os.listdir(image_base_path) if ele[-4:] == '.jpg']), max_length)
    # read input video
    images = []
    rgbimages = []
    for i in range(image_len):
        image = cv2.imread(os.path.join(args.InputBasePath, '{:03d}.jpg'.format(i)), 0)
        image = image * (1. / 255.)
        image = cv2.resize(image, (cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT))
        images.append(image.reshape(1, 1, cfg.MODEL.HEIGHT, cfg.MODEL.WIDTH))

        image = cv2.imread(os.path.join(args.InputBasePath, '{:03d}.jpg'.format(i)))
        image = cv2.resize(image, (cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT))
        rgbimages.append(np.expand_dims(np.transpose(image, (2, 0, 1)), 0))

    x = np.concatenate(images, 1).astype(np.float32)
    x = torch.from_numpy(x).unsqueeze(0)

    x_RGB = np.concatenate(rgbimages, 0).astype(np.float32)
    x_RGB = torch.from_numpy(x_RGB).unsqueeze(0)

    with torch.no_grad():
        origin_motion, smoothPath = model.inference(x.cuda(), x_RGB.cuda(), repeat=args.Repeat)

    origin_motion = origin_motion.cpu().numpy()
    smoothPath = smoothPath.cpu().numpy()
    origin_motion = np.transpose(origin_motion[0], (2, 3, 1, 0))
    smoothPath = np.transpose(smoothPath[0], (2, 3, 1, 0))

    x_paths = origin_motion[:, :, :, 0]
    y_paths = origin_motion[:, :, :, 1]
    sx_paths = smoothPath[:, :, :, 0]
    sy_paths = smoothPath[:, :, :, 1]

    frame_rate = 25
    frame_width = cfg.MODEL.WIDTH
    frame_height = cfg.MODEL.HEIGHT

    print("generate stabilized video...")
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    out = cv2.VideoWriter(os.path.join(outPath, outPrefix + 'tmp_DUT_stable.mp4'), fourcc, frame_rate, (frame_width, frame_height))
    print(frame_width, frame_height)

    new_x_motion_meshes = sx_paths - x_paths
    new_y_motion_meshes = sy_paths - y_paths

    outImages = warpListImage(rgbimages, new_x_motion_meshes, new_y_motion_meshes)
    outImages = outImages.numpy().astype(np.uint8)
    outImages = [np.transpose(outImages[idx], (1, 2, 0)) for idx in range(outImages.shape[0])]
    for frame in tqdm(outImages):
        VERTICAL_BORDER = 60
        HORIZONTAL_BORDER = 80

        new_frame = frame[VERTICAL_BORDER:-VERTICAL_BORDER, HORIZONTAL_BORDER:-HORIZONTAL_BORDER]
        new_frame = cv2.resize(new_frame, (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_CUBIC)
        #print(new_frame.shape)
        out.write(new_frame)
    out.release()

In [12]:
from models.DUT.DUT import DUT
from utils.WarpUtils import warpListImage
from configs.config import cfg
import numpy as np
from tqdm import tqdm

model = DUT(SmootherPath=dut_args.SmootherPath, RFDetPath=dut_args.RFDetPath, PWCNetPath=dut_args.PWCNetPath, MotionProPath=dut_args.MotionProPath, homo=dut_args.SingleHomo)
model.cuda()
model.eval()

generateStable(model, dut_args.InputBasePath, dut_args.OutputBasePath, dut_args.OutNamePrefix, dut_args.MaxLength, dut_args)

# OpenCV doesn't ship with the H264 codec that you need to see the video in a notebook; due to licensing incompatabilities. For that reason 
# I encode as MP4V and then I post process with FFMPEG
video_name =os.path.join(dut_args.OutputBasePath,'DUT_stable.mp4')
tmp_video_name = os.path.join(dut_args.OutputBasePath,'tmp_DUT_stable.mp4')

if os.path.exists(video_name):
    os.remove(video_name)

!ffmpeg-4.4-amd64-static/ffmpeg -i {tmp_video_name} -c:v h264 {video_name}
    
if os.path.exists(tmp_video_name):
    os.remove(tmp_video_name)


-------------model configuration------------------------
using RFNet ...
using PWCNet for motion estimation...
using Motion Propagation model with multi homo...
using Deep Smoother Model...
------------------reload parameters-------------------------
reload Smoother params
successfully load 12 params for smoother
reload RFDet Model
successfully load 100 params for RFDet
reload PWCNet Model
reload MotionPropagation Model
successfully load 21 params for MotionPropagation
detect keypoints ....


  None, None, :, :
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /codebuild/output/src811146734/src/torch/csrc/utils/python_arg_parser.cpp:766.)
  kpts = im_topk.nonzero()  # (B*topk, 4)


estimate motion ....
motion propagation ....
generate stabilized video...
640 480


100%|██████████| 480/480 [00:02<00:00, 221.47it/s]


ffmpeg version 4.4-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    5

In [13]:
%%html

<table>
  <tr>
    <td>
        <center>Full Unstable (640x360)</center>
        <video width="640" height="360" controls autoplay>
            <source src="unstable.mp4" type="video/mp4">
        </video>
    </td>
  <tr>
    <td>
        <center>Stabalized (640x480)</center>
        <video width="640" height="480" controls autoplay>
          <source src="results/DUT_stable.mp4" type="video/mp4">
        </video>
    </td>
</table>

0
Full Unstable (640x360)
Stabalized (640x480)


## Running DIFRINT  Stabalizer

These are the interesting pieces:

```bash
modelPath=='DUTPretrained/ckpt/DIFNet2.pth'
InputBasePath='images/'
OutputBasePath='results/'
```

```bash
# Run the DIFRINT model
echo " Stabiling using the DIFRINT model "
echo "-----------------------------------"

python ./scripts/DIFRINTStabilizer.py \
    --modelPath=$DIFPath \
    --InputBasePath=$InputPath \
    --OutputBasePath=$OutputBasePath \
    --cuda 
```

So given this I will pull the code from DIFRINTStabilizer.py to explore it. I will modify it as needed to support running from the notebook so I can really get in and explore what is going on.

```python
import argparse
import os
import sys
from shutil import copyfile

import torch
import torch.nn as nn
from torch.autograd import Variable
parentddir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
sys.path.append(parentddir)
from models.DIFRINT.models import DIFNet2
from models.DIFRINT.pwcNet import PwcNet

from PIL import Image
import numpy as np
import math
import pdb
import time
import cv2

parser = argparse.ArgumentParser()
parser.add_argument('--modelPath', default='./trained_models/DIFNet2.pth')  # 2
parser.add_argument('--InputBasePath', default='')
parser.add_argument('--OutputBasePath', default='./')
parser.add_argument('--temp_file', default='./DIFRINT_TEMP/')
parser.add_argument('--n_iter', type=int, default=3,
                    help='number of stabilization interations')
parser.add_argument('--skip', type=int, default=2,
                    help='number of frame skips for interpolation')
parser.add_argument('--desiredWidth', type=int, default=640,
                    help='width of the input video')
parser.add_argument('--desiredHeight', type=int, default=480,
                    help='height of the input video')
parser.add_argument('--cuda', action='store_true', help='use GPU computation')
opt = parser.parse_args()
print(opt)

if torch.cuda.is_available() and not opt.cuda:
    print("WARNING: You have a CUDA device, so you should probably run with --cuda")

##########################################################

# Networks
DIFNet = DIFNet2()

# Place Network in cuda memory
if opt.cuda:
    DIFNet.cuda()

# DataParallel
DIFNet = nn.DataParallel(DIFNet)
DIFNet.load_state_dict(torch.load(opt.modelPath))
DIFNet.eval()

if not os.path.exists(opt.OutputBasePath):
    os.mkdir(opt.OutputBasePath)

if not os.path.exists(opt.temp_file):
    os.mkdir(opt.temp_file)

##########################################################

frameList = [ele for ele in os.listdir(opt.InputBasePath) if ele[-4:] == '.jpg']
frameList = sorted(frameList, key=lambda x: int(x[:-4]))

if os.path.exists(opt.temp_file):
    copyfile(opt.InputBasePath + frameList[0], opt.temp_file + frameList[0])
    copyfile(opt.InputBasePath + frameList[-1], opt.temp_file + frameList[-1])
else:
    os.makedirs(opt.temp_file)
    copyfile(opt.InputBasePath + frameList[0], opt.temp_file + frameList[0])
    copyfile(opt.InputBasePath + frameList[-1], opt.temp_file + frameList[-1])
# end

# Generate output sequence
for num_iter in range(opt.n_iter):
    idx = 1
    print('\nIter: ' + str(num_iter+1))
    for f in frameList[1:-1]:
        if f.endswith('.jpg'):
            if num_iter == 0:
                src = opt.InputBasePath
            else:
                src = opt.temp_file
            # end

            if idx < opt.skip or idx > (len(frameList)-1-opt.skip):
                skip = 1
            else:
                skip = opt.skip


            fr_g1 = torch.cuda.FloatTensor(np.array(Image.open(opt.temp_file + '%d.jpg' % (
                int(f[:-4])-skip)).resize((opt.desiredWidth, opt.desiredHeight))).transpose(2, 0, 1).astype(np.float32)[None, :, :, :] / 255.0)

            fr_g3 = torch.cuda.FloatTensor(np.array(Image.open(
                src + '%d.jpg' % (int(f[:-4])+skip)).resize((opt.desiredWidth, opt.desiredHeight))).transpose(2, 0, 1).astype(np.float32)[None, :, :, :] / 255.0)


            fr_o2 = torch.cuda.FloatTensor(np.array(Image.open(
                opt.InputBasePath + f).resize((opt.desiredWidth, opt.desiredHeight))).transpose(2, 0, 1).astype(np.float32)[None, :, :, :] / 255.0)

            with torch.no_grad():
                fhat, I_int = DIFNet(fr_g1, fr_g3, fr_o2,
                                     fr_g3, fr_g1, 0.5)  # Notice 0.5

            # Save image
            img = Image.fromarray(
                np.uint8(fhat.cpu().squeeze().permute(1, 2, 0)*255))
            img.save(opt.temp_file + f)

            sys.stdout.write('\rFrame: ' + str(idx) +
                             '/' + str(len(frameList)-2))
            sys.stdout.flush()

            idx += 1
        # end
    # end

frame_rate = 25
frame_width = opt.desiredWidth
frame_height = opt.desiredHeight

print("generate stabilized video...")
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
out = cv2.VideoWriter(opt.OutputBasePath + '/DIFRINT_stable.mp4', fourcc, frame_rate, (frame_width, frame_height))

for f in frameList:
    if f.endswith('.jpg'):
        img = cv2.imread(os.path.join(opt.temp_file, f))
        out.write(img)

out.release()

```

In [14]:
class DIFNetArguments:
    
    modelPath='DUTPretrained/ckpt/DIFNet2.pth'
    InputBasePath='DUTCode/images/'
    OutputBasePath='results/'
    temp_file='./DUTCode/DIFRINT_TEMP/'
    skip= 2
    desiredWidth= 640
    desiredHeight= 480
    n_iter= 3

difnet_args = DIFNetArguments()

In [15]:
from models.DIFRINT.models import DIFNet2
from models.DIFRINT.pwcNet import PwcNet

# Networks
DIFNet = DIFNet2()
DIFNet.cuda()

# DataParallel
DIFNet = nn.DataParallel(DIFNet)
DIFNet.load_state_dict(torch.load(difnet_args.modelPath))
DIFNet.eval()

if not os.path.exists(difnet_args.OutputBasePath):
    os.mkdir(difnet_args.OutputBasePath)

if not os.path.exists(difnet_args.temp_file):
    os.mkdir(difnet_args.temp_file)

In [16]:
from shutil import copyfile

frameList = [ele for ele in os.listdir(difnet_args.InputBasePath) if ele[-4:] == '.jpg']
frameList = sorted(frameList, key=lambda x: int(x[:-4]))

if os.path.exists(difnet_args.temp_file):
    copyfile(difnet_args.InputBasePath + frameList[0], difnet_args.temp_file + frameList[0])
    copyfile(difnet_args.InputBasePath + frameList[-1], difnet_args.temp_file + frameList[-1])
else:
    os.makedirs(difnet_args.temp_file)
    copyfile(difnet_args.InputBasePath + frameList[0], difnet_args.temp_file + frameList[0])
    copyfile(difnet_args.InputBasePath + frameList[-1], difnet_args.temp_file + frameList[-1])
# end

# Generate output sequence
for num_iter in range(difnet_args.n_iter):
    idx = 1
    print('\nIter: ' + str(num_iter+1))
    for f in frameList[1:-1]:
        if f.endswith('.jpg'):
            if num_iter == 0:
                src = difnet_args.InputBasePath
            else:
                src = difnet_args.temp_file
            # end

            if idx < difnet_args.skip or idx > (len(frameList)-1-difnet_args.skip):
                skip = 1
            else:
                skip = difnet_args.skip


            fr_g1 = torch.cuda.FloatTensor(np.array(Image.open(difnet_args.temp_file + '%03d.jpg' % (
                int(f[:-4])-skip)).resize((difnet_args.desiredWidth, difnet_args.desiredHeight))).transpose(2, 0, 1).astype(np.float32)[None, :, :, :] / 255.0)

            fr_g3 = torch.cuda.FloatTensor(np.array(Image.open(
                src + '%03d.jpg' % (int(f[:-4])+skip)).resize((difnet_args.desiredWidth, difnet_args.desiredHeight))).transpose(2, 0, 1).astype(np.float32)[None, :, :, :] / 255.0)


            fr_o2 = torch.cuda.FloatTensor(np.array(Image.open(
                difnet_args.InputBasePath + f).resize((difnet_args.desiredWidth, difnet_args.desiredHeight))).transpose(2, 0, 1).astype(np.float32)[None, :, :, :] / 255.0)

            with torch.no_grad():
                fhat, I_int = DIFNet(fr_g1, fr_g3, fr_o2,
                                     fr_g3, fr_g1, 0.5)  # Notice 0.5

            # Save image
            img = Image.fromarray(
                np.uint8(fhat.cpu().squeeze().permute(1, 2, 0)*255))
            img.save(difnet_args.temp_file + f)

            sys.stdout.write('\rFrame: ' + str(idx) +
                             '/' + str(len(frameList)-2))
            sys.stdout.flush()

            idx += 1
        # end
    # end

frame_rate = 25
frame_width = difnet_args.desiredWidth
frame_height = difnet_args.desiredHeight

print()
print("generate stabilized video...")
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
out = cv2.VideoWriter(difnet_args.OutputBasePath + '/tmp_DUT_stable.mp4', fourcc, frame_rate, (frame_width, frame_height))

for f in frameList:
    if f.endswith('.jpg'):
        img = cv2.imread(os.path.join(difnet_args.temp_file, f))
        out.write(img)

out.release()

# OpenCV doesn't ship with the H264 codec that you need to see the video in a notebook; due to licensing incompatabilities. For that reason 
# I encode as MP4V and then I post process with FFMPEG
video_name =os.path.join(difnet_args.OutputBasePath,'DIFRINT_stable.mp4')
tmp_video_name = os.path.join(difnet_args.OutputBasePath,'tmp_DUT_stable.mp4')

if os.path.exists(video_name):
    os.remove(video_name)

!ffmpeg-4.4-amd64-static/ffmpeg -i {tmp_video_name} -c:v h264 {video_name}
    
if os.path.exists(tmp_video_name):
    os.remove(tmp_video_name)


Iter: 1




Frame: 478/478
Iter: 2
Frame: 478/478
Iter: 3
Frame: 478/478
generate stabilized video...
ffmpeg version 4.4-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavuti

In [17]:
%%html

<table>
  <tr>
    <td>
        <center>Full Unstable (640x360)</center>
        <video width="640" height="360" controls autoplay>
            <source src="unstable.mp4" type="video/mp4">
        </video>
    </td>
  <tr>
    <td>
        <center>Stabalized (640x480)</center>
        <video width="640" height="480" controls autoplay>
          <source src="results/DIFRINT_stable.mp4" type="video/mp4">
        </video>
    </td>
</table>

0
Full Unstable (640x360)
Stabalized (640x480)


## Side by Side Qualitative Stabalized Analysis Comparison

The follow cell generates a side by side comparison to get a qualitative feel for the output. This notebook could be adjusted to send different video types through (hand tracked star feild imaging).

In [18]:
%%html

<table>
  <tr>
    <td>
        <center>Full Unstable (640x360)</center>
        <video width="320" height="180" controls autoplay>
            <source src="unstable.mp4" type="video/mp4">
        </video>
    </td>
    <td>
        <center>Clipped Unstabalized (342x206)</center>
        <video width="342" height="206" controls autoplay>
          <source src="clipped_unstable.mp4" type="video/mp4">
        </video>
    </td>
  </tr>
  <tr>
    <td>
        <center>StabNet Stalization (342x206)</center>
        <video width="342" height="206" controls autoplay>
          <source src="results/StabNet_stable.mp4" type="video/mp4">
        </video>
    </td>
    <td>
        <center>DIFRINT Stabalization (640x480)</center>
        <video width="320" height="240" controls autoplay>
          <source src="results/DIFRINT_stable.mp4" type="video/mp4">
        </video>
    </td>
    <td>
        <center>DUT Stabalization (640x480)</center>
        <video width="320" height="240" controls autoplay>
          <source src="results/DUT_stable.mp4" type="video/mp4">
        </video>
    </td>
  </tr>
</table>


0,1,2
Full Unstable (640x360),Clipped Unstabalized (342x206),
StabNet Stalization (342x206),DIFRINT Stabalization (640x480),DUT Stabalization (640x480)


From the side by side comparison you can see that StabNet and DIFRINT both contain artifacts or distortions to the image. DUT stabilization appears to stabalize (watch the blue line towards the end of the video) without distortion.

All have a degree of loss of the original image data.

These are the Quantitaive numbers from the DUT Stabilization paper (Dec 2020). The quantitative analysis is noted but reproduction of the analysis is seen as being unnecessary for a model to model comparison. The DUT model out performs the two ML Models (DIFRINT and StabNet). It also outperforms the algortimic approaches of Meshflow and SubSpace. In addition DUT is tuneable by the number of smoothing iterations to deal with video that deal with differences in homography of videos (number of planes of motion).   

### Experiment Settings
Unstable videos from DeepStab were used for training. Five categories of unstable videos from were used as the test set. The metrics introduced in were used for quantitative evaluation, including cropping ratio, distortion, and stability. Cropping ratio measures the ratio of remaining area and distortion measures the distortion level after stabilization. Stability measures how stable a video is by frequency domain analysis. All the metrics are in the range of [0, 1]. A larger value denotes a better performance. More implementation details, user study results, and robustness evaluation can be found in the supplementary material.

The DUT paper does reference the follow method for collecting benchmarks: "HBundled Camera Paths for Video Stabilization" (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/Stabilization_SIGGRAPH13.pdf)

### Quantitative Results
#### <center> Stability scores of different methods </center>

|          | Regular | Parallax | Running | QuickRot |   Crowd  |   Avg.   |
|----------|:-------:|:--------:|:-------:|:--------:|:--------:|:--------:|
| Meshflow | 0.843 | 0.793 | 0.839 | 0.801 | 0.774 | 0.810
| SubSpace | 0.837 | 0.760 | 0.829 |   \   | 0.730 | 0.789*
| DIFRINT | 0.838 | 0.808 | 0.822 | 0.835 | 0.791 | 0.819
| StabNet | 0.838 | 0.769 | 0.818 | 0.785 | 0.741 | 0.790
| DUT | 0.843 | 0.813 | 0.841 | 0.877 | 0.792 | 0.833

<center> ∗ The average score is not accurate since SubSpace fails to stabilize some of the videos in the category of Quick Rotation. </center>

#### <center> Distortion scores of different methods </center>

|          | Regular | Parallax | Running | QuickRot |   Crowd  |   Avg.   |
|----------|:-------:|:--------:|:-------:|:--------:|:--------:|:--------:|
| Meshflow | 0.898 | 0.716 | 0.764 | 0.763 | 0.756 | 0.779
| SubSpace | 0.973 | 0.855 | 0.939 | \ | 0.831 | 0.900
| DIFRINT | 0.934 | 0.921 | 0.873 | 0.633 | 0.905 | 0.853
| StabNet | 0.702 | 0.573 | 0.753 | 0.574 | 0.759 | 0.672
| DUT | 0.982 | 0.949 | 0.927 | 0.935 | 0.955 | 0.949 

#### <center> Cropping ratios of different methods </center>

|          | Regular | Parallax | Running | QuickRot |   Crowd  |   Avg.   |
|----------|:-------:|:--------:|:-------:|:--------:|:--------:|:--------:|
| Meshflow | 0.686 | 0.540 | 0.584 | 0.376 | 0.552 | 0.548
| SubSpace | 0.712 | 0.617 | 0.686 | \ | 0.543 | 0.639
| DIFRINT | 0.922 | 0.903 | 0.869 | 0.732 | 0.882 | 0.862
| StabNet | 0.537 | 0.503 | 0.512 | 0.418 | 0.497 | 0.493
| DUT | 0.736 | 0.709 | 0.690 | 0.673 | 0.710 | 0.704