# Image Stabilization for Hand-tracking Telescope Video Prototype

This notebook will explore a prototype for "Image Stabilization for Hand-tracking Telescope Video". For SageMaker you will want to launch the instance as ml.g4dn.xlarge (4 vCPU + 16 GiB + 1 GPU) Python 3 (PyTorch 1.6 Python 3.6 GPU Optimized)

The prototype will be built on the Deep Unsupervised Trajectory-based stabilization framework (DUT).

The DUT Stabalization model was observed to be qualitatively better on stabalization.

The DUT pipeline is an ensemble of 3 models: a point cloud generator model, a motion analysis model, and a smoothing model. That being said what we probably want to do is pick one of the models and focus on training it. 

In the process of building the prototype I will identify what appears to be the weakest model based on the use of astronomical data and focus in on it. Training of that model will use the non-astronomical data. This is an attempt to solve the cold start problem through the use of "transfer learning". Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. The goal is then to accumulate the target data to continue to improve the model.

Requirements:

* Upload a video (mp4) created by hand tracking a celestial body
* Store the Uploaded unstable video for future use.
* Run it through the video stabilization pipeline 
* Present the stabalized video side-by-side with the uploaded video
* Allow the user to download the stabalized video

## Setup
I have a setup script that makes it easy to run anytime a sagemaker notebooke for this project is run.

In [1]:
%%time
import os

FFMPEG_TAR = 'ffmpeg-release-amd64-static.tar.xz'
if os.path.exists(FFMPEG_TAR):
    os.remove(FFMPEG_TAR)
    
!wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
!tar -xf ffmpeg-release-amd64-static.tar.xz
!ffmpeg-4.4-amd64-static/ffmpeg -version

if os.path.exists(FFMPEG_TAR):
    os.remove(FFMPEG_TAR)

!/opt/conda/bin/python -m pip install --upgrade pip
!pip install scikit-image
!pip install easydict
!pip install pypng

!conda update -n base -y -c defaults conda
!conda install -y -c conda-forge cupy

wget: /opt/conda/lib/libuuid.so.1: no version information available (required by wget)
--2021-11-05 19:16:44--  https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
Resolving johnvansickle.com (johnvansickle.com)... 107.180.57.212
Connecting to johnvansickle.com (johnvansickle.com)|107.180.57.212|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39893092 (38M) [application/x-xz]
Saving to: ‘ffmpeg-release-amd64-static.tar.xz’


2021-11-05 19:16:47 (15.4 MB/s) - ‘ffmpeg-release-amd64-static.tar.xz’ saved [39893092/39893092]

tar: ffmpeg-4.4.1-amd64-static/GPLv3.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: ffmpeg-4.4.1-amd64-static/manpages/ffmpeg-all.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: ffmpeg-4.4.1-amd64-static/manpages/ffmpeg-scaler.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: ffmpeg-4.4.1-amd64-static/manpages/ffmpeg-resam

## Using the DUT Model for Prototype.

In [2]:
import torch
import torch.nn as nn
import argparse
from PIL import Image
import cv2
import os
import traceback
import math
import time
import sys

project_location = os.getcwd()
sys.path.append(os.path.join(project_location,'DUTCode'))
project_location

'/root/hand-tracking-stabilization'

### Load the Pre-trained Models from S3

I pulled the pre-trained models to S3 to protect them from disappearing. At the time the pre-trained models were at [https://drive.google.com/drive/folders/15T8Wwf1OL99AKDGTgECzwubwTqbkmGn6](https://drive.google.com/drive/folders/15T8Wwf1OL99AKDGTgECzwubwTqbkmGn6)

In [3]:
%%time

import os
import boto3

project_dir = os.getcwd()

data_dir = 'DUTPretrained'
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)
os.chdir(data_dir)

s3 = boto3.client('s3')
s3.download_file('madat-machine-learning-data', 'Mark/capstone-project/pre-trained-models/ckpt-20210817T154228Z-001.zip', 'ckpt-20210817T154228Z-001.zip')

from zipfile import ZipFile

with ZipFile('ckpt-20210817T154228Z-001.zip', 'r') as zipObj:
    zipObj.extractall()
    
os.chdir('..')
os.getcwd()

CPU times: user 2.88 s, sys: 1.13 s, total: 4.01 s
Wall time: 7.08 s


'/root/hand-tracking-stabilization'

### Preprosessing of Input Video

Initial assumptions:
* An unstable video will be dropped into the input folder as './input/unstable.mp4'.
* The video will be an mp4 video, encoded as H264.

This preprcessing step will split the video into its individual frames. The frames will be jpg images 640x360 in size. This is required for the current model. The frames will be labeled from '0000.jpg' to up to '9999.jpg'. Each frame be stored in '/input/frames'. 

Framerate should not really make a difference. But the current output stablized video will be a 25 fps.

In [28]:
# OpenCV doesn't ship with the H264 codec that you need to see the video in a notebook; due to licensing incompatabilities. For that reason 
# I encode as MP4V and then I post process with FFMPEG
unstable_video = './input/unstable.mp4'
#input_filename = './input/unstable-Copy1.mp4'
input_filename = 'star-small-trans.mp4'
if os.path.exists(unstable_video):
    os.remove(unstable_video)
    
!ffmpeg-4.4.1-amd64-static/ffmpeg -i {input_filename} -c:v h264 -r 25 {unstable_video}
    
dir = 'input/frames'
for f in os.listdir(dir):
    os.remove(os.path.join(dir, f))

vidcap = cv2.VideoCapture(unstable_video)
count = 0
while vidcap.isOpened():
    success,image = vidcap.read()
    if success:
        cv2.imwrite("input/frames/%04d.jpg" % count, image)     # save frame as JPEG file
        count += 1
    else:
        break
vidcap.release()
    

ffmpeg version 4.4.1-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat   

## Prototype Built on DUT Stabalizer




In [29]:
# From deploy_samples.sh

#OutputBasePath='results/'
#SmootherPath='ckpt/smoother.pth'
#RFDetPath='ckpt/RFDet_640.pth.tar'
#PWCNetPath='ckpt/network-default.pytorch'
#MotionProPath='ckpt/MotionPro.pth'
#InputPath='images/'

#--SmootherPath=$SmootherPath \
#--RFDetPath=$RFDetPath \
#--PWCNetPath=$PWCNetPath \
#--MotionPro=$MotionProPath \
#--InputBasePath=$InputPath \
#--OutputBasePath=$OutputBasePath 
class DUTArguments:
    
    SmootherPath='DUTPretrained/ckpt/smoother.pth'
    RFDetPath='DUTPretrained/ckpt/RFDet_640.pth.tar'
    PWCNetPath='DUTPretrained/ckpt/network-default.pytorch'
    MotionProPath='DUTPretrained/ckpt/MotionPro.pth'
    SingleHomo=True
    OutputBasePath = 'results/'
    InputBasePath = 'input/frames/'
    MaxLength = 500
    OutNamePrefix = ''
    Repeat = 50
    
dut_args = DUTArguments()

In [30]:
def generateStable(model, base_path, outPath, outPrefix, max_length, args):

    print(cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT)
    
    image_base_path = base_path
    image_len = min(len([ele for ele in os.listdir(image_base_path) if ele[-4:] == '.jpg']), max_length)
    # read input video
    images = []
    rgbimages = []
    print('Number of Images:', image_len)
    for i in range(image_len):
        image = cv2.imread(os.path.join(args.InputBasePath, '{:04d}.jpg'.format(i)), 0)
        #print(image.shape)
        image = cv2.resize(image, (360,640))
        #print(image.shape)
        
        image = image * (1. / 255.)
        image = cv2.resize(image, (cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT))
        images.append(image.reshape(1, 1, cfg.MODEL.HEIGHT, cfg.MODEL.WIDTH))

        image = cv2.imread(os.path.join(args.InputBasePath, '{:04d}.jpg'.format(i)))
        image = cv2.resize(image, (360,640))
        image = cv2.resize(image, (cfg.MODEL.WIDTH, cfg.MODEL.HEIGHT))
        rgbimages.append(np.expand_dims(np.transpose(image, (2, 0, 1)), 0))

    print("1")
    print(np.size(images))
    x = np.concatenate(images, 1).astype(np.float32)
    #print("1a ", x)
    x = torch.from_numpy(x).unsqueeze(0)

    print("2") 
    x_RGB = np.concatenate(rgbimages, 0).astype(np.float32)
    x_RGB = torch.from_numpy(x_RGB).unsqueeze(0)

    print("3")     
    with torch.no_grad():
        origin_motion, smoothPath = model.inference(x.cuda(), x_RGB.cuda(), repeat=args.Repeat)

    origin_motion = origin_motion.cpu().numpy()
    smoothPath = smoothPath.cpu().numpy()
    origin_motion = np.transpose(origin_motion[0], (2, 3, 1, 0))
    smoothPath = np.transpose(smoothPath[0], (2, 3, 1, 0))

    x_paths = origin_motion[:, :, :, 0]
    y_paths = origin_motion[:, :, :, 1]
    sx_paths = smoothPath[:, :, :, 0]
    sy_paths = smoothPath[:, :, :, 1]

    frame_rate = 25
    frame_width = cfg.MODEL.WIDTH
    frame_height = cfg.MODEL.HEIGHT

    print("generate stabilized video...")
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    out = cv2.VideoWriter(os.path.join(outPath, outPrefix + 'tmp_DUT_stable.mp4'), fourcc, frame_rate, (frame_width, frame_height))
    print(frame_width, frame_height)

    new_x_motion_meshes = sx_paths - x_paths
    new_y_motion_meshes = sy_paths - y_paths

    outImages = warpListImage(rgbimages, new_x_motion_meshes, new_y_motion_meshes)
    outImages = outImages.numpy().astype(np.uint8)
    outImages = [np.transpose(outImages[idx], (1, 2, 0)) for idx in range(outImages.shape[0])]
    for frame in tqdm(outImages):
        VERTICAL_BORDER = 60
        HORIZONTAL_BORDER = 80

        new_frame = frame[VERTICAL_BORDER:-VERTICAL_BORDER, HORIZONTAL_BORDER:-HORIZONTAL_BORDER]
        new_frame = cv2.resize(new_frame, (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_CUBIC)
        #print(new_frame.shape)
        out.write(new_frame)
    out.release()

In [31]:
%%time

from models.DUT.DUT import DUT
from utils.WarpUtils import warpListImage
from configs.config import cfg
import numpy as np
from tqdm import tqdm
import cupy

model = DUT(SmootherPath=dut_args.SmootherPath, RFDetPath=dut_args.RFDetPath, PWCNetPath=dut_args.PWCNetPath, MotionProPath=dut_args.MotionProPath, homo=dut_args.SingleHomo)
model.cuda()
model.eval()

generateStable(model, dut_args.InputBasePath, dut_args.OutputBasePath, dut_args.OutNamePrefix, dut_args.MaxLength, dut_args)

# OpenCV doesn't ship with the H264 codec that you need to see the video in a notebook; due to licensing incompatabilities. For that reason 
# I encode as MP4V and then I post process with FFMPEG
video_name =os.path.join(dut_args.OutputBasePath,'DUT_stable.mp4')
tmp_video_name = os.path.join(dut_args.OutputBasePath,'tmp_DUT_stable.mp4')

if os.path.exists(video_name):
    os.remove(video_name)

!ffmpeg-4.4.1-amd64-static/ffmpeg -i {tmp_video_name} -c:v h264 -r 25 {video_name}
    
if os.path.exists(tmp_video_name):
    os.remove(tmp_video_name)


-------------model configuration------------------------
using RFNet ...
using PWCNet for motion estimation...
using Motion Propagation model with multi homo...
using Deep Smoother Model...
------------------reload parameters-------------------------
reload Smoother params
successfully load 12 params for smoother
reload RFDet Model
successfully load 100 params for RFDet
reload PWCNet Model
reload MotionPropagation Model
successfully load 21 params for MotionPropagation
640 480
Number of Images: 313
1
96153600
2
3
detect keypoints ....


  None, None, :, :


estimate motion ....
motion propagation ....


  * old_points_numpy[:, 1] + Homo[0, 2]) / dominator,
  * old_points_numpy[:, 1] + Homo[1, 2]) / dominator


generate stabilized video...
640 480


100%|██████████| 313/313 [00:00<00:00, 314.34it/s]


ffmpeg version 4.4.1-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat   

In [34]:
%%html

<table>
  <tr>
    <td>
        <center>Full Unstable</center>
        <video width="640" height="480" controls autoplay>
            <source src="input/unstable.mp4" type="video/mp4">
        </video>
    </td>
    <td>
        <center>Stabalized (640x480)</center>
        <video controls autoplay>
          <source src="results/DUT_stable.mp4" type="video/mp4">
        </video>
    </td>
</table>

0,1
Full Unstable,Stabalized (640x480)
