Inpainting video frames via a Temporally-Aware Interpolation network.
Switch branches/tags
Nothing to show
Clone or download
Latest commit d23f257 Sep 26, 2018
Permalink
Failed to load latest commit information.
bashes initial commit Jun 23, 2018
src initial commit Jun 23, 2018
videolist initial commit Jun 23, 2018
.gitignore initial commit Jun 23, 2018
README.md Update README.md Sep 26, 2018
requirements.txt initial commit Jun 23, 2018

README.md

A Temporally-Aware Interpolation Network for Video Frame Inpainting

Ximeng Sun, Ryan Szeto and Jason J. Corso, University of Michigan

Using this code

If you use this work in your research, please cite the following paper:

@inproceedings{sun2018temporally,
    author = "Sun, Ximeng and Szeto, Ryan and Corso, Jason J.",
    title = "A Temporally-Aware Interpolation Network for Video Frame Inpainting",
    booktitle = "Asian Conference on Computer Vision",
    month = "Dec",
    year = "2018"
}

Introduction

Setup the environment

In this section, we provide an instruction to help you set up the experiment environment. This code was tested for Python 2.7 on Ubuntu 16.04, and is based on PyTorch 0.3.1.

First, please make sure you have installed cuda(8.0.61), cudnn(8.0-v7.0.5) opencv(2.4.13), and ffmpeg(2.2.3). Note that different versions might not work.

Then, to avoid conflicting with your current environment, we suggest to use virtualenv. For using virtualenv, please refer https://virtualenv.pypa.io/en/stable/. Below is an example of initializing and activating a virtual environment:

virualenv .env
source .env/bin/activate

After activating a virtual environment, use the following command to install all dependencies you need to run this model.

pip install -r requirements.txt

Compiling the separable convolution module

The separable convolution module uses CUDA code that must be compiled before it can be used. To compile, start by activating the virtual environment described above. Then, set the CUDA_HOME environment variable as the path to your CUDA installation. For example, if your CUDA installation is at /usr/local/cuda-8.0, run this command:

export CUDA_HOME=/usr/local/cuda-8.0

Then, identify a virtual architecture that corresponds to your GPU from this site (e.g. compute_52 for Maxwell GPUs). Finally, to compile the CUDA code, run the install script with the virtual architecture as an argument, e.g.:

bash bashes/misc/install.bash compute_52

Download checkpoints, datasets and sample results

Download checkpoints

The checkpoints for the models used in our paper are available online. To set them up, download and extract them to the project root directory:

wget http://web.eecs.umich.edu/~szetor/media/tai-checkpoints.zip
unzip tai-checkpoints.zip
rm tai-checkpoints.zip

Download datasets

In our paper, we use the KTH Actions, UCF-101, and HMDB-51 datasets to evaluate our method. Use the commands below to download these datasets. The sample commands below download the datasets into a new datasets directory, but you can replace this argument with another location if needed. Note: Each command may take about 1.5 hours on a 1 Gbit/s connection, but they can be run in parallel.

bash bashes/download/download_KTH.bash datasets
bash bashes/download/download_UCF.bash datasets
bash bashes/download/download_HMDB.bash datasets

Evaluate models and reproduce results in paper

Evaluate single model

In this section, we describe how to generate inpainted frames and quantitative comparisons to the ground-truth frames. Before you start, make sure you have extracted the checkpoints as described in the "Download checkpoints" section.

The scripts to run each model described in our paper are spread throughout the bashes folder, but generally are organized as bashes/evaluation/<train_dataset>_<model_name>/test_<test_dataset>.bash. Each script takes the following arguments:

  1. The number of preceding/following frames
  2. The number of middle frames to generate
  3. The path to the test dataset

For example, the command below uses our KTH-trained TAI model to generate predictions on the KTH test set. It takes in five preceding frames and five following frames to generate ten middle frames:

bash bashes/evaluation/kth_TAI/test_KTH.bash 5 10 datasets/KTH

Each script takes about four hours to finish. When the evaluation is finished, the quantitative results will be under results/quantitative/<test_dataset>/<exp_test_name> and the qualitative results will be under results/images/<test_dataset>/<exp_test_name>.

Common arguments for test

In each test bash file, we define a set of arguments which are passed to test.py. To help understand what common arguments are, we take bashes/evaluation/kth_TAI/test_KTH.bash for example.

python src/test.py \
    --name=kth_TAI_exp1 \
    --K=${1} \
    --F=${1} \
    --T=${2} \
    --data=KTH --dataroot=${3} --textroot=videolist/KTH/ --c_dim=1 --pick_mode=Slide \
    --model=kernelcomb --comb_type=avg \
    --image_size 128 \
    --batch_size=1 \
    --num_block=5 \
    --layers=3 \
    --kf_dim=32 \
    --rc_loc=4 \
    --enable_res \
    --output_both_directions \
    --shallow

This is what each argument in the above example specifies:

  • name: the name of experiments
  • K: the number of preceding frames
  • F: the number of following frames
  • T: the number of middle frames
  • data: the name of dataset
  • dataroot: the path of the dataset
  • textroot: the path of files defining train/test/val split
  • c_dim: the number of color channels (1 for grayscale)
  • pick_mode: how to extract test cases from the specified dataset. Use "Slide" to generate test clips via a sliding window on every test video, or "First" to evaluate on the first K + F + T frames of each test video.
  • model/comb_type/rc_loc/shallow: Together, these specify how to make intermediate predictions and stitch them together. In this case, kernelcomb/avg/4/True yields our TAI network.
  • image_size: the height and width to scale all frames to
  • batch_size: number of samples in a mini-batch
  • num_block: the number of VGG blocks in the encoder/decoder of the blend network
  • layers: the number of convolution layers in a VGG block
  • kf_dim: the number of kernels in the first VGG block of the blend network
  • enable_res: enable the residual network in the video prediction part
  • output_both_directions: output the intermediate predictions

For all arguments used in test, you can refer src/options/base_options.py and src/options/test_options.py.

Reproduce all experiments in paper

We provide a bash file to automatically evaluate all the models in our paper. Its arguments are the paths to the KTH, UCF-101, and HMDB-51 datasets. WARNING: THIS SCRIPT TAKES OVER 80 HOURS TO COMPLETE. WE RECOMMEND SCHEDULING THE JOBS CONTAINED IN all_evals.bash ON A COMPUTE CLUSTER.

bash bashes/evaluation/all_evals.bash datasets/KTH datasets/UCF-101 datasets/HMDB-51

Reproduce quantitative and qualitative results in paper

After running all evaluations, use bashes/evaluation/results_reproduction.bash to generate the figures shown in our paper.

bash bashes/evaluation/results_reproduction.bash

After running this script, the quantitative plots will be located under paper_plots and the qualitative plots will be located under paper_imgs.

Note that if you run this script multiple times for the same model on the same dataset with the same numbers of preceding/following and middle frames, it will overwrite any existing figures under paper_plots and paper_imgs. Alternatively, you can run individual lines in bashes/results_reproduction.bash to generate any specific plot you want.

Train our model and baselines

Train single model

To train each model, you can use the training file under bashes/evaluation/<exp name>/train.bash by providing the location of the dataset. For example, if we want to train the TAI network on KTH Actions, run

bash bashes/evaluation/kth_TAI/train.bash datasets/KTH

If the training needs to be resumed, you can use the training file under bashes/evaluation/<exp name>/train.bash by passing an extra argument. For example, if we want to resume the training of TAI network on KTH Actions, run

bash bashes/evaluation/kth_TAI/train.bash datasets/KTH resume

Common arguments for training

In each training bash file, we define a set of arguments which are passed to train.py. To help understand what common arguments are, we take bashes/evaluation/kth_TAI/train.bash for example.

python src/train.py  \
    --name=kth_TAI_exp1 \
    --K=5 \
    --F=5 \
    --T=5 \
    --max_iter=200000 \
    --data=KTH --dataroot=${1} --textroot=videolist/KTH/ --c_dim=1 \
    --model=kernelcomb --comb_loss=ToTarget --comb_type=avg \
    --image_size 128 \
    --final_sup_update=1 \
    --batch_size=4 \
    --num_block=5 \
    --layers=3 \
    --kf_dim=32 \
    --D_G_switch=alternative \
    --beta=0.002 \
    --rc_loc=4 \
    --Ip=3 \
    --sn \
    --enable_res \
    --shallow \

This is what each argument in the above example specifies:

  • name: the name of experiments
  • K: the number of preceding frames
  • F: the number of following frames
  • T: the number of middle frames
  • max_iter: the maximum number of iterations to train for
  • data: the name of dataset
  • dataroot: the path of the dataset
  • textroot: the path of files defining train/test/val split
  • c_dim: the number of color channels (1 for grayscale)
  • model/comb_type/rc_loc/shallow: Together, these specify how to make intermediate predictions and stitch them together. In this case, kernelcomb/avg/4/True yields our TAI network.
  • comb_loss: how to penalize the intermediate predictions. "ToTarget" specifies that they should be penalized based on their similarity to the ground truth target, and "Closer" specifies that they should be penalized based on their similarity to each other.
  • image_size: the height and width to scale all frames to
  • final_sup_update: when to introduce supervision on the final prediction
  • batch_size: number of samples in a mini-batch
  • num_block: the number of VGG blocks in the encoder/decoder of the blend network
  • layers: the number of convolution layers in a VGG block
  • kf_dim: the number of kernels in the first VGG block of the blend network
  • D_G_switch: whether to update the generator and discriminator in an alternating fashion
  • beta: the weight of the GAN loss when training the generator
  • Ip: the number of power iterations in every update
  • sn: whether to use spectral normalization in the discriminator
  • enable_res: enable the residual network in the video prediction part

For all arguments used in training, you can refer src/options/base_options.py and src/options/train_options.py.

Visualize the training process

Losses, visualizations for the current batch and the validation are saved in the tensorboard file under tb/ for each experiment. In order to access these intermediate results, you can activate a tensorboard instance. Take kth_TAI_exp1 experiment for example, run

tensorboard --logdir kth_TAI:./tb/kth_TAI_exp1/