PWC-Net-large model training (with cyclical learning rate schedule)
=======================================================

In this notebook, we:
- Use a PWC-Net-large model (with dense and residual connections), 6 level pyramid, uspample level 2 by 4 as the final flow prediction
- Train the model on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` dataset using a Cyclic<sub>short</sub> schedule of our own
- The Cyclic<sub>short</sub> schedule oscillates between `5e-04` and `1e-05` for 200,000 steps

Below, look for `TODO` references and customize this notebook based on your own machine setup.

## Reference

[2018a]<a name="2018a"></a> Sun et al. 2018. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. [[arXiv]](https://arxiv.org/abs/1709.02371) [[web]](http://research.nvidia.com/publication/2018-02_PWC-Net%3A-CNNs-for) [[PyTorch (Official)]](https://github.com/NVlabs/PWC-Net/tree/master/PyTorch) [[Caffe (Official)]](https://github.com/NVlabs/PWC-Net/tree/master/Caffe)

In [1]:
"""
pwcnet_train.ipynb

PWC-Net model training.

Written by Phil Ferriere

Licensed under the MIT License (see LICENSE for details)

Tensorboard:
    [win] tensorboard --logdir=E:\\repos\\tf-optflow\\tfoptflow\\pwcnet-lg-6-2-cyclic-chairsthingsmix
    [ubu] tensorboard --logdir=/media/EDrive/repos/tf-optflow/tfoptflow/pwcnet-lg-6-2-cyclic-chairsthingsmix
"""
from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy

from dataset_base import _DEFAULT_DS_TRAIN_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_TRAIN_OPTIONS

## TODO: Set this first!

In [2]:
# TODO: You MUST set dataset_root to the correct path on your machine!
if sys.platform.startswith("win"):
    _DATASET_ROOT = 'E:/datasets/'
else:
    _DATASET_ROOT = '/media/EDrive/datasets/'
_FLYINGCHAIRS_ROOT = _DATASET_ROOT + 'FlyingChairs_release'
_FLYINGTHINGS3DHALFRES_ROOT = _DATASET_ROOT + 'FlyingThings3D_HalfRes'
    
# TODO: You MUST adjust the settings below based on the number of GPU(s) used for training
# Set controller device and devices
# A one-gpu setup would be something like controller='/device:GPU:0' and gpu_devices=['/device:GPU:0']
# Here, we use a dual-GPU setup, as shown below
gpu_devices = ['/device:GPU:0', '/device:GPU:1']
controller = '/device:CPU:0'

# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 8

# Pre-train on `FlyingChairs+FlyingThings3DHalfRes` mix

## Load the dataset

In [3]:
# TODO: You MUST set the batch size based on the capabilities of your GPU(s) 
#  Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TRAIN_OPTIONS)
ds_opts['in_memory'] = False                          # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy'                         # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (256, 448)                  # Crop to a smaller input size
ds1 = FlyingChairsDataset(mode='train_with_val', ds_root=_FLYINGCHAIRS_ROOT, options=ds_opts)
ds_opts['type'] = 'into_future'
ds2 = FlyingThings3DHalfResDataset(mode='train_with_val', ds_root=_FLYINGTHINGS3DHALFRES_ROOT, options=ds_opts)
ds = MixedDataset(mode='train_with_val', datasets=[ds1, ds2], options=ds_opts)

In [4]:
# Display dataset configuration
ds.print_config()


Dataset Configuration:
  verbose              False
  in_memory            False
  crop_preproc         (256, 448)
  scale_preproc        None
  input_channels       3
  tb_test_imgs         False
  random_seed          1969
  val_split            0.03
  aug_type             heavy
  aug_labels           True
  fliplr               0.5
  flipud               0.5
  translate            (0.5, 0.05)
  scale                (0.5, 0.05)
  batch_size           16
  type                 into_future
  mode                 train_with_val
  train size           41282
  val size             1230


## Configure the training

In [5]:
# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_TRAIN_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_dir'] = './pwcnet-lg-6-2-cyclic-chairsthingsmix/'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller

# Use the PWC-Net-large model in quarter-resolution mode
nn_opts['use_dense_cx'] = True
nn_opts['use_res_cx'] = True
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 2

In [6]:
# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'cyclic'
nn_opts['cyclic_lr_max'] = 5e-04 # Anything higher will generate NaNs
nn_opts['cyclic_lr_base'] = 1e-05
nn_opts['cyclic_lr_stepsize'] = 20000
nn_opts['max_steps'] = 200000

# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['cyclic_lr_stepsize'] /= len(gpu_devices)
nn_opts['max_steps'] /= len(gpu_devices)
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] / (float(ds_opts['batch_size']) / 8))
nn_opts['max_steps'] = int(nn_opts['max_steps'] / (float(ds_opts['batch_size']) / 8))

In [7]:
# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()

Building model towers...
  Building tower_0...
Instructions for updating:
`normal` is a deprecated alias for `truncated_normal`
  ...tower_0 built.
  Building tower_1...
  ...tower_1 built.
... model towers built.
Initializing model with random values for initial training...

... model initialized

Model Configuration:
  verbose                True
  ckpt_dir               ./pwcnet-lg-6-2-cyclic-chairsthingsmix/
  max_to_keep            10
  x_dtype                <dtype: 'float32'>
  x_shape                [2, 256, 448, 3]
  y_dtype                <dtype: 'float32'>
  y_shape                [256, 448, 2]
  train_mode             train
  display_step           100
  snapshot_step          1000
  val_step               1000
  val_batch_size         -1
  tb_val_imgs            pyramid
  tb_test_imgs           None
  gpu_devices            ['/device:GPU:0', '/device:GPU:1']
  controller             /device:CPU:0
  use_tf_data            True
  batch_size             16
  lr_policy        

## Train the model

In [8]:
# Train the model
nn.train()

Start training from scratch...
2018-09-09 11:41:51 Iter 100 [Train]: loss=217.31, epe=17.64, lr=0.000020, samples/sec=17.9, sec/step=1.791, eta=1 day, 0:49:50
2018-09-09 11:43:33 Iter 200 [Train]: loss=226.02, epe=18.37, lr=0.000030, samples/sec=34.5, sec/step=0.926, eta=12:48:48
2018-09-09 11:45:15 Iter 300 [Train]: loss=222.64, epe=18.10, lr=0.000039, samples/sec=34.6, sec/step=0.926, eta=12:47:08
2018-09-09 11:46:57 Iter 400 [Train]: loss=215.81, epe=17.55, lr=0.000049, samples/sec=34.5, sec/step=0.929, eta=12:47:36
2018-09-09 11:48:40 Iter 500 [Train]: loss=221.65, epe=18.05, lr=0.000059, samples/sec=34.4, sec/step=0.930, eta=12:47:18
2018-09-09 11:50:23 Iter 600 [Train]: loss=224.30, epe=18.27, lr=0.000069, samples/sec=34.4, sec/step=0.931, eta=12:46:16
2018-09-09 11:52:06 Iter 700 [Train]: loss=221.96, epe=18.09, lr=0.000079, samples/sec=34.3, sec/step=0.933, eta=12:46:52
2018-09-09 11:53:49 Iter 800 [Train]: loss=216.28, epe=17.54, lr=0.000088, samples/sec=34.0, sec/step=0.942, 

... model saved in ./pwcnet-lg-6-2-cyclic-chairsthingsmix/pwcnet.ckpt-11000
2018-09-09 14:59:52 Iter 11100 [Train]: loss=92.57, epe=6.27, lr=0.000064, samples/sec=33.1, sec/step=0.967, eta=10:26:56
2018-09-09 15:01:39 Iter 11200 [Train]: loss=96.58, epe=6.58, lr=0.000069, samples/sec=32.9, sec/step=0.973, eta=10:29:18
2018-09-09 15:03:26 Iter 11300 [Train]: loss=92.92, epe=6.30, lr=0.000074, samples/sec=32.7, sec/step=0.978, eta=10:30:29
2018-09-09 15:05:13 Iter 11400 [Train]: loss=92.08, epe=6.25, lr=0.000079, samples/sec=32.9, sec/step=0.974, eta=10:26:32
2018-09-09 15:07:00 Iter 11500 [Train]: loss=96.03, epe=6.59, lr=0.000083, samples/sec=32.9, sec/step=0.974, eta=10:24:42
2018-09-09 15:08:46 Iter 11600 [Train]: loss=92.28, epe=6.28, lr=0.000088, samples/sec=32.9, sec/step=0.974, eta=10:23:23
2018-09-09 15:10:33 Iter 11700 [Train]: loss=95.56, epe=6.53, lr=0.000093, samples/sec=32.9, sec/step=0.973, eta=10:20:56
2018-09-09 15:12:20 Iter 11800 [Train]: loss=94.79, epe=6.50, lr=0.000

... model saved in ./pwcnet-lg-6-2-cyclic-chairsthingsmix/pwcnet.ckpt-22000
2018-09-09 18:26:12 Iter 22100 [Train]: loss=72.20, epe=4.83, lr=0.000061, samples/sec=31.2, sec/step=1.024, eta=7:56:13
2018-09-09 18:28:04 Iter 22200 [Train]: loss=78.13, epe=5.29, lr=0.000064, samples/sec=31.3, sec/step=1.023, eta=7:53:47
2018-09-09 18:29:57 Iter 22300 [Train]: loss=72.79, epe=4.88, lr=0.000066, samples/sec=30.9, sec/step=1.035, eta=7:57:47
2018-09-09 18:31:50 Iter 22400 [Train]: loss=72.16, epe=4.83, lr=0.000069, samples/sec=31.0, sec/step=1.031, eta=7:54:24
2018-09-09 18:33:42 Iter 22500 [Train]: loss=75.61, epe=5.11, lr=0.000071, samples/sec=31.0, sec/step=1.032, eta=7:53:01
2018-09-09 18:35:35 Iter 22600 [Train]: loss=75.58, epe=5.09, lr=0.000074, samples/sec=31.1, sec/step=1.029, eta=7:49:44
2018-09-09 18:37:27 Iter 22700 [Train]: loss=77.16, epe=5.20, lr=0.000076, samples/sec=31.0, sec/step=1.033, eta=7:50:04
2018-09-09 18:39:20 Iter 22800 [Train]: loss=77.69, epe=5.26, lr=0.000079, sa

... model saved in ./pwcnet-lg-6-2-cyclic-chairsthingsmix/pwcnet.ckpt-33000
2018-09-09 22:06:31 Iter 33100 [Train]: loss=69.27, epe=4.59, lr=0.000048, samples/sec=29.0, sec/step=1.104, eta=5:10:50
2018-09-09 22:08:32 Iter 33200 [Train]: loss=70.25, epe=4.69, lr=0.000049, samples/sec=28.8, sec/step=1.113, eta=5:11:33
2018-09-09 22:10:34 Iter 33300 [Train]: loss=69.89, epe=4.65, lr=0.000050, samples/sec=28.8, sec/step=1.113, eta=5:09:45
2018-09-09 22:12:35 Iter 33400 [Train]: loss=67.74, epe=4.49, lr=0.000052, samples/sec=28.7, sec/step=1.114, eta=5:08:14
2018-09-09 22:14:36 Iter 33500 [Train]: loss=68.59, epe=4.56, lr=0.000053, samples/sec=28.8, sec/step=1.111, eta=5:05:27
2018-09-09 22:16:37 Iter 33600 [Train]: loss=70.25, epe=4.68, lr=0.000054, samples/sec=28.8, sec/step=1.110, eta=5:03:21
2018-09-09 22:18:38 Iter 33700 [Train]: loss=67.69, epe=4.48, lr=0.000055, samples/sec=28.8, sec/step=1.111, eta=5:01:46
2018-09-09 22:20:38 Iter 33800 [Train]: loss=69.50, epe=4.62, lr=0.000057, sa

... model wasn't saved -- its score (4.99) doesn't outperform other checkpoints
2018-09-10 02:00:23 Iter 44100 [Train]: loss=63.83, epe=4.17, lr=0.000035, samples/sec=27.4, sec/step=1.169, eta=1:54:56
2018-09-10 02:02:30 Iter 44200 [Train]: loss=63.63, epe=4.16, lr=0.000036, samples/sec=27.4, sec/step=1.169, eta=1:52:59
2018-09-10 02:04:37 Iter 44300 [Train]: loss=65.67, epe=4.33, lr=0.000036, samples/sec=27.3, sec/step=1.170, eta=1:51:11
2018-09-10 02:06:44 Iter 44400 [Train]: loss=65.32, epe=4.30, lr=0.000037, samples/sec=27.3, sec/step=1.171, eta=1:49:15
2018-09-10 02:08:51 Iter 44500 [Train]: loss=66.40, epe=4.36, lr=0.000038, samples/sec=27.4, sec/step=1.168, eta=1:47:02
2018-09-10 02:10:59 Iter 44600 [Train]: loss=65.51, epe=4.32, lr=0.000038, samples/sec=27.3, sec/step=1.171, eta=1:45:24
2018-09-10 02:13:05 Iter 44700 [Train]: loss=66.74, epe=4.41, lr=0.000039, samples/sec=27.5, sec/step=1.163, eta=1:42:43
2018-09-10 02:15:12 Iter 44800 [Train]: loss=66.40, epe=4.38, lr=0.000039

## Training log

Here are the training curves for the run above:

![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/loss.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/epe.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/lr.png)

Here are the predictions issued by the model for a few validation samples:

![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val1.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val2.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val3.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val4.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val5.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val6.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val7.png)
![](img/pwcnet-lg-6-2-cyclic-chairsthingsmix/val8.png)