# `AStream` Offline Training
This notebook performs offline training of the **appearance stream** on the **DAVIS 2016** dataset.

MaskRNN's' binary segmentation net is a 2-stream convnet (`astream` and `fstream`):

![](img/maskrnn.png)

Section "3.3 Binary Segmentation" of the MaskRNN paper and "Figure 2" are inconsistent when it comes to describing the inputs of the two-stream network. In this implementation, we chose the input of the appearance stream `astream` to be the concatenation of the current frame I<sub>t</sub> and the warped prediction of the previous frame's segmentation mask b<sub>t-1</sub>, denoted as φ<sub>t-1,t</sub>(b<sub>t-1</sub>). The warping function φ<sub>t-1,t</sub>(.) transforms the input based on the optical flow fields from frame I<sub>t-1</sub> to frame I<sub>t</sub>. The `AStream` network takes 4-channel inputs (I<sub>t</sub>[0], I<sub>t</sub>[1], I<sub>t</sub>[2], φ<sub>t-1,t</sub>(b<sub>t-1</sub>)):

The offline training of the `AStream` is done using a **4-channel input** VGG16 network pre-trained on ImageNet:

![](img/osvos_parent.png)

> Note: The VGG16 networks pretrained on Imagenet available online are **3-channel RGB input** models, so make sure you've run the [`VGG16 Surgery`](VGG16_Surgery.ipynb) notebook fist!

To monitor training, run:
```
tensorboard --logdir E:\repos\tf-video-seg\tfvos\models\astream_parent
http://localhost:6006
```

In [1]:
"""
astream_offline_training.ipynb

AStream offline trainer

Written by Phil Ferriere

Licensed under the MIT License (see LICENSE for details)

Based on:
  - https://github.com/scaelles/OSVOS-TensorFlow/blob/master/osvos_parent_demo.py
    Written by Sergi Caelles (scaelles@vision.ee.ethz.ch)
    This file is part of the OSVOS paper presented in:
      Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel Cremers, Luc Van Gool
      One-Shot Video Object Segmentation
      CVPR 2017
    Unknown code license
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os, sys
from PIL import Image
import numpy as np
import tensorflow as tf
slim = tf.contrib.slim
import matplotlib.pyplot as plt

In [2]:
# Import model files
import model
import datasets

## Configuration

In [3]:
# Model paths
imagenet_ckpt = 'models/vgg_16_4chan.ckpt'
segnet_stream = 'astream'
ckpt_name= segnet_stream + '_parent'
logs_path = os.path.join('models', ckpt_name)

# Offline training parameters
gpu_id = 0
iter_mean_grad = 10
max_training_iters_1 = 15000
max_training_iters_2 = 30000
max_training_iters_3 = 50000
save_step = 5000
test_image = None
display_step = 100
ini_learning_rate = 1e-8
boundaries = [10000, 15000, 25000, 30000, 40000]
values = [ini_learning_rate, ini_learning_rate * 0.1, ini_learning_rate, ini_learning_rate * 0.1, ini_learning_rate,
          ini_learning_rate * 0.1]

## Dataset load

In [4]:
# Load DAVIS 2016 dataset
options = datasets._DEFAULT_DAVIS16_OPTIONS
# Set the following to True if you have a lot of RAM
options['data_aug'] = False
# Set the following to wherever you have downloaded the DAVIS 2016 dataset
dataset_root = 'E:/datasets/davis2016/' if sys.platform.startswith("win") else '/media/EDrive/datasets/davis2016/'
train_file = dataset_root + 'ImageSets/480p/train.txt'
dataset = datasets.davis16(train_file, None, dataset_root, options)

Initializing dataset...
E:/datasets/davis2016/ImageSets/480p/train.txt
Cache files:
   videos container: E:/datasets/davis2016/davis2016_videos_train.npy
   video_frame_idx container: E:/datasets/davis2016/davis2016_video_frame_idx_train.npy
   images_train container: E:/datasets/davis2016/davis2016_images_train.npy
   images_train_path container: E:/datasets/davis2016/davis2016_images_train_path.npy
   masks_train container: E:/datasets/davis2016/davis2016_masks_train.npy
   masks_train_path container: E:/datasets/davis2016/davis2016_masks_train_path.npy
   flow_norms_train container: E:/datasets/davis2016/davis2016_flow_norms_train.npy
   warped_prev_masks_train container: E:/datasets/davis2016/davis2016_warped_prev_masks_train.npy
   masks_bboxes_train container: E:/datasets/davis2016/davis2016_masks_bboxes_train.npy
Loading ndarrays from cache...
 videos container... done.
 video_frame_idx container... done.
 images_train container... done.
 images_train_path container... done.
 ma

In [5]:
# Display dataset configuration
dataset.print_config()


Configuration:
  in_memory            True
  data_aug             False
  use_cache            True
  use_optical_flow     True
  use_warped_masks     True
  use_bboxes           True
  optical_flow_mgr     pyflow


## Offline Training

In [6]:
# Train the network with strong side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(0, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 1, learning_rate, logs_path, max_training_iters_1, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, test_image_path=test_image,
                           ckpt_name=ckpt_name)

Network Layers:
   name = astream/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = astream/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = astream/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = astream/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = astream/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = astream/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = astream/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = astream/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = astream/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = astream/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = astream/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = astream/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = astream/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)

In [7]:
# Train the network with weak side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(max_training_iters_1, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 2, learning_rate, logs_path, max_training_iters_2, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, resume_training=True,
                           test_image_path=test_image, ckpt_name=ckpt_name)

Network Layers:
   name = astream/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = astream/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = astream/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = astream/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = astream/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = astream/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = astream/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = astream/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = astream/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = astream/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = astream/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = astream/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = astream/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)

Finished training.


In [8]:
# Train the network without side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(max_training_iters_2, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 3, learning_rate, logs_path, max_training_iters_3, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, resume_training=True,
                           test_image_path=test_image, ckpt_name=ckpt_name)

Network Layers:
   name = astream/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = astream/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = astream/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = astream/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = astream/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = astream/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = astream/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = astream/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = astream/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = astream/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = astream/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = astream/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = astream/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = astream/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)

2018-02-01 04:22:14.823729 Iter 45200: Training Loss = 443.9996
2018-02-01 04:25:51.797347 Iter 45300: Training Loss = 1268.5164
2018-02-01 04:29:28.434882 Iter 45400: Training Loss = 1945.3839
2018-02-01 04:33:05.111364 Iter 45500: Training Loss = 594.7296
2018-02-01 04:36:41.936460 Iter 45600: Training Loss = 1580.1082
2018-02-01 04:40:18.711780 Iter 45700: Training Loss = 820.3978
2018-02-01 04:43:55.364036 Iter 45800: Training Loss = 433.5038
2018-02-01 04:47:32.120917 Iter 45900: Training Loss = 531.2731
2018-02-01 04:51:08.986146 Iter 46000: Training Loss = 678.2197
2018-02-01 04:54:45.743155 Iter 46100: Training Loss = 810.8578
2018-02-01 04:58:22.345168 Iter 46200: Training Loss = 1273.8517
2018-02-01 05:01:59.025731 Iter 46300: Training Loss = 480.9303
2018-02-01 05:05:35.914415 Iter 46400: Training Loss = 346.1341
2018-02-01 05:09:12.619658 Iter 46500: Training Loss = 234.9200
2018-02-01 05:12:49.445928 Iter 46600: Training Loss = 403.1935
2018-02-01 05:16:26.241092 Iter 4670

## Training losses & learning rate
You should see training curves similar to the following:
![](img/astream_parent_dsn2_loss.png)

![](img/astream_parent_dsn3_loss.png)

![](img/astream_parent_dsn4_loss.png)

![](img/astream_parent_dsn5_loss.png)

![](img/astream_parent_main_loss.png)

![](img/astream_parent_total_loss.png)

![](img/astream_parent_learning_rate.png)