# `FStream` Offline Training

This notebook performs offline training of the **flow stream** on the **DAVIS 2016** dataset.

MaskRNN's' binary segmentation net is a 2-stream convnet (`astream` and `fstream`):

![](img/maskrnn.png)

Section "3.3 Binary Segmentation" of the MaskRNN paper and "Figure 2" are inconsistent when it comes to describing the inputs of the two-stream network. In this implementation, we chose the input of the flow stream `fstream` to be the concatenation of the magnitude of the flow field from <sub>t-1</sub> to I<sub>t</sub> and I<sub>t</sub> to frame I<sub>t+1</sub> and the warped prediction of the previous frame's segmentation mask b<sub>t-1</sub>, denoted as φ<sub>t-1,t</sub>(b<sub>t-1</sub>). The warping function φ<sub>t-1,t</sub>(.) transforms the input based on the optical flow fields from frame I<sub>t-1</sub> to frame I<sub>t</sub>. The `FStream` network takes 3-channel inputs (||φ<sub>t-1,t</sub>||, ||φ<sub>t,t+1</sub>||, φ<sub>t-1,t</sub>(b<sub>t-1</sub>)):

The offline training of the `AStream` is done using a **3-channel input** VGG16 network pre-trained on ImageNet:

![](img/osvos_parent.png)


To monitor training, run:
```
tensorboard --logdir E:\repos\tf-video-seg\tfvos\models\fstream_parent
http://localhost:6006
```

In [1]:
"""
fstream_offline_training.ipynb

FStream offline trainer

Written by Phil Ferriere

Licensed under the MIT License (see LICENSE for details)

Based on:
  - https://github.com/scaelles/OSVOS-TensorFlow/blob/master/osvos_parent_demo.py
    Written by Sergi Caelles (scaelles@vision.ee.ethz.ch)
    This file is part of the OSVOS paper presented in:
      Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel Cremers, Luc Van Gool
      One-Shot Video Object Segmentation
      CVPR 2017
    Unknown code license
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os, sys
from PIL import Image
import numpy as np
import tensorflow as tf
slim = tf.contrib.slim
import matplotlib.pyplot as plt

In [2]:
# Import model files
import model
import datasets

## Configuration

In [3]:
# Model paths
imagenet_ckpt = 'models/vgg_16_3chan.ckpt' # copy of checkpoint in http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
segnet_stream = 'fstream'
ckpt_name= segnet_stream + '_parent'
logs_path = os.path.join('models', ckpt_name)

# Offline training parameters
gpu_id = 0
iter_mean_grad = 10
max_training_iters_1 = 15000
max_training_iters_2 = 30000
max_training_iters_3 = 50000
save_step = 5000
test_image = None
display_step = 100
ini_learning_rate = 1e-8
boundaries = [10000, 15000, 25000, 30000, 40000]
values = [ini_learning_rate, ini_learning_rate * 0.1, ini_learning_rate, ini_learning_rate * 0.1, ini_learning_rate,
          ini_learning_rate * 0.1]

## Dataset load

In [4]:
# Load DAVIS 2016 dataset
options = datasets._DEFAULT_DAVIS16_OPTIONS
# Set the following to True if you have a lot of RAM
options['data_aug'] = False
# Set the following to wherever you have downloaded the DAVIS 2016 dataset
dataset_root = 'E:/datasets/davis2016/' if sys.platform.startswith("win") else '/media/EDrive/datasets/davis2016/'
train_file = dataset_root + 'ImageSets/480p/train.txt'
dataset = datasets.davis16(train_file, None, dataset_root, options)

Initializing dataset...
E:/datasets/davis2016/ImageSets/480p/train.txt
Cache files:
   videos container: E:/datasets/davis2016/davis2016_videos_train.npy
   video_frame_idx container: E:/datasets/davis2016/davis2016_video_frame_idx_train.npy
   images_train container: E:/datasets/davis2016/davis2016_images_train.npy
   images_train_path container: E:/datasets/davis2016/davis2016_images_train_path.npy
   masks_train container: E:/datasets/davis2016/davis2016_masks_train.npy
   masks_train_path container: E:/datasets/davis2016/davis2016_masks_train_path.npy
   flow_norms_train container: E:/datasets/davis2016/davis2016_flow_norms_train.npy
   warped_prev_masks_train container: E:/datasets/davis2016/davis2016_warped_prev_masks_train.npy
   masks_bboxes_train container: E:/datasets/davis2016/davis2016_masks_bboxes_train.npy
Loading ndarrays from cache...
 videos container... done.
 video_frame_idx container... done.
 images_train container... done.
 images_train_path container... done.
 ma

In [5]:
# Display dataset configuration
dataset.print_config()


Configuration:
  in_memory            True
  data_aug             False
  use_cache            True
  use_optical_flow     True
  use_warped_masks     True
  use_bboxes           True
  optical_flow_mgr     pyflow


## Offline Training

In [6]:
# Train the network with strong side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(0, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 1, learning_rate, logs_path, max_training_iters_1, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, test_image_path=test_image,
                           ckpt_name=ckpt_name)

Network Layers:
   name = fstream/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = fstream/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = fstream/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = fstream/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = fstream/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = fstream/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = fstream/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = fstream/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = fstream/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)

2018-01-31 02:54:02.291664 Iter 3400: Training Loss = 11270.4795
2018-01-31 02:59:42.345527 Iter 3500: Training Loss = 60530.7891
2018-01-31 03:05:22.844060 Iter 3600: Training Loss = 5610.5703
2018-01-31 03:11:03.057009 Iter 3700: Training Loss = 25830.4277
2018-01-31 03:16:43.472930 Iter 3800: Training Loss = 27597.2734
2018-01-31 03:22:24.413489 Iter 3900: Training Loss = 12331.9375
2018-01-31 03:28:04.443860 Iter 4000: Training Loss = 21082.2598
2018-01-31 03:33:44.497701 Iter 4100: Training Loss = 13443.0225
2018-01-31 03:39:24.851565 Iter 4200: Training Loss = 15971.2969
2018-01-31 03:45:05.194807 Iter 4300: Training Loss = 19679.0352
2018-01-31 03:50:45.266596 Iter 4400: Training Loss = 159844.8906
2018-01-31 03:56:25.225160 Iter 4500: Training Loss = 2067.9700
2018-01-31 04:02:05.950620 Iter 4600: Training Loss = 35693.9414
2018-01-31 04:07:45.238442 Iter 4700: Training Loss = 73718.4141
2018-01-31 04:13:25.732018 Iter 4800: Training Loss = 48485.9219
2018-01-31 04:19:06.132029

Finished training.


In [7]:
# Train the network with weak side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(max_training_iters_1, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 2, learning_rate, logs_path, max_training_iters_2, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, resume_training=True,
                           test_image_path=test_image, ckpt_name=ckpt_name)

Network Layers:
   name = fstream/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = fstream/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = fstream/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = fstream/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = fstream/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = fstream/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = fstream/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = fstream/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = fstream/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)

2018-01-31 16:54:53.902193 Iter 18300: Training Loss = 953.1269
2018-01-31 17:00:33.313879 Iter 18400: Training Loss = 8677.8760
2018-01-31 17:06:13.068090 Iter 18500: Training Loss = 13860.7119
2018-01-31 17:11:53.670203 Iter 18600: Training Loss = 14735.6367
2018-01-31 17:17:32.427196 Iter 18700: Training Loss = 15957.6230
2018-01-31 17:23:09.629562 Iter 18800: Training Loss = 11599.0400
2018-01-31 17:28:44.997200 Iter 18900: Training Loss = 3127.4773
2018-01-31 17:34:20.234634 Iter 19000: Training Loss = 12191.3691
2018-01-31 17:39:55.924644 Iter 19100: Training Loss = 1164.7054
2018-01-31 17:45:31.250823 Iter 19200: Training Loss = 16669.8105
2018-01-31 17:51:06.991163 Iter 19300: Training Loss = 7274.4878
2018-01-31 17:56:42.060803 Iter 19400: Training Loss = 4742.8896
2018-01-31 18:02:17.165247 Iter 19500: Training Loss = 4301.2656
2018-01-31 18:07:52.455863 Iter 19600: Training Loss = 3335.3042
2018-01-31 18:13:27.826877 Iter 19700: Training Loss = 39116.0508
2018-01-31 18:19:02

Model saved in file: models\fstream_parent\fstream_parent.ckpt-30000
Finished training.


In [8]:
# Train the network without side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(max_training_iters_2, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 3, learning_rate, logs_path, max_training_iters_3, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, resume_training=True,
                           test_image_path=test_image, ckpt_name=ckpt_name)

Network Layers:
   name = fstream/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = fstream/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = fstream/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = fstream/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = fstream/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = fstream/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = fstream/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = fstream/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = fstream/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = fstream/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = fstream/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)

2018-02-01 06:42:14.914684 Iter 33400: Training Loss = 8093.8384
2018-02-01 06:47:21.759702 Iter 33500: Training Loss = 2032.3414
2018-02-01 06:52:29.734673 Iter 33600: Training Loss = 1796.6763
2018-02-01 06:57:37.762587 Iter 33700: Training Loss = 69.6234
2018-02-01 07:02:45.600817 Iter 33800: Training Loss = 844.9960
2018-02-01 07:07:53.608262 Iter 33900: Training Loss = 11779.7393
2018-02-01 07:13:01.353649 Iter 34000: Training Loss = 1849.7102
2018-02-01 07:18:09.389271 Iter 34100: Training Loss = 1278.1311
2018-02-01 07:23:17.500768 Iter 34200: Training Loss = 835.7748
2018-02-01 07:28:25.099870 Iter 34300: Training Loss = 71.5322
2018-02-01 07:33:32.876707 Iter 34400: Training Loss = 1316.3887
2018-02-01 07:38:40.502646 Iter 34500: Training Loss = 385.2216
2018-02-01 07:43:48.023997 Iter 34600: Training Loss = 1876.2085
2018-02-01 07:48:55.219889 Iter 34700: Training Loss = 1489.0850
2018-02-01 07:54:01.868462 Iter 34800: Training Loss = 141.1523
2018-02-01 07:59:08.836424 Iter 

2018-02-01 16:45:44.104479 Iter 45100: Training Loss = 0.4088
2018-02-01 16:50:55.779447 Iter 45200: Training Loss = 750.8162
2018-02-01 16:56:07.414582 Iter 45300: Training Loss = 2302.9490
2018-02-01 17:01:19.265632 Iter 45400: Training Loss = 3132.6301
2018-02-01 17:06:30.903599 Iter 45500: Training Loss = 1459.1300
2018-02-01 17:11:42.837958 Iter 45600: Training Loss = 2140.6804
2018-02-01 17:16:54.583203 Iter 45700: Training Loss = 2253.5552
2018-02-01 17:22:05.287269 Iter 45800: Training Loss = 1032.7069
2018-02-01 17:27:15.671825 Iter 45900: Training Loss = 954.1885
2018-02-01 17:32:26.101411 Iter 46000: Training Loss = 1567.7511
2018-02-01 17:37:35.995277 Iter 46100: Training Loss = 1361.3478
2018-02-01 17:42:45.046858 Iter 46200: Training Loss = 3688.9390
2018-02-01 17:47:54.415659 Iter 46300: Training Loss = 910.2719
2018-02-01 17:53:03.714443 Iter 46400: Training Loss = 576.1975
2018-02-01 17:58:12.646903 Iter 46500: Training Loss = 337.3216
2018-02-01 18:03:21.651449 Iter 4

## Training losses & learning rate
You should see training curves similar to the following:
![](img/fstream_parent_dsn_2_loss.png)
![](img/fstream_parent_dsn_3_loss.png)
![](img/fstream_parent_dsn_4_loss.png)
![](img/fstream_parent_dsn_5_loss.png)
![](img/fstream_parent_main_loss.png)
![](img/fstream_parent_total_loss.png)
![](img/fstream_parent_learning_rate.png)