## Lesson resources

* [https://youtu.be/Q0z-l2KRYFY Lesson video]
* [http://forums.fast.ai/t/lesson-7-discussion Forum discussion]
* [http://wiki.fast.ai/index.php/Lesson_7_Notes Lesson 7 notes]
* The notebooks:
** [https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson6.ipynb Lesson 6] updated with pure python RNN and theano GRU details
** [https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson7.ipynb Lesson 7] has the various CNN architectures applied to the kaggle fisheries competition.

* The python scripts:
** [https://github.com/fastai/courses/blob/master/deeplearning1/nbs/vgg16bn.py VGG with batch normalization] - also adds 'size' and 'include_top' parameters
** [https://github.com/fastai/courses/blob/master/deeplearning1/nbs/resnet50.py resnet50.py] - The resnet architecture

## More information

* [https://culurciello.github.io/tech/2016/06/04/nets.html Network architecture review] (really great)
* [http://techtalks.tv/talks/fully-convolutional-networks-for-semantic-segmentation/61606/ Fully convolutional net]
* [https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models [Multi input and multi output nets in Keras]
* [http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/ Gated recurrent units (GRUs) ]

## Fisheries competition

In this notebook we're going to investigate a range of different architectures for the [Kaggle fisheries competition](https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring). The video states taht vgg.py and vgg_ft() from utils.py have been updated to include VGG with batch normalization, but this is not the case. We've instead created a new file [vgg_bn.py] and an additional method vgg_ft_bn() (which is already in utils.py) which we use in this notbook.

### actions
1. create validation and sample set
2. rearrange image files into their respective directories
3. fine tune and train model
4. generate prediction
5. validate prediction
6. sumit predictions to kaggle

In [90]:
%matplotlib inline
import imp
import utils
imp.reload(utils)
from utils import *

Using TensorFlow backend.


In [91]:
path = './data/fish'
batch_size = 64

## Data Preparation

In [88]:
import os
import glob
import numpy as np
from pathlib import Path
from shutil import copy

source = [x for x in Path(path + '/org').iterdir() if x.is_dir()]
target = {'train': 10, 'valid': 5}

# create folders: train, valid
for name in target.keys():
    Path(path + '/' + name).mkdir(exist_ok = True)

def drop(dst, src):
    dst.mkdir(exist_ok = True) # ignore FileExistsrror
    print(src, '-->', dst)
    #copy(src, dst)

for src in source: 
    imgs = sorted(Path(src).glob('*.jpg'))
    print(len(imgs), src)
    imgs = np.random.permutation(imgs)
    total = np.sum(list(target.values())) * 5.0
    start = 0
    for key, val in target.items():
        size = np.int32(len(imgs) * val / total)
        dst = Path(path + '/' + key + '/' + src.name)
        for index in range(0, size):
            img = imgs[start + index]
            drop(dst, img)
        start += size


1719 data\fish\org\ALB
data\fish\org\ALB\img_02229.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02883.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02414.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_06989.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_01901.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02328.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_05973.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_05266.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_01822.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_04907.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_07384.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_00793.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_03593.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02046.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_04954.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_06158.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_07614.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_052

data\fish\org\ALB\img_05279.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_00617.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02719.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_01072.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_07463.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_00829.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_04112.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_07917.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02034.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_01333.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_07549.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_03233.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_05131.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_01334.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_04301.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_02292.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_03575.jpg --> data\fish\train\ALB
data\fish\org\ALB\img_03119.jpg --> data\fish\tr

data\fish\org\ALB\img_04164.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_00719.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_06372.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_07810.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_03451.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_05808.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_01841.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_01032.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_05465.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_01507.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_02586.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_03635.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_03216.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_05297.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_01455.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_07207.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_07662.jpg --> data\fish\valid\ALB
data\fish\org\ALB\img_06437.jpg --> data\fish\va

data\fish\org\NoF\img_04351.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_02329.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_03933.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_05733.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_00720.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_05984.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_02015.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_03893.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_01892.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_02441.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_05352.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_07758.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_05998.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_05814.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_00849.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_07597.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_05178.jpg --> data\fish\train\NoF
data\fish\org\NoF\img_03506.jpg --> data\fish\tr

data\fish\org\SHARK\img_00096.jpg --> data\fish\train\SHARK
data\fish\org\SHARK\img_00033.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_07533.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_02557.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_03131.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_01986.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_01916.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_01820.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_07080.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_06385.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_05019.jpg --> data\fish\valid\SHARK
data\fish\org\SHARK\img_06000.jpg --> data\fish\valid\SHARK
734 data\fish\org\YFT
data\fish\org\YFT\img_00217.jpg --> data\fish\train\YFT
data\fish\org\YFT\img_00801.jpg --> data\fish\train\YFT
data\fish\org\YFT\img_06846.jpg --> data\fish\train\YFT
data\fish\org\YFT\img_01974.jpg --> data\fish\train\YFT
data\fish\org\YFT\img_06763.jpg --

data\fish\org\YFT\img_05676.jpg --> data\fish\valid\YFT


In [96]:
batches = get_batches(path + '/train', batch_size = batch_size)
val_batches = get_batches(path + '/valid', batch_size = batch_size * 2, shuffle = False)

(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames, test_filenames) = get_classes(path + '/')

Found 499 images belonging to 8 classes.
Found 247 images belonging to 8 classes.
Found 499 images belonging to 8 classes.
Found 247 images belonging to 8 classes.
Found 0 images belonging to 0 classes.


Sometimes it's helpful to have just the filenames, without the path.

In [97]:
raw_filenames = [f.split('/')[-1] for f in filenames]
raw_test_filenames = [f.split('/')[-1] for f in test_filenames]
raw_val_filenames = [f.split('/')[-1] for f in val_filenames]

## vgg approach

We start with our usual VGG approach. We will be using VGG with batch normalization. We explained how to add batch normalization to VGG in the [imagenet batch notebook]. VGG with batch normalization is implemented in [vgg_bn.py], and there is a version of vgg_ft (our fine tuning function) with batch norm called vgg_ft_bn in [util.py].

### Initial model

First we create a simple fine-tuned VGG model to be our starting point.

In [114]:
from vgg16bn import Vgg16BN
model = vgg_ft_bn(8)

In [99]:
trn = get_data(path + '/train')
val = get_data(path + '/valid')

Found 499 images belonging to 8 classes.
Found 247 images belonging to 8 classes.


In [107]:
imp.reload(utils)
from utils import *
test = get_data(path + '/test')

Found 0 images belonging to 0 classes.


In [108]:
save_array(path + '/results/trn.dat', trn)
save_array(path + '/results/val.dat', val)

In [109]:
save_array(path + '/results/test.dat', test)

In [110]:
trn = load_array(path + '/results/trn.dat')
val = load_array(path + '/results/val.dat')

In [111]:
test = load_array(path + '/results/test.dat')

In [112]:
gen = image.ImageDataGenerator()

In [116]:
model.compile(optimizer = Adam(1e-3), loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [None]:
model.fit(trn, trn_labels, batch_size = batch_size, epochs = 3, validation_data = (val, val_labels))

Train on 499 samples, validate on 247 samples
Epoch 1/3


In [118]:
model.save_weights(path + '/results/ft1.h5')

### Precompute convolutional output

We pre-compute the output of the last convolution layer of VGG, since we're unlikely to need to fine-tune those layers. (All following analysis will be done on just the pre-computed convolutional features.)

In [120]:
model.load_weights(path + '/results/ft1.h5')

In [123]:
conv_layers, fc_layers = split_at(model, Convolution2D)

In [124]:
conv_model = Sequential(conv_layers)

In [None]:
conv_feat = conv_model.predict(trn)
conv_val_feat = conv_model.predict(val)

In [None]:
conv_test_feat = conv_model.predict(test)

In [None]:
save_array(path + '/results/conv_feat.dat', conv_feat)
save_array(path + '/results/conv_val_feat.dat', conv_val_feat)

In [None]:
save_array(path + '/results/conv_test_feat.dat', conv_test_feat)

In [None]:
conv_feat = load_array(path + '/results/conv_feat.dat')
conv_val_feat = load_array(path + '/results/conv_val_feat.dat')

In [None]:
conv_test_feat = load_array(path + '/results/conv_test_feat.dat')

In [None]:
conv_val_feat.shape

### Train model

We can now create our first baseline model - a simple 3-layer FC net.