<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Download-+-Preprocess-Data" data-toc-modified-id="Download-+-Preprocess-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Download + Preprocess Data</a></span><ul class="toc-item"><li><span><a href="#Download" data-toc-modified-id="Download-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Download</a></span></li><li><span><a href="#Preprocess" data-toc-modified-id="Preprocess-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Preprocess</a></span></li></ul></li><li><span><a href="#Models" data-toc-modified-id="Models-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Models</a></span><ul class="toc-item"><li><span><a href="#~Inception-v4" data-toc-modified-id="~Inception-v4-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>~Inception v4</a></span></li><li><span><a href="#VGG" data-toc-modified-id="VGG-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>VGG</a></span></li></ul></li><li><span><a href="#Model-Comparison" data-toc-modified-id="Model-Comparison-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Model Comparison</a></span></li></ul></div>

From [this class][class] (look at more of their notes later)

Design principles
* Reduce filter sizes (except possibly at the lowest layer), factorize filters aggressively
* Use 1x1 convolutions to reduce and expand the number of feature maps judiciously
* Use skip connections and/or create multiple paths through the network 

What else?
* Training tricks and details: initialization, regularization, normalization
* Training data augmentation
* Averaging classifier outputs over multiple crops/flips
* Ensembles of networks

[class]: http://slazebni.cs.illinois.edu/spring17/

Submit results [here][submission] with team code: **KknPS9LrSKwM2cFXe9T2**

See the leaderboard [here][lb].

[submission]: http://miniplaces.csail.mit.edu/submit.php
[lb]: http://miniplaces.csail.mit.edu/leaderboard.php

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tqdm import tnrange
import time
import os
import sys
sys.path = ['/scratch/nhunt/cv_parker/scripts'] + sys.path
from utils import tf_init, get_next_run_num, load_data, output_file
from layers import ConvLayer, MaxPoolLayer, AvgPoolLayer, BranchedLayer, MergeLayer, LayerModule, FlattenLayer, DenseLayer, GlobalAvgPoolLayer, DropoutLayer, GlobalMaxPoolLayer
from models import CNN, BaseNN

%matplotlib inline
config = tf_init()

The history saving thread hit an unexpected error (DatabaseError('database disk image is malformed',)).History will not be written to the database.


  return f(*args, **kwds)


In [2]:
train_inputs, train_labels, val_inputs, val_labels, test_inputs = load_data('miniplaces')
n_classes = len(np.unique(train_labels))

# Download + Preprocess Data

We end up with this file structure folder after first downloading everything:

development_kit
 * README
 * ...

data
 * labels
    * categories.txt
    * object_categories.txt
    * train.txt
    * val.txt
 * images
     * train
         * a
             * abbey
             * airport_terminal
             * ...
         * b
         * ...
     * val
     * test
 * objects
     * train
         * a
             * abbey
             * airport_terminal
             * ...
         * b
         * ...
     * val

The train images are stored in directories that correspond to their scene labels. All of the val and test images are stored directly in their directory. The labels for the val images (and for the test ones, for easier access) are in `development_kit/val.txt` and `/train.txt`.

All of the images are .jpg files. The images have been resized to 128x128 to make the challenge easier (computationally; it may be harder in terms of achieving the same accuracy).

The object notations are a special file that tells you the name of the image to which they correspond, where that image is (which folder), and then have bounding polygons (as a series of points) for the objects in the image, with classes for the objects. There are 3502 train images with object annotations and 371 validation images.

**Read the README to get a better idea of the data before continuing!**

## Download

In [3]:
%%bash
wget -q http://6.869.csail.mit.edu/fa17/miniplaces/development_kit.tar.gz
tar -xzf development_kit.tar.gz
rm development_kit.tar.gz

mkdir -p data/labels
mv development_kit/data/* data/labels
rm -r development_kit/data

cd data
wget -q http://6.869.csail.mit.edu/fa17/miniplaces/data.tar.gz
tar -xzf data.tar.gz
rm data.tar.gz

In [4]:
img = plt.imread('data/images/train/a/abbey/00000001.jpg')
plt.imshow(img)
plt.axis('off');

## Preprocess
* Put all train/val/test images in their own array for easy loading (the dataset is small enough that we can load all of them at once)

In [5]:
def save_imgs_together(split, convert_to_float=False, image_path='data/images', labels_path='data/labels/'):
    """
    Reads in all of the images from this split and saves them into a single numpy array.
    This should make the training easier and more efficient.
    :param split: one of train, val, or test; which split of the data to process
                  if split isn't test, the labels will also be saved into a numpy array
    :param convert_to_float: if true, the image array is divided by 255 to conver the data to floats in [0, 1]
    :param image_path: path to the first-level image directories (e.g. a, b, ...)
    :param labels_path: path to the labels data (e.g. train.txt, val.txt)
    """

    img_fnames = ! find $image_path/$split -name *.jpg | sort
    imgs = [plt.imread(img_fname) for img_fname in img_fnames]

    imgs = np.array(imgs)

    if convert_to_float:
        imgs = imgs / 255

    np.save('{}/{}.npy'.format(image_path, split), imgs)

    if split != 'test':  # no labels for test
        labels = pd.read_csv('{}/{}.txt'.format(labels_path, split), sep=' ', header=None, usecols=[1]).iloc[:, 0].values

        assert len(labels) == len(imgs)

        np.save('{}/{}_labels.npy'.format(image_path, split), labels.astype(np.int32))

In [6]:
for split in ['train', 'val', 'test']:
    save_imgs_together(split)

# Models

## ~Inception v4

In [7]:
# tf.layers.separable_conv2d
graph = tf.Graph()
with graph.as_default():
    labels = tf.placeholder(tf.int32, shape=None)
    img = tf.placeholder(tf.float32, (None, 128, 128, 3))

    layers = [
        tf.layers.Conv2D(96, 11, 4, padding='SAME', activation=tf.nn.relu), # image size reduces to 32 * 32
        tf.layers.MaxPooling2D(3, 2, padding='SAME'), # image size reduces to 16 * 16
        tf.layers.Conv2D(256, 5, padding='SAME', activation=tf.nn.relu),
        tf.layers.MaxPooling2D(3, 2, padding='SAME'), # image size reduces to 8 * 8
        tf.layers.Conv2D(384, 3, padding='SAME', activation=tf.nn.relu),
        tf.layers.Conv2D(384, 3, padding='SAME', activation=tf.nn.relu),
        tf.layers.Conv2D(256, 3, padding='SAME', activation=tf.nn.relu),
        tf.layers.MaxPooling2D(3, 2, padding='SAME'), # image size reduces to 4 * 4
        tf.layers.Flatten(),
        tf.layers.Dropout(0.5),
        tf.layers.Dense(4096, activation=tf.nn.relu),
        tf.layers.Dropout(0.5),
        tf.layers.Dense(4096, activation=tf.nn.relu)
    ]

    hidden = img
    for layer in layers:
        hidden = layer(hidden)

    logits = tf.layers.Dense(100, activation=None)(hidden)
    preds = tf.nn.softmax(logits)
    loss_op = tf.losses.sparse_softmax_cross_entropy(labels, logits)
    train_op = tf.train.AdagradOptimizer(.001).minimize(loss_op)
    
    _, acc_op = tf.metrics.accuracy(labels, tf.argmax(preds, axis=1))
    
    global_init = tf.global_variables_initializer()
    local_init = tf.local_variables_initializer()

batch_size = 128

train_idx = list(range(len(train_labels)))
val_idx = list(range(len(val_labels)))

sess = tf.Session(config=config, graph=graph)
sess.run(global_init)

n_epochs = 20

n_epochs = 20

for epoch in range(n_epochs):
	np.random.shuffle(train_idx)
	
    sess.run(local_init)
    train_loss = []
    for batch in range(int(np.ceil(len(train_labels) / batch_size))):
        batch_idx = train_idx[batch * batch_size : (batch + 1) * batch_size]
        loss, train_acc, _ = sess.run([loss_op, acc_op, train_op], {img: train_inputs[batch_idx], labels: train_labels[batch_idx]})
        train_loss.append(loss)

    sess.run(local_init)
    val_loss = []
    for batch in range(int(np.ceil(len(val_labels) / batch_size))):
        batch_idx = val_idx[batch * batch_size : (batch + 1) * batch_size]
        loss, val_acc = sess.run([loss_op, acc_op], {img: val_inputs[batch_idx], labels: val_labels[batch_idx]})
        val_loss.append(loss)

    print(f"Epoch {epoch}. Train Loss: {np.mean(train_loss):.3f}; Val Loss: {np.mean(val_loss):.3f}. Train Acc: {train_acc:.3f}; Val Acc: {val_acc:.3f}")

NameError: name 'tf' is not defined

In [None]:
layers = [
    # Image initial size 128 * 128
    ConvLayer(96, 11, 4), # image size reduces to 32 * 32
    MaxPoolLayer(3,2), # image size reduces to 16 * 16
    ConvLayer(256, 5),
    MaxPoolLayer(3,2), # image size reduces to 8 * 8
    ConvLayer(384, 3),
    ConvLayer(384, 3),
    ConvLayer(256, 3),
    MaxPoolLayer(3,2), # image size reduces to 4 * 4
    FlattenLayer(),
    DropoutLayer(0.5),
    DenseLayer(4096),
    DropoutLayer(0.5),
    DenseLayer(4096)
]

cnn = CNN(layers, models_dir = "/scratch/nhunt/cv_parker/miniplaces/models", n_classes=n_classes)

cnn.train(train_inputs[:1000], train_labels[:1000], val_inputs[:1000], val_labels[:1000], verbose=2, n_epochs=20,) # max_patience=20) # verbose 0 doesn't print anything, 2 progress bar in notebook, 1 terminal progress bar
# cnn.train(train_inputs, train_labels, val_inputs, val_labels, verbose=2, n_epochs=1) # verbose 0 doesn't print anything, 2 progress bar in notebook, 1 terminal progress bar

In [None]:
cnn.score(val_inputs, val_labels)

In [None]:
cnn = CNN(run_num=52)

In [None]:
cnn.score(val_inputs, val_labels)

In [None]:
# cnn.score(val_inputs, val_labels)

In [None]:
preds = cnn.predict_proba(test_inputs)
output_file(preds)

In [None]:
prelu()

## VGG

In [None]:
cnn = PretrainedCNN(n_classes=n_classes, dense_nodes=(1024, 1024), batch_size=64, config=config, cnn_module='vgg16',
                   pretrained_weights=False)
cnn.train(train_inputs, train_labels, val_inputs, val_labels, in_notebook=True)

In [None]:
# cnn = PretrainedCNN(run_num=20)

In [None]:
cnn.score(val_inputs, val_labels)

In [None]:
preds = cnn.predict_proba(val_inputs)
output_file(preds)

# Model Comparison

Adding batch norm: [0.23039998, 0.5126999]

Submitted Models

| Model | Acc@1 | Acc@5 | VAcc@1 | VAcc@5 | Notes | Run # |
|-----|
|  | .2429 | .4776 | 0.2509 | 0.5397 | | ??50-ish |
| | .2691 | .5224 | 0.2900 | 0.5844 | batch norm and l2=.001 | 58 |

In [None]:
log = pd.read_hdf('models/log.h5', key='default').sort_values('dev_loss')
log.head()