# Custom cifar-100 conv net with Caffe in Python (Pycaffe)

Here, I train a custom convnet on the cifar-100 dataset. I will try to build a new convolutional neural network architecture. It is a bit based on the NIN (Network In Network) architecture detailed in this paper: http://arxiv.org/pdf/1312.4400v3.pdf. 

I mainly use some convolution layers, cccp layers, pooling layers, dropout, fully connected layers, relu layers, as well ass sigmoid layers and softmax with loss on top of the neural network. 

My code, other than the neural network architecture, is inspired from the official caffe python ".ipynb" examples available at: https://github.com/BVLC/caffe/tree/master/examples.

Please refer to https://www.cs.toronto.edu/~kriz/cifar.html for more information on the nature of the task and of the dataset on which the convolutional neural network is trained on.

## Dynamically download and convert the cifar-100 dataset to Caffe's HDF5 format using code of another git repo of mine.
More info on the dataset can be found at http://www.cs.toronto.edu/~kriz/cifar.html.

In [1]:
%%time

!rm download-and-convert-cifar-100.py
print("Getting the download script...")
!wget https://raw.githubusercontent.com/guillaume-chevalier/caffe-cifar-10-and-cifar-100-datasets-preprocessed-to-HDF5/master/download-and-convert-cifar-100.py
print("Downloaded script. Will execute to download and convert the cifar-100 dataset:")
!python download-and-convert-cifar-100.py

rm: cannot remove ‘download-and-convert-cifar-100.py’: No such file or directory
Getting the download script...
wget: /root/anaconda2/lib/libcrypto.so.1.0.0: no version information available (required by wget)
wget: /root/anaconda2/lib/libssl.so.1.0.0: no version information available (required by wget)
--2015-12-30 23:48:28--  https://raw.githubusercontent.com/guillaume-chevalier/caffe-cifar-10-and-cifar-100-datasets-preprocessed-to-HDF5/master/download-and-convert-cifar-100.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 23.235.39.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|23.235.39.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3526 (3.4K) [text/plain]
Saving to: ‘download-and-convert-cifar-100.py’


2015-12-30 23:48:28 (1.25 GB/s) - ‘download-and-convert-cifar-100.py’ saved [3526/3526]

Downloaded script. Will execute to download and convert the cifar-100 dataset:

Downloading...
wget: /root/anaconda2

## Build the model with Caffe. 

In [2]:
import numpy as np

import caffe
from caffe import layers as L
from caffe import params as P

In [3]:
def cnn(hdf5, batch_size):
    n = caffe.NetSpec()
    n.data, n.label_coarse, n.label_fine = L.HDF5Data(batch_size=batch_size, source=hdf5, ntop=3)
    
    n.conv1 = L.Convolution(n.data, kernel_size=4, num_output=64, weight_filler=dict(type='xavier'))
    n.cccp1a = L.Convolution(n.conv1, kernel_size=1, num_output=42, weight_filler=dict(type='xavier'))
    n.relu1a = L.ReLU(n.cccp1a, in_place=True)
    n.cccp1b = L.Convolution(n.relu1a, kernel_size=1, num_output=32, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.cccp1b, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop1 = L.Dropout(n.pool1, in_place=True)
    n.relu1b = L.ReLU(n.drop1, in_place=True)
    
    n.conv2 = L.Convolution(n.relu1b, kernel_size=4, num_output=42, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop2 = L.Dropout(n.pool2, in_place=True)
    n.relu2 = L.ReLU(n.drop2, in_place=True)
    
    n.conv3 = L.Convolution(n.relu2, kernel_size=2, num_output=64, weight_filler=dict(type='xavier'))
    n.pool3 = L.Pooling(n.conv3, kernel_size=2, stride=2, pool=P.Pooling.AVE)
    n.relu3 = L.ReLU(n.pool3, in_place=True)
    
    n.ip1 = L.InnerProduct(n.relu3, num_output=768, weight_filler=dict(type='xavier'))
    n.sig1 = L.Sigmoid(n.ip1, in_place=True)
    
    n.ip_c = L.InnerProduct(n.sig1, num_output=20, weight_filler=dict(type='xavier'))
    n.accuracy_c = L.Accuracy(n.ip_c, n.label_coarse)
    n.loss_c = L.SoftmaxWithLoss(n.ip_c, n.label_coarse)
    
    n.ip_f = L.InnerProduct(n.sig1, num_output=100, weight_filler=dict(type='xavier'))
    n.accuracy_f = L.Accuracy(n.ip_f, n.label_fine)
    n.loss_f = L.SoftmaxWithLoss(n.ip_f, n.label_fine)
    
    return n.to_proto()
    
with open('cnn_train.prototxt', 'w') as f:
    f.write(str(cnn('cifar_100_caffe_hdf5/train.txt', 100)))
    
with open('cnn_test.prototxt', 'w') as f:
    f.write(str(cnn('cifar_100_caffe_hdf5/test.txt', 120)))

## Load and visualise the untrained network's internal structure and shape
The network's structure (graph) visualisation tool of caffe is broken in the current release. We will simply print here the data shapes. 

In [4]:
caffe.set_mode_gpu()
solver = caffe.get_solver('cnn_solver_rms.prototxt')

In [5]:
print("Layers' features:")
[(k, v.data.shape) for k, v in solver.net.blobs.items()]

Layers' features:


[('data', (100, 3, 32, 32)),
 ('label_coarse', (100,)),
 ('label_fine', (100,)),
 ('label_coarse_data_1_split_0', (100,)),
 ('label_coarse_data_1_split_1', (100,)),
 ('label_fine_data_2_split_0', (100,)),
 ('label_fine_data_2_split_1', (100,)),
 ('conv1', (100, 64, 29, 29)),
 ('cccp1a', (100, 42, 29, 29)),
 ('cccp1b', (100, 32, 29, 29)),
 ('pool1', (100, 32, 14, 14)),
 ('conv2', (100, 42, 11, 11)),
 ('pool2', (100, 42, 5, 5)),
 ('conv3', (100, 64, 4, 4)),
 ('pool3', (100, 64, 2, 2)),
 ('ip1', (100, 768)),
 ('ip1_sig1_0_split_0', (100, 768)),
 ('ip1_sig1_0_split_1', (100, 768)),
 ('ip_c', (100, 20)),
 ('ip_c_ip_c_0_split_0', (100, 20)),
 ('ip_c_ip_c_0_split_1', (100, 20)),
 ('accuracy_c', ()),
 ('loss_c', ()),
 ('ip_f', (100, 100)),
 ('ip_f_ip_f_0_split_0', (100, 100)),
 ('ip_f_ip_f_0_split_1', (100, 100)),
 ('accuracy_f', ()),
 ('loss_f', ())]

In [6]:
print("Parameters and shape:")
[(k, v[0].data.shape) for k, v in solver.net.params.items()]

Parameters and shape:


[('conv1', (64, 3, 4, 4)),
 ('cccp1a', (42, 64, 1, 1)),
 ('cccp1b', (32, 42, 1, 1)),
 ('conv2', (42, 32, 4, 4)),
 ('conv3', (64, 42, 2, 2)),
 ('ip1', (768, 256)),
 ('ip_c', (20, 768)),
 ('ip_f', (100, 768))]

## Solver's params

The solver's params for the created net are defined in a `.prototxt` file. 

Notice that because `max_iter: 100000`, the training will loop 2 times on the 50000 training data. Because we train data by minibatches of 100 as defined above when creating the net, there will be a total of `100000*100/50000 = 200` epochs on some of those pre-shuffled 100 images minibatches.

We will test the net on `test_iter: 100` different test images at each `test_interval: 1000` images trained. 
____

Here, **RMSProp** is used, it is SDG-based, it converges faster than a pure SGD and it is robust.
____

In [7]:
!cat cnn_solver_rms.prototxt

train_net: "cnn_train.prototxt"
test_net: "cnn_test.prototxt"

test_iter: 100
test_interval: 1000

base_lr: 0.0006
momentum: 0.0
weight_decay: 0.001

lr_policy: "inv"
gamma: 0.0001
power: 0.75

display: 100

max_iter: 150000

snapshot: 50000
snapshot_prefix: "cnn_snapshot"
solver_mode: GPU

type: "RMSProp"
rms_decay: 0.98


## Alternative way to train directly in Python
Since a recent update, there is no output in python by default, which is bad for debugging. 
Skip this cell and train with the second method shown below if needed. It is commented out in case you just chain some `shift+enter` ipython shortcuts. 

In [8]:
# %%time
# solver.solve()

## Train by calling caffe in command line
Just set the parameters correctly. Be sure that the notebook is at the root of the ipython notebook server. 
You can run this in an external terminal if you open it in the notebook's directory. 

It is also possible to finetune an existing net with a different solver or different data. Here I do it, because I feel the net could better fit the data. 

In [9]:
%%time
!$CAFFE_ROOT/build/tools/caffe train -solver cnn_solver_rms.prototxt

/root/caffe/build/tools/caffe: /root/anaconda2/lib/liblzma.so.5: no version information available (required by /usr/lib/x86_64-linux-gnu/libunwind.so.8)
I1230 23:53:02.863142  2138 caffe.cpp:184] Using GPUs 0
I1230 23:53:03.078757  2138 solver.cpp:48] Initializing solver from parameters: 
train_net: "cnn_train.prototxt"
test_net: "cnn_test.prototxt"
test_iter: 100
test_interval: 1000
base_lr: 0.0006
display: 100
max_iter: 150000
lr_policy: "inv"
gamma: 0.0001
power: 0.75
momentum: 0
weight_decay: 0.001
snapshot: 50000
snapshot_prefix: "cnn_snapshot"
solver_mode: GPU
device_id: 0
rms_decay: 0.98
type: "RMSProp"
I1230 23:53:03.078974  2138 solver.cpp:81] Creating training net from train_net file: cnn_train.prototxt
I1230 23:53:03.079375  2138 net.cpp:49] Initializing net from parameters: 
state {
  phase: TRAIN
}
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label_coarse"
  top: "label_fine"
  hdf5_data_param {
    source: "cifar_100_caffe_hdf5/train.txt"
    batch_size

Caffe brewed. 
## Test the model completely on test data
Let's test directly in command-line:

In [10]:
%%time
!$CAFFE_ROOT/build/tools/caffe test -model cnn_test.prototxt -weights cnn_snapshot_iter_150000.caffemodel -iterations 83

/root/caffe/build/tools/caffe: /root/anaconda2/lib/liblzma.so.5: no version information available (required by /usr/lib/x86_64-linux-gnu/libunwind.so.8)
I1231 10:31:19.907760  9759 caffe.cpp:234] Use CPU.
I1231 10:31:20.073982  9759 net.cpp:49] Initializing net from parameters: 
state {
  phase: TEST
}
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label_coarse"
  top: "label_fine"
  hdf5_data_param {
    source: "cifar_100_caffe_hdf5/test.txt"
    batch_size: 120
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 64
    kernel_size: 4
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "cccp1a"
  type: "Convolution"
  bottom: "conv1"
  top: "cccp1a"
  convolution_param {
    num_output: 42
    kernel_size: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "relu1a"
  type: "ReLU"
  bottom: "cccp1a"
  top: "cccp1a"
}
layer {
  name: "cccp1b"
  type: "Convolution"

## The model achieved near 58% accuracy on the 20 coarse labels and 47% accuracy on fine labels.
This means that upon showing the neural network a picture it had never seen, it will correctly classify it in one of the 20 coarse categories 58% of the time or it will classify it correctly in the fine categories 47% of the time right, and ignoring the coarse label. This is amazing, but the neural network for sure could be fine tuned with better solver parameters. 

It would  be also possible to have two more loss layers on top of the existing loss, to recombine the predictions made and synchronize with the fact that coarse and fine labels influence on each other and are related.

This neural network training could be compared to the results listed here: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#494c5356524332303132207461736b2031

Let's convert the notebook to github markdown:

In [11]:
!jupyter nbconvert --to markdown custom-cifar-100.ipynb 
!mv custom-cifar-100.md README.md

[NbConvertApp] Converting notebook custom-cifar-100.ipynb to markdown
[NbConvertApp] Writing 1036426 bytes to custom-cifar-100.md
