<a href="https://colab.research.google.com/github/jfogarty/machine-learning-intro-workshop/blob/master/external/cnn_vgg16.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
- Author: Sebastian Raschka
- GitHub Repository: https://github.com/rasbt/deeplearning-models

# Model Zoo -- Convolutional Neural Network (VGG16)

The VGG-16 Convolutional Neural Network Architecture [1] implemented in TensorFlow and trained on Cifar-10 [2, 3] images.

References:

- [1] Simonyan, K., & Zisserman, A. (2015). [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556). International Conference on Learning Representations (ICRL), 1–14. https://doi.org/10.1016/j.infsof.2008.09.005
- [2] Krizhevsky, A. (2009). [Learning Multiple Layers of Features from Tiny Images](https://doi.org/10.1.1.222.9220 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.222.9220&rep=rep1&type=pdf). Science Department, University of Toronto.
- [3] https://www.cs.toronto.edu/~kriz/cifar.html

## VGG16 – Convolutional Network for Classification and Detection

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. 

The model achieves **92.7% top-5 test accuracy** in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another.

VGG16 was **trained for weeks** while using NVIDIA Titan Black GPU’s.


<figure>
  <center><img src="https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png" />
  </center>
</figure>

<figure>
  <center><img src="https://neurohive.io/wp-content/uploads/2018/11/vgg16.png" />
  <figcaption>Figure 1</figcaption></center>
</figure>

## ImageNet DataSet

[**ImageNet**](http://www.image-net.org/) is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. The images were collected from the web and labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing tool. Starting in 2010, as part of the Pascal Visual Object Challenge, an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held. ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images. ImageNet consists of variable-resolution images. Therefore, the images have been down-sampled to a fixed resolution of 256×256. Given a rectangular image, the image is rescaled and cropped out the central 256×256 patch from the resulting image.

<figure>
  <center><img src="https://cdn.technologyreview.com/i/images/Object%20recognition.png" />
  <figcaption>Figure 2</figcaption></center>
</figure>


This set is **way too big** to train in this Colab notebook, so we'll use the much smaller (but still non-trivai CIFAR-10 set).


## CIFAR-10 DataSet

The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. 

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. 

Here are the classes in the dataset, as well as 10 random images from each:


<figure>
  <center><img src="https://cafe-and-cookies.tokyo/wp/wp-content/uploads/2019/03/img_5c828f2715196.png" />
  <figcaption>Figure 2</figcaption></center>
</figure>


In [0]:
#@title Check Runtime
import os
import tensorflow as tf

try:
   device_name = os.environ['COLAB_TPU_ADDR']
   TPU_ADDRESS = 'grpc://' + device_name
   print(f'Running with TPU acceleration at {TPU_ADDRESS}')
except KeyError:
  GPU_NAME = tf.test.gpu_device_name()
  if GPU_NAME.startswith('/device:GPU'): 
      print(f"Running with GPU acceleration at {GPU_NAME}")
  else:
      print("Running on normal CPU without GPU acceleration.")
      print("This will be VERY VERY slow.")
      print("Consider changing the runtime type to GPU or NPU")

Running with GPU acceleration at /device:GPU:0


In [0]:
#@title Imports and Utility functions
from urllib.request import urlretrieve
import shutil
import glob
import tarfile
import os
import sys
import pickle
import numpy as np
import scipy.misc
from tensorflow.examples.tutorials.mnist import input_data


def download_and_extract_cifar(target_dir,
                               cifar_url='http://www.cs.toronto.edu/'
                               '~kriz/cifar-10-python.tar.gz'):

    if not os.path.exists(target_dir):
        os.mkdir(target_dir)

    fbase = os.path.basename(cifar_url)
    fpath = os.path.join(target_dir, fbase)

    if not os.path.exists(fpath):
        def get_progress(count, block_size, total_size):
            sys.stdout.write('\rDownloading ... %s %d%%' % (fbase,
                             float(count * block_size) /
                             float(total_size) * 100.0))
            sys.stdout.flush()
        local_filename, headers = urlretrieve(cifar_url,
                                              fpath,
                                              reporthook=get_progress)
        sys.stdout.write('\nDownloaded')

    else:
        sys.stdout.write('Found existing')

    statinfo = os.stat(fpath)
    file_size = statinfo.st_size / 1024**2
    sys.stdout.write(' %s (%.1f Mb)\n' % (fbase, file_size))
    sys.stdout.write('Extracting %s ...\n' % fbase)
    sys.stdout.flush()

    with tarfile.open(fpath, 'r:gz') as t:
        t.extractall(target_dir)

    return fpath.replace('cifar-10-python.tar.gz', 'cifar-10-batches-py')

def unpickle_cifar(fpath):
    with open(fpath, 'rb') as f:
        dct = pickle.load(f, encoding='bytes')
    return dct

In [0]:
#@title Ciphar10Loader utility class

class Cifar10Loader():
    def __init__(self, cifar_path, normalize=False,
                 channel_mean_center=False, zero_center=False):
        self.cifar_path = cifar_path
        self.batchnames = [os.path.join(self.cifar_path, f)
                           for f in os.listdir(self.cifar_path)
                           if f.startswith('data_batch')]
        self.testname = os.path.join(self.cifar_path, 'test_batch')
        self.num_train = self.count_train()
        self.num_test = self.count_test()
        self.normalize = normalize
        self.channel_mean_center = channel_mean_center
        self.zero_center = zero_center
        self.train_mean = None

    def _compute_train_mean(self):

        cum_mean = np.zeros((1, 1, 1, 3))

        for batch in self.batchnames:
            dct = unpickle_cifar(batch)
            dct[b'labels'] = np.array(dct[b'labels'], dtype=int)
            dct[b'data'] = dct[b'data'].reshape(
                dct[b'data'].shape[0], 3, 32, 32).transpose(0, 2, 3, 1)
            mean = dct[b'data'].mean(axis=(0, 1, 2), keepdims=True)
            cum_mean += mean

        self.train_mean = cum_mean / len(self.batchnames)

        return None

    def load_test(self, onehot=True):
        dct = unpickle_cifar(self.testname)
        dct[b'labels'] = np.array(dct[b'labels'], dtype=int)

        dct[b'data'] = dct[b'data'].reshape(
            dct[b'data'].shape[0], 3, 32, 32).transpose(0, 2, 3, 1)

        if onehot:
            dct[b'labels'] = (np.arange(10) ==
                              dct[b'labels'][:, None]).astype(int)

        if self.normalize:
            dct[b'data'] = dct[b'data'].astype(np.float32)
            dct[b'data'] = dct[b'data'] / 255.0

        if self.channel_mean_center:
            if self.train_mean is None:
                self._compute_train_mean()
            dct[b'data'] -= self.train_mean

        if self.zero_center:
            if self.normalize:
                dct[b'data'] -= .5
            else:
                dct[b'data'] -= 127.5

        return dct[b'data'], dct[b'labels']

    def load_train_epoch(self, batch_size=50, onehot=True,
                         shuffle=False, seed=None):

        rgen = np.random.RandomState(seed)

        for batch in self.batchnames:
            dct = unpickle_cifar(batch)
            dct[b'labels'] = np.array(dct[b'labels'], dtype=int)
            dct[b'data'] = dct[b'data'].reshape(
                dct[b'data'].shape[0], 3, 32, 32).transpose(0, 2, 3, 1)

            if onehot:
                dct[b'labels'] = (np.arange(10) ==
                                  dct[b'labels'][:, None]).astype(int)

            if self.normalize:
                dct[b'data'] = dct[b'data'].astype(np.float32)
                dct[b'data'] = dct[b'data'] / 255.0

            if self.channel_mean_center:
                if self.train_mean is None:
                    self._compute_train_mean()
                dct[b'data'] -= self.train_mean

            if self.zero_center:
                if self.normalize:
                    dct[b'data'] -= .5
                else:
                    dct[b'data'] -= 127.5

            arrays = [dct[b'data'], dct[b'labels']]
            del dct
            indices = np.arange(arrays[0].shape[0])

            if shuffle:
                rgen.shuffle(indices)

            for start_idx in range(0, indices.shape[0] - batch_size + 1,
                                   batch_size):
                index_slice = indices[start_idx:start_idx + batch_size]
                yield (ary[index_slice] for ary in arrays)

    def count_train(self):
        cnt = 0
        for f in self.batchnames:
            dct = unpickle_cifar(f)
            cnt += len(dct[b'labels'])
        return cnt

    def count_test(self):
        dct = unpickle_cifar(self.testname)
        return len(dct[b'labels'])


In [0]:
##########################
### DATASET
##########################

dest = download_and_extract_cifar('./cifar-10')
cifar = Cifar10Loader(dest, normalize=True, 
                      zero_center=True,
                      channel_mean_center=False)
cifar.num_train

X, y = cifar.load_test()
half = cifar.num_test // 2
X_test, X_valid = X[:half], X[half:]
y_test, y_valid = y[:half], y[half:]

del X, y

print("Ready.")

Downloading ... cifar-10-python.tar.gz 100%
Downloaded cifar-10-python.tar.gz (162.6 Mb)
Extracting cifar-10-python.tar.gz ...


In [0]:
##########################
### SETTINGS
##########################

# Hyperparameters
learning_rate = 0.001
training_epochs = 30
batch_size = 32

# Other
print_interval = 200

# Architecture
image_width, image_height, image_depth = 32, 32, 3
n_classes = 10


##########################
### WRAPPER FUNCTIONS
##########################

def conv_layer(input, input_channels, output_channels, 
               kernel_size, strides, scope, padding='SAME'):
    with tf.name_scope(scope):
        weights_shape = kernel_size + [input_channels, output_channels]
        weights = tf.Variable(tf.truncated_normal(shape=weights_shape,
                                                  mean=0.0,
                                                  stddev=0.1,
                                                  dtype=tf.float32),
                                                  name='weights')
        biases = tf.Variable(tf.zeros(shape=[output_channels]),
                             name='biases')
        conv = tf.nn.conv2d(input=input,
                            filter=weights,
                            strides=strides,
                            padding=padding,
                            name='convolution')
        out = tf.nn.bias_add(conv, biases, name='logits')
        out = tf.nn.relu(out, name='activation')
        return out


def fc_layer(input, output_nodes, scope,
             activation=None, seed=None):
    with tf.name_scope(scope):
        shape = int(np.prod(input.get_shape()[1:]))
        flat_input = tf.reshape(input, [-1, shape])
        weights = tf.Variable(tf.truncated_normal(shape=[shape,
                                                         output_nodes],
                                                  mean=0.0,
                                                  stddev=0.1,
                                                  dtype=tf.float32,
                                                  seed=seed),
                                                  name='weights')
        biases = tf.Variable(tf.zeros(shape=[output_nodes]),
                             name='biases')
        act = tf.nn.bias_add(tf.matmul(flat_input, weights), biases, 
                             name='logits')

        if activation is not None:
            act = activation(act, name='activation')

        return act


##########################
### GRAPH DEFINITION
##########################

g = tf.Graph()
with g.as_default():

    # Input data
    tf_x = tf.placeholder(tf.float32, [None, image_width, image_height, image_depth], name='features')
    tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets')
     
    ##########################
    ### VGG16 Model
    ##########################

    # =========
    # BLOCK 1
    # =========
    conv_layer_1 = conv_layer(input=tf_x,
                              input_channels=3,
                              output_channels=64,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv1')
    
    conv_layer_2 = conv_layer(input=conv_layer_1,
                              input_channels=64,
                              output_channels=64,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv2')    
    
    pool_layer_1 = tf.nn.max_pool(conv_layer_2,
                                  ksize=[1, 2, 2, 1], 
                                  strides=[1, 2, 2, 1],
                                  padding='SAME',
                                  name='pool1') 
    # =========
    # BLOCK 2
    # =========
    conv_layer_3 = conv_layer(input=pool_layer_1,
                              input_channels=64,
                              output_channels=128,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv3')    
    
    conv_layer_4 = conv_layer(input=conv_layer_3,
                              input_channels=128,
                              output_channels=128,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv4')    
    
    pool_layer_2 = tf.nn.max_pool(conv_layer_4,
                                  ksize=[1, 2, 2, 1], 
                                  strides=[1, 2, 2, 1],
                                  padding='SAME',
                                  name='pool2') 
    # =========
    # BLOCK 3
    # =========
    conv_layer_5 = conv_layer(input=pool_layer_2,
                              input_channels=128,
                              output_channels=256,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv5')        
    
    conv_layer_6 = conv_layer(input=conv_layer_5,
                              input_channels=256,
                              output_channels=256,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv6')      
    
    conv_layer_7 = conv_layer(input=conv_layer_6,
                              input_channels=256,
                              output_channels=256,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv7')
    
    pool_layer_3 = tf.nn.max_pool(conv_layer_7,
                                  ksize=[1, 2, 2, 1], 
                                  strides=[1, 2, 2, 1],
                                  padding='SAME',
                                  name='pool3') 
    # =========
    # BLOCK 4
    # =========
    conv_layer_8 = conv_layer(input=pool_layer_3,
                              input_channels=256,
                              output_channels=512,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv8')      
    
    conv_layer_9 = conv_layer(input=conv_layer_8,
                              input_channels=512,
                              output_channels=512,
                              kernel_size=[3, 3],
                              strides=[1, 1, 1, 1],
                              scope='conv9')     
    
    conv_layer_10 = conv_layer(input=conv_layer_9,
                               input_channels=512,
                               output_channels=512,
                               kernel_size=[3, 3],
                               strides=[1, 1, 1, 1],
                               scope='conv10')   
    
    pool_layer_4 = tf.nn.max_pool(conv_layer_10,
                                  ksize=[1, 2, 2, 1], 
                                  strides=[1, 2, 2, 1],
                                  padding='SAME',
                                  name='pool4') 
    # =========
    # BLOCK 5
    # =========
    conv_layer_11 = conv_layer(input=pool_layer_4,
                               input_channels=512,
                               output_channels=512,
                               kernel_size=[3, 3],
                               strides=[1, 1, 1, 1],
                               scope='conv11')   
    
    conv_layer_12 = conv_layer(input=conv_layer_11,
                               input_channels=512,
                               output_channels=512,
                               kernel_size=[3, 3],
                               strides=[1, 1, 1, 1],
                               scope='conv12')   

    conv_layer_13 = conv_layer(input=conv_layer_12,
                               input_channels=512,
                               output_channels=512,
                               kernel_size=[3, 3],
                               strides=[1, 1, 1, 1],
                               scope='conv13') 
    
    pool_layer_5 = tf.nn.max_pool(conv_layer_12,
                                  ksize=[1, 2, 2, 1], 
                                  strides=[1, 2, 2, 1],
                                  padding='SAME',
                                  name='pool5')     
    # ===========
    # CLASSIFIER
    # ===========
    
    fc_layer_1 = fc_layer(input=pool_layer_5, 
                          output_nodes=4096,
                          activation=tf.nn.relu,
                          scope='fc1')
    
    fc_layer_2 = fc_layer(input=fc_layer_1, 
                          output_nodes=4096,
                          activation=tf.nn.relu,
                          scope='fc2')

    out_layer = fc_layer(input=fc_layer_2, 
                         output_nodes=n_classes,
                         activation=None,
                         scope='output_layer')
    
    # Loss and optimizer
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=tf_y)
    cost = tf.reduce_mean(loss, name='cost')
    
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train = optimizer.minimize(cost, name='train')

    # Prediction
    correct_prediction = tf.equal(tf.argmax(tf_y, 1), tf.argmax(out_layer, 1), 
                                  name='correct_predictions')
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')

    # Saver to save session for reuse
    saver = tf.train.Saver()

    
##########################
### TRAINING & EVALUATION
##########################

with tf.Session(graph=g) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        
        avg_cost = 0.
        mbatch_cnt = 0
        for batch_x, batch_y in cifar.load_train_epoch(shuffle=True, batch_size=batch_size):
            
            mbatch_cnt += 1
            _, c = sess.run(['train', 'cost:0'], feed_dict={'features:0': batch_x,
                                                            'targets:0': batch_y})
            avg_cost += c

            if not mbatch_cnt % print_interval:
                print("Minibatch: %04d | Cost: %.3f" % (mbatch_cnt, c))
                

        # ===================
        # Training Accuracy
        # ===================
        n_predictions, n_correct = 0, 0
        for batch_x, batch_y in cifar.load_train_epoch(batch_size=batch_size):
        
            p = sess.run('correct_predictions:0', 
                         feed_dict={'features:0': batch_x,
                                    'targets:0':  batch_y})
            n_correct += np.sum(p)
            n_predictions += p.shape[0]
        train_acc = n_correct / n_predictions
        
        
        # ===================
        # Validation Accuracy
        # ===================
        #valid_acc = sess.run('accuracy:0', feed_dict={'features:0': X_valid,
        #                                              'targets:0': y_valid})
        # ---------------------------------------
        # workaround for GPUs with <= 4 Gb memory
        n_predictions, n_correct = 0, 0
        indices = np.arange(y_valid.shape[0])
        chunksize = 500
        for start_idx in range(0, indices.shape[0] - chunksize + 1, chunksize):
            index_slice = indices[start_idx:start_idx + chunksize]
            p = sess.run('correct_predictions:0', 
                         feed_dict={'features:0': X_valid[index_slice],
                                    'targets:0': y_valid[index_slice]})
            n_correct += np.sum(p)
            n_predictions += p.shape[0]
        valid_acc = n_correct / n_predictions
        # ---------------------------------------
                                                
        print("Epoch: %03d | AvgCost: %.3f" % (epoch + 1, avg_cost / mbatch_cnt), end="")
        print(" | Train/Valid ACC: %.3f/%.3f" % (train_acc, valid_acc))
    
    saver.save(sess, save_path='./convnet-vgg16.ckpt')

W0813 21:47:31.524140 140361440728960 deprecation.py:323] From <ipython-input-7-6aaaad946b04>:227: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



Minibatch: 0200 | Cost: 5.968
Minibatch: 0400 | Cost: 3.273
Minibatch: 0600 | Cost: 2.727
Minibatch: 0800 | Cost: 2.259
Minibatch: 1000 | Cost: 2.265
Minibatch: 1200 | Cost: 2.474
Minibatch: 1400 | Cost: 2.419
Epoch: 001 | AvgCost: 1905.048 | Train/Valid ACC: 0.220/0.217
Minibatch: 0200 | Cost: 2.247
Minibatch: 0400 | Cost: 2.108
Minibatch: 0600 | Cost: 2.337
Minibatch: 0800 | Cost: 2.016
Minibatch: 1000 | Cost: 2.184
Minibatch: 1200 | Cost: 1.755
Minibatch: 1400 | Cost: 1.791
Epoch: 002 | AvgCost: 2.068 | Train/Valid ACC: 0.258/0.259
Minibatch: 0200 | Cost: 1.890
Minibatch: 0400 | Cost: 2.112
Minibatch: 0600 | Cost: 2.091
Minibatch: 0800 | Cost: 1.867
Minibatch: 1000 | Cost: 1.530
Minibatch: 1200 | Cost: 1.793
Minibatch: 1400 | Cost: 1.827
Epoch: 003 | AvgCost: 1.908 | Train/Valid ACC: 0.290/0.285
Minibatch: 0200 | Cost: 1.942
Minibatch: 0400 | Cost: 1.841
Minibatch: 0600 | Cost: 1.783
Minibatch: 0800 | Cost: 1.823
Minibatch: 1000 | Cost: 1.741
Minibatch: 1200 | Cost: 2.216
Minibatch:

### Training Times

- Even with a Colab GPU enabled, the 30 epochs for this set will take some time.

- Expect this to take about 45 minutes to finish.

In [0]:
##########################
### RELOAD & TEST
##########################

with tf.Session(graph=g) as sess:
    saver.restore(sess, save_path='./convnet-vgg16.ckpt')
    
    # test_acc = sess.run('accuracy:0', feed_dict={'features:0': X_test,
    #                                              'targets:0': y_test})
    # ---------------------------------------
    # workaround for GPUs with <= 4 Gb memory
    n_predictions, n_correct = 0, 0
    indices = np.arange(y_test.shape[0])
    chunksize = 500
    for start_idx in range(0, indices.shape[0] - chunksize + 1, chunksize):
        index_slice = indices[start_idx:start_idx + chunksize]
        p = sess.run('correct_predictions:0', 
                     feed_dict={'features:0': X_test[index_slice],
                                'targets:0': y_test[index_slice]})
        n_correct += np.sum(p)
        n_predictions += p.shape[0]
    test_acc = n_correct / n_predictions
    # ---------------------------------------

    print('Test ACC: %.3f' % test_acc)

W0813 22:18:38.424992 140361440728960 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


Test ACC: 0.781


### Results

With 30 epochs on a GPU in about 40 minutes, this model achieves about **78% test accuracy** in Cifar-10.  This very respectable for the limited training time.

### End of notebook.