In [2]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats

***VGGNet***

VGGNet was designed by K. Simonyan and A. Zisserman (2014). It is a fairly deep network (19 layers), and its guiding principle (and insight) is the use of simple blocks: 3x3 convolutions with stride of 1 and 2x2 max-pooling with stride of 2, finished with a dense layer of depth 4096. It has also introduced stacking of convolutional layers without max-pooling, effectively turning the convolutional filters into deep neural networks of their own.

In the following example, we will be classifying two classes: cats vs dogs from ImageNet, totalling 1127 and 791 pictures, respectively.

Before feeding to the network, the images were preprocessed with preprocess.py:
 - the 'No image available' downloads were discarded,
 - image was transformed into gray scale,
 - and it was resized to 256x256 (set by globals.py), with padding used to respect the original aspect ratio.
 No further standardization of the color scale was performed. Note that preprocessing was performed using tensorflow, and so can use GPU processing as available (in present version, however, only on image-by-image basis...).
 
To make the problem tractable on a laptop and avoid overfitting (after all, we are only classifying two classes), all dimension of the original AlexNet were divided the downscale parameter.

First, a bare version, but let us start with an balanced sample already. No dropout is used, downscale=16, there is no image augmentation. The network is trained with Adam for 50 epochs on batches of 256 images each:

In [9]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/0_balanced.log"))

Despite several attempts, the network does not learn. Perhaps we need a bigger one after all, downscale=8:

In [21]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/1_downscale8.log"))

Hmm, still nothing. Let's seek help in the original VGG paper...

Great! Thesre is some overfitting, but overall 80% accuracy is quite a fair result for a bare network. We will address overfitting soon, first let's balance the dataset.

Huh, the overfitting is less of an issue now as well. The network is clearly still learning, let's give it more time to see how far it goes...

In [14]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/ZFNet_logs/2_balanced-long.log"))

Ok, now it's just memorizing again. Let's use dropout to prevent overfitting. We set dropout probability to 50%.

In [20]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/ZFNet_logs/3_dropout.log"))

Not bad, still some overfitting, but we now reach 80% validation accuracy.. Let's add data augmentation. As in the case of AlexNet, let's turn them on one by one every 100 epochs: random zoom-in, rotation, horizontal flip, brightness.

In [40]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/ZFNet_logs/4_augmented.log"))

Awesome! Before overfitting starts, our model reaches 95-96% accuracy. Since overfitting is a serious problem, we should probably keep downscale at 16. There are other ways to improve -- for instance, we could try playing with the learning rate. But let's finish our run through ZFNet here, for now.

In [7]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/workspace/VGGNet_zero.log"))