In [2]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats

***VGGNet***

VGGNet was designed by K. Simonyan and A. Zisserman (2014). It is a fairly deep network (19 layers), and its guiding principle (and insight) is the use of simple blocks: 3x3 convolutions with stride of 1 and 2x2 max-pooling with stride of 2, finished with a dense layer of depth 4096. It has also introduced stacking of convolutional layers without max-pooling, effectively turning the convolutional filters into deep neural networks of their own.

In the following example, we will be classifying two classes: cats vs dogs from ImageNet, totalling 1127 and 791 pictures, respectively.

Before feeding to the network, the images were preprocessed with preprocess.py:
 - the 'No image available' downloads were discarded,
 - image was transformed into gray scale,
 - and it was resized to 256x256 (set by globals.py), with padding used to respect the original aspect ratio.
 No further standardization of the color scale was performed. Note that preprocessing was performed using tensorflow, and so can use GPU processing as available (in present version, however, only on image-by-image basis...).
 
To make the problem tractable on a laptop and avoid overfitting (after all, we are only classifying two classes), all dimension of the original AlexNet were divided the downscale parameter.

First, a bare version, but let us start with an balanced sample already. No dropout is used, downscale=16, there is no image augmentation. The network is trained with Adam for 50 epochs on batches of 256 images each:

In [9]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/0_balanced.log"))

Despite several attempts, the network does not learn. Perhaps we need a bigger one after all, downscale=8:

In [21]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/1_downscale8.log"))

Hmm, still nothing. Let's seek help in the original VGG paper... The authors perform training in stages, starting with a simplified version of the network. Let us follow in their steps. Stage training is turned on by the handle pretrain, a higher version of the network is used each pretrain_interval epochs (default:50). First however, we need to have the simplest network train for the procedure to make sense. Let's run our (downscale=8) equivalent of the VGG network A from the paper.

In [61]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/2_pretrainA.log"))

Not much better. Let's employ the optimizer used by the authors as well. Training is now optimized using tf.train.MomentumOptimizer with initial learning_rate=0.01 and momentum=0.9.

Note, that our images are still black and white, and we use a simplified downscale=8 network.

In [73]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/3_momentum.log"))

Well, how about making the network larger. For downscale=4:

In [84]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/4_downscale4.log"))

Hmm, the small convolutional filters of the network provide regularization and prevent memorizing images. Perhaps we need more images? Let's turn on image augmentation and run for a longer time.

In [91]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/VGGNet_logs/5_augmentation.log"))

No luck. Perhaps the full network is needed after all. Since the GPU memory on my laptop is not enough to run lower than downscale=4, let's use an Azure (azure.com) virtual machine to train a full VGG network.

In [89]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/workspace/VGGNet_zero.log"))