In [1]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats

***AlexNet***

This is a simple analysis of general features of a simplified implementation (see NNs/AlexNet.py and runtime parameters in globals.py and cv.py) of the image recognition network AlexNet (Krizhevsky, Sutskever, and Hinton 2012).

AlexNet is a convolutional neural network with four max-pooled convolutional layers followed by two dense layers. For original structure, see Fig. 2 of the original paper (reproduced here). It is best known for being the first deep neural network to outperform traditional machine learning tools in the ImageNet challenge (image-net.org), in 2012.

In the following example, we will be classifying two classes: cats vs dogs from ImageNet, totalling 1127 and 791 pictures, respectively.

Before feeding to the network, the images were preprocessed with preprocess.py:
 - the 'No image available' downloads were discarded,
 - image was transformed into gray scale,
 - and it was resized to 256x256 (set by globals.py), with padding used to respect the original aspect ratio.
 No further standardization of the color scale was performed. Note that preprocessing was performed using tensorflow, and so can use GPU processing as available (in present version, however, only on image-by-image basis...).
 
To make the problem tractable on a laptop and avoid overfitting (after all, we are only classifying two classes), all dimension of the original AlexNet were divided the downscale parameter.

Lets start with a bare version of AlexNet. No dropout is used, downscale=16. The network is trained with Adam for 50 epochs:

In [2]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/0.log"))

Hmm.. This doesn't look too well. It seems that our network satisfied itself with simply guessing the same answer (cats) each time. Part of the problem is the fact that our sample is imbalanced (it contains more cat pictures: where are you, dog people?! ;) ). This can be fixed by changing cv_io.py script to draw the same number of images from each class at each training epoch. This is turned on by the balance_sample switch.

In [3]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/1_balanced_sample.log"))

That doesn't look so well either. The first peak of 100% accuracy is probably overfitting -- the network memorized the images before the shuffle seed changed and drew another batch. After that initial peak, the accuracy quickly degraded towards a coin toss.

We know that overfitting can be helped by using dropout. Let's turn it in in our network. The original AlexNet paper used dropout of 0.5 before each of the two dense layers. Let's reproduce this approach here.

In [4]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/2_dropout.log"))

Perhaps we're doing *slightly* better than previously, but the accuracy still asymptotes to a pure guess.

There is one more thing we should try. Our dataset is still fairly small, it is possible that we need more diversity for training. Let's explore whether we can teach AlexNet to recognize cats and dogs on an augmented dataset.

In [5]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/3_augmented.log"))

Still not so great. Well, let's try a bigger network, perhaps we do not have enough computing after all. With downscale=8:

In [6]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/4_downscale8.log"))

Hmm.. How about downscale=4?

In [7]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/5_downscale4.log"))

Note that initially the accuracy is actually pretty good, then it decays asymptotically to a pure guess, and rises slowly. Perhaps training is too aggressive? Let's change the learning rate from the default (1e-3) to 1e-4.

In [8]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/6_leaningRate1e-4.log"))

Well, this doesn't seem to be the way. Let's try to go back to the old learning rate and train longer...

In [9]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/7_long.log"))

The loss is still diverging exponentially... Should we try another optimizer..? How about the simplest choice: tf.train.GradientDescentOptimizer:

In [13]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/workspace/AlexNet_zero.log"))