In [2]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats

***AlexNet***

This is a simple analysis of general features of a simplified implementation (see NNs/AlexNet.py and runtime parameters in globals.py and cv.py) of the image recognition network AlexNet (Krizhevsky, Sutskever, and Hinton 2012).

AlexNet is a convolutional neural network with four max-pooled convolutional layers followed by two dense layers. For original structure, see Fig. 2 of the original paper (reproduced here). It is best known for being the first deep neural network to outperform traditional machine learning tools in the ImageNet challenge (image-net.org), in 2012.

In the following example, we will be classifying two classes: cats vs dogs from ImageNet, totalling 1127 and 791 pictures, respectively.

Before feeding to the network, the images were preprocessed with preprocess.py:
 - the 'No image available' downloads were discarded,
 - image was transformed into gray scale,
 - and it was resized to 256x256 (set by globals.py), with padding used to respect the original aspect ratio.
 No further standardization of the color scale was performed. Note that preprocessing was performed using tensorflow, and so can use GPU processing as available (in present version, however, only on image-by-image basis...).
 
To make the problem tractable on a laptop and avoid overfitting (after all, we are only classifying two classes), all dimension of the original AlexNet were divided the downscale parameter.

Lets start with a bare version of AlexNet. No dropout is used, downscale=16. The network is trained with Adam for 50 epochs on batches of 256 images each:

In [2]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/0.log"))

Not too bad for a first try. However, we can see that while the training diagnostics improve, validation set is essentially at a level of pure guess. This is a clear sign of overfitting -- the network is memorizing the training set. This was noted to be a problem in AlexNet by Krizhevsky, Sutskever, and Hinton (2012). As a remedy, they use dropout probability of 0.5 during training. Let's reproduce that approach and see if it helps!

In [3]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/1_dropout.log"))

This is much better, we now see some genuine learning! However, some amount of overfitting is definitely still present. Perhaps our model is still to complex for only two classes.. Let's try down-scaling AlexNet by another factor of 2: downscale=32.

In [4]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/2_downscale32.log"))

That's worse. Let's stay with downscale=16 for now, we will try other things to limit overfitting later on.

In the meantime, notice that our sample is rather imbalanced (it contains more cat pictures: where are you, dog people?! ;) ). This can be fixed by drawing the same number of images from each class, which is turned on by the balance_sample switch. Let's see if it improves the results!

In [5]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/3_balanced.log"))

This seems slightly worse, but the problem is now a bit more complex. After all, a network trained on a biased sample gains additional accuracy just by assuming everything unknown is a cat. Therefore, let us keep the balanced sample.

There are other ways we can improve the data we train on. Our dataset is still fairly small. Let's explore whether an augmented dataset (for augmentation details, see cv_augment.py) improves the network performance. In addition to producing an effectively larger sample, augmentation should also prevent memorization, so we can limit overfitting as well!

To go a bit easier on the network, let's turn augmentation on in stages. First, we only zoom and crop the image by a factor of 0.9...

In [6]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/4_augmented_zoom.log"))

In [7]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/4_augmented_zoom-rotate.log"))

In [8]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/4_augmented_zoom-rotate-flip.log"))

In [9]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/4_augmented_zoom-rotate-flip-color.log"))

At first, we still have a fair amount of overfitting. As additional augmentation components are added, the training accuracy comes together with validation one -- proving that overfitting has been fixed. In the final case, validation accuracy is actually much better than training -- this is because the validation set has not been augmented, and thus is easier to recognize (it is easier to see a dog in a picture that is not, e.g., rotated side-on).

Still, accuracy of 60-70% is far from satisfactory. Since overfitting does not seem to be a problem anymore, let's increase the network and see if it can do better. The training approach remains the same: we add another augmentation component every 50 epochs.

In [10]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/5_downscale8.log"))

A bit better. What about downscale=4

In [11]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/6_downscale4.log"))

Hmm, that's fairly similar, with less overfitting. However, the accuracy reached is roughly the same. Note that the diagnostics are fluctuating heavily. This may indicate that our learning is too aggressive to be effective. Let's try learning_rate = 0.0001 (default for Adam was 0.001).

In [16]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/7_lr1e-4.log"))

Not that much better... Ok, let's go back to the default learning rate and try the full power of AlexNet, downscale=1.

In [3]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs/AlexNet_logs/8_downscale1.log"))

Well, it seems that ~70% accuracy is as far as we'll go with this basic AlexNet setup. Note that this is comparable with the results of Krizhevsky et al. (2012), who report their top-1 error rates for a 1000-class ImageNet subset in 30-40% range.

In [30]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/workspace/AlexNet_zero.log"))