In [1]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from cv_vis import plot_training_stats

import os
curr_path = os.getcwd()
if curr_path.split('/')[-1] == 'NNs':
    NNs_path = curr_path
else:
    NNs_path = curr_path + '/NNs'
print('Path to the NNs directory:', NNs_path)

Path to the NNs directory: /DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/NNs


***AlexNet***

This is a simple analysis of general features of a simplified implementation (see NNs/AlexNet.py and runtime parameters in globals.py and cv.py) of the image recognition network AlexNet (Krizhevsky, Sutskever, and Hinton 2012).

AlexNet is a convolutional neural network with four max-pooled convolutional layers followed by two dense layers. For original structure, see Fig. 2 of the original paper (reproduced here). It is best known for being the first deep neural network to outperform traditional machine learning tools in the ImageNet challenge (image-net.org), in 2012.

In the following example, we will be classifying two classes: cats vs dogs from ImageNet, totalling 1127 and 791 pictures, respectively.

Before feeding to the network, the images were preprocessed with preprocess.py:
 - the 'No image available' downloads were discarded,
 - image was transformed into gray scale,
 - and it was resized to 256x256 (set by globals.py), with padding used to respect the original aspect ratio.
 No further standardization of the color scale was performed. Note that preprocessing was performed using tensorflow, and so can use GPU processing as available (in present version, however, only on image-by-image basis...).
 
To make the problem tractable on a laptop and avoid overfitting (after all, we are only classifying two classes), all dimension of the original AlexNet were divided the downscale parameter.

Lets start with a bare version of AlexNet. No dropout is used, downscale=16. The network is trained with Adam for 50 epochs on batches of 256 images each:

In [2]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/0.log"))

Not too bad for a first try. We don't even see overfitting yet.Notice that our sample is rather imbalanced (it contains more cat pictures: where are you, dog people?! ;) ). This can be fixed by drawing the same number of images from each class, which is turned on by the balance_sample switch. Let's see if it improves the results!

In [3]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/1_balanced.log"))

Hmm, it seems that it's difficult for the network to learn in this setup. Let's increase its size to increase learning capacity. Perhaps downscale=4 will be better?

In [4]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/2_downscale4.log"))

That's better! However, we can see that while the training diagnostics continue to improve, validation set saturates. This is a clear sign of overfitting -- the network is memorizing the training set. This was noted to be a problem in AlexNet by Krizhevsky, Sutskever, and Hinton (2012). As a remedy, they use dropout probability of 0.5 during training. Let's reproduce that approach and see if it helps!

In [5]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/3_dropout.log"))

This is much better, we now see some genuine learning! However, large amount of overfitting is definitely still present. Perhaps our model is still to complex for only two classes.. Let's try down-scaling AlexNet by a factor of 2: downscale=8.

In [6]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/4_downscale8.log"))

Hmm, it's just as good. However, it took a couple of trials to have the network start training -- apparently, the random initial state is more important for a smaller network. Therefore, let's stick with downscale=4.

Our dataset is still fairly small. Let's explore whether an augmented dataset (for augmentation details, see cv_augment.py) improves the network performance. In addition to producing an effectively larger sample, augmentation should also prevent memorization, so we can limit overfitting as well!

To go a bit easier on the network, let's turn augmentation on in stages. First, we only zoom and crop the image by a factor of 0.9...

In [7]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/5_augmented_zoom.log"))

Then, we add rotation by up to 90 degrees left and right.

In [8]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/5_augmented_zoom-rotate.log"))

In the next step, the image can be flipped horizontally (with probability of 50%) as well.

In [9]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/5_augmented_zoom-rotate-flip.log"))

And, finally, we apply random brightness and contrast to the (black and white) image.

In [10]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/5_augmented_zoom-rotate-flip-color.log"))

Some overfitting is still present, but we can get ~75% in accuracy, not too bad. Let's see if we can get away with increasing the network and whether it gives better results. The training approach remains the same: we add another augmentation component every 50 epochs.

For downscale=2:

In [11]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/6_downscale2.log"))

Hmm, that's fairly similar, perhaps slightly more accurate on average. Just for fun, let's try the full version of AlexNet, downscale=1.

In [12]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/7_downscale1.log"))

It seems that we can still improve by additional training time, let's give it a 100 more epochs..

In [13]:
iplot(plot_training_stats(NNs_path + "/AlexNet_logs/8_downscale1-long.log"))

Nope, now we're just overfitting.

Well, it seems that ~85% accuracy is as far as we'll go with this basic AlexNet setup. This seems consistent with the results of Krizhevsky et al. (2012), who report their top-1 error rates ***for a 1000-class ImageNet subset*** in 30-40% range. The performance of AlexNet on our very small sample can be likely yet improved, but such tuning is outside the scope of this short summary.

In [11]:
iplot(plot_training_stats("/DATA/Dropbox/LOOTRPV/Personal_programming/MachineLearning/Tutorials/ImageRecognition/workspace/AlexNet_zero.log"))