<b> <h2> ImageNet Classification with Deep Convolutional Neural Networks </h2> </b>

<b> Alex Krizhevsky </b>
University of Toronto
kriz@cs.utoronto.ca
<b> Ilya Sutskever </b>
University of Toronto
ilya@cs.utoronto.ca
<b> Geoffrey E. Hinton </b>
University of Toronto
hinton@cs.utoronto.ca
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf 

This paper was one of the recommendations from Prof Andrew in the Deep Learning Course. This paper was groundbreaking in every sense. One of the benchmarks of Image Classification task was classification accuracy on imagenet dataset which has several million images belonging to thousands of classes. 


![image.png](attachment:image.png) 

<b> <h3> Abstract </h3> </b> 

<ul> 
<li> 1.2 million high res images in 1000 classes </li>
<li> Built a CNN with 60 million params and 650000 neurons in it. </li>
<li> 5 CONV layers, some followed with Pooling layer, 3 FC layer and 1000 way softmax at the end </li>
<li> Dropout used to avoid overfitting </li>
</ul>

<b> <h3> Introduction </h3> </b>
<ul>
<li>
Until that point, datasets have been relatively small where recognition tasks can be solved quite easily, citing MNIST dataset.
</li>
<li>  Learning about 1000s of objects from millions of images , require a model with a large learning capacity. CNNs fit the bill. Their capacity can be controlled by changing breadth and depth. </li> 
<li> Highly optimized GPU implementation was written at that time for 2D convolutions </li>
<li> Overfitting problems were addressed effectively </li>
<li> Network took about 5 to 6 days to train at that time </li>

<h3> Dataset </h3>
<li> Talks on imagenet dataset and  ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) </li>
<li> Top-1 and Top-5 error rates </li>
<li> All images are resized to 227x227x3 dimensions as needed by the system </li>

<h3> Architecture </h3> 

![image.png](attachment:image.png)

<h3> Relu nonlinearity </h3> 
<li> Y = max(x,0)
<li> Faster to train than sigmoid or tanh 
Article Reference: https://medium.com/machine-intelligence-report/how-do-neural-networks-work-57d1ab5337ce 
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6 

![image.png](attachment:image.png)

<li> Saturating vs Non-Saturating nonlinearities: 
Link: https://datascience.stackexchange.com/questions/27665/what-is-saturating-gradient-problem

<h3> Local Response Normalization 
Additionally local response normalization is used , which was proved to improve performance further. 
<li> Note: http://cs231n.github.io/convolutional-networks/#norm 

![image.png](attachment:image.png)

<h3> Combatting overfitting </h3> 
<h4> Data Augmentation </h4>
<h5> Method 1 </h5>
<ul> Generating Image Translations and horizontal reflections </ul>
<ul> Random 224x224 crops made on the image plus images were horizontally flipped resulting in a much larger dataset </ul>
<h5> Method 2 - Altering intensities of RGB channels in the input image </h5>

<h4> Dropout </h4>

<ul> Combining outputs of multiple models is shown to improve performance. But its quite expensive to do that. Dropout provides an alternative way of doing that. </ul>
<ul> Randomly with said probability (0.5 used here), certain neurons would be turned off during both forward and backward propogation. </ul>
<ul> Its like almost using a different architecture for each input </ul>
<ul> Most widely used now </ul>

And then we code !

<h4> Classifying flowers dataset from Kaggle </h4>
<ul> Data stored in folders data/train and data/test </ul>
Link: https://www.kaggle.com/alxmamaev/flowers-recognition 

In [1]:
%matplotlib inline

from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model 
from keras.layers import Dropout, Flatten, Dense, GlobalAveragePooling2D, Conv2D, Activation, MaxPooling2D
from keras import backend as k 
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping

Using TensorFlow backend.


In [2]:
img_width, img_height = 224, 224
train_data_dir = "data/train"
validation_data_dir = "data/test"
nb_train_samples = 100
nb_validation_samples = 10
batch_size = 5
epochs = 10

<h4> Building the model </h4>


In [3]:
model = Sequential()

# 1st Convolutional Layer
model.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11), strides=(4,4), padding='valid'))
model.add(Activation('relu'))
# Max Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

# 2nd Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Max Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

# 3rd Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))

# 4th Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))

# 5th Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Max Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

# Passing it to a Fully Connected layer
model.add(Flatten())
# 1st Fully Connected Layer
model.add(Dense(4096, input_shape=(224*224*3,)))
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.5))

# 2nd Fully Connected Layer
model.add(Dense(4096))
model.add(Activation('relu'))
# Add Dropout
model.add(Dropout(0.5))

# 3rd Fully Connected Layer
model.add(Dense(1000))
model.add(Activation('relu'))
# Add Dropout
model.add(Dropout(0.5))

# Output Layer
model.add(Dense(5))
model.add(Activation('softmax'))

model.summary()

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 54, 54, 96)        34944     
_________________________________________________________________
activation_1 (Activation)    (None, 54, 54, 96)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 17, 17, 256)       2973952   
_________________________________________________________________
activation_2 (Activation)    (None, 17, 17, 256)       0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 256)         0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 6, 6, 384)         885120    
__________

<h4> Data Augmentation code </h4>

In [4]:
train_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
rotation_range=30)

test_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
rotation_range=30)

In [5]:
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size, 
class_mode = "categorical")

validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size = (img_height, img_width),
class_mode = "categorical")


Found 457 images belonging to 5 classes.
Found 200 images belonging to 5 classes.


In [6]:
model.fit_generator(
train_generator,
samples_per_epoch = nb_train_samples,
epochs = epochs,
validation_data = validation_generator,
nb_val_samples = 15,
callbacks = None,
verbose=1)

  
  


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x4e381a4cf8>

Ran for just 10 epochs and with very limited data. This wouldnt get us anywhere. The purpose is to just provide code sample. It would be ideal to try the network with lot more data and run it for atleast 100 epochs.