### Introduction of VGG19 Pre-trained Model
For image recognition task we chose to proceed with VGG19 pre-trained model. The model has 19 layers and achieved very high accuracy in ImageNet 2014 Challenge. VGG19 model achieves this performance through increasing depth using an architecture with very small (3x3) convolution filters and pushing the depth to 19 weight layers. You can find the architecture of the model below from Very Deep Convolutional Networks for Large-Scale Image Recognition paper by K. Simonyan and A. Zisserman. We used this model as base to predict movie genres since the pre-trained weights can help us recognize images with high accuracy. 

On top of the VGG19 model, we added a trainable max-pooling layer to down-sample and further reduce the dimensionality of the image before flattening . After flattening images to two dimensions,  2 fully connected layers with size 512 and 64 is used to gradually decrease number of channels which perform classification on the features extracted by the convolutional layers . Finally the last layer is used predict across 14 genres using softmax as we have a multi-class problem.


### VGG19 ARCHITECTURE
During training, the input to our ConvNets is a fixed-size 224 × 224 RGB image. The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel. The image is passed through a stack of convolutional (conv.) layers, where we use filters with a very small receptive field: 3 × 3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations we also utilise 1 × 1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1 pixel for 3 × 3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv. layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2 × 2 pixel window, with stride 2. A stack of convolutional layers (which has a different depth in different architectures) is followed by three Fully-Connected (FC) layers: the first two have 4096 channels each, the third performs 1000- way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks. All hidden layers are equipped with the rectification (ReLU (Krizhevsky et al., 2012)) non-linearity. We note that none of our networks (except for one) contain Local Response Normalisation (LRN) normalisation (Krizhevsky et al., 2012): as will be shown in Sect. 4, such normalisation does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time. Where applicable, the parameters for the LRN layer are those of (Krizhevsky et al., 2012).


In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import numpy                  as np
import pandas                 as pd
import scipy                  as sp
import matplotlib
import matplotlib.pyplot      as plt
import seaborn
import requests
import urllib
import requests
import json
import os
import random
import seaborn as sns
import joblib
import sklearn
# deep learning packages

from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
from keras.models import Model

#import statsmodels.api as sm
from ast                                  import literal_eval
from matplotlib                           import rcParams
from scipy.stats                          import mode
from IPython.core.interactiveshell        import InteractiveShell
from time                                 import sleep
from collections                          import Counter
from itertools                            import combinations, permutations
from urlparse                             import urljoin


Using TensorFlow backend.


In [4]:
from IPython.display import display
from PIL import Image

In [10]:
InteractiveShell.ast_node_interactivity = "all"
%matplotlib inline

In [6]:
matplotlib.style.use('ggplot')
rcParams['figure.figsize'] = (20, 10)
rcParams['axes.facecolor'] = "w"
rcParams['grid.color'] = "gray"
rcParams['grid.linewidth'] = 0.5

# Load Data
We will use the cleaned data from previous milestones. 

In [2]:
merged_mdb_final = pd.read_csv('merged_mdb_final.txt')
merged_mdb_final = merged_mdb_final.drop('Unnamed: 0', axis=1)

In [3]:
merged_mdb_final.head(n=2)

Unnamed: 0,title_x,imdb_id,id,overview,budget,genres,release_date,revenue,runtime,original_language,...,vote_count,status,adult,title_y,rating,votes,year,cast,director,writer
0,[u'Notorious'],38787,303,"[u'Released shortly after the war, this classi...",2000000,"[{u'id': 53, u'name': u'Thriller'}, {u'id': 18...",8/15/1946,24464742,102,en,...,250,Released,0,[u'Notorious'],8.0,76704,1946,"[u'Cary Grant', u'Ingrid Bergman', u'Claude Ra...",[u'Alfred Hitchcock'],"[u'Ben Hecht', u'John Taintor Foote', u'Alfred..."
1,[u'The ABCs of Death'],1935896,87436,"[u""An ambitious anthology film featuring segme...",0,"[{u'id': 27, u'name': u'Horror'}]",6/28/2013,21660,123,en,...,137,Released,0,[u'The ABCs of Death'],4.7,14951,2012,"[u'Eva Llorach', u'Miquel Insua', u'Alejandra ...","[u'Kaare Andrews', u'Angela Bettis', u'H\xe9l\...","[u'Ant Timpson', u'Nacho Vigalondo', u'Adri\xe..."


In [None]:
## -----------------------------uncomment to use poster code chunk---------------------------------------------
## getting images

#base_path = 'https://image.tmdb.org/t/p/{size}/'
#local_poster_folder = os.path.expanduser('~/poster_folder')

#if not os.path.exists(local_poster_folder):
#    os.mkdir(local_poster_folder)

#merged_mdb_final['poster_url'] = merged_mdb_final.poster_path.map(lambda path: urljoin(base_path.format(size = 'w500'), path.replace('/', '')))
# add absolute paths for images
#merged_mdb_final['local_poster_path'] = merged_mdb_final.poster_path.map(lambda path: os.path.join(local_poster_folder, path.replace('/', '')))

#def save_image_to_local(image_url, local_path):
#    import requests
#    response = requests.get(image_url)
#    with open(local_path, 'w+') as f:
#        f.write(response.content)
#    pass

# downloading images to local paths
#merged_mdb_final[['poster_url', 'local_poster_path']].apply(axis = 1, func = lambda x: save_image_to_local(x[0], x[1]))
    

In [5]:
base_path = 'https://image.tmdb.org/t/p/{size}/'
local_poster_folder = os.path.expanduser('/home/ubuntu/poster_folder')

merged_mdb_final['poster_url'] = merged_mdb_final.poster_path.map(lambda path: urljoin(base_path.format(size = 'w500'), path.replace('/', '')))
##add absolute paths for images
merged_mdb_final['local_poster_path'] = merged_mdb_final.poster_path.map(lambda path: os.path.join(local_poster_folder, path.replace('/', '')))


In [6]:
def extract_first_genre(x):
    try:
        return [genre['name'] for genre in literal_eval(x)][0]
    except Exception:
        return 0


In [7]:
def get_np_repr(img_path):
       
    try:
        img = image.load_img(img_path, target_size=(224, 224))
    except IOError:
        return 0
    x = image.img_to_array(img) # this is a Numpy array with shape (3, 224, 224)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    #x = x.reshape( (1,) + x.shape )  # this is a Numpy array with shape (1, 3, 224, 224)
    
    return x

def image_exists(img_path):
    try:
        with open(img_path):
            pass
    except IOError:
        return 0
    return 1

In [8]:
merged_mdb_final['img_repr'] = merged_mdb_final.local_poster_path.map(lambda x: get_np_repr(x))

In [9]:
merged_mdb_final['image_exists'] = merged_mdb_final.local_poster_path.map(lambda x: image_exists(x))

In [10]:
merged_mdb_final.shape

(4444, 32)

In [11]:
merged_mdb_final['img_repr'].shape

(4444,)

In [12]:
sum(merged_mdb_final['image_exists'])

4444

In [13]:
merged_mdb_final['genres'].unique()

array([ "[{u'id': 53, u'name': u'Thriller'}, {u'id': 18, u'name': u'Drama'}, {u'id': 10749, u'name': u'Romance'}]",
       "[{u'id': 27, u'name': u'Horror'}]",
       "[{u'id': 37, u'name': u'Western'}]", ...,
       "[{u'id': 12, u'name': u'Adventure'}, {u'id': 28, u'name': u'Action'}, {u'id': 80, u'name': u'Crime'}, {u'id': 9648, u'name': u'Mystery'}]",
       "[{u'id': 10749, u'name': u'Romance'}, {u'id': 37, u'name': u'Western'}, {u'id': 18, u'name': u'Drama'}]",
       "[{u'id': 18, u'name': u'Drama'}, {u'id': 10749, u'name': u'Romance'}, {u'id': 36, u'name': u'History'}]"], dtype=object)

In [14]:
#Pick the first genre randomly to deal with the problem as a multiclass prediction
merged_mdb_final['final_genre'] = merged_mdb_final['genres'].map(lambda x: extract_first_genre(x))

In [15]:
Counter(merged_mdb_final['final_genre'])

Counter({0: 53,
         u'Action': 692,
         u'Adventure': 260,
         u'Animation': 153,
         u'Comedy': 890,
         u'Crime': 187,
         u'Documentary': 124,
         u'Drama': 999,
         u'Family': 61,
         u'Fantasy': 113,
         u'Foreign': 2,
         u'History': 17,
         u'Horror': 330,
         u'Music': 34,
         u'Mystery': 68,
         u'Romance': 92,
         u'Science Fiction': 80,
         u'TV Movie': 41,
         u'Thriller': 201,
         u'War': 20,
         u'Western': 27})

Removing movies with less than 50 counts

In [16]:
filtered_merged_mdb_final = merged_mdb_final[~merged_mdb_final.final_genre.isin((0,'War', 'Western', 
                                                                                'TV Movie', 'Foreign',
                                                                               'History', 'Music'))].copy()

In [17]:
Counter(filtered_merged_mdb_final['final_genre'])

Counter({u'Action': 692,
         u'Adventure': 260,
         u'Animation': 153,
         u'Comedy': 890,
         u'Crime': 187,
         u'Documentary': 124,
         u'Drama': 999,
         u'Family': 61,
         u'Fantasy': 113,
         u'Horror': 330,
         u'Mystery': 68,
         u'Romance': 92,
         u'Science Fiction': 80,
         u'Thriller': 201})

In [18]:
deepLearning_mdb_final = filtered_merged_mdb_final[['final_genre','img_repr']].copy()

In [19]:
deepLearning_mdb_final.head(2)

Unnamed: 0,final_genre,img_repr
0,Thriller,[[[[ -96.93900299 -109.77899933 -116.68000031]...
1,Horror,[[[[ -82.93900299 -92.77899933 -114.68000031]...


In [20]:
deepLearning_mdb_final['img_repr'].shape

(4250,)

- We have poster image size of 'w500' i.e. (500 by 750)

We will go over the following options:

- training a small network from scratch (as a baseline)
- using the bottleneck features of a pre-trained network
- fine-tuning the top layers of a pre-trained network


In [22]:
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing

le = preprocessing.LabelEncoder()
le.fit(np.unique([deepLearning_mdb_final.final_genre]))
train, test = train_test_split(deepLearning_mdb_final, test_size=0.30, random_state=123)
img_rows, img_cols = 224, 224
# reshapping 
length = len(train.img_repr)
X_train = np.hstack(([x] for x in list(train.img_repr))).reshape((length, img_rows, img_cols,3))
y_train = le.transform(train.final_genre).reshape((length, 1))

length2 = len(test.img_repr)
X_test = np.hstack(([x] for x in list(test.img_repr))).reshape((length2, img_rows, img_cols, 3))
y_test = le.transform(test.final_genre).reshape((length2, 1))

In [23]:
X_train[0:5].shape
y_train[0:5].shape

(5, 1)

In [24]:
len(y_test)

1275

In [25]:
# normalize inputs from 0-255 to 0.0-1.0

X_train = X_train / 255.0
X_test = X_test / 255.0

In [26]:
from keras.utils import np_utils

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_uni_classes = y_test.shape[1]

In [27]:
len(y_test)

1275

In [28]:
num_uni_classes

14

In [29]:
#saving the files

joblib.dump(X_train, 'X_train.pkl')
joblib.dump(X_test, 'X_test.pkl')
joblib.dump(y_train, 'y_train.pkl')
joblib.dump(y_test, 'y_test.pkl')


['y_test.pkl']

In [7]:
# #Loading the files
X_train  = joblib.load('X_train.pkl')
X_test  = joblib.load('X_test.pkl')
y_train  = joblib.load('y_train.pkl')
y_test  = joblib.load('y_test.pkl')
X_train.shape

(2975, 224, 224, 3)

# Deep Learning: Baseline model

Our baseline network structure can be summarized as follows:

- Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.
- Dropout set to 20%.
- Convolutional layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.
- Max Pool layer with size 2×2.
- Flatten layer.
- Fully connected layer with 512 units and a rectifier activation function.
- Dropout set to 50%.
- Fully connected output layer with 10 units and a softmax activation function.
- A logarithmic loss function is used with the stochastic gradient descent optimization algorithm configured with a large momentum and weight decay start with a learning rate of 0.01.

In [90]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import Activation
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_dim_ordering('th')

In [91]:
epochs = 100
lrate = 0.01
decay = lrate/epochs
num_classes = num_uni_classes

In [92]:

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(3, 224, 224)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# the model so far outputs 3D feature maps (height, width, features)

In [93]:
model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(14))
#model.add(14)
model.add(Activation('softmax'))

lrate = 0.01

sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

In [94]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 32, 222, 222)      896       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 222, 222)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 111, 111)      0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 109, 109)      9248      
_________________________________________________________________
activation_2 (Activation)    (None, 32, 109, 109)      0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 32, 54, 54)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 64, 52, 52)        18496     
__________

In [33]:

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=32)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Train on 2975 samples, validate on 1275 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100

<keras.callbacks.History at 0x7fc28ebfcf50>

Accuracy: 23.45%


In [36]:
# once training is complete, let's see how well we have done
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 6.6711477294622687)
('Test accuracy:', 0.23450980410856359)


# Deep Learning: Pre-trained model

- Download the weights files for the pre-trained network(s) here VGG19.
- Apply the pre-trained ImageNet networks to your own images.

In [3]:
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D


In [4]:
from keras.layers import Input

#Load weights for VGG19 convolutional network trained on ImageNet
input_tensor = Input(shape=(224,224,3))
vggmodel = VGG19(weights='imagenet', include_top= False,input_tensor=input_tensor)


In [6]:
vggmodel.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [5]:
#Flatten data and apply softmax classifier on top of VGG19 model
x = vggmodel.output
x = Flatten()(x)
predictions = Dense(14, activation='softmax')(x)
model = Model(inputs=vggmodel.input, outputs=predictions)

#Use previous VGG19 weights, do not train again
for layer in vggmodel.layers:
    layer.trainable = False
    
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [8]:
# compile the model (should be done *after* setting layers to non-trainable)
epochs = 10
lrate = 0.01

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

Train on 2975 samples, validate on 1275 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa72c0d9cd0>

In [9]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 12.350885659012141)
('Test accuracy:', 0.23372549023113998)


**Add Additional Layers to VGG19 Model**

In [10]:
#Add Layers, flatten data and apply softmax classifier on top of VGG19 model
x = vggmodel.output
x = MaxPooling2D()(x)
# let's add a fully-connected layer
# and a logistic layer for 14 genres
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(64, activation='relu')(x)
predictions = Dense(14, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=vggmodel.input, outputs=predictions)

#Use previous VGG19 weights, do not train again
for layer in vggmodel.layers:
    layer.trainable = False
    
model.summary()   

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [11]:
# compile the model
epochs = 10
lrate = 0.01

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

Train on 2975 samples, validate on 1275 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa72b4604d0>

In [12]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 2.156161008909637)
('Test accuracy:', 0.33019607801063389)


In [None]:
Adding layers improved performance so we will keep the layers

**Add Dropout to the model**


In [13]:
#Add Layers, flatten data and apply softmax classifier on top of VGG19 model
x = vggmodel.output
x = MaxPooling2D()(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(64, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(14, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=vggmodel.input, outputs=predictions)

#Use previous VGG19 weights, do not train again
for layer in vggmodel.layers:
    layer.trainable = False


In [15]:
# compile the model
epochs = 20 #increase epochs as it might take longer to learn with dropout
lrate = 0.01

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

Train on 2975 samples, validate on 1275 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7fa72aea9b10>

In [16]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 2.133724864510929)
('Test accuracy:', 0.29803921620051066)


Adding drop out did not seem to improve performance so we will not use them

**Decrease Learning Rate to 0.001**

In [17]:
#Add Layers, flatten data and apply softmax classifier on top of VGG19 model
x = vggmodel.output
x = MaxPooling2D()(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(64, activation='relu')(x)
predictions = Dense(14, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=vggmodel.input, outputs=predictions)

#Use previous VGG19 weights, do not train again
for layer in vggmodel.layers:
    layer.trainable = False

In [19]:
# compile the model
lrate = 0.001

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

Train on 2975 samples, validate on 1275 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa72aa76f10>

In [20]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 2.0749437141418459)
('Test accuracy:', 0.33803921556940264)


It seems that decreasing learning rate worked slightly. Let's see what happens if we decrease it further

**Decrease Learning Rate to 0.0001

In [22]:
# compile the model
lrate = 0.0001

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs,batch_size=64)

Train on 2975 samples, validate on 1275 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa72a6e5e10>

In [23]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 1.968651039460126)
('Test accuracy:', 0.37098039220361151)


It seems that decreasing learning rate worked. Let's see what happens if we decrease it even further!

**Decrease Learning Rate to 0.00001**

In [24]:
# compile the model
lrate = 0.00001

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

Train on 2975 samples, validate on 1275 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa727f9be50>

In [25]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 1.9715144277086445)
('Test accuracy:', 0.36078431377223896)


It seems learning rate 0.0001 was slightly better

**Decrease Batch Size**

In [26]:
lrate = 0.0001

sgd = SGD(lr=lrate, momentum=0.9)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                     epochs=5, batch_size=32)

Train on 2975 samples, validate on 1275 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fa727dfea50>

In [27]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 1.9820959850386077)
('Test accuracy:', 0.3686274510271409)


Decreasing batch size did not improve performance

**Increase Batch Size**

In [28]:
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                     epochs=5, batch_size=128)

Train on 2975 samples, validate on 1275 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fa72c455690>

In [29]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 1.9807448207630831)
('Test accuracy:', 0.36313725511233014)


Increasing batch size did not improve performance either

**Augment Data**

In [32]:
from keras.callbacks import EarlyStopping

lrate = 0.0001
batch_size=64
epochs = 10
early_stopping = EarlyStopping(monitor='val_loss',
patience=2)

In [33]:
datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

datagen.fit(X_train)

In [40]:
# fits the model on mini batches with real-time data augmentation:
model.fit_generator(datagen.flow(X_train, y_train, batch_size=64),
                    steps_per_epoch=len(X_train)/16, validation_data=(X_test, y_test), 
                    epochs=epochs, callbacks=[early_stopping])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10


<keras.callbacks.History at 0x7fa727dd1490>

In [41]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 2.1064000784182082)
('Test accuracy:', 0.30039215690949383)


Augmenting data did not improve performance

**Add Class Weights**

In [42]:
# labels_dict
labels_dict = {0: 467, 1: 187, 2: 101, 3: 641, 4: 123, 5: 91, 6: 701, 
               7: 35, 8: 80, 9: 236, 10: 52, 11: 68, 12: 55, 13: 138}

import math

# labels_dict : {ind_label: count_label}
# mu : parameter to tune 

def create_class_weight(labels_dict,mu=0.15):
    total = np.sum(labels_dict.values())
    keys = labels_dict.keys()
    class_weight = dict()

    for key in keys:
        score = math.log(mu*total/float(labels_dict[key]))
        class_weight[key] = score if score > 1.0 else 1.0

    return class_weight

# generate the class weight to handle the imbalanced classes
class_weight = create_class_weight(labels_dict)

In [46]:
# Fit the model with class weights
model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                     epochs=10, batch_size=64, class_weight=class_weight)


Train on 2975 samples, validate on 1275 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa72a84bed0>

In [47]:
# Evaluate model performance
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

('Test loss:', 2.0008108956206079)
('Test accuracy:', 0.36078431377223896)


Adding class weights did not improve performance

In [68]:
model.save('vgg19mod.h5') 

# Fine-tuned Pre-trained VGG19 model with top layers 

It seems that adding additional MaxPooling and Dense top layers improved the performance of VGG19 pre-trained model significantly (from 23% to 33%). We used softmax as activation since we have a multi-class problem with 14 unique classes. 

When we finetuned the model, we saw that a low learning rate works best with VGG19 (ie. 0.0001), which increased accuracy from 33% to 37%. Batch size of 64 seems to be ideal, increasing or decreasing batch size did not change accuracy.

Data augmentation decreased performance. Also using class weights also did not change accuracy performance. 


In [67]:
lrate = 0.0001
batch_size=64

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

# Conclusion

Our insight is that pre-trained and fine-tuned VGG19 model with top-layers perform much better than the fine-tuned baseline model (37% accuracy vs. 23%)

The most important factor in increasing accuracy seems to be setting up layers and adding top layers to the pre-trained model. Also fine-tuning learning rate played a significant role to boost accuracy performance. 

In order to improve our model for final submission, our aim is to create an ensemble of models combining Random Forest, LSTM (for plot summary and title) and CNN (poster images).

