## DL3
Follow this notebook only if you're new to DeepLearing and Transfer learning. This is a extension of the Startet kit given [here](https://github.com/shubham3121/DL-3/blob/master/DL%233_EDA.ipynb). I'll try to keep it simple. Please ignore the typos :)

### Why to use Mobilenet architecture?
You might have seen multiple tutorials on the VGG16 based transfer learning but here I'm going to use Mobilenet because of the following reasons 
<ul>
    <li> No. of parameters to train in Mobilenet is quite less in compare to the VGG16
    <li> Having fewer parameters will make your training time less and you'll be able to do more experiment and your chances of wining becames higher.
    <li> On top of above reasons Mobile net has similar performance on the ImageNet dataset as VGG16
</ul>

Having said that let move on to the imorting important libs

In [1]:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import GlobalAveragePooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications import MobileNet
from keras import optimizers
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.models import Model

from sklearn.model_selection import train_test_split
import pandas as pd
from tqdm import tqdm
import gc
import cv2 as cv

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926

import os
os.environ['PYTHONHASHSEED'] = '0'
# os.environ['CUDA_VISIBLE_DEVICES'] = '0'

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

from keras import backend as K

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.set_random_seed(1234)

sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)


I'm going to use 128x128 images. You can change that if you wish.

My folder structure is as follow

<ul>
    <li> DL3</li>
        <ul>
            <li> starter_kit</li>
                <ul>
                    <li> this_notebook</li>
                </ul>
            <li> data</li>
                <ul>
                    <li> train_img</li>
                    <li> test_img</li>
                </ul>
    </ul>
</ul>

In [3]:
img_width, img_height = (224, 224)

train_data_dir = './data/train_img/'
test_data_dir = './data/test_img/'
epochs = 10
batch_size = 128

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)


In [4]:
Mobile_model = MobileNet(include_top=False, input_shape=input_shape)

In [5]:
def get_model():
# add a global spatial average pooling layer
    x = Mobile_model.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(85, activation='sigmoid')(x)
    model = Model(inputs=Mobile_model.input, outputs=predictions)
    
    return model

In [6]:
model = get_model()

We'll start with training the head(last layer) only as that layer is initialized randomaly and we don't want to affect the other layers weights as while backpropogation. 

In [7]:
#train only last layer
for layer in model.layers[:-1]:
    layer.trainable = False

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 112, 112, 32)      864       
_________________________________________________________________
conv1_bn (BatchNormalization (None, 112, 112, 32)      128       
_________________________________________________________________
conv1_relu (Activation)      (None, 112, 112, 32)      0         
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D)  (None, 112, 112, 32)      288       
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 112, 112, 32)      128       
_________________________________________________________________
conv_dw_1_relu (Activation)  (None, 112, 112, 32)      0         
__________

In [8]:
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(width_shift_range=0.2, 
                                   height_shift_range=0.2,
                                   rescale=1. / 255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True, 
                                   rotation_range = 20,
#                                    zca_whitening=True
                                   )

val_datagen = ImageDataGenerator(rescale=1./255)

test_datagen = ImageDataGenerator(rescale=1./255)

In [9]:
train = pd.read_csv('./data/train.csv', index_col=0)
test = pd.read_csv('./data/test.csv')
attributes = pd.read_csv('./data/attributes.txt', delimiter='\t', header=None, index_col=0)
classes = pd.read_csv('./data/classes.txt', delimiter='\t', header=None, index_col=0)

In [10]:
def get_imgs(src, df, labels = False):
    if labels == False:
        imgs = []    
        files = df['Image_name'].values
        for file in tqdm(files):
            im = cv.imread(os.path.join(src, file))
            im = cv.resize(im, (img_width, img_height))
            imgs.append(im)
        return np.array(imgs)
    else:
        imgs = []
        labels = []
        files = os.listdir(src)
        for file in tqdm(files):
            im = cv.imread(os.path.join(src, file))
            im = cv.resize(im, (img_width, img_height))
            imgs.append(im)
            labels.append(df.loc[file].values)
        return np.array(imgs), np.array(labels)

In [11]:
train_imgs, train_labels = get_imgs(train_data_dir, train, True)

100%|██████████| 12600/12600 [08:46<00:00, 23.92it/s]


In [12]:
#train val split
X_tra, X_val, y_tra, y_val = train_test_split(train_imgs, train_labels, test_size = 3000, random_state = 222)
gc.collect()

49989

In [13]:
X_tra.shape

(9600, 224, 224, 3)

In [14]:
train_datagen.fit(X_tra)
val_datagen.fit(X_val)

In [15]:
def fmeasure(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.

        Only computes a batch-wise average of recall.

        Computes the recall, a metric for multi-label classification of
        how many relevant items are selected.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        """Precision metric.

        Only computes a batch-wise average of precision.

        Computes the precision, a metric for multi-label classification of
        how many selected items are relevant.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall))

We're going to train our model with SGD and very low learning rate

In [16]:
early_stp = EarlyStopping(monitor="val_loss", mode='min', patience=5)
model_ckpt = ModelCheckpoint('mobilenet_1_layer.h5', save_best_only=True, mode='min', monitor='val_loss', verbose=1, save_weights_only=True)

# opt = optimizers.SGD(lr=0.001, decay = 1e-6, momentum = 0.9, nesterov = True)
opt = optimizers.Adam(lr=1e-4)
model.compile(opt, loss = 'binary_crossentropy', metrics=['accuracy', fmeasure])
model.reset_states()

In [17]:
model.fit_generator(train_datagen.flow(X_tra, y_tra, batch_size=batch_size),                     
                    steps_per_epoch=len(X_tra) / batch_size, epochs=10,
                    validation_data=val_datagen.flow(X_val, y_val, batch_size=batch_size), 
                    validation_steps = len(X_val)/batch_size, callbacks=[early_stp, model_ckpt], workers = 10, max_queue_size=20)

Epoch 1/100

Epoch 00001: val_loss improved from inf to 0.52820, saving model to mobilenet_1_layer.h5
Epoch 2/100

Epoch 00002: val_loss improved from 0.52820 to 0.47458, saving model to mobilenet_1_layer.h5
Epoch 3/100

Epoch 00003: val_loss improved from 0.47458 to 0.43761, saving model to mobilenet_1_layer.h5
Epoch 4/100

Epoch 00004: val_loss improved from 0.43761 to 0.40839, saving model to mobilenet_1_layer.h5
Epoch 5/100

Epoch 00005: val_loss improved from 0.40839 to 0.38550, saving model to mobilenet_1_layer.h5
Epoch 6/100

Epoch 00006: val_loss improved from 0.38550 to 0.36878, saving model to mobilenet_1_layer.h5
Epoch 7/100

Epoch 00007: val_loss improved from 0.36878 to 0.35468, saving model to mobilenet_1_layer.h5
Epoch 8/100

Epoch 00008: val_loss improved from 0.35468 to 0.34303, saving model to mobilenet_1_layer.h5
Epoch 9/100

Epoch 00009: val_loss improved from 0.34303 to 0.33243, saving model to mobilenet_1_layer.h5
Epoch 10/100

Epoch 00010: val_loss improved from 

Epoch 32/100

Epoch 00032: val_loss did not improve
Epoch 33/100

Epoch 00033: val_loss improved from 0.26254 to 0.26149, saving model to mobilenet_1_layer.h5
Epoch 34/100

Epoch 00034: val_loss improved from 0.26149 to 0.26098, saving model to mobilenet_1_layer.h5
Epoch 35/100

Epoch 00035: val_loss improved from 0.26098 to 0.26009, saving model to mobilenet_1_layer.h5
Epoch 36/100

Epoch 00036: val_loss improved from 0.26009 to 0.25906, saving model to mobilenet_1_layer.h5
Epoch 37/100

Epoch 00037: val_loss improved from 0.25906 to 0.25837, saving model to mobilenet_1_layer.h5
Epoch 38/100

Epoch 00038: val_loss improved from 0.25837 to 0.25812, saving model to mobilenet_1_layer.h5
Epoch 39/100

Epoch 00039: val_loss did not improve
Epoch 40/100

Epoch 00040: val_loss improved from 0.25812 to 0.25796, saving model to mobilenet_1_layer.h5
Epoch 41/100

Epoch 00041: val_loss improved from 0.25796 to 0.25359, saving model to mobilenet_1_layer.h5
Epoch 42/100

Epoch 00042: val_loss did 


Epoch 00066: val_loss did not improve


<keras.callbacks.History at 0x7f3c775ac668>

In [18]:
model = get_model()

#train only last 10 layer
for layer in model.layers:
    layer.trainable = True

opt = optimizers.Adam(lr=1e-4)
# opt = optimizers.SGD(lr=0.001, decay = 1e-6, momentum = 0.9, nesterov = True)
model.compile(opt, loss = 'binary_crossentropy', metrics=['accuracy', fmeasure])
    
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 112, 112, 32)      864       
_________________________________________________________________
conv1_bn (BatchNormalization (None, 112, 112, 32)      128       
_________________________________________________________________
conv1_relu (Activation)      (None, 112, 112, 32)      0         
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D)  (None, 112, 112, 32)      288       
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 112, 112, 32)      128       
_________________________________________________________________
conv_dw_1_relu (Activation)  (None, 112, 112, 32)      0         
__________

In [19]:
early_stp = EarlyStopping(monitor="val_loss", mode='min', patience=5)
model_ckpt = ModelCheckpoint('mobilenet_all_layers.h5', save_best_only=True, mode='min', monitor='val_loss', verbose=1, save_weights_only=True)

model.load_weights('mobilenet_1_layer.h5')


In [22]:
batch_size = 64
model.fit_generator(train_datagen.flow(X_tra, y_tra, batch_size=batch_size),                     
                    steps_per_epoch=len(X_tra) / batch_size, epochs=100,
                    validation_data=val_datagen.flow(X_val, y_val, batch_size=batch_size), 
                    validation_steps = len(X_val)/batch_size, callbacks=[early_stp, model_ckpt], workers = 10, max_queue_size=20)

Epoch 1/100

Epoch 00001: val_loss improved from inf to 0.17746, saving model to mobilenet_all_layers.h5
Epoch 2/100

Epoch 00002: val_loss improved from 0.17746 to 0.16941, saving model to mobilenet_all_layers.h5
Epoch 3/100

Epoch 00003: val_loss improved from 0.16941 to 0.15560, saving model to mobilenet_all_layers.h5
Epoch 4/100

Epoch 00004: val_loss improved from 0.15560 to 0.14650, saving model to mobilenet_all_layers.h5
Epoch 5/100

Epoch 00005: val_loss improved from 0.14650 to 0.12550, saving model to mobilenet_all_layers.h5
Epoch 6/100

Epoch 00006: val_loss improved from 0.12550 to 0.12521, saving model to mobilenet_all_layers.h5
Epoch 7/100

Epoch 00007: val_loss improved from 0.12521 to 0.11905, saving model to mobilenet_all_layers.h5
Epoch 8/100

Epoch 00008: val_loss did not improve
Epoch 9/100

Epoch 00009: val_loss improved from 0.11905 to 0.11271, saving model to mobilenet_all_layers.h5
Epoch 10/100

Epoch 00010: val_loss improved from 0.11271 to 0.10769, saving mode

<keras.callbacks.History at 0x7f3935d98fd0>

In [23]:
test_imgs = get_imgs(test_data_dir, test)

100%|██████████| 5400/5400 [01:17<00:00, 69.85it/s]


In [24]:
test_datagen.fit(test_imgs)

In [25]:
pred = model.predict_generator(test_datagen.flow(test_imgs, batch_size=512, shuffle=False), verbose=1, workers=8)



In [26]:
sub = pd.read_csv('./data/sample_submission.csv')
sub.iloc[:, 1:] = pred.round().astype(int)
sub.head()

Unnamed: 0,Image_name,attrib_01,attrib_02,attrib_03,attrib_04,attrib_05,attrib_06,attrib_07,attrib_08,attrib_09,...,attrib_76,attrib_77,attrib_78,attrib_79,attrib_80,attrib_81,attrib_82,attrib_83,attrib_84,attrib_85
0,Image-1.jpg,0,0,0,0,0,1,0,0,1,...,0,0,0,1,1,0,0,1,0,1
1,Image-2.jpg,0,1,0,1,1,1,1,0,0,...,0,1,0,1,0,0,0,0,1,1
2,Image-3.jpg,0,0,0,1,0,0,1,0,0,...,1,0,0,1,0,0,0,1,0,1
3,Image-4.jpg,1,0,0,0,0,1,1,0,0,...,0,1,1,1,0,0,0,1,0,1
4,Image-5.jpg,0,0,0,1,0,1,0,0,0,...,0,0,0,1,1,0,0,1,0,0


In [27]:
sub.to_csv('submission.csv', index=False)

In [28]:
sub.shape

(5400, 86)

## Final Thoughts

This submission should get you around $\approx$0.80 on the LB. and if you've noticed that our last epochs val_fmeasure is the same so it means that our val set is represantion of the test set and you can train for many epochs with EarlyStopping without worring about overfitting on val or train set.

### How to improve from here?
You can change many things which will let you get higher LB score. Following is a small list
<ul>
    <li> Change the Image size to bigger number </li>
    <li> Increse the number of epoch in the fully trainable network($2^{nd}$ training) </li>
    <li> Use diffrent architechure. you'll get more info on that [here](keras.io/applications/)</li>
    <li> If nothing works ensemble is your best friend </li>
</ul>

I'll try to keep improving this notebook. Feel free to contribuite.

Thanks
