## DL3
Follow this notebook only if you're new to DeepLearing and Transfer learning. This is a extension of the Startet kit given [here](https://github.com/shubham3121/DL-3/blob/master/DL%233_EDA.ipynb). I'll try to keep it simple. Please ignore the typos :)

### Why to use Mobilenet architecture?
You might have seen multiple tutorials on the VGG16 based transfer learning but here I'm going to use Mobilenet because of the following reasons 
<ul>
    <li> No. of parameters to train in Mobilenet is quite less in compare to the VGG16
    <li> Having fewer parameters will make your training time less and you'll be able to do more experiment and your chances of wining becames higher.
    <li> On top of above reasons Mobile net has similar performance on the ImageNet dataset as VGG16
</ul>

Having said that let move on to the imorting important libs

In [1]:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import GlobalAveragePooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications import ResNet50
from keras import optimizers

from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.models import Model

from sklearn.model_selection import train_test_split
import pandas as pd
from tqdm import tqdm
import gc
import cv2 as cv

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926

import os
os.environ['PYTHONHASHSEED'] = '0'

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

from keras import backend as K

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.set_random_seed(1234)

sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)


I'm going to use 128x128 images. You can change that if you wish.

My folder structure is as follow

<ul>
    <li> DL3</li>
        <ul>
            <li> starter_kit</li>
                <ul>
                    <li> this_notebook</li>
                </ul>
            <li> data</li>
                <ul>
                    <li> train_img</li>
                    <li> test_img</li>
                </ul>
    </ul>
</ul>

In [3]:
img_width, img_height = (197, 197)

train_data_dir = '/home/suraj/Repositories/Datasets/DL3 Dataset/train_img/'
test_data_dir = '/home/suraj/Repositories/Datasets/DL3 Dataset/test_img/'
epochs = 10
batch_size = 128

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)


In [4]:
ResNet_model = ResNet50(include_top=False, input_shape=input_shape)

In [5]:
def get_model():
# add a global spatial average pooling layer
    x = ResNet_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(1024, activation='relu')(x)
    predictions = Dense(85, activation='sigmoid')(x)
    model = Model(inputs=ResNet_model.input, outputs=predictions)
    
    return model

In [6]:
model = get_model()

We'll start with training the head(last layer) only as that layer is initialized randomaly and we don't want to affect the other layers weights as while backpropogation. 

In [7]:
#train only last layer
for layer in model.layers[:-1]:
    layer.trainable = False

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 197, 197, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 203, 203, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 99, 99, 64)   9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 99, 99, 64)   256         conv1[0][0]                      
__________________________________________________________________________________________________
activation

In [8]:
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(rescale=1. / 255)

val_datagen = ImageDataGenerator(rescale=1./255)

test_datagen = ImageDataGenerator(rescale=1./255)

In [9]:
train = pd.read_csv('/home/suraj/Repositories/Datasets/DL3 Dataset/meta-data/train.csv', index_col=0)
test = pd.read_csv('/home/suraj/Repositories/Datasets/DL3 Dataset/meta-data/test.csv')
attributes = pd.read_csv('/home/suraj/Repositories/Datasets/DL3 Dataset/attributes.txt', delimiter='\t', header=None, index_col=0)
classes = pd.read_csv('/home/suraj/Repositories/Datasets/DL3 Dataset/classes.txt', delimiter='\t', header=None, index_col=0)

In [10]:
def get_imgs(src, df, labels = False):
    if labels == False:
        imgs = []    
        files = df['Image_name'].values
        for file in tqdm(files):
            im = cv.imread(os.path.join(src, file))
            im = cv.resize(im, (img_width, img_height))
            imgs.append(im)
        return np.array(imgs)
    else:
        imgs = []
        labels = []
        files = os.listdir(src)
        for file in tqdm(files):
            im = cv.imread(os.path.join(src, file))
            im = cv.resize(im, (img_width, img_height))
            imgs.append(im)
            labels.append(df.loc[file].values)
        return np.array(imgs), np.array(labels)

In [11]:
train_imgs, train_labels = get_imgs(train_data_dir, train, True)

100%|██████████| 12600/12600 [04:59<00:00, 42.04it/s]


In [12]:
#train val split
X_tra, X_val, y_tra, y_val = train_test_split(train_imgs, train_labels, test_size = 3000, random_state = 222)
gc.collect()

6288

In [15]:
gc.collect()

0

In [20]:
def fmeasure(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.

        Only computes a batch-wise average of recall.

        Computes the recall, a metric for multi-label classification of
        how many relevant items are selected.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        """Precision metric.

        Only computes a batch-wise average of precision.

        Computes the precision, a metric for multi-label classification of
        how many selected items are relevant.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall))

We're going to train our model with SGD and very low learning rate

In [21]:
early_stp = EarlyStopping(patience=3)
model_ckpt = ModelCheckpoint('resNet.h5', save_weights_only=True)

opt = optimizers.SGD(lr=0.001, decay = 1e-6, momentum = 0.9, nesterov = True)
model.compile(opt, loss = 'binary_crossentropy', metrics=['accuracy', fmeasure])

In [16]:
model.fit(X_tra, y_tra,                  
                    steps_per_epoch=len(X_tra) / 25, epochs=25, batch_size=25,
                    validation_data=(X_val, y_val), 
                    validation_steps = len(X_val)/25, callbacks=[early_stp, model_ckpt], workers = 10, max_queue_size=20)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f1a0bbc96d0>

In [14]:
model = get_model()

#train only last 10 layer
for layer in model.layers:
    layer.trainable = True

opt = optimizers.SGD(lr=0.001, decay = 1e-6, momentum = 0.9, nesterov = True)
model.compile(opt, loss = 'binary_crossentropy', metrics=['accuracy', fmeasure])
    
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 128, 128, 3)       0         
_________________________________________________________________
conv1_pad (ZeroPadding2D)    (None, 130, 130, 3)       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 64, 64, 32)        864       
_________________________________________________________________
conv1_bn (BatchNormalization (None, 64, 64, 32)        128       
_________________________________________________________________
conv1_relu (Activation)      (None, 64, 64, 32)        0         
_________________________________________________________________
conv_pad_1 (ZeroPadding2D)   (None, 66, 66, 32)        0         
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D)  (None, 64, 64, 32)        288       
__________

In [15]:
early_stp = EarlyStopping(patience=3)
model_ckpt = ModelCheckpoint('resnet_all_layers.h5', save_weights_only=True)
model.load_weights('mobilenet_1_layer.h5')


In [None]:
model.fit(X_tra, y_tra,                  
                    steps_per_epoch=len(X_tra) / 25, epochs=25, batch_size=25,
                    validation_data=(X_val, y_val), 
                    validation_steps = len(X_val)/25, callbacks=[early_stp, model_ckpt], workers = 10, max_queue_size=20)

Epoch 1/75
Epoch 2/75
Epoch 3/75
Epoch 4/75
Epoch 5/75
Epoch 6/75
Epoch 7/75
Epoch 8/75
Epoch 9/75
Epoch 10/75
Epoch 11/75
Epoch 12/75
Epoch 13/75
Epoch 14/75
Epoch 15/75
Epoch 16/75
Epoch 17/75
Epoch 18/75
Epoch 19/75
Epoch 20/75
Epoch 21/75
 88/384 [=====>........................] - ETA: 44s - loss: 0.2298 - acc: 0.9033 - fmeasure: 0.8660

In [19]:
test_imgs = get_imgs(test_data_dir, test)

100%|██████████| 5400/5400 [01:02<00:00, 86.87it/s]


In [20]:
test_datagen.fit(test_imgs)

In [21]:
pred = model.predict(test_imgs, batch_size=512, shuffle=False, verbose=1, workers=8)



In [22]:
sub = pd.read_csv('/home/suraj/Repositories/Datasets/DL3 Dataset/meta-data/sample_submission.csv')
sub.iloc[:, 1:] = pred.round().astype(int)
sub.head()

Unnamed: 0,Image_name,attrib_01,attrib_02,attrib_03,attrib_04,attrib_05,attrib_06,attrib_07,attrib_08,attrib_09,...,attrib_76,attrib_77,attrib_78,attrib_79,attrib_80,attrib_81,attrib_82,attrib_83,attrib_84,attrib_85
0,Image-1.jpg,0,0,0,1,0,1,0,0,1,...,0,0,1,1,1,0,0,1,0,0
1,Image-2.jpg,0,1,0,1,0,1,1,0,0,...,0,1,0,1,0,0,0,0,1,1
2,Image-3.jpg,0,0,0,1,0,0,1,0,0,...,0,0,0,1,0,0,0,0,0,1
3,Image-4.jpg,1,0,0,0,0,1,0,0,1,...,0,1,1,1,0,1,0,1,0,1
4,Image-5.jpg,0,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,1,0,0


In [23]:
sub.to_csv('submission.csv', index=False)

In [24]:
sub.shape

(5400, 86)

## Final Thoughts

This submission should get you around $\approx$0.80 on the LB. and if you've noticed that our last epochs val_fmeasure is the same so it means that our val set is represantion of the test set and you can train for many epochs with EarlyStopping without worring about overfitting on val or train set.

### How to improve from here?
You can change many things which will let you get higher LB score. Following is a small list
<ul>
    <li> Change the Image size to bigger number </li>
    <li> Increse the number of epoch in the fully trainable network($2^{nd}$ training) </li>
    <li> Use diffrent architechure. you'll get more info on that [here](keras.io/applications/)</li>
    <li> If nothing works ensemble is your best friend </li>
</ul>

I'll try to keep improving this notebook. Feel free to contribuite.

Thanks
