**Overview/Goal of this Model:**

We will be building a convolution neural network using transfer learning and try to apply optimization techniques for better model performance. This shall be done for the kaggle dataset available from https://www.kaggle.com/gpiosenka/100-bird-species on birdspecies. We plan to use the available 25812 (number changes as new versions are published) training images to train a visualnet model while using validation data (5 per species available) so that we can accurately predict the species of the images in test dataset. As of May 2nd 2020 this dataset had 190 bird species which can get updated over time.

**Description of data:
**
Some important considerations when working with the data are as follows:

1) Size of dataset:

All images provided in the dataset are of size 224 X 224 X 3. Hence these are color images in the jpg format. We have a total of 25812 training images, 950 test images(5 per species) and 950 validation images(5 per species)

2) Organization of data:

Images for each species are contained in a separate sub directory for each of the train, test and validate datasets. We also have a "consolidated" image set available that combines all the available images and could have allowed to do our own train, validate, test split. For the purpose of this model I have not done that.

3) Distribution of images by gender: Male to Female species images are in the ratio 4:1 in the dataset. Given that the physical characteristics of the males in birds can be very different from female in color (males generally brigher), size and other physical characteristics the classifier will not perform well with female species images. However given this information is not available on the images it is not possible to test how the model is performing specifically on females species images in terms of prediction.

4) In the images the birds are sitting on the branches and are centered mostly


**Summary of methods**

Tried different architectures: (added dense and removed the last few layers of the models)
* InceptionV3
* vgg16
* MobileNet

Optimizers:
Adam
SGD
rmsprop

Tried to adjust the:
*decay_rate*: We used this option with optimizers to stabilize learning rate. 
*learning rate*: We have used a learning rate of 0.0001 which seems to have worked well with earlier models

Steps to avoid overfitting by using regularization techniques:

Used dropout with new dense layers added 
weight decay(L2 regularization)

The goal of the model is to generalize well so that it can perform well on the never seen before test data. Data augmentation helps to generating more training data  from existing training samples, by augmenting the samples via a number of random transformations as follows:

* *rotation_range* is a value in degrees (0–180), a range within which to randomly rotate pictures: We tried rotation of 20-40% in different attempts
* *width_shift* and *height_shift* are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally:
  We tried in the models to changes this by 20% horizontally and vertically
* *shear_range* is for randomly applying shearing transformations. We tried 20 % shear_range
* *zoom_range* is for randomly zooming inside pictures. Again here we tried 20 % zoom_range
* *horizontal_flip* is for randomly flipping half the images horizontally—relevant when there are no assumptions of horizontal asymmetry (for example,real-world pictures. Since in the images all the birds seem to be sitting vertically I did not try this option.
* *fill_mode* is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift. We have used nearest here

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

#Used to make data more uniform across screen.
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os

# Load the REQUIRED libraries
# This magic function not found
#%tensorflow_version 2.x
#from tensorflow.keras.datasets import mnist
from tensorflow.keras import backend, models, layers, regularizers
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

import numpy as np
import pandas as pd
from tensorflow.keras.preprocessing import image
# Libraries needed to build model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten,Dropout,AveragePooling2D,DepthwiseConv2D
# library to read the kaggle authentication fike
import json
from IPython.display import display # Library to help view images
from PIL import Image # Library to help view images
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Library for data augmentation

import os, shutil # Library for navigating files

# These are functions that are needed in Google Collab
#from google.colab import drive # Library to mount google drives
# Colab library to upload files to notebook
#from google.colab import files

import time

from tensorflow.keras.applications import MobileNet
from tensorflow.keras.applications.mobilenet import preprocess_input
from tensorflow.keras.optimizers import Adam

import os

# display images
from IPython.display import Image

from tensorflow.keras.utils import plot_model # This will print model architecture.

from sklearn.metrics import classification_report,confusion_matrix
import seaborn as sns

#for dirname, _, filenames in os.walk('/kaggle/input'):
#    for filename in filenames:
#        print(os.path.join(dirname, filename))

# Move files to /kaggle/working directory
#!cp -r /kaggle/input/100-bird-species/ /kaggle/working/190-bird-species
# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

list_image=[]
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        #print(os.path.join(dirname, filename))
        list_image.append(os.path.join(dirname, filename))
        
# View a few of the images
for i in range(14, 39):
    plt.subplot(5, 5, + 1 + i-14)
    image = mpimg.imread(list_image[i-14])
    plt.imshow(image)
    #imgplot = plt.imshow(img)
plt.show()



**Summary of model:**
1. The model below used pre-trained network MobileNet to transfer weights to models and use data augmentation. I have kept the weights of all layers and added a few more dense layers to train with the new dataset
1. We add a dense layer with dropout to this architecture before predicting the multiple classes.
1. We use adam optimizer with learning rate of 1e-5 and decay rate 1e-6
1. Used EarlyStopping based on accuracy monitoring with patient of 5 and with restore_best_weights=True

In [None]:
# We will use the MobileNet CNN that was trained on ImageNet data which gives better performance than others like Inception_V3 and VGG16
mobile = MobileNet(include_top=False,input_shape=(224,224,3),pooling='avg', weights='imagenet',alpha=1, depth_multiplier=1)



In [None]:
print(mobile.summary()) 

In [None]:
plot_model(mobile, show_layer_names=True,show_shapes=True ) 

In [None]:
# Specify the traning, validation, and test directories.  
base_dir='/kaggle/input/100-bird-species'
train_dir = os.path.join(base_dir,'train')
validation_dir = os.path.join(base_dir,'valid')
test_dir = os.path.join(base_dir,'test')
consolidated_dir=os.path.join(base_dir,'consolidated')

# Given the number of bird species that need to be predicted can change we will make this a parameter
class_num=len(os.listdir(train_dir))

In [None]:
import math
def round_down(n, decimals=0):
    multiplier = 10 ** decimals
    return math.floor(n * multiplier) / multiplier

In [None]:
train_batch=128
# Data Augmentation on training set
train_datagen = ImageDataGenerator(rescale=1./255,
                                   #horizontal_flip=True, # Flip image horizontally 
                                   samplewise_center=True,
                                   samplewise_std_normalization=True,
                                   fill_mode='nearest')
test_datagen = ImageDataGenerator(rescale=1./255)

# Since the file images are in a dirrectory we need to move them from the directory into the model.  
# Keras as a function that makes this easy. Documentaion is here: https://keras.io/preprocessing/image/

train_generator = train_datagen.flow_from_directory(
    train_dir, # The directory where the train data is located
    target_size=(224, 224), # Reshape the image to 150 by 150 pixels. This is important because it makes sure all images are the same size.
    batch_size=train_batch, # We will take images in batches of 200.
    class_mode='categorical') # The classification is multiple categories.

validation_generator = train_datagen.flow_from_directory(
    validation_dir,
    target_size=(224, 224),
    class_mode='categorical')

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(224, 224),
    class_mode='categorical')

In [None]:
prediction_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(224, 224),
    class_mode='categorical',
    shuffle=False)

In [None]:
train_steps_per_epoch=round_down(len(train_generator.filenames)/train_batch)
validation_batch=32
valid_steps=round_down(len(validation_generator.filenames)/validation_batch)


In [None]:
backend.clear_session()
# We will use the MobileNet CNN that was trained on ImageNet data which gives better performance than others like Inception_V3 and VGG16
mobile = MobileNet(include_top=False,input_shape=(224,224,3),weights='imagenet')
model_name='MobileNet'
mobile.trainable = False
mobileMod= models.Sequential()
mobileMod.add(mobile)
mobileMod.add(layers.Flatten())
mobileMod.add(layers.Dense(256, activation = 'relu',kernel_regularizer = regularizers.l1(0.00001)))
mobileMod.add(layers.Dropout(0.4))
mobileMod.add(layers.Dense(class_num, activation = 'softmax'))

# We will still use the same generators with data augmentation defined above
epoch=40
start_3 = time.perf_counter()

mobileMod.compile(optimizer = 'adam',
    loss = 'categorical_crossentropy',
    metrics = ['accuracy'])

history= mobileMod.fit_generator(
    train_generator,
    steps_per_epoch=train_steps_per_epoch,
    epochs=epoch,
    validation_data=validation_generator,
    validation_steps=valid_steps,
    verbose = 2,
    callbacks = [EarlyStopping(monitor='accuracy', patience = 5, restore_best_weights=True)])
end_3 = time.perf_counter()

In [None]:
# Code to plot performance in difference epochs. Here I am not seeing the graphs because only epoch was run above. But google collab should give the graph
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
acc_values = history_dict['accuracy']
val_acc_values = history_dict['val_accuracy']
epochs = range(1, len(history_dict['accuracy']) + 1)

plt.plot(epochs, loss_values, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss_values, 'b', label = 'Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.plot(epochs, acc_values, 'bo', label = 'Training accuracy')
plt.plot(epochs, val_acc_values, 'b', label = 'Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

In [None]:
# Evaluate the performance of the model on test dataset. Low here because of the above issue of only epoch run
test_loss, test_acc = mobileMod.evaluate_generator(test_generator, steps = valid_steps)
print('test_acc for {} is {}'.format(model_name,test_acc))
print('Loaded {} feature extractor in {:.2f}sec'.format(model_name, end_3-start_3))

In [None]:
# We will now look at the images that were incorrectly classfied by model
# reset the test_generator before whenever you call the predict_generator. This is important, if you forget to reset the test_generator you will get outputs in a weird order or use shuffle=false

Y_pred = mobileMod.predict_generator(prediction_generator,verbose=1)
predicted_classes = np.argmax(np.round(Y_pred), axis=1)
predicted_class_indices=np.argmax(Y_pred,axis=1)

#labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
test_labels=[labels[k] for k in prediction_generator.classes]
filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,"labels":test_labels,"Predictions":predictions}) #
# Look at the images that have been misclassified
rslt_df = results[results['labels'] != results['Predictions']] 
#print(rslt_df)

#Add full path to the Filename
rslt_df.loc[rslt_df.index, 'Filename'] = '/kaggle/input/100-bird-species/test/' + rslt_df['Filename'].astype(str)



In [None]:
# To display misclassified images in pandas dataframe we will use the following function which have adopted from https://stackoverflow.com/questions/46107348/how-to-display-image-stored-in-pandas-dataframe
import glob
import random
import base64
import pandas as pd

from PIL import Image
from io import BytesIO
from IPython.display import HTML
import io

pd.set_option('display.max_colwidth', -1)


def get_thumbnail(path):
    #path = "\\\\?\\"+path # This "\\\\?\\" is used to prevent problems with long Windows paths
    path = path # This "\\\\?\\" is used to prevent problems with long Windows paths
    i = Image.open(path)    
    return i

def image_base64(im):
    if isinstance(im, str):
        im = get_thumbnail(im)
    with BytesIO() as buffer:
        im.save(buffer, 'jpeg')
        return base64.b64encode(buffer.getvalue()).decode()

def image_formatter(im):
    return f'<img src="data:image/jpeg;base64,{image_base64(im)}">'

In [None]:
#We can pass our local image path to get_thumbnail(path) with following:

rslt_df['FilenamePill'] = rslt_df.Filename.map(lambda f: get_thumbnail(f))

#view pandas dataframe with resized images by call image_formatter function in IPython.display HTML function:
HTML(rslt_df.to_html(formatters={'FilenamePill': image_formatter}, escape=False))


We can see from the misclassified image above that the model had trouble classifying Albatross image when in different settings. Since the background based on location and flying or sitting position can change we should use data augmentation to enhance the dataset

In [None]:
predicted_class_indices=np.argmax(Y_pred,axis=1)
predictions = [labels[k] for k in predicted_class_indices]
test_labels=[labels[k] for k in test_generator.classes]

plt.subplots(figsize=(20,15))
#sns.heatmap(confusion_matrix(test_generator.classes, y_pred))
sns.heatmap(confusion_matrix(test_labels, predictions))

In [None]:
print('Classification Report')
print(classification_report(test_generator.classes, predicted_class_indices, target_names=list(test_generator.class_indices.keys())))

**Here is a summary of the results of other runs:**

1) test_acc for InceptionV3 with weights not frozen in last 4 layers is 0.6369612812995911
optimizer used =Adam(lr=0.0001)
No data augmentation used
Time taken for model to fit = 591.04sec
This was run for just 10 epochs and could have performed better if run for a higher number of epochs. 


2) test_acc for MobileNet without Data Augmentation and Adam classifier is 0.5638841390609741
Here we used the default settings on adam optimizer. 
on fit_generator we used :
callbacks=[ReduceLROnPlateau(monitor = 'val_loss', factor = 0.7, patience = 5, verbose = 1)
Loaded MobileNet with Data Augmentation and Adam classifier feature extractor in 2203.69sec (0.61 hr)
Even though the sepeed was better the performance was not as good

3) test_acc for MobileNet with Data Augmentation and Adam classifier was  0.8897005319595337
optimizerA=Adam(lr=0.0001)
Loaded MobileNet with Data Augmentation feature extractor in 1510.89sec which was relatively fast. 
We however had trained this model on just 20 epochs. It could have done better with higher number of epochs. It is also possible to play with batch size. Here I had used 200 with steps_per_epoch of 125. We can reduce the batch size and increase steps_per_epoch. 

4) test_acc for MobileNet with Data Augmentation and SGD Optimizer and controlled batch size is 0.9264943599700928
Here we have used 
SGD(lr=0.0001, momentum=0.9, nesterov=True)
Here we used the callbacks :
[EarlyStopping(monitor = 'val_accuracy', patience = 5,restore_best_weights = True),
ReduceLROnPlateau(monitor = 'val_loss', factor = 0.7,patience = 5, verbose = 1)]) 
We had specified 50 epochs and did have early stopping hence we could have run the model for a higher number of epochs

5) Loaded MobileNet with Data Augmentation and SGD Optimizer and controlled batch size feature extractor in 13745.94sec (3.82 hrs). The time taken with a low learning rate is high.


test_acc for MobileNet with rmsprop optimizer with default setting was  0.8850405216217041
Here during model fitting we used callbacks=[LearningRateScheduler(lr_schedule)]
where the function was defined as follows

def lr_schedule(epoch) :
    lrate = 0.0001
    if epoch > 15:
        lrate = 0.00005
    if epoch > 20:
        lrate = 0.00001
    return lrate 
We added 2 dense layes in this architecture with 40 % drop off and l1 regularizers. 

ROC curves are typically used in binary classification to study the output of a classifier hence I have not used it here. 

**Analysis of Results**:
1. When using TransferLearning MobileNet gave better performance than Inception_V3, VGG16. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks.
1. Adjusting the learning rate on the optimizer helps in better convergence of the model. Both adam, sgd did well with a learning rate around 0.0001
1. Data Augmentation did not apparently provide any benefit on model performance