# Introduction <a name="introduction"></a>

Deep convolutional neural network models may take days or even weeks to train on very large datasets.

A way to simplify this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet and MNIST image recognition tasks. Top performing models can be downloaded and used directly, or integrated into a new model for your own computer vision problems. This is called **Transfer Learning**.

In this project, we intend to use transfer learning when developing convolutional neural networks for an image classification problem.

# Objectives <a name="objectives"></a>

* Perming some exploration on dataset
* Selecting a pre-trained model on a publicly known dataset
* Customizing the model for our specific problem
* Freeze the parts of the model that we do not want to change
* Train the model to do well with our dataset
* Evaluate performance of the model

# Selecting a Dataset

Since we are going to peform image classification, we will do well to use a dataset that is trained on the most popular dataset, ImageNet.

## ImageNet

ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). ImageNet, aims to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated.

![ImageNet Images](https://raw.githubusercontent.com/ml-heroes/ml-dataset/master/image-net.png)


## Monkey Dataset

The dataset consists of two files, training and validation. Each folder contains 10 subforders labeled as n0~n9, each corresponding a species form Wikipedia's monkey cladogram. Images are 400x300 px or larger and JPEG format (almost 1400 images). Images were downloaded with help of the googliser open source code.


| Label   | Latin Name     | Common Name   | Train Images   | Validation Images   |
|--------|--------------  |---------------|----------------|---------------------|
| n0     | alouatta_palliata | mantled_howler | 131 | 26 |
| n1     | erythrocebus_patas | patas_monkey | 139 | 28 |
| n2     | cacajao_calvus | bald_uakari | 137 | 27 |
| n3     | macaca_fuscata | japanese_macaque | 152 | 30 |
| n4     | cebuella_pygmea | pygmy_marmoset |  131 | 26 |
| n5     | cebus_capucinus | white_headed_capuchin| 141 | 28 |
| n6     | mico_argentatus| silvery_marmoset| 132| 26 |
| n7     | saimiri_sciureus | common_squirrel_monkey | 142 | 28 |
| n8     | aotus_nigriceps | black_headed_night_monkey | 133 | 27 |
| n9     | trachypithecus_johnii | nilgiri_langur | 132 | 26 |

# Exploratory Data Analysis

## Import the required libraries

In [None]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm

color = sns.color_palette()
%matplotlib inline
%config InlineBackend.figure_format="svg"

## Make our results reproducible

In [None]:
# Set the seed for hash based operations in python
os.environ['PYTHONHASHSEED'] = '0'

seed=1234

# Set the numpy seed
np.random.seed(seed)

# Set the random seed in tensorflow at graph level
tf.random.set_seed(seed)

# Make the augmentation sequence deterministic
aug.seed(seed)

##  Load dataset and process it

In [None]:
# As usual, define some paths first to make life simpler
PATH = '/content/drive/My Drive/ml-dataset/monkey/'
training_data = Path(PATH + 'training/') 
validation_data = Path(PATH + 'validation/') 
labels_path = Path(PATH + 'monkey_labels.txt')



We will read the monkey_labels.txt file to extract the information about the labels. We can store this information in a list which then can be converted into a pandas dataframe.


In [None]:
labels_info = []

# Read the file
lines = labels_path.read_text().strip().splitlines()[1:]
for line in lines:
    line = line.split(',')
    line = [x.strip(' \n\t\r') for x in line]
    line[3], line[4] = int(line[3]), int(line[4])
    line = tuple(line)
    labels_info.append(line)
    
# Convert the data into a pandas dataframe
labels_info = pd.DataFrame(labels_info, columns=['Label', 'Latin Name', 'Common Name', 
                                                 'Train Images', 'Validation Images'], index=None)
# Sneak peek 
labels_info

The labels are n0, n1, n2, .... We will create a mapping of these labels where each class will be represented by an integer starting from 0 to number of classes. We will also create a mapping for the names corresponding to a class. We will be using Common Name for the last part

In [None]:
# Create a dictionary to map the labels to integers
labels_dict= {'n0':0, 'n1':1, 'n2':2, 'n3':3, 'n4':4, 'n5':5, 'n6':6, 'n7':7, 'n8':8, 'n9':9}
cat = pd.Categorical(labels_info['Label'])
labels_info['Label'] = cat.rename_categories(labels_dict)
df = labels_info.drop(columns=['Train Images', 'Validation Images'])
df.head(10)

This is a very small dataset. You can load the data into numpy arrays which then can be directly used for training. But this isn't always the scenario. Most of the time you won't be able to load the entire dataset in the memory. This is why I always store information about the dataset in dataframes and then use a generator to load the data on the fly. We will be doing the same thing here.

In [None]:
# Creating a dataframe for the training dataset
train_df = []
for folder in os.listdir(training_data):
    # Define the path to the images
    imgs_path = training_data / folder
    
    # Get the list of all the images stored in that directory
    imgs = sorted(imgs_path.glob('*.jpg'))
    
    # Store each image path and corresponding label 
    for img_name in imgs:
        train_df.append((str(img_name), labels_dict[folder]))


train_df = pd.DataFrame(train_df, columns=['image', 'label'], index=None)
# shuffle the dataset 
train_df = train_df.sample(frac=1.).reset_index(drop=True)

####################################################################################################

# Creating dataframe for validation data in a similar fashion
valid_df = []
for folder in os.listdir(validation_data):
    imgs_path = validation_data / folder
    imgs = sorted(imgs_path.glob('*.jpg'))
    for img_name in imgs:
        valid_df.append((str(img_name), labels_dict[folder]))

        
valid_df = pd.DataFrame(valid_df, columns=['image', 'label'], index=None)
# shuffle the dataset 
valid_df = valid_df.sample(frac=1.).reset_index(drop=True)

####################################################################################################

# How many samples do we have in our training and validation data?
print("Number of traininng samples: ", len(train_df))
print("Number of validation samples: ", len(valid_df))

# sneak peek of the training and validation dataframes
print("\n",train_df.head(), "\n")
print("=================================================================\n")
print("\n", valid_df.head())

## Viewing some monkeys

In [None]:
def plots(ims, figsize=(12,6), rows=3, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows+1, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])

In [None]:
imgs = []
labels = []

for i in range(10):
    file = os.listdir(f'{PATH}training/n%d'%i)
    img = plt.imread(f'{PATH}training/n%d/{file[0]}'%i)
    imgs.append(img)
    name = df.loc[df['Label'] == i, 'Common Name'].item()
    label = "Class %d: %s" % (i, name)
    labels.append(label)

plots(imgs, titles=labels, rows=4, figsize=(16,15))

## Checking missing data

In [None]:
print("Number of records: ", len(train_df))
print("Shape: ", train_df.shape)
# Checks if there are any missing values
print("\nMissing data?")

In [None]:
print("Train: ")
train_df.isnull().sum()

In [None]:
print("Test: ")
valid_df.isnull().sum()

We have absolutely no missing data in our dataset. All images have an associated

## Data Distibution

In [None]:
sns.countplot(x="label", data=train_df);

In [None]:
plt.xlim(0, 9)
sns.distplot(train_df['label']);

We see a highly balanced dataset for our training data.

In [None]:
sns.countplot(x="label", data=valid_df);

In [None]:
plt.xlim(0, 9)
sns.distplot(valid_df['label']);

And our testing data is as well balanced across all labels.

One other thing to notice is the similarity in the distribution of data for the training set and validation set. It will be safe to assume, this data was collected together but separated using a function.

## Image Dimensions

Look at image dimensions, confirm it's 3 band (RGB)

In [None]:
first = plt.imread(train_df.iloc[0].image)
print(np.shape(first))

Image dimenion is 500x412x3. This shows a regular image with rgb coloring.

Confirm images are byte scaled (0-255).

In [None]:
np.min(first), np.max(first)

## Check size distribution

Let's select group images ny their size and check for the distribution of image sizes. However downloading all images for this will take forever. Instead let's sample random 5 images

In [None]:
def get_image_dims(image):
  im = plt.imread(image)
  return np.shape(im)

In [None]:
def sample_data(df, label, size=10):
  samples = df.loc[df['label'] == label].sample(size)
  return samples

In [None]:
colors = cm.rainbow(np.linspace(0, 1, 10))

for i in range(0, 10):
  samples = sample_data(train_df, i)
  images = samples['image']
  dims = images.apply(get_image_dims)
  width_height = dims.apply(lambda x: (x[0], x[1]))
  plt.scatter(*zip(*width_height), color=colors[i], label=str(i))

plt.title("Image sizes")
plt.xlabel('Width')
plt.ylabel('Height')
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()

We can notice that from our sampling, many images are have size 500x500. With some few been very large.

There is no need to show the x-axis since there all the same value for x axis

This also confirms that the images are of different sizes and may need some pooling for size reduction

We can also infer that all the images sampled have a third dimension with rgb format. This means we may need to perform some standardization to the colors.

# Model Selection

There are perhaps a dozen or more top-performing models for image recognition that can be downloaded and used as the basis for image recognition and related computer vision tasks.

Perhaps three of the more popular models are as follows:

* VGG (e.g. VGG16 or VGG19).
*GoogLeNet (e.g. InceptionV3).
* Residual Network (e.g. ResNet50).

These models are both widely used for transfer learning both because of their performance, but also because they were examples that introduced specific architectural innovations, namely consistent and repeating structures (VGG), inception modules (GoogLeNet), and residual modules (ResNet)

We will use a pre trained Deep Convolutional Neural Network "Xception" to transfer learn on our own Data.


## Xception

Xception by Google, stands for Extreme version of Inception. With a modified depthwise separable convolution, it is even better than Inception-v3.

## Original Architecture (Inception)

![Original Architecture](https://raw.githubusercontent.com/ml-heroes/ml-dataset/master/original_xception.png)

> *Original Depthwise Separable Convolution*

The original depthwise separable convolution is the depthwise convolution followed by a pointwise convolution.
* Depthwise convolution is the channel-wise n×n spatial convolution. Suppose in the figure above, we have 5 channels, then we will have 5 n×n spatial convolution.
* Pointwise convolution actually is the 1×1 convolution to change the dimension.

Compared with conventional convolution, we do not need to perform convolution across all channels. That means the number of connections are fewer and the model is lighter.

## Modified Architecture (Xception)

![Original Architecture](https://raw.githubusercontent.com/ml-heroes/ml-dataset/master/modified_xception.png)

> *The Modified Depthwise Separable Convolution used as an Inception Module in Xception, so called “extreme” version of Inception module (n=3 here)*

The modified depthwise separable convolution is the pointwise convolution followed by a depthwise convolution. This modification is motivated by the inception module in Inception-v3 that 1×1 convolution is done first before any n×n spatial convolutions. Thus, it is a bit different from the original one. (n=3 here since 3×3 spatial convolutions are used in Inception-v3.)

## Comparison of ImageNet: Xception Model with other Models

![Original Architecture](https://raw.githubusercontent.com/ml-heroes/ml-dataset/master/compare.png)

> *ImageNet: Xception has the Highest Accuracy*

# Model

> Describe model structure here

## Training

> Describe how model is trained

## Analysis

> Show graphs and results here

# Conclusion

> Final Analysis

# References

* https://medium.com/analytics-vidhya/image-recognition-using-pre-trained-xception-model-in-5-steps-96ac858f4206
* https://www.kaggle.com/aakashnain/what-does-a-cnn-see
* https://openaccess.thecvf.com/content_cvpr_2017/papers/Chollet_Xception_Deep_Learning_CVPR_2017_paper.pdf