## Dog Breed Classification

In this we will use Transfer Learning by Inception model with weights pre-trained on Imagenet.

### Load Dataset Files

Run the below code to mount the google drive and access the files in drive

In [0]:
from google.colab import drive

In [2]:
drive.mount('/content/drive/')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive/


Now, upload the given dataset file shared with you in your google drive and give its path for the below given `project_path` variable. For example, a path is given below according to the file path in our google drive. You need to change this to match the path of yours.

In [0]:
project_path = "drive/My Drive/AIML/"

Run the below code to extract all the images in the train.zip and labels.csv.zip files given in the dataset. We are going to use these images as train and validation sets and their labels in further steps.

In [0]:
from zipfile import ZipFile
with ZipFile(project_path+'train.zip', 'r') as z:
  z.extractall()

In [0]:
# from zipfile import ZipFile
# with ZipFile(project_path+'test.zip', 'r') as z:
#   z.extractall()

In [0]:
# from zipfile import ZipFile
# with ZipFile(project_path+'sample_submission.csv.zip', 'r') as z:
#   z.extractall()

In [0]:
from zipfile import ZipFile
with ZipFile(project_path+'labels.csv.zip', 'r') as z:
  z.extractall()

### Read labels.csv file using pandas

In [0]:
import pandas as pd
labels = pd.read_csv("labels.csv")

In [5]:
labels.head(10)

Unnamed: 0,id,breed
0,000bec180eb18c7604dcecc8fe0dba07,boston_bull
1,001513dfcb2ffafc82cccf4d8bbaba97,dingo
2,001cdf01b096e06d78e9e5112d419397,pekinese
3,00214f311d5d2247d5dfe4fe24b2303d,bluetick
4,0021f9ceb3235effd7fcde7f7538ed62,golden_retriever
5,002211c81b498ef88e1b40b9abf84e1d,bedlington_terrier
6,00290d3e1fdd27226ba27a8ce248ce85,bedlington_terrier
7,002a283a315af96eaea0e28e7163b21b,borzoi
8,003df8b8a8b05244b1d920bb6cf451f9,basenji
9,0042188c895a2f14ef64a918ed9c7b64,scottish_deerhound


### Print the count of each category of Dogs given in the dataset

You can use value_counts() to get the count of each category

In [6]:
pd.value_counts(labels.breed)

scottish_deerhound                126
maltese_dog                       117
afghan_hound                      116
entlebucher                       115
bernese_mountain_dog              114
shih-tzu                          112
pomeranian                        111
great_pyrenees                    111
basenji                           110
samoyed                           109
tibetan_terrier                   107
airedale                          107
cairn                             106
leonberg                          106
beagle                            105
japanese_spaniel                  105
miniature_pinscher                102
australian_terrier                102
blenheim_spaniel                  102
irish_wolfhound                   101
lakeland_terrier                   99
saluki                             99
papillon                           96
norwegian_elkhound                 95
siberian_husky                     95
whippet                            95
pug         

### Get one-hot encodings of labels

In [0]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
label_enc = LabelEncoder()
SEED = 2018
NUM_CLASSES = 120

np.random.seed(seed=SEED)
rnd = np.random.random(len(labels))
train_idx = rnd < 0.9
valid_idx = rnd >= 0.9
y_train = label_enc.fit_transform(labels["breed"].values)
ytr = y_train[train_idx]
yv = y_train[valid_idx]

### Form feature set using the images in `train` folder and their corresponding labels from `labels.csv`

Run the below given code to form the feature set. Here, we are resizing each image to 150x150x3 as the Inception model we will be using in the further steps for transfer learning needs input image in this dimension.

In [0]:
img_rows=150
img_cols=150
num_channel=3

In [9]:
from tqdm import tqdm
import cv2
x_feature = []
y_feature = []

i = 0 # initialisation
for f, img in tqdm(labels.values): # f for format ,jpg
    train_img = cv2.imread('./train/{}.jpg'.format(f),1)
    label = y_train[i]
    train_img_resize = cv2.resize(train_img, (img_rows, img_cols)) 
    x_feature.append(train_img_resize)
    y_feature.append(label)
    i += 1

100%|██████████| 10222/10222 [00:30<00:00, 333.35it/s]


In [10]:
x_train_data = np.array(x_feature, np.float32) / 255.   # /= 255 for normolisation
print (x_train_data.shape)
# x_train_data = np.expand_dims(x_train_data, axis = 3) # for keras to given input to Conv2D layer
# print (x_train_data.shape)


(10222, 150, 150, 3)


In [11]:
y_train_data = np.array(y_feature)

y_train_data.shape

(10222,)

We have the normalized feature set ready in `x_train_data` and `y_train_data` varaibles.

### Split the training and validation data from `x_train_data` and `y_train_data` obtained from above step

In [0]:
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x_train_data, y_train_data, test_size=0.33, random_state=42)

### Import the IncpetionResNetV2 model which we use for Transfer Learning.

Run the below code to import the inception model.

In [15]:
from keras.applications import InceptionResNetV2

conv_base = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=(150,150,3))

Using TensorFlow backend.


In [0]:
#get the length of the train and validation data
ntrain = len(x_train)
nval = len(x_val)

#We will use a batch size of 32. 
batch_size = 32  

### Build the network using above inception model and add layers as mentioned below for classification.

1. Add a Dense layer with 256 neurons with `relu` activation

2. Add a Dense layer with 120 neurons as final layer (as there are 120 classes in the given dataset) with `softmax` activation for classifiaction. 

In [16]:
conv_base.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 150, 150, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 74, 74, 32)   864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 74, 74, 32)   96          conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 74, 74, 32)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (

In [0]:
from keras.layers import Dense, Activation, Dropout, Flatten, Reshape, Input
from keras.optimizers import RMSprop
from keras.layers import Convolution2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization
from keras.utils import np_utils
from keras.models import Sequential, Model

In [0]:
BATCH_SIZE=32
EPOCHS=20
model = Sequential()
model.add(conv_base)
model.add(Flatten())
model.add(Dense(256, activation='relu'))
#model.add(Dropout(0.1))
model.add(Dense(120, activation='softmax'))

In [61]:


model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
inception_resnet_v2 (Model)  (None, 3, 3, 1536)        54336736  
_________________________________________________________________
flatten_1 (Flatten)          (None, 13824)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               3539200   
_________________________________________________________________
dense_4 (Dense)              (None, 120)               30840     
Total params: 57,906,776
Trainable params: 3,570,040
Non-trainable params: 54,336,736
_________________________________________________________________


### Now, freeze the layers in inception model as we dont want to train those layers.

Run the below code to freeze the inception model layers.

In [0]:

conv_base.trainable = False



### Compile the model using Adam optimizer with `categorical_crossentropy` loss function

In [0]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Data Augmentation

Run the below code to inilialize the ImageDataGenerators for train and validation data separately.

In [0]:
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, load_img

train_datagen = ImageDataGenerator(rescale=1./255,   #Scale the image between 0 and 1
                                    rotation_range=40,
                                    width_shift_range=0.2,
                                    height_shift_range=0.2,
                                    shear_range=0.2,
                                    zoom_range=0.2,
                                    horizontal_flip=True,
                                    fill_mode='nearest')

val_datagen = ImageDataGenerator(rescale=1./255)  #We do not augment validation data. we only perform rescale

### Using the above objects, create the image generators with variable names `train_generator` and `val_generator`

You need to use train_datagen.flow() and val_datagen.flow()

In [0]:
train_generator = train_datagen.flow(x_train, y_train,batch_size=BATCH_SIZE)

val_generator = val_datagen.flow(x_val, y_val,batch_size=BATCH_SIZE)



In [50]:
train_generator.shape 

AttributeError: ignored

### Fit the model using fit_generator() using `train_generator` and `val_generator` from the above step with 2 epochs

In [69]:
model.fit_generator(train_generator,
                    steps_per_epoch=10,
                    epochs=2,
                    shuffle=True)

Epoch 1/2


ValueError: ignored