## Category-2-Supervised Machine Learning on Unstructured (Images) data
> Dataset consists of images of cats & dogs taken from Kaggle (https://www.kaggle.com/c/dogs-vs-cats/). The training dataset is labeled and thus this is a Supervised Image Classification problem.
> Given below is a simple CNN architecture that helps with image classification.

In [1]:
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

Using TensorFlow backend.


** Part 1 - Initializing the CNN **

In [2]:
# Initialising the CNN
classifier = Sequential()

In [3]:
# Step 1 - Convolution
# Convolution operator for filtering windows of two-dimensional inputs.
# For the first layer in the model,use argument, input_shape=(64, 64, 3)`for 64x64 RGB pictures with Tensorflow backend
# 32,3,3 means - Apply a 3x3 convolution with 32 output filters

classifier.add(Convolution2D(32, 3, 3, input_shape = (64, 64, 3), activation = 'relu'))

In [4]:
# Step 2 - Pooling
# Max pooling operation to capture spatial features
# pool_size is the stride window size
classifier.add(MaxPooling2D(pool_size = (2, 2)))

In [5]:
# Adding a second convolutional layer
# Input dim is not required for Convolution2D as this is not the first layer
classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

In [6]:
# Step 3 - Flattening
classifier.add(Flatten())

In [7]:
# Step 4 - Full connection
classifier.add(Dense(output_dim = 128, activation = 'relu'))
classifier.add(Dense(output_dim = 1, activation = 'sigmoid'))

In [8]:
# Compiling the CNN
# Binary cross entropy is used as there are only 2 classes - cats & dogs
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

** Part 2 - Fitting the CNN to the images **

In [9]:
# ImageDataGenerator in Keras is used for Image Augmentation

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

In [10]:
# Flow from Directory expects the data to be organized in a certain way: Root folder having subfolders for each class
training_set = train_datagen.flow_from_directory('0.datasets/catsdogs/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

Found 8000 images belonging to 2 classes.


In [11]:
test_set = test_datagen.flow_from_directory('0.datasets/catsdogs/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 2000 images belonging to 2 classes.


In [12]:
classifier.fit_generator(training_set,
                         samples_per_epoch = 8000,
                         nb_epoch = 3,  # No.of rounds. Higher number = Higher Accuracy but more time to train
                         validation_data = test_set,
                         nb_val_samples = 2000)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x203dfbbc9e8>

** Part 3 - Making predictions for new images **

In [19]:
import numpy as np
import pandas as pd
from keras.preprocessing import image

In [34]:
to_predict = pd.DataFrame()
to_predict['Images'] = ''
to_predict['Prediction'] = ''
predictions=[]
#print(training_set.class_indices)

to_predict_image_path = '0.datasets/catsdogs/to_predict_set/'

In [37]:
for i in range(10):  # 10 images are present in the to-predict folder
    x = str(i+1)
    img_name = 'Img' + x + '.jpg'
    to_predict_image_string = to_predict_image_path + img_name 
    temp_image = image.load_img(to_predict_image_string,target_size = (64, 64))
    temp_image = image.img_to_array(temp_image)
    temp_image = np.expand_dims(temp_image,axis = 0)
    result = classifier.predict_classes(temp_image)
    if result[0][0] == 1:
        prediction = 'dog'
    else:
        prediction = 'cat'
    
    to_predict.at[i, 'Images'] = img_name
    to_predict.at[i, 'Prediction'] = prediction
    predictions.append(prediction)  



In [38]:
# Final prediction
print(to_predict)

      Images Prediction
0   Img1.jpg        dog
1   Img2.jpg        dog
2   Img3.jpg        dog
3   Img4.jpg        cat
4   Img5.jpg        dog
5   Img6.jpg        cat
6   Img7.jpg        cat
7   Img8.jpg        dog
8   Img9.jpg        cat
9  Img10.jpg        dog
