## CNN Classification

In [1]:
from PIL import Image
import glob
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import keras
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, Dense, Activation, Flatten

Using TensorFlow backend.


Load image files from 'PIE' folder and 'own_image' folder.

The 20 subjects out of the entire data set are selected using a random number generator to ensure randomness and prevent any bias.

The images are splited into training and test sets. 

Preprocess images using standard scaler from sklearn package, reshape the images to fit into CNN

Since sampling is not mentioned in the instructions provided, the entire training set is used to train the model.

In [2]:
img_dir_list = ['PIE/1/*.jpg', 'PIE/4/*.jpg', 'PIE/5/*.jpg', \
               'PIE/19/*.jpg', 'PIE/21/*.jpg', 'PIE/23/*.jpg', \
               'PIE/25/*.jpg', 'PIE/29/*.jpg', 'PIE/33/*.jpg', \
               'PIE/39/*.jpg', 'PIE/40/*.jpg', 'PIE/44/*.jpg', \
               'PIE/45/*.jpg', 'PIE/46/*.jpg', 'PIE/48/*.jpg', \
               'PIE/52/*.jpg', 'PIE/57/*.jpg', 'PIE/58/*.jpg', \
               'PIE/59/*.jpg', 'PIE/67/*.jpg']
image_list = []
image_own = []
train_list = []
train_own = []
test_list = []
test_own = []
image_label = []
own_label = [20] * 10

for i in range (0, 20):
    for filename in glob.glob(img_dir_list[i]):
        im = Image.open(filename)
        arr = np.array(im).flatten()
        image_list.append(arr)
        image_label.append(i)
for filename in glob.glob('own_image/*.jpg'):
    im = Image.open(filename)
    arr = np.array(im).flatten()
    image_own.append(arr)
    
image_list = np.asarray(image_list)
image_own = np.asarray(image_own)
image_label = np.asarray(image_label)
own_label = np.asarray(own_label)

train_list, test_list, train_label, test_label = train_test_split(image_list, image_label, test_size = 0.3)
train_list = np.asarray(train_list)
test_list = np.asarray(test_list)
train_label = np.asarray(train_label)
test_label = np.asarray(test_label)

train_own, test_own, train_own_label, test_own_label = train_test_split(image_own, own_label, test_size = 0.3)
train_own = np.asarray(train_own)
test_own = np.asarray(test_own)
train_own_label = np.asarray(train_own_label)
test_own_label = np.asarray(test_own_label)

test_list = np.concatenate((test_list, test_own))
test_label = np.concatenate((test_label, test_own_label))
train_list = np.concatenate((train_list, train_own))
train_label = np.concatenate((train_label, train_own_label))

test_list = np.asarray(test_list)
train_list = np.asarray(train_list)

sc = preprocessing.StandardScaler()
train_list_prep = sc.fit_transform(train_list)
test_list_prep = sc.transform(test_list)

test_list = test_list_prep.reshape(test_list_prep.shape[0], 32, 32, 1)
train_list = train_list_prep.reshape(train_list_prep.shape[0], 32, 32, 1)

Convert the labels array to binary matrix to fit in CNN

In [3]:
test_label = to_categorical(test_label, num_classes = 21)
train_label = to_categorical(train_label, num_classes = 21)

Construct neural network with 2 convolutional layers and one fully connected layer

Number of nodes: 20-50-500-21

Convolutional kernel size are set as 5

In [4]:
model = Sequential()

model.add(Conv2D(filters=20, kernel_size=5, strides=1, padding="same", input_shape=(32, 32, 1)))
model.add(MaxPool2D(pool_size=2, strides=2))

model.add(Conv2D(filters=50, kernel_size=5, strides=1, padding='same'))
model.add(MaxPool2D(pool_size=2, strides=2))

model.add(Flatten())
model.add(Dense(500))
model.add(Activation('relu'))

model.add(Dense(21))
model.add(Activation('softmax'))
adam = keras.optimizers.Adam(lr=1e-4)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 20)        520       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 20)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 50)        25050     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 50)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 3200)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 500)               1600500   
_________________________________________________________________
activation_1 (Activation)    (None, 500)              

Fit training data into CNN model

In [5]:
model.fit(train_list, train_label, epochs=20, batch_size=32, validation_data=(test_list, test_label),
              shuffle=True, verbose=2, )

Train on 2387 samples, validate on 1023 samples
Epoch 1/20
 - 6s - loss: 2.4131 - accuracy: 0.3775 - val_loss: 1.6571 - val_accuracy: 0.6491
Epoch 2/20
 - 6s - loss: 0.9367 - accuracy: 0.8387 - val_loss: 0.6220 - val_accuracy: 0.8456
Epoch 3/20
 - 6s - loss: 0.3808 - accuracy: 0.9212 - val_loss: 0.3699 - val_accuracy: 0.9198
Epoch 4/20
 - 7s - loss: 0.2056 - accuracy: 0.9585 - val_loss: 0.2428 - val_accuracy: 0.9492
Epoch 5/20
 - 6s - loss: 0.1356 - accuracy: 0.9740 - val_loss: 0.1942 - val_accuracy: 0.9521
Epoch 6/20
 - 6s - loss: 0.0851 - accuracy: 0.9853 - val_loss: 0.1492 - val_accuracy: 0.9648
Epoch 7/20
 - 6s - loss: 0.0624 - accuracy: 0.9899 - val_loss: 0.1571 - val_accuracy: 0.9638
Epoch 8/20
 - 6s - loss: 0.0448 - accuracy: 0.9925 - val_loss: 0.1331 - val_accuracy: 0.9697
Epoch 9/20
 - 6s - loss: 0.0344 - accuracy: 0.9954 - val_loss: 0.1125 - val_accuracy: 0.9717
Epoch 10/20
 - 6s - loss: 0.0236 - accuracy: 0.9975 - val_loss: 0.1327 - val_accuracy: 0.9677
Epoch 11/20
 - 6s - l

<keras.callbacks.callbacks.History at 0x22082875518>

Compute loss and accuracy for training set and test set

In [6]:
test_loss, test_accuracy = model.evaluate(test_list, test_label, verbose = 0)
print("Test data accuracy:" + str(test_accuracy * 100) + "%. Loss: " + str(test_loss))
train_loss, train_accuracy = model.evaluate(train_list, train_label, verbose = 0)
print("Training data accuracy:" + str(train_accuracy * 100) + "%. Loss: " + str(train_loss))

Test data accuracy:97.75171279907227%. Loss: 0.09190742222943643
Training data accuracy:100.0%. Loss: 0.0023320119163828304
