### Problem Description

Write a demo program to apply CNN model on camera / video frames and record your results into videos.

### Program Blueprint

Here I will provide just a blueprint of the required program. I wasn't able to come out with a way to make an object detector based on a custom made CNN. Also, I haven't been able to find a proper resource on the internet to do so. The resources I was able to find were based on some pre-made architectures and algorithms such as ResNet, VGG, SSD, Yolo and so on...  This blueprint might also seem a bit naive, since it first 'does' the detection and then the classification, while in practice, both things are done simultaneously.  

Hopefully, given the opportunity, learning to do this properly would be amnong the first to-learn tasks during my internship.   

In [1]:
import cv2
import numpy as np
import imageio

In [2]:
#importing the class names
class_labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']


In [3]:
'''
This is the incomplete function that takes a frame as an argument,
turns it grayscale, detects and classifies the objects within
the frame. 
'''

def obj_detector(frame, cnn):
    
    #first, we turn the frame into grayscale
    cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
    images = []
    img_coords = []
    
    '''
    Let's assume that we have managed somehow to get the objects
    of interest from the frame, and that we have stored them in the
    images list, which is a list of 2D numpy arrays. We also have 
    another list - img_coords, which is a list of 4-tuples 
    containing the coorinates of an image within a frame. 
    '''
   

    for image, coords in zip(images, img_coords):
    
        #resizing, adding the channel axis and scaling to [0,1]
        image = cv2.resize(image, (28, 28))
        #maybe I should also add a channel, but not sure...
        #image = np.expand_dims(image, axis = 2)
        image = image.astype('float')
        image = image/255
        
        #predicting the class of an image 
        image = np.expand_dims(image, axis = 0)
        class_no = cnn.predict_classes(image)[0]
        pred_class = class_labels[class_no]
        
        #drawing a rectangle around an image
        cv2.rectangle(frame, (coords[0], coords[1]),
                     (coords[2], coords[3]),
                     (0,255,0))
        
        
        #labeling the image
        cv2.putText(frame, pred_class, (coords[0], coords[1]))
        
    
    return frame

Now we need to load a pretrained CNN.

In [4]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, Dropout, Dense, Flatten
from keras.optimizers import nadam
from keras.regularizers import l2
from keras.models import load_model

Using TensorFlow backend.


In [5]:
cnn = load_model('hm_shallow_net.h5py')

W0323 23:33:37.199226 12168 deprecation_wrapper.py:119] From c:\users\alegzander\appdata\local\programs\python\python36\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

W0323 23:33:38.016031 12168 deprecation_wrapper.py:119] From c:\users\alegzander\appdata\local\programs\python\python36\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.



In [6]:
cnn.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 28, 28, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 32)        9248      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1568)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               401664    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)              

Here's the main body of a program: it takes a live feed, detects and classifies clothing object using our custom-made CNN.

In [None]:
#turning on the camera
video_capture = cv2.VideoCapture(0)
#prepearing the recording file
writer = imageio.get_writer('clothing_detection.mp4')

while True:
    #reading the live feed
    _, frame = video_capture.read()
    #detecting and labeling clothing objects
    frame = obj_detector(frame, cnn)
    #displaying the output
    cv2.imshow('Video', frame)
    #writing the output to the video file
    writer.append_data(frame)
    #quit if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

writer.close()
#turning-off the camera and shutting down the display windows 
video_capture.release()
cv2.destroyAllWindows()