<a href="https://colab.research.google.com/github/lauvshree/ASLADigitechLearningMaterial/blob/master/Project_9_Face_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this problem we use "Transfer Learning" of an Object Detector model to detect any object according to the problem in hand.
Here, We are particularly interested in detecting faces in a given image.

To use the model first, we need to import the model and its supporting files for the model to function. We have the MobileNet model given in file mn_model.py
We use the below steps to import the model.

In [0]:
from google.colab import drive

In [2]:
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
import sys
sys.path.append('/content/gdrive/My Drive/Project 9/Files_required_for_face_detection')

In [0]:
import warnings
warnings.filterwarnings("ignore")


In [5]:
%cd /content/gdrive/My\ Drive/Project\ 9/Files_required_for_face_detection

/content/gdrive/My Drive/Project 9/Files_required_for_face_detection


The mn_model uses the older version of keras. We will ignore the warning for the time being and continue to use the older libraries.

Import the BatchGenerator and SSDLoss functions in given files face_generator.py, keras_ssd_loss and ssd_box_encode_decode_utils.py as well, used in MobileNet model.

In [6]:
#### Import the BatchGenerator and SSDLoss functions as well, used in MobileNet model

from face_generator import BatchGenerator
from keras_ssd_loss import SSDLoss
from ssd_box_encode_decode_utils import SSDBoxEncoder, decode_y, decode_y2

Using TensorFlow backend.


ImportError: ignored

In [0]:
from keras.optimizers import Adam, SGD, Nadam
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, TensorBoard, LearningRateScheduler
from keras.callbacks import Callback
from keras import backend as K 
from keras.models import load_model
from math import ceil 
import numpy as np 
from termcolor import colored


Set the parameters for the model. Originally it will have many classes. Our objective here is only to detect the object. We need to customize the model parameters according to our problem as given below.
Set n_classes (no.of classes) = 2, as we are interested in only face detection. Face will be one class and everything else comes under other class (we can call it as background).

In [0]:
img_height =512
img_width = 512

img_channels = 3

n_classes =2 
class_names = ["background","face"]

scales = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05] # anchorboxes for coco dataset
aspect_ratios = [[0.5, 1.0, 2.0],
                 [1.0/3.0, 0.5, 1.0, 2.0, 3.0],
                 [1.0/3.0, 0.5, 1.0, 2.0, 3.0],
                 [1.0/3.0, 0.5, 1.0, 2.0, 3.0],
                 [0.5, 1.0, 2.0],
                 [0.5, 1.0, 2.0]] # The anchor box aspect ratios used in the original SSD300
two_boxes_for_ar1 = True
limit_boxes = True # Whether or not you want to limit the anchor boxes to lie entirely within the image boundaries
variances = [0.1, 0.1, 0.2, 0.2] # The variances by which the encoded target coordinates are scaled as in the original implementation
coords = 'centroids' # Whether the box coordinates to be used as targets for the model should be in the 'centroids' or 'minmax' format, see documentation
normalize_coords = True

det_model_path = "./"

Now, we have imported the model and its dependencies. The next thing is to import the dataset for the model to train on. For this, we are using the WIDER FACE dataset.
To make the dataset available follow the steps given below.
Create a folder in your google drive for this project.

Download the train and test dataset files given in .zip format into your drive folder you created for the project in step-1.

Set the project path variable according to the folders you created to use for this project in your google drive.

project_path = "/content/drive/My Drive/DLCP/"

Now, as we mount the drive the images will be available to use for training and testing but in zip format.

So, lets extract the images from the zipfiles by using the code given of zipfile module.

In [0]:
project_path = "/content/gdrive/My Drive/Project 9/Files_required_for_face_detection/dataset/"

In [0]:
train_images_path = project_path + 'WIDER_train.zip'
test_images_path = project_path + 'WIDER_val.zip'

In [0]:
import zipfile
archive = zipfile.ZipFile(train_images_path, 'r')
archive.extractall()


classnames = []
for file in archive.filelist:
  fn = file.filename
  if (fn.startswith("WIDER_train/images/") and fn.endswith(".jpg") == False and fn.find("--") != -1):
    hyphenIdx = fn.find("--") + 2
    classnames.append(fn[hyphenIdx:len(fn)-1])
print("Possible objects")
print(classnames)


import zipfile
archive = zipfile.ZipFile(test_images_path, 'r')
archive.extractall()

Now, the images and their respective labels are available. But the objective of this project is not recognition but it is detection. So, as mentioned above, we will have just two classes viz., background and face.

Let's load the '' wider_train_small.npy'' file given to check the information given about the dataset. In this file you can see the information about each image in the dataset in a list with following elemets:

    1.   Image filename (str)
    2.   Image filename (str)
    3.   Image size (list) [height, width]
    4.   List of bounding box co-ordinates and Class label (list) [[a,b,c,d], Class label, ...]
 
    where,
    a,b,c,d are the four co-ordinates of the bounding box
    Class label is the position of object as mentioned in `class_names` list above.

In [0]:
filename1 = '/content/gdrive/My Drive/Project 9/Files_required_for_face_detection/wider_train_small.npy'

print(filename1)

data = np.load(filename1,allow_pickle=True).item()

In [0]:
### Printed first element to check the above given information.

for key in data:
  print data[key]
  break

In the case of the sample data printed out the class name is '1', indicating it is a face. As we can see from the above output all the information mentioned above is there for all the images.


Now, load the files wider_trian.npy and wider_val.npy

In [0]:
train_data = '/content/gdrive/My Drive/Project 9/Files_required_for_face_detection/wider_train_small.npy'
test_data = '/content/gdrive/My Drive/Project 9/Files_required_for_face_detection/wider_val_small.npy'

Now, call the imported model with the given parameters and freeze all the layers in the model with names not having ''detection'' word as prefix.
As we are not training the model from scratch, we are freezing all the above layers in the model having only last few layers while training to update their weights according to the problem in hand. This is called as Transfer Learning.

In [0]:
from mn_model import mn_model
K.clear_session()
model, model_layer, img_input, predictor_sizes = mn_model(image_size=(img_height, img_width, img_channels), 
                                                                      n_classes = n_classes,
                                                                      min_scale = None, 
                                                                      max_scale = None, 
                                                                      scales = scales, 
                                                                      aspect_ratios_global = None, 
                                                                      aspect_ratios_per_layer = aspect_ratios, 
                                                                      two_boxes_for_ar1= two_boxes_for_ar1, 
                                                                      limit_boxes=limit_boxes, 
                                                                      variances= variances, 
                                                                      coords=coords, 
                                                                      normalize_coords=normalize_coords)

print ("Freezing classification layers")
#Freeze layers
for layer_key in model_layer:
  if('detection'  not in layer_key):
    model_layer[layer_key].trainable = False
    print('Freezing layer, %s' % layer_key)


After making the model ready for transfer learning, load the weights of the model given in file ''mobilenet_1_0_224_tf.h5''

In [0]:
print ("Loading classification weights")
classification_model = '/content/gdrive/My Drive/Project 9/Files_required_for_face_detection/mobilenet_1_0_224_tf.h5'
model.load_weights(classification_model,  by_name= True)

Using the functions given in the model, we are trying to divide the dataset into train and validation samples. Run the below code.

In [0]:
batch_size = 32
ssd_box_encoder = SSDBoxEncoder(img_height=img_height,
                                img_width=img_width,
                                n_classes=n_classes, 
                                predictor_sizes=predictor_sizes,
                                min_scale=None,
                                max_scale=None,
                                scales=scales,
                                aspect_ratios_global=None,
                                aspect_ratios_per_layer=aspect_ratios,
                                two_boxes_for_ar1=two_boxes_for_ar1,
                                limit_boxes=limit_boxes,
                                variances=variances,
                                pos_iou_threshold=0.5,
                                neg_iou_threshold=0.2,
                                coords=coords,
                                normalize_coords=normalize_coords)

train_dataset = BatchGenerator(images_path=train_data, 
                include_classes='all', 
                box_output_format = ['class_id', 'xmin', 'xmax', 'ymin', 'ymax'])

print ("==>TRAINING DATA")
print ("==> Parsing XML files ...")

train_dataset.parse_xml(
                  annotations_path=train_data,
                  image_set_path='None',
                  image_set='None',
                  classes = class_names, 
                  exclude_truncated=False,
                  exclude_difficult=False,
                  ret=False, 
                  debug = False)
print("==>Parsing XML Finished.")

print ("==>Generate training batches...")
train_generator = train_dataset.generate(
                 batch_size=batch_size,
                 train=True,
                 ssd_box_encoder=ssd_box_encoder,
                 equalize=True,
                 brightness=(0.5,2,0.5),
                 flip=0.5,
                 translate=((0, 20), (0, 30), 0.5),
                 scale=(0.75, 1.2, 0.5),
                 crop=False,
                 #random_crop = (img_height,img_width,1,3), 
                 random_crop=False,
                 resize=(img_height, img_width),
                 #resize=False,
                 gray=False,
                 limit_boxes=True,
                 include_thresh=0.4,
                 diagnostics=False)

print ("==>Training batch generation complete")

print(train_dataset.filenames)

n_train_samples = train_dataset.get_n_samples()

print ("==>Total number of training samples = {}".format(n_train_samples))

print ("==>VALIDATION")

val_dataset = BatchGenerator(images_path=test_data, include_classes='all', 
                box_output_format = ['class_id', 'xmin', 'xmax', 'ymin', 'ymax'])

print ("==> Parsing XML files ...")

val_dataset.parse_xml(
                  annotations_path=test_data,
                  image_set_path='None',
                  image_set='None',
                  classes = class_names, 
                  exclude_truncated=False,
                  exclude_difficult=False,
                  ret=False, 
                  debug = False)


print("==>Parsing XML Finished.")


print ("==>Generate testing batches...")
val_generator = val_dataset.generate(
                 batch_size=batch_size,
                 train=True,
                 ssd_box_encoder=ssd_box_encoder,
                 equalize=False,
                 brightness=False,
                 flip=False,
                 translate=False,
                 scale=False,
                 crop=False,
                 #random_crop = (img_height,img_width,1,3), 
                 random_crop=False, 
                 resize=(img_height, img_width), 
                 #resize=False, 
                 gray=False,
                 limit_boxes=True,
                 include_thresh=0.4,
                 diagnostics=False)


print ("==>Training batch generation complete")

n_val_samples = val_dataset.get_n_samples()

print ("==>Total number of validation samples = {}".format(n_val_samples))


Now, lets setup things for training by initilaizing required variables like learning rate, epochs, optimizer and loss function(SSDLoss) to compile the model.


In [0]:
# setting up training 

batch_size = 16
num_epochs = 25

#Learning rate
base_lr = 0.002

# Optimizer
adam = Adam(lr=base_lr, beta_1=0.9, beta_2=0.999, epsilon=1e-6, decay = 0.0)

# Loss
ssd_loss = SSDLoss(neg_pos_ratio=2, n_neg_min=0, alpha=1.0, beta = 1.0)

# Compile
model.compile(optimizer=adam, loss=ssd_loss.compute_loss)

Lets create early stopping and model checkpoint layers on validation loss with some patience values and use fit_generator to train the model on data generated batch-by-batch by a Python generator, train_generator object as generator, using early stopping and model checkpoint as callbacks.
We are using checkpoint to save the best model based on validation accuracy.


In [0]:
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.001, patience=100)

model_checkpoint =  ModelCheckpoint(det_model_path + 'ssd_mobilenet_face_epoch_{epoch:02d}_loss{val_loss:.4f}.h5',
                                                           monitor='val_loss',
                                                           verbose=1,
                                                           save_best_only=True,
                                                           save_weights_only=True,
                                                           mode='auto',
                                                           period=1)

print ("Fitting the model")

history = model.fit_generator(generator = train_generator,
                              steps_per_epoch = ceil(n_train_samples/batch_size)*2,
                              epochs = num_epochs,
                              callbacks = [model_checkpoint, early_stopping],                      
                              validation_data = val_generator,
                              validation_steps = ceil(n_val_samples/batch_size))

model.save_weights("./" + 'ssd_mobilenet_weights_epoch_{}.h5'.format(num_epochs))

print ("model and weight files saved at : " + det_model_path)


We will now load the best saved model from above step and check predictions for test data using test_generator object to generate batches.

In [0]:
model_path = './'
model_name = './ssd_mobilenet_face_epoch_20_loss0.2117.h5'

model.load_weights(model_path + model_name,  by_name= True)

print (colored('weights %s loaded' % (model_path + model_name), 'green'))

Let's use the below function to plot the boundingbox in the test image to show the predictions.


In [0]:
def save_bb(path, filename, results, prediction=True):
  
  img = image.load_img(filename, target_size=(img_height, img_width))
  img = image.img_to_array(img)

  filename = filename.split("/")[-1]

  if(not prediction):
    filename = filename[:-4] + "_gt" + ".jpg"

  currentAxis = plt.gca()

 # Get detections with confidence higher than 0.6.
  colors = plt.cm.hsv(np.linspace(0, 1, 25)).tolist()
  color_code = min(len(results), 16)
  print (colored("total number of bbs: %d" % len(results), "yellow"))
  for result in results:
    # Parse the outputs.

    if(prediction):
      det_label = result[0]
      det_conf = result[1]
      det_xmin = result[2]
      det_xmax = result[3]
      det_ymin = result[4]
      det_ymax = result[5]
    else :
      det_label = result[0]
      det_xmin = result[1]
      det_xmax = result[2]
      det_ymin = result[3]
      det_ymax = result[4]

    xmin = int(det_xmin)
    ymin = int(det_ymin)
    xmax = int(det_xmax)
    ymax = int(det_ymax)

    if(prediction):
      score = det_conf
    
    plt.imshow(img / 255.)
    
    label = int(int(det_label))
    label_name = class_names[label]
    # print label_name 
    # print label

    if(prediction):
      display_txt = '{:0.2f}'.format(score)
    else:
      display_txt = '{}'.format(label_name)

      
    # print (xmin, ymin, ymin, ymax)
    coords = (xmin, ymin), (xmax-xmin), (ymax-ymin)
    color_code = color_code-1 
    color = colors[color_code]
    currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor=color, linewidth=2))
    currentAxis.text(xmin, ymin, display_txt, bbox={'facecolor':color, 'alpha':0.2})

  # y
  currentAxis.axes.get_yaxis().set_visible(False)
  # x
  currentAxis.axes.get_xaxis().set_visible(False)
  plt.savefig(path + filename, bbox_inches='tight')

  print ('saved' , path + filename)

  plt.clf()


In [0]:
!mkdir output_test

In [0]:
from keras.preprocessing import image
from matplotlib import pyplot as plt

test_size = 10
test_generator = val_dataset.generate(
                 batch_size=test_size,
                 train=False,
                 ssd_box_encoder=ssd_box_encoder,
                 equalize=False,
                 brightness=False,
                 flip=False,
                 translate=False,
                 scale=False,
                 crop=False,
                 #random_crop = (img_height,img_width,1,3), 
                 random_crop=False, 
                 resize=(img_height, img_width), 
                 #resize=False,
                 gray=False,
                 limit_boxes=True,
                 include_thresh=0.4,
                 diagnostics=False)

print (colored("done.", "green"))

print (colored("now predicting...", "yellow"))

_CONF = 0.60 
_IOU = 0.15

for i in range(test_size):
  X, y, filenames = next(test_generator)
  y_pred = model.predict(X)


  y_pred_decoded = decode_y2(y_pred,
                             confidence_thresh=_CONF,
                            iou_threshold=_IOU,
                            top_k='all',
                            input_coords=coords,
                            normalize_coords=normalize_coords,
                            img_height=img_height,
                            img_width=img_width)


  np.set_printoptions(suppress=True)

  save_bb("./output_test/", filenames[i], y_pred_decoded[i])
  save_bb("./output_test/", filenames[i], y[i], prediction=False)


Let's now visualize the test image to check the predictions


In [0]:
from google.colab.patches import cv2_imshow

In [0]:
import cv2
img = cv2.imread('././output_test/28_Sports_Fan_Sports_Fan_28_590_gt.jpg', cv2.IMREAD_UNCHANGED)
cv2_imshow(img)

As we can see in the above image all the faces of which even a part are shown have been predicted by our model correctly. 


In [0]:
%pylab inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib.pyplot import figure

img=mpimg.imread('./output_test/2_Demonstration_Political_Rally_2_219.jpg')
figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k')
imgplot = plt.imshow(img)

plt.show()

This image prediction seems to have gone wrong. The training set didn't have a balance of ethnicity and that could be the main reason. We need to balance the data on demographic basis. We will verify the test results with another image in the test dataset.

In [0]:
img=mpimg.imread('/content/gdrive/My Drive/Project 9/Files_required_for_face_detection/WIDER_val/images/16--Award_Ceremony/16_Award_Ceremony_Awards_Ceremony_16_338.jpg')
figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k')
imgplot = plt.imshow(img)

plt.show()


img=mpimg.imread('./output_test/16_Award_Ceremony_Awards_Ceremony_16_338.jpg')
figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k')
imgplot = plt.imshow(img)

plt.show()



The face detection in this case given the original image seems correct. We can however fine tune the model by training it with a more extensice dataset. 