# Face Mask Detector
Author: [Handoyo](https://github.com/t3981-h) & [Tomy](https://github.com/tomytjandra)

In this notebook, we will create a real-time face mask detector via video camera as follows.

<img src="demo.gif" width="500">

This application is created to automatically classify whether a face is covered with a mask or not. This is useful for today's COVID-19 situation because can be implemented on an automatic door lock to only allow those who have covered their face with a mask to enter. If implemented correctly, the mask detector we’re building here could potentially be used to ensure one's safety and the safety of others.

First thing first, we train a Convolutional Neural Network using `keras` in order to "teach" the computer how to differentiate a facial feature of a covered face from the uncovered one. Then this trained model is applied after HaarCascade face detection on a video camera.

# Import Libraries

In [1]:
import os
import random
import shutil
import cv2
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, Input, ZeroPadding2D, BatchNormalization, Activation, MaxPooling2D, Flatten, Dense
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# Modelling

Here is the structure of our folder containing image data:

    dataset
    ├───test
    │   ├───without_mask
    │   └───with_mask
    ├───train
    │   ├───without_mask
    │   └───with_mask
    └───val
        ├───without_mask
        └───with_mask

The folder `dataset` consists of three subfolders `test`, `train`, and `val` in which each of them has another subfolder: `without_mask` and `with_mask` denoting the class of our target variable.

[Source of dataset](https://www.linkedin.com/feed/update/urn%3Ali%3Aactivity%3A6655711815361761280/)

## Data Augmentation

We apply on-the-fly data augmentation, a technique to expand the `train` dataset size by creating a modified version of the original image which can improve model performance and the ability to generalize. This can be achieved by using `ImageDataGenerator()` provided by `keras`. We will not perform any data augmentation on the `test` data.

In [2]:
TRAINING_DIR = "dataset/train"
train_datagen = ImageDataGenerator(rescale=1.0/255,
                                   rotation_range=40,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True,
                                   fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(TRAINING_DIR, 
                                                    batch_size=20, 
                                                    target_size=(224, 224),
                                                    seed=123)


VALIDATION_DIR = "dataset/test"
validation_datagen = ImageDataGenerator(rescale=1.0/255)

validation_generator = validation_datagen.flow_from_directory(VALIDATION_DIR, 
                                                         batch_size=20, 
                                                         target_size=(150, 150),
                                                         seed=123)

Found 1315 images belonging to 2 classes.
Found 194 images belonging to 2 classes.


## Define Network

CNN is used as an automatic feature extractor from the images so that it can learn how to distinguish between `without_mask` and `with_mask` images. It effectively uses the adjacent pixel to downsample the image and then use a prediction (fully-connected) layer to solve the classification problem.

Here is the detailed architecture that we are going to use:

1. **First convolutional layer**: consists of 100 `filters` with `kernel_size` matrix 3 by 3.
2. **First pooling layer**: Using max-pooling matrix 2 by 2 (`pool_size`) and 2-pixel `strides` at a time reduce the image size by half.
3. **Second convolutional layer**: Same as the first convolutional layer.
4. **Second pooling layer**: Same as the first pooling layer.
5. **Flattening**: Convert two-dimensional pixel values into one dimension, so that it is ready to be fed into the fully-connected layer.
6. **First dense layer + Dropout**: consists of 50 `units` and 1 bias unit. Dropout of rate 50% is used to prevent overfitting.
7. **Output layer**: consists of two units and activation is a sigmoid function to convert the scores into a probability of an image being `without_mask` or `with_mask`.

In [3]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(100, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(100, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(2, activation='softmax')
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 148, 148, 100)     2800      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 100)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 100)       90100     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 100)       0         
_________________________________________________________________
flatten (Flatten)            (None, 129600)            0         
_________________________________________________________________
dropout (Dropout)            (None, 129600)            0         
_________________________________________________________________
dense (Dense)                (None, 50)                6

## Compile Network

Next, we specify how the model backpropagates or update the weights after each batch feed-forward. We use adam optimizer and a loss function binary cross-entropy since we are dealing with binary classification problem. The metrics used to monitor the training progress is accuracy.

In [4]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

## Training

The full training process is available on [Google Colab](https://drive.google.com/file/d/1gw2CcVndcTTBRJXnu7NPrrLlEQXj2Gif/view?usp=sharing).

In [None]:
checkpoint = ModelCheckpoint('model-{epoch:03d}.model',monitor='val_loss',verbose=0,save_best_only=True,mode='auto')
history = model.fit_generator(train_generator,
                              epochs=2 ,
                              validation_data=validation_generator,
                              callbacks=[checkpoint])

We load the trained model instead:

In [5]:
model_saved = tf.keras.models.load_model('mask.h5')
loss, acc = model_saved.evaluate(validation_generator, steps=3, verbose=0)
acc

0.8500000238418579

The `model` achieves 85% accuracy for only two epochs of training, which is quite good.

# Turn On Your Camera

Run the code below and test the face mask detector on your own. Enjoy! :)

In [6]:
labels_dict={0:'without_mask',1:'with_mask'}
color_dict={0:(0,0,255),1:(0,255,0)}

size = 4
webcam = cv2.VideoCapture(0) # Use camera 0

# We load the xml file, using HaarCascade for face detection
classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

while True:
    (rval, im) = webcam.read()

    # Flip to act as a mirror
    im=cv2.flip(im,1,1)

    # Resize the image to speed up detection
    mini = cv2.resize(im, (im.shape[1] // size, im.shape[0] // size))

    # Detect MultiScale / faces 
    faces = classifier.detectMultiScale(mini)

    # Draw rectangles around each face
    for f in faces:
        # Scale the shapesize backup
        (x, y, w, h) = [v * size for v in f]
        
        # Save just the rectangle faces in SubRecFaces
        face_img = im[y:y+h, x:x+w]
        resized=cv2.resize(face_img,(150,150))
        normalized=resized/255.0
        reshaped=np.reshape(normalized,(1,150,150,3))
        reshaped = np.vstack([reshaped])
        
        # Classify using trained CNN
        result=model_saved.predict(reshaped)
        label=np.argmax(result,axis=1)[0]
        
        # Draw the rectangle
        cv2.rectangle(im,(x,y),(x+w,y+h),color_dict[label],2)
        cv2.rectangle(im,(x,y-40),(x+w,y),color_dict[label],-1)
        cv2.putText(im, labels_dict[label], (x, y-10),cv2.FONT_HERSHEY_SIMPLEX,0.8,(255,255,255),2)
        
    # Show the image
    cv2.imshow('LIVE', im)
    key = cv2.waitKey(10)
    
    # If Esc key is press then break out of the loop 
    if key == 27: # Esc key
        break
        
# Stop video
webcam.release()

# Close all started windows
cv2.destroyAllWindows()

No pain no gain

Good bye and see you again :)