Davide Cremonini 14412 - Davide Sbetti 14032

----


# Real Time Application

In this notebook we would like to apply the previously trained model in real time, on the frames delivered by a webcam. 

**NOTE**: this notebook was not executed on Google Colab, since the model was saved and downloaded locally. All necessary files are located in the repository and the notebook can so be executed from there. 

We start by importing the necessary libraries. 

In [None]:
import tensorflow as tf
from tensorflow import keras
import cv2
import csv
import numpy as np
import time
from pygame import mixer

pygame 2.0.1 (SDL 2.0.14, Python 3.7.8)
Hello from the pygame community. https://www.pygame.org/contribute.html


We then initialise the mixer object, loading the defined alarm we have chosen to use when drowsiness is detected conan. 

In [None]:
mixer.init()
mixer.music.load("kikeriki.mp3")

We load then the model we saved previously. 

In [None]:
model = keras.models.load_model('model.h5')

We load the files for the opencv cascade classifier, files downloaded from the official repository of OpenCv (https://github.com/opencv/opencv/tree/master/data/haarcascades). 

We have decided to use the frontalface default, to detect faces, while we opted for two different classifiers in order to detect the two eyes. 

In [None]:
xml_path = "haarcascade_xml/"

In [None]:
face_cascade = cv2.CascadeClassifier(xml_path + "haarcascade_frontalface_default.xml")
right_eye_cascade = cv2.CascadeClassifier(xml_path + "haarcascade_righteye_2splits.xml")
left_eye_cascade = cv2.CascadeClassifier(xml_path + "haarcascade_lefteye_2splits.xml")

We set the image dimension to 80x80, since this is the input size of the model, and we start the capture. 

In [None]:
dim = (80,80)

Each time we obtain a new frame from the webcam, we convert it to gray, keeping 3 channels, as done also in the training of the model. 

We then try to detect a face in the frame. If the face is detected, we split vertically the resulting area in two parts and we apply the single classifiers to each one, in order to detect the eyes. 

If we are able to detect both eyes, we "cut" the interested area and we resize it to a 80x80 dimension. We standardise the two images (applying so the same transformations of the training phase), passing then them to the model. 

If both eyes are closed, we increase a counter of closed eyes, that stores the number of subsequent closed eyes frames. Once the counter reaches a certain threshold, we play the defined alarm. 

We reset the counter whenever two consecutive frames show open eyes (around half a second). 

In [None]:
cap = cv2.VideoCapture(0)

closed_count = 0
frame_count = 0
closed_tol_count = 0

start = time.time()
while(True):
    # Capture frame-by-frame
    ret, frame = cap.read()
    frame_count += 1
    
    # Our operations on the frame come here
    gray_single = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
    gray = np.zeros_like(frame)
    gray[:,:,0] = gray_single
    gray[:,:,1] = gray_single
    gray[:,:,2] = gray_single
    
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)

    if len(faces) > 0:
        (x,y,w,h) = faces[0]

        #img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
        half_int = int(np.ceil(w/2))

        roi_gray_left = gray[y:y+h, x:x+half_int]
        roi_gray_right = gray[y:y+h, x+half_int:x+w]

        #frame = cv2.rectangle(frame,(x,y),(x+half_int,y+h),(255,0,0),2) # left part
        #frame = cv2.rectangle(frame,(x+half_int,y),(x+w,y+h),(0,255,0),2) # right part

        roi_color_left = frame[y:y+h, x:x+half_int]
        roi_color_right = frame[y:y+h, x+half_int:x+w]

        right_eyes = right_eye_cascade.detectMultiScale(roi_gray_left)
        left_eyes = left_eye_cascade.detectMultiScale(roi_gray_right)

        # check we have detected something on both sides
        if len(right_eyes) > 0 and len(left_eyes) > 0:
            (rx,ry,rw,rh) = right_eyes[0]
            (lx,ly,lw,lh) = left_eyes[0]


            cv2.rectangle(roi_color_left,(rx,ry),(rx+rw,ry+rh),(0,255,0),2)
            cv2.rectangle(roi_color_right,(lx,ly),(lx+lw,ly+lh),(0,255,0),2)

            #take right eye image
            right_eye = roi_gray_left[ry:ry+rh, rx:rx+rw]
            left_eye = roi_gray_right[ly:ly+lh, lx:lx+lw]
            
            #cv2.imshow('frame',left_eye)
            
            right_eye_resized = cv2.resize(right_eye, dim, interpolation = cv2.INTER_AREA)
            left_eye_resized = cv2.resize(left_eye, dim, interpolation = cv2.INTER_AREA)
            
            right_eye_resized = right_eye_resized/255.0
            left_eye_resized = left_eye_resized/255.0
            
            left_mirror = cv2.flip(left_eye_resized, 1)
            #cv2.imshow('frame',left_mirror)
            
            right_final = tf.data.Dataset.from_tensor_slices([right_eye_resized])
            left_final = tf.data.Dataset.from_tensor_slices([left_mirror])
            
            
            right_open = model.predict(right_final.batch(32))
            left_open = model.predict(left_final.batch(32))            
            
            if right_open[0][0] < 0.5 and left_open[0][0] < 0.5:
                closed_count += 1
                cv2.putText(frame,'CLOSED', 
                    (500,460), 
                    cv2.FONT_HERSHEY_SIMPLEX, 
                    1,
                    (0,0,255),
                    2)
                close_tol_count = 0
            elif right_open[0][0] < 0.5 or left_open[0][0] < 0.5:
                cv2.putText(frame,'CLOSED', 
                    (500,460), 
                    cv2.FONT_HERSHEY_SIMPLEX, 
                    1,
                    (0,0,255),
                    2)
                #print("Closed eyes found")
            else:
                closed_tol_count += 1
                if closed_tol_count > 2:
                    closed_count = 0
                #print("Open eyes found")
            
            if closed_count >= 4 and not mixer.music.get_busy():
                mixer.music.play()
            
    # Display the resulting frame
    cv2.imshow('frame',frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        stop = time.time()
        break
        
cap.release()
cv2.waitKey(0)
cv2.destroyAllWindows()

Right: [[0.99675643]]  Left: [[0.93149817]]
Right: [[0.99999726]]  Left: [[0.99997807]]
Right: [[0.9999447]]  Left: [[0.9999995]]
Right: [[1.]]  Left: [[0.99935263]]
Right: [[0.9998214]]  Left: [[1.]]
Right: [[0.99998873]]  Left: [[1.]]
Right: [[0.9995645]]  Left: [[1.]]
Right: [[0.9999842]]  Left: [[0.99999666]]
Right: [[0.99999976]]  Left: [[0.9999932]]
Right: [[0.99992776]]  Left: [[0.9999964]]
Right: [[0.9999335]]  Left: [[0.99999654]]
Right: [[0.99988234]]  Left: [[0.99998164]]
Right: [[0.99959695]]  Left: [[0.9793924]]
Right: [[0.9924703]]  Left: [[0.99545854]]
Right: [[0.99981594]]  Left: [[0.9908395]]
Right: [[0.9999395]]  Left: [[0.93750596]]
Right: [[0.77440953]]  Left: [[0.88221455]]
Right: [[0.80850667]]  Left: [[0.72917914]]
Right: [[0.6175787]]  Left: [[0.3947398]]
Right: [[0.98582107]]  Left: [[0.22462872]]
Right: [[0.94399774]]  Left: [[0.15245667]]
Right: [[0.9998834]]  Left: [[0.9993943]]
Right: [[0.9951675]]  Left: [[0.99918395]]
Right: [[0.9987079]]  Left: [[0.99972

We can see how, on average, we have 3.4 frames each second with the camera we decided to use for our demo. This value allowed us to tune better the threshold value, which can be customised. 

In [None]:
frame_count/(stop-start)

3.4194400062487587