**Most of the things we need here are the same as we used for images but I will still do everything again.**

The things that will be different will start after we make the function to detect faces and put rectange on them.

**Importing the necessary libraries**

In [5]:
# open cv for image processing and importing 
import cv2

# for linear algebra
import numpy as np

# for visualizing the images 
import matplotlib.pyplot as plt
%matplotlib inline

**We can now import the main DNN model files and there are two of them:**
* The .prototxt file(s) which define the model architecture (i.e., the layers themselves)
* The .caffemodel file which contains the weights for the actual layers

In [2]:
model_file = 'model/res10_300x300_ssd_iter_140000.caffemodel'
config_file = 'model/deploy.prototxt.txt'

# for the actual model
net = cv2.dnn.readNetFromCaffe(config_file, model_file)

**Now that we have the model we just have to make a function to make a rectangle where it detects a face.**

The steps we will follow are:
1. Take the image and make a copy of it.
2. Make a blob out of the image using cv2 function **blobFromImage** : this basically means that we pre-process the image and do some operations on it for it to be sent into the network. The things included in the setting the dimensions of the blob, normalizing it and some other things that are given as the arguments and are explained below:
    * the actual image itself.
    
    * resize : we resize the image to be `(300,300)` so that it is not too large.
    
    * scaleFactor : After we perform mean subtraction we can optionally scale our images by some factor. This value defaults to `1.0` (i.e., no scaling) but we can supply another value as well. 
    
    * size : Here we supply the spatial size that the Convolutional Neural Network expects. For most current state-of-the-art neural networks this is either `224×224`, `227×227`, or `299×299`.
    
    *  These are our mean subtraction values. They can be a 3-tuple of the RGB means.
    
    * swapRB : OpenCV assumes images are in BGR channel order; however, the `mean` value assumes we are using RGB order. To resolve this discrepancy we can swap the R and B channels in image  by setting this value to `True`. By default OpenCV performs this channel swapping for us.

3. Feed the blob to the network.
4. Now we get detections which is a 4-dimensional matrix which has for every face it detected some values and they are numberwise:
    * don't know
    * the current frame (this is relevant if it is being given multiple frames)
    * no. of detections (no. of faces it detected)
    * this is again a number of values in a single dimension:
        * class id
        * class score
        * confidence (how confident it is that it is detecting a face)
        * the actual dimensions of the bounding rectangle (x,y,w,h)
        
5. Now we loop through all the detections and do :
    * if the confidence is above a threshold, then:
        * scale the normalized bounding box dimensions to be bigger again.
        * make the bounding box on top of the image.
        * set the text above the box as the cofidence it has for that box.

In [8]:
def detect_face(img):
    
    face = img.copy()
    
    # the original dimensions of the frame
    (h, w) = face.shape[:2]
    
    # make the blob
    blob = cv2.dnn.blobFromImage(cv2.resize(face, (300, 300)), 1.0,
    (300, 300), (104.0, 177.0, 123.0))
    
    # feed it to the network
    net.setInput(blob)
    detections = net.forward()
    
    # loop over the detections
    for i in range(0, detections.shape[2]):
        
        # extract the confidence of the rectangle
        confidence = detections[0, 0, i, 2]
        
        # set the threshold for the minimum confidence (you can mess around with this to get better results)
        if confidence > 0.3:
            
            # get the bounding box coordinates and scale them to fit the frame width
            box = detections[0,0,i,3:7] * np.array([w, h, w, h])
            (X, Y, width, height) = box.astype('int')
            
            # draw the bounding box on around the face
            cv2.rectangle(face, (X, Y), (width, height), (255,0,0), 2 )
            
            # get the position above the box
            y = Y - 10 if Y - 10 > 10 else Y + 10
            
            # make the confidence into a percentage value
            text = "{:.2f}%".format(confidence * 100)
            
            # put the confidence score above the rectange
            cv2.putText(face, text, (X, y),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 255, 255), 2)
            
    return face

**Now we will do the things differently from the image one. Here we are using video which is nothing but a lot of images showing one after the other based on how many frames(images) are set to show in a second.**

We can do two types of video i.e. from a webcam(live video) or from a video file already recorded.

So the steps we need to do are:
1. Make a video capture object using openCV. This will take the video as input. We can either give it a **path to a video** or just put **0 to for it to automatically detect from the default webcam that is present in the computer**.

2. Make a while loop that will constantly do a certain things for us until we stop it. The things are:
    * get the frames from the video one by one.
    
    * pass that frame to the face detection function.
    
    * show the frame using openCV.
    
    * Make a waitkey to break out of the while loop.
    
    * release the capture object after breaking out of the loop.
    
    * destroy all the windows after breaking out of the loop.
    

In [9]:
# the capture object
cap = cv2.VideoCapture(0)

# the infinite while loop
while True:
    
    # read frame by frame
    ret, frame = cap.read()
    
    # pass the frame to the function
    frame = detect_face(frame)
    
    # display the resultant frame
    cv2.imshow("face", frame)
    
    
    # make a waitkey that will wait for a second and break out of the loop if "q" is pressed.
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

#release the cap object
cap.release()
# destroy all the windows
cv2.destroyAllWindows()

We can also write this video file we are reading frame by frame by using a writer object. That will have around 6 lines of code added to this.

The steps for that will be same as above with the added lines of:
1. Get the frame width and height of the camera.
2. Make a writer object with appropriate arguments.
3. Write the frame we are reading after processing it.
4. Release the writer object after breaking out of the while loop.


In [7]:
# the capture object
cap = cv2.VideoCapture(0)

# get the frame width and height
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))   
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# make the writer object
writer = cv2.VideoWriter('face_detection_video.mp4', cv2.VideoWriter_fourcc(*'DIVX'), 20, (width,height))

# the infinite while loop
while True:
    
    # read frame by frame
    ret, frame = cap.read()
    
    # pass the frame to the function
    frame = detect_face(frame)
    
    # write the frame
    writer.write(frame)
    
    # display the resultant frame
    cv2.imshow("face", frame)
    
    
    # make a waitkey that will wait for a second and break out of the loop if "q" is pressed.
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

#release the cap object
cap.release()
# release the writer object
writer.release()
# destroy the windows
cv2.destroyAllWindows()

# Thank you.