## Introduction
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.

The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc.

It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS. This tutorial will particularly focus on its python interfaces.

## Tutorial Content
This tutorial aims at getting you familiar with OpenCV. We will show how to use OpenCV to do basic operations to images and videos. Later, we will leverage OpenCV existing classifiers to do face detection.

Finally, we will analyze a speech video of Trump to see how OpenCV applies to real life video data. Specially, we would combine what we have learned to do face detection and tracking to the video and present the result in a decent way. 

We will cover the following topics in this tutorial:
- [Installing the library](#Installing-the-library)
- [Loading, Displaying and Saving images](#Loading,-Displaying-and-Saving-images)
- [Loading video and Displaying frame-by-frame](#Loading-video-and-Displaying-frame-by-frame)
- [Drawing Functions in OpenCV](#Drawing-Functions-in-OpenCV)
- [Accessing Image](#Accessing-Image)
- [Arithmetic Operations](#Arithmetic-Operations)
- [Face Detection using Haar Cascades](#Face-Detection-using-Haar-Cascades)
- [Example Application: Mark Trump's face from his speech video](#Example-Application:-Mark-Trump's-face-from-his-speech-video)

## Installing the library
Before getting started, you need to install OpenCV. Using pip is the best choice:

    $ pip install opencv-contrib-python

Once you have installed the library, you can check it by print the version of it.

In [1]:
import cv2
print(cv2.__version__)

3.4.0


## Loading, Displaying and Saving images
Now since we've installed OpenCV, let's look at some basic functions. First, we use `cv2.imread()` to read an image.

`cv2.imread()` has two arguments. The first one stands for a relative or full path of the image and the second argument specifies the way the image would be read. The value could be `1`, `0`, `-1`, representing `cv.IMREAD_COLOR`, `cv.IMREAD_GRAYSCALE` and `cv.IMREAD_UNCHANGED` respectively.

Suppose we want to read an image in the working directory named "cat.jpg" without any change.

In [2]:
img = cv2.imread('cat.jpg', -1)

After loading the image, we can use `cv2.imshow()` method to display the image in a new window.

In [3]:
cv2.imshow('example_window', img)
cv2.waitKey(0) # wait infinitely
cv2.destroyAllWindows()

<img src="cat.jpg">

And you can use `cv2.imwrite()` to save the image.

In [4]:
cv2.imwrite('cat2.jpg', img)

True

## Loading video and Displaying frame-by-frame
Often, the data we could collect is in the format of video. In this case, we should read the video and break it into many frames. OpenCv provides extremely easy way to do these operations. To read a video, we have to create a `VideoCapture` object using `cv2.VideoCapture()`. Its argument can be either a device index or the path of a video file. To save a video, we need to create a `VideoWriter` object using `cv2.VideoWriter()` and then constantly write frames to the object. In the method, we should specify file path, [FourCC](http://www.fourcc.org/codecs.php) code, the number of frames per second(fps) and frame size.

In [5]:
# create VideoCapture object & get frame width and height
cap = cv2.VideoCapture('trump.wmv')
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))

# Define FourCC and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'MJPG')
out = cv2.VideoWriter('trump_out.avi', fourcc, 30.0, (frame_width, frame_height))

while(cap.isOpened()):
    # read the next frame 
    ret, frame = cap.read()
    if ret == False:
        break
        
    # you can make some changes to the frame here
    
    # write the frame
    # out.write(frame) #de-comment and re-run to check saving function
    
    # display the frame
    cv2.imshow('frame',frame)
    if cv2.waitKey(10) & 0xFF == ord('q'):# press Q to quit
        break

# release everything
cap.release()
out.release()
cv2.destroyAllWindows()

## Drawing Functions in OpenCV
Sometimes, we need to use lines to mark down the information we want to present and OpenCV provides some useful drawing functions for us. For example, when we detect a human face in an image, we may use a rectangle to mark its boundary and generate a new image for visualization. Following are the examples of `cv2.line()`, `cv2.rectangle()` and `cv2.circle()` with straightforward comments.

It should be noted that all the drawing functions do `in-place modification` to the source image. Therefore, if you do need the origin one for further use, remember to make copies before calling drawing functions.

In [6]:
import numpy as np
# Create a black image
img = np.zeros((512,512,3), np.uint8)
img_line = img.copy()
img_rect = img.copy()
img_circle = img.copy()
img_text = img.copy()

# Display the original black image
cv2.imshow('black_image', img)
cv2.waitKey(500)
cv2.destroyAllWindows()

# Draw a diagonal white line with thickness of 5 px, Display it
# Parameters: image object, start point, end point, color, thickness 
img_line = cv2.line(img_line,(0,0),(512,512),(255,255,255),5)

cv2.imshow('white_line', img_line)
cv2.waitKey(1000)
cv2.destroyAllWindows()

# Draw a blue rectangle with thickness of 3px, Display it
# Parameters: image object, top-left point, bottom-right point, color, thickness
img_rect = cv2.rectangle(img_rect,(0,0),(128,128),(255,0,0),3)
cv2.imshow('blue_rect', img_rect)
cv2.waitKey(1000)
cv2.destroyAllWindows()

# Draw a circle
# Parameters: image object, center, radius, color, thickness(-1 means filling the closed shape inside)
img_circle = cv2.circle(img_circle,(100,100), 50, (0,0,255), -1)
cv2.imshow('red_circle', img_circle)
cv2.waitKey(1000)
cv2.destroyAllWindows()

## Accessing Image
Image is just one format of data. After loading an image to memory, we can directly access and modify pixel values of image by row and column coordinates. A BGR image is three-dimensional so it would return a vector of size 3 given (x,y). You could also use (x,y,z) to get the specific value of B, G or R. For a grayscale image, only one number which stands for intensity would return.

In [7]:
pixel = img[50,50]
print(pixel)
print(type(pixel))

# access only green pixel
g_pixel = img_line[3,3,1]
print(g_pixel)

# modify a pixel value
img_line[3,3,1] = 255

[0 0 0]
<class 'numpy.ndarray'>
255


As a matrix three dimensional vector, an image has its shape and other properties.

In [8]:
# returns a tuple of number of rows, columns and channels (if image is color)
print(img.shape)

# Total number of pixels
print(img.size)

# Image datatype
print(img.dtype)

(512, 512, 3)
786432
uint8


Sometimes, we want to manipulate part of the whole picture. By numpy indexing, we can easily retreive take out a region of an image and then put it in another region of the image.

In [9]:
img_res = img_rect.copy()
#cv2.imwrite('before.png',img_res)
region = img_rect[0:129,0:129]

img_res[383:512,383:512] = region
#cv2.imwrite('after.png', img_res)
cv2.imshow('image', img_res)
cv2.waitKey(0)
cv2.destroyAllWindows()

## Arithmetic Operations
Sometimes, we want to combine two images into one single image. Then we should use `c2.add()` method. You may directly use `+` to do that but OpenCV addition is a saturated operation instead of a modulo one, which means number beyond 255 would simply become 255 rather than mod 255. Normally, a saturated operation will provide a better result. We could also add them with different weights using `cv2.addWeighted()`. Then the following equation `dst = α * img1 + β * img2 + γ` will be applied.

In [10]:
img1 = cv2.imread('emoji.jpg')
img2 = cv2.imread('bg.jpg')

<img src="emoji.jpg" align="left" height="30%" width="30%"> <img src="bg.jpg" align="center" height="30%" width="30%">

In [11]:
add_img = cv2.add(img1, img2)
blend_img = cv2.addWeighted(img1,0.2,img2,0.8,0)

cv2.imshow('added image',add_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

<img src="added_image.png" width="30%" height="30%">

In [12]:
cv2.imshow('blended image',blend_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

<img src="blended_image.jpg" width="30%" height="30%">

## Face Detection using Haar Cascades
So far, we have learned how load data, access data and do some basic operations to our data. Now, let's do something more interesting. 

Suppose we have an image of Trump and we want to detect the position of his face. To achieve this, we can leverage the advantage of machine learning. First we need to train a machine learning model. Then, put the image into the model as input and it will give us the result we want. For object detection, using  `Haar feature-based cascade` classifiers is an effective way. The principle of the method could be learned from Paul Viola and Michael Jones' paper, `"Rapid Object Detection using a Boosted Cascade of Simple Features"` in 2001 or a video [here](https://www.youtube.com/watch?v=WfdYYNamHZ8). Fortunately, OpenCV comes with some existing detectors using Haar Cascades for facing detections, which are stored in `opencv/data/haarcascades/` folder. To use them, we should first load these detectors files. Then load our image in `grayscale mode`.

In [13]:
img_face = cv2.imread('face.png',0)
cv2.imshow('blended image',img_face)
cv2.waitKey(0)
cv2.destroyAllWindows()
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

<img src="face.png" width="80%">

Then we use `detectMultiScale()` method to detect faces in the image. It will return a list of tuples, each tuple in the form of (x,y,w,h) as a rectangle. As you may wonder how to select parameters for `detectMultiScale()` method, detailed tutorial is accessible from [here](http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Image_Object_Detection_Face_Detection_Haar_Cascade_Classifiers.php), which may take some time to understand.

In [14]:
faces = face_cascade.detectMultiScale(img_face, 1.3, 5)
for face in faces:
    print(face)

[540 120 228 228]


Because of different position, background and size of training data, the performances of default models provided by OpenCV are not always as good as we expected. Sometimes, we do have to train a different model for our own task. Its full details are [here](https://docs.opencv.org/2.4/doc/user_guide/ug_traincascade.html).

## Example Application: Mark Trump's face from his speech video
Now we want to see how OpenCV can help us do analysis in real life. In general, the original data we can get in real life is always in the format of videos. Therefore, the following content will show a simple example of how to mark Trump's face from his speech video and generate a new video with face detection boundaries. The steps are:
- Loading the video
- Breaking the video into frames
- Detecting faces in a frame
- Marking the result of detection by drawing functions
- Displaying and saving frames

Marked frames will be like this:
<img src="trump_marked.jpg" width="80%">

In [15]:
def mark(frame, face_cas):
    gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # bgr to gray mode
    faces = face_cascade.detectMultiScale(gray_frame, 1.3, 5)
    for (x,y,w,h) in faces:
        frame = cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),5)
    
def process_video(r_path, w_path, face_cas):
    cap = cv2.VideoCapture(r_path)
    frame_width = int(cap.get(3))
    frame_height = int(cap.get(4))
    
    fourcc = cv2.VideoWriter_fourcc(*'MJPG')
    out = cv2.VideoWriter(w_path, fourcc, 30.0, (frame_width, frame_height))
    cnt = 0
    while(cap.isOpened()):
        ret, frame = cap.read()
        if ret == False:
            break

        # detect faces in the frame and mark
        mark(frame, face_cas)
        if cnt == 0:
            cnt = 1
            cv2.imwrite('trump_marked.jpg',frame)
        # write the frame
        # out.write(frame)

        # display the frame
        cv2.imshow('frame',frame)
        if cv2.waitKey(5) & 0xFF == ord('q'):# press Q to quit
            break
            
    cap.release()
    out.release()
    cv2.destroyAllWindows()
    return

# pre-processing starts
in_path = 'trump.wmv'
out_path = 'out.avi'
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
process_video(in_path, out_path, face_cascade)

## Summary and references
This tutorial simply introduced how to play with images and videos using OpenCV and how to use OpenCV interfaces to do face detection. You may want to explore more about OpenCV or want to know the fundamental principle of face detection. Further, you may want to know how to use OpenCV to track objects. Much more detail are available from the following links.

1. OpenCV: https://opencv.org/
2. Face detection and tracking method: https://www.youtube.com/watch?v=WfdYYNamHZ8
3. Object Tracking using OpenCV: https://www.learnopencv.com/object-tracking-using-opencv-cpp-python/ 