## Basics of Computer Vision: an interactive tutorial using OpenCV


Computer vision is a sector of artificial intelligence that is concerned with making computers capable of ‘understanding’ visual content. On the surface, the process of how we see the world and how computers perceive it are very similar. Our eyes act as visual inputs to our brain by receiving light and translating it into electrical pulses. The brain is required to process these electrical pulses to produce an image.

Computers do not ‘see’ images just like our eyes do not ‘see’ the world.  Similarly to how our brain operates, computers receive electrical pulses that are converted to bits of information. The major difference is that the human brain is much more powerful than computers, especially in terms of analytical reasoning. Computers are not able to naturally ‘understand’ things. That is one of the reasons why we have developed a field of artificial intelligence and why computer vision is an important component of the field.


In this tutorial we will first start by having an overview of the composition of an image and what kind of data we can extract from it. Afterwards we will move to basic operations we can conduct on images. We will then look into feature isolation and extraction. Finally we will end with a discussion on real world applications of computer vision, what the future might entail, and available resources for further learning. 


## Prerequisites

***Knowledge***
* Basic understanding of matrices
* Basic knowledge on statistics
* Familiarity with Python & programming


***Software***
* Anaconda distribution (includes Python, Jupyter Notebook, etc.)
* Python 3.6.3
* OpenCV-python 3.4.3.18
* Jupyter
* Numpy 1.15.4


## Properties of an Image

Images are represented by a matrix for each primary color. RGB (red blue green) is the convention for color representation, but in OpenCV they use the BGR order. The individual values in these matrices go from 0 to 255 to represent color intensity, where 0 means no color and 255 is the maximum brightness. An isolated bright blue pixel could be represented by (255, 0, 0), a green one by (0, 255, 0), red by (0, 0, 255), and all other colors by a combination of the three values, such as purple (240, 32, 160).

Since we have three color, we're able to seperate the picture into its individual color components. These individual matrices are called color channels. This can be a useful operation when we're trying to work with a specific color.

In [257]:
import cv2
import numpy as np

# download this image at: http://www.vintagecardprices.com/pics/3287/189069.jpg
img = cv2.imread('mccartney_color_noisy.jpg')

# adjust if neccessary for your own screen using fx & fy
img = cv2.resize(img, None, fx=0.8, fy=0.8)

# B G R values at the top left corner
print('Pixel color values at location [0,0]:', image[0,0])

# splits image into color channels
img_b, img_g, img_r = cv2.split(img)
print('Blue channel pixel at location [0,0]:', img_b[0,0])
print('Green channel pixel at location [0,0]:', img_g[0,0])
print('Red channel pixel at location [0,0]:', img_r[0,0])

# compare all three single channel images and the original image
color_channels = np.hstack((img_b, img_g, img_r))
cv2.imshow('color_channels', color_channels)
cv2.imshow('original_image', img)

# press any key to close the opened images
cv2.waitKey(0)
cv2.destroyAllWindows()

Pixel color values at location [0,0]: [235 243 242]
Blue channel pixel at location [0,0]: 236
Green channel pixel at location [0,0]: 244
Red channel pixel at location [0,0]: 243


### Grayscale

It might come as a surprise, but images are often converted to grayscale when doing computer vision. The reasoning is that color information doesn’t play a big factor in analyzing most images. For example, if we are trying to find edges around an object, we would look at a difference of intensity from one pixel to the other, which is independent of color composition. Given that we are converting from a 3 color matrix to a single color channel (gray), we are also removing data that might be irrelevant, thus increasing the speed of our computation. This becomes essential when working with big projects such as neural networks.

To allow users to better view the results of this tutorial, we will be avoiding converting to grayscale unless required.


### Histogram

Histograms are a good way of getting information from an image. We can tell if the image is well balanced in terms of contrast levels, we can get an idea of the saturation levels, etc. One of the practical outcomes from using histograms is that it can indicate to us when there's an imbalance (gray levels for example) that can potentially be corrected. Histograms can also help us determine an appropriate threshold value when converting an image to black and white (binary).

A popular library used to create histograms is matplotlib. Since this tutorial will be focusing on OpenCV, and to avoid introducing another library/dependency, we will skip the analysis of histograms in this tutorial. Resources for further reading on histograms and image analysis are provided in the last section of this tutorial.

In [256]:
import cv2
import numpy as np

# download this image here: 
# https://us.123rf.com/450wm/annasea/annasea1705/annasea170500668/78910522-without-words-handwritten-text-on-blue-and-white-striped-background-vector-.jpg?ver=6
img_original = cv2.imread('text_color_bad_res.jpg')

# resize to fit your screen using fx and fy parameters
img_original = cv2.resize(img_original, None, fx=0.6, fy=0.6)

# grayscale conversion
img_grayscale = cv2.cvtColor(img_original, cv2.COLOR_BGR2GRAY)

# we use 100 as our manually picked value for this threshold example
ret_val, img_th = cv2.threshold(img_grayscale, 100, 255, cv2.THRESH_BINARY)

'''
Adaptive gaussian thresholding
The last two parameters (11 and 2) are the blockSize and C (constant) parameters.
blockSize: Size of a pixel neighborhood that is used to calculate a threshold value
C: Constant subtracted from the mean or weighted mean
We will understand the importance of neighbouring pixels when we go over
what filters/kernels are in the next section.
'''
img_athg = cv2.adaptiveThreshold(img_grayscale, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv2.THRESH_BINARY, 21, 29)

# put grayscale, threshold and adaptive threshold images next to each other
compare_imgs = np.hstack((img_grayscale, img_th, img_athg))

# show comparisons and original image
cv2.imshow('original_image', img_original)
cv2.imshow('gray_and_binary', compare_imgs)

# press any key to close the opened image frames
cv2.waitKey(0)
cv2.destroyAllWindows()

## Image Operations

OpenCV allows us to do a lot of basic image operations such as drawing over images, adding text, etc. But in this tutorial, we will be focusing on some of the more interesting operations that are commonly used in computer vision.

### Morphological Operations

"Morphology is a branch of biology dealing with the study of the form and structure of organisms and their specific structural features."  *Morphology (biology) - Wikipedia*

In computer vision, morphological operations consist of the modification of an image depending on a provided structure (shape) and the chosen operation. The main two operations are dilation and erosion. Dilation is the process of expanding an existing shape. Erosion is the reverse operation. The provided structure determines the amount of dilation or erosion.

For example, lets say we have a simple binary image (0 = white, 1 = black) that is represented by the following array: [1 0 0 1 1 0 0 0 1 0]. Lets further assume we will use the following structure for our operations: [0 1 1]. In both operations, we place the center of our structure on the existing black pixels of the original image.

In dilation, we add the structure (results > 1 stay as 1) and produce the following result:
[1 1 0 1 1 1 0 0 1 1]. We can see that it has effectively expanded (dilated) our image.

In erosion, we only keep the pixels if they can fit the whole structure using their neighbouring pixels. We get the following result: [0 0 0 1 0 0 0 0 0 0]. The 4th pixel is kept since it the pixel to its left was 0 and the pixel to the right was a 1, thus matching the shape of [0 1 1]. The following pixel was however discarded since it did not fit the shape. We can see that this effectively reduces (erodes) our image.

*** In OpenCV, the foreground is actually considered to be white pixels instead of black pixels like shown in my above example, so dilation would expand white pixels (0) and erosion would reduce them. Due to this, I will invert the example image used below before doing the morphologlical operations, such that black pixels become white. In the end I will revert them back to their original white and black pixels. ***

In [43]:
import cv2
import numpy as np

# download this image here: 
# https://us.123rf.com/450wm/annasea/annasea1705/annasea170500668/78910522-without-words-handwritten-text-on-blue-and-white-striped-background-vector-.jpg?ver=6
img_original = cv2.imread('text_color_bad_res.jpg')

# resize to fit your screen using fx and fy parameters
img_original = cv2.resize(img_original, None, fx=0.8, fy=0.8)

# color -> grayscale -> inverted binary (white=black, black=white)
img_grayscale = cv2.cvtColor(img_original, cv2.COLOR_BGR2GRAY)
ret_val, img_binary = cv2.threshold(img_grayscale, 100, 255, cv2.THRESH_BINARY_INV)

# we will use a cross shape for our structuring element
shape = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,3))
print(shape)

# dilation and erosion operations
img_dilation = cv2.dilate(img_binary, shape, iterations=1)
img_erosion = cv2.erode(img_binary, shape, iterations=1)

# converting back to original binary colors
ret_val, img_binary = cv2.threshold(img_binary, 100, 255, cv2.THRESH_BINARY_INV)
ret_val, img_dilation = cv2.threshold(img_dilation, 100, 255, cv2.THRESH_BINARY_INV)
ret_val, img_erosion = cv2.threshold(img_erosion, 100, 255, cv2.THRESH_BINARY_INV)

# show images side by side (erosion, original, dilation)
compare_imgs = np.hstack((img_erosion, img_binary, img_dilation))
cv2.imshow('morphological_operations', compare_imgs)

# press any key to close the opened image frames
cv2.waitKey(0)
cv2.destroyAllWindows()

[[0 1 0]
 [1 1 1]
 [0 1 0]]


### Filters \ Kernels

As we have seen in the introduction, images are represented by pixel matrices. A kernel (also known as a filter) is a matrix composed of weights that is used to modify images. Essentially, a kernel can be viewed as a window that moves pixel to pixel and modifies the central pixel by applying its (the kernel’s) weights to all the pixels inside the window and assigning the resulting sum to the central pixel. Since a kernel needs a neighbourhood around the pixel it is operating on (the central pixel), the width (x-dimension) and height (y-dimension) of a kernel has to be odd (3x3, 3x5, 9x3, etc). It is also important to note that as a kernel is being applied, it is assigning the resulting pixels to a new matrix in order to not modify the original image as it is being operated on.

The values (weights) inside the kernel depend on what effect we are trying to achieve. For example, a simple blur could be done by having a 3x3 kernel composed of (1/9) weights. This kernel would pass through each pixel and modify it to be the average of the surrounding 3x3 window (neighbourhood). The resulting image would be a blurry version of its previous self.


In general, smoothing/blurring an image can remove unwanted noise, but in turn diminishes the level of detail in the image. The effectiveness of a filter depends on the image and what we are trying to accomplish. In this tutorial we will show how to apply a Gaussian blur and a bilateral filter.

In [258]:
import cv2
import numpy as np

# download this image at: http://www.vintagecardprices.com/pics/3287/189069.jpg
img = cv2.imread('mccartney_color_noisy.jpg')

# resize to fit your screen using fx and fy parameters
img = cv2.resize(img, None, fx=0.8, fy=0.8)

'''
The weight values in the Gaussian blur kernel is determined by 
the standard deviation.The idea is to associate more importance (weight) 
to the pixels that are more relevant (close) to the central pixel.

'''
img_gb1 = cv2.GaussianBlur(img, (3, 3), 0)
img_gb2 = cv2.GaussianBlur(img, (5, 5), 0)
img_gb3 = cv2.GaussianBlur(img, (7, 7), 0)

'''
A bilateral filter smooths an image but tries to perserve edges.
This is done by not only taking into account the distance of the neighbouring 
pixels (Gaussian blur), but also their similarity (color intensity, etc).

'''
img_bf1 = cv2.bilateralFilter(img, 3, 31, 31)
img_bf2 = cv2.bilateralFilter(img, 7, 31, 31)
img_bf3 = cv2.bilateralFilter(img, 11, 31, 31)

# put images side by side for easy comparison
compare_gb = np.hstack((img, img_gb1, img_gb2, img_gb3))
compare_bf = np.hstack((img, img_bf1, img_bf2, img_bf3))

# show images
cv2.imshow('gaussian_blur', compare_gb)
cv2.imshow('bilateral_filter', compare_bf)

# press any key to close the opened image frames
cv2.waitKey(0)
cv2.destroyAllWindows()

## Feature Detection

Our eyes are naturally drawn to interesting focal points in the world around us. In computer vision, we try to emulate the same logic with what we call feature detection. Feature detection is the process of finding 'interesting' points in an image by analyzing its surroundings. These features are typically blob shapes, corners and edges.

### Edge Detection

Edge detection is one of the most important features we can detect in an image. There are many different approaches and algorithms to find edges in an image, but the general concept relies on identifying discontinuities in intensity. Edge detection uses kernels to find these changes in brightness. The weights of the kernel depend on the algorithm we employ.

To get a sense of how edge detection works, lets work through a simple example. Lets assume we have a 1x3 kernel [1 0 -1] and a grayscale 1D image [10 10 20 50 10 10 10 20 10 10]. 

Passing this filter through the image wherever it fits (ignoring the first and last pixels since the kernel can't fit completely) and calculating the resulting sum, we get the following result: [-10 -40 10 40 0 -10 0 10]. If we then use a threshold to convert our result to a binary image, we can see the isolated edges. Using the absolute value of 30 as a threshold, we get the following image: [0 1 0 1 0 0 0 0]. This result clearly shows where the edges are for this simple image.

***We will quickly go over two OpenCV edge detection functions below. I encourage you to research these algorithms (sobel, canny) and play around with the parameters!***

In [259]:
import cv2

# download this image at: http://www.vintagecardprices.com/pics/3287/189069.jpg
# '0' flag reads image as a grayscale
img = cv2.imread('mccartney_color_noisy.jpg', 0)

# resize to fit your screen using fx and fy parameters
img = cv2.resize(img, None, fx=0.9, fy=0.9)

# clean noise from image
img_bf = cv2.bilateralFilter(img, 11, 31, 31)

# sobel edge detection
img_sobel = cv2.Sobel(img_bf, cv2.CV_16U, 1, 1, ksize=9)

# canny edge detection (low and high threshold required)
img_canny = cv2.Canny(img_bf, 75, 150)

# show images and the outputs of the edge detection
cv2.imshow('original', img_bf)
cv2.imshow('sobel', img_sobel)
cv2.imshow('canny', img_canny)

# press any key to close opened picture frames
cv2.waitKey(0)
cv2.destroyAllWindows()

### Face Detection

Face recognition might be the most known application of feature detection. We have all been exposed to this component of computer vision in one way or another. Popular websites and applications, such as Facebook and Snapchat, use it to track your face and add interesting animations over it. Digital cameras have been using face detection for years in order to optimize their autofocus. There are so many real world applications, but how do they work? In this section, we'll go over one of the famous algorithms, the Viola-Jones algorithm.

The Viola-Jones algorithm uses a classifier to detect objects. The classifier is trained on a lot of positive images (what we are trying to classify) and a bunch of negative images (not what we are looking for). For face recognition, it uses a large set of features, which it has learned through machine learning, to determine if there is a face present. These features are similar to the kernels we have seen in the edge detection section as they look at difference in pixel intensity.

These facial features are attributes that are common to all faces. For example, our eyes are usually brighter (lighter) than our eye-brows (darker). The bridge of the nose is brighter than the side edges, etc.

The algorithm passes through the image and goes through a checklist of these features. As soon as the area it is looking at doesn't satisfy a set of features, it is discarded as being not a face. This is done to improve computational effiency. There are a lot of other details that allows the algorithm to run quickly, but I will invite you to read more on this algorithm through the provided links in the 'further learning resources' section. Below we will see the OpenCV implementation of this algorithm.

In [252]:
import cv2

# download this image at: http://www.vintagecardprices.com/pics/3287/189069.jpg
single_face = cv2.imread('mccartney_color_noisy.jpg')

# download this image at: https://www.better-records.com/zoom_img/_1271109403.jpg
multi_faces = cv2.imread('beatles_noisy.jpg')

# save the below cascade file under the same repository as this notebook
# https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml
faceCascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")

# using the classifier to detect faces
faces1 = faceCascade.detectMultiScale(single_face, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
faces2 = faceCascade.detectMultiScale(multi_faces, scaleFactor=1.01, minNeighbors=5, minSize=(40, 40), flags=cv2.CASCADE_SCALE_IMAGE)

# this will draw a green rectangle over the detected face(s)
for(x, y, w, h) in faces1:
    cv2.rectangle(single_face, (x, y), (x+w, y+h), (0, 255, 0), 2)
for(x, y, w, h) in faces2:
    cv2.rectangle(multi_faces, (x, y), (x+w, y+h), (0, 255, 0), 2)

# show the original image with the detected face(s)
cv2.imshow('single_face', single_face)
cv2.imshow('multi_faces', multi_faces)

# press any key to close the opened image frames
cv2.waitKey(0)
cv2.destroyAllWindows()

## Further Learning Resources

*** Here are some useful links. Some of these were used as sources for creating this tutorial, and others I simply found to be interesting resources. ***

Histograms and color channels:
https://www.allaboutcircuits.com/technical-articles/image-histogram-characteristics-machine-learning-image-processing/

Thresholding: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html

Kernels: https://en.wikipedia.org/wiki/Kernel_(image_processing)

Interactive kernel application: http://setosa.io/ev/image-kernels/

Convolution and Morphological filters: https://www.harrisgeospatial.com/docs/ConvolutionMorphologyFilters.html

Dilation and Erosion:
https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing-html/topic4.htm

Edge Detection: https://www.projectrhea.org/rhea/index.php/An_Implementation_of_Sobel_Edge_Detection

Canny Edge Detection and Threshold Selection: https://www.pyimagesearch.com/2015/04/06/zero-parameter-automatic-canny-edge-detection-with-python-and-opencv/

Viola-Jones Object Detection Framework: https://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework

Viola-Jones Paper: http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf

Face Detection using OpenCV: https://www.superdatascience.com/opencv-face-detection/

Another OpenCV tutorial:
https://dzone.com/articles/introduction-to-computer-vision-with-opencv-and-py