# aim2 workshop - From models to complete solutions using OpenVINO 
[![aim2](./assets/aim2.png)](https://www.youtube.com/watch?v=a6bwjYjuBEg)

### Luca Ruzzola, Machine Learning Engineer @ aim2.io

### What is computer vision
*"**Computer vision** is concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding."*

### What is deep learning
*"**Deep Learning** is a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation to learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts."*

*"**Machine Learning** is the discipline that provides systems the ability to automatically learn and improve from experience without being explicitly programmed."*

*"**AI** is the discipline dealing with the designing and building of intelligent agents that receive percepts from the environment and take actions that affect that environment."*

![Deep learning](./assets/deep_learning.png)

### What is a CNN
A **CNN** is a neural network that uses kernel convolution instead of matrix multiplication in one or more of its layers.
![Convolution](./assets/convolution.gif)
![Application of a blur filter](./assets/filterd_image.png)
![Application of an edge detection filter](./assets/cameraman.png)

## Let's get our hands dirty!

# Environment setup

Please clone this repository: https://github.com/lucaruzzola/aaeon_workshop

*git clone https://github.com/lucaruzzola/aaeon_workshop*

If you don't have git installed you can install it executing this command in a terminal:
*sudo apt-get install git*

Create the Anaconda environment as such: *conda env create -f workshopenv.yml*

If you don't have Anaconda installed alredy you can download it from: https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh

To test that everything is working please type in a terminal: 

*source activate workshop*

*source /opt/intel/computer_vision_sdk/bin/setupvars.sh*

*python blur.py*

If everything is working as expected you should see the live video from your camera with blurred faces.

You can then quit the demo by pressing "q" and start the notebook for this session typing in the same terminal:

*python -m jupyter notebook*

In [None]:
import cv2
import os
import openvino
import matplotlib.pyplot as plt
%matplotlib inline

# Acquire images from you camera

In [None]:
cap = cv2.VideoCapture(0)
try:
    ret, img = cap.read()
    plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
    plt.show
except Exception as e:
    print(e)
cap.release()

# Face detection

Traditional techniques, like Viola-Jones used custom-engineered features, and work quite well without the need to train them, however they are quite brittle and make somewhat strict assumptions.
Modern techniques like MobileNetSSD are more precise and particularly more resilient, they can account for more variation in a face and its position in the image.

MobileNetSSD is a very popular CNN architecture for general object detection, especially used on low power devices, and as every deep learning model requires quite a bit of expertise to train and deploy.
However thanks to OpenVINO and its model zoo it's nowadays possible to use it just like any other library you encounter in you daily workflow.

You can simply load the model and use to get the bounding boxes of every face in an image, however there is still quite a bit of code that you need to write to be able to use it effectively and even more easily, and we will later see how easy this can get when you have the necessary tools in place, to be able to go from thinking about a single model, to thinking about a complete AI solution.

In [None]:
import copy
from openvino.inference_engine import IEPlugin, IENetwork
from utils import SyRegion, SyFrame, Location, draw_bounding_box

#Load a model using plain OpenVINO

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

ret, frame = cap.read()

try: 
    device = "CPU"
    cpu_extension = "/opt/intel/computer_vision_sdk/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libcpu_extension_sse4.so"
    plugin = IEPlugin(device=device, plugin_dirs=None)
    if cpu_extension and 'CPU' in device:
        plugin.add_cpu_extension(cpu_extension)

    net = IENetwork(model=face_xml, weights=face_bin)

    input_layer = next(iter(net.inputs))
    output_layer = next(iter(net.outputs))
    exec_net = plugin.load(network=net, num_requests=1)
    n, c, net_input_height, net_input_width = net.inputs[input_layer].shape
    del net

    #pre
    copy_frame = copy.deepcopy(frame)
    resized_frame = cv2.resize(copy_frame, (300, 300))
    transposed_frame = resized_frame.transpose((2,0,1))

    #detection
    network_output = exec_net.infer(inputs={input_layer: transposed_frame})[output_layer]

    #post
    thr=0.5

    for obj in network_output[0][0]:
        textual_label = str(int(obj[1]))
        confidence = obj[2]

        if int((obj[1])) != -1 and confidence > thr:
            x_min = max(0, int(obj[3] * 1280))
            y_min = max(0, int(obj[4] * 720))
            x_max = int(obj[5] * 1280)
            y_max = int(obj[6] * 720)

            face = SyRegion(label=textual_label, confidence=confidence, location=Location(x=x_min, y=y_min, w=x_max - x_min, h=y_max - y_min), sy_frame=SyFrame(frame))

            draw_bounding_box(face,frame,color=(50,50,200),thickness=8)
    plt.imshow(cv2.cvtColor(frame,cv2.COLOR_BGR2RGB))
    plt.show
except Exception as e:
    print(e)
    
cap.release()

In [None]:
from models import FaceDetector

face_xml="./assets/face_detection/FP32/fd.xml"
face_bin="./assets/face_detection/FP32/fd.bin"

face_detector = FaceDetector(model_xml=face_xml,\
                             model_bin=face_bin,\
                             device="CPU",\
                             confidence_threshold=0.8,\
                             cpu_extension="/opt/intel/computer_vision_sdk/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libcpu_extension_sse4.so")

In [None]:
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

ret, img = cap.read()
detected_faces=face_detector.detect(SyFrame(img))

try:
    for face in detected_faces:

        draw_bounding_box(\
        face,\
        img,\
        )


    plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
    plt.show
except Exception as e:
    print(e)
    
cap.release()

# Blur the face
We are now going to use OpenCV to blur the faces that we have just detected, in order to reproduce the same result that you saw before and therefore to build a privacy-preserving system.

In [None]:
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

ret, img = cap.read()
detected_faces=face_detector.detect(SyFrame(img))
try:
    for face in detected_faces:
        sq_loc = face.get_square_location()
        blur_face = face.get_square_frame_region().frame
        blur_face = cv2.blur(blur_face,(55,55))
        draw_bounding_box(face,img)

        img[sq_loc.y:sq_loc.y+blur_face.shape[0], sq_loc.x:sq_loc.x+blur_face.shape[1]] = blur_face

    plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
    plt.show
except Exception as e:
    print(e)
cap.release()

# Emotion recognition
Now we are going to add emotion recognition to our system, so that we can show an emoji in real time, matching people's expressions, instead of just blurring.

We are again going to use a pre-trained CNN that has been trained for this very specific task, to be able to distinguish between 5 different expressions: neutral, happy, sad, surprised, angry.

The output is going to be something like this:
![Emoji result](./assets/emoji_result.png)

In [None]:
from models import EmotionClassifier
from utils import load_emojis 
from utils import emoji_overlay

emotion_classifier = EmotionClassifier(model_xml="./assets/emotion_recognition/FP32/em.xml",\
                                       model_bin="./assets/emotion_recognition/FP32/em.bin",\
                                       device="CPU",\
                                       cpu_extension="/opt/intel/computer_vision_sdk/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libcpu_extension_sse4.so",\
                                       emotion_label_list=["neutral", "happy", "sad", "surprise", "anger"])

emojis = load_emojis("./assets/emojis/")

In [None]:
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

ret, img = cap.read()
detected_faces=face_detector.detect(SyFrame(img))

try:
    for face in detected_faces:
        emotion = emotion_classifier.predict(face.get_square_frame_region().frame)
        emoji_overlay(emojis[emotion], img, face.location)
        draw_bounding_box(face,img)

    plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
    plt.show
except Exception as e:
    print(e)
cap.release()