Utilize Pre-Trained Models from the Intel Distribution of OpenVINO Toolkit to build powerful edge applications, without the need to train our own model.

## Introduction

[Youtube Video](https://youtu.be/vFNZu1VpdwE)

Lesson 2 covers subjects below:
* Intel OpenVINO Toolkit basics
* Different Computer Vision model types
* Available Pre-Trained Models in the Software
* Choosing the right Pre-Trained Model for our Application
* Loading and Deploying a Basic Application with a Pre-Trained Model

## The OpenVINO™ Toolkit

[Youtube Video](https://youtu.be/-pM9pLCnzJk)

General information about OpenVINO Toolkit is expressed here.

The OpenVINO Toolkit's name comes from "Open Visual Inferencing and Neural Network Optimization". This open-source software is largely focused around optimizing neural network inference.

The software is developed by Intel and helps support fast inference across Intel CPUs, GPUs, FPGAs and Neural Compute Stick with a common API.

OpenVINO supports models that trained some frameworks like TensorFlow or Caffe to use in Model Optimizer. Model Optimizer converts models into Intermediate Representation models. Intermediate Representation models can then be used with the Inference Engine, which helps speed inference on the related hardware. The toolkit also has a wide variety of Pre-Trained Models.

Model Optimizer optimizes model speed and size so as to make models deployable on edge applications. This optimization does not increase inference accuracy - this needs to be done in training beforehand. Lower resource applications need smaller, quicker models and hardware optimizations. OpenVINO Toolkit provides them. For example, an IoT device does not have the benefit of multiple GPUs and unlimited memory space to run its apps.

* The Intel Distribution of OpenVINO Toolkit is an open source library useful for edge deployment due to its performance maximizations and pre-trained models.

[Main site of OpenVINO Toolkit](https://software.intel.com/en-us/openvino-toolkit)

## Pre-Trained Models in OpenVINO™

[Youtube Video](https://youtu.be/1-Vije0cMBQ)

 There are lots of Pre-Trained Models directly in the software. Pre-Trained Models are previously trained models with high accuracy. If we find a Pre-Trained Model, then we don't need to collect data and train the model from the beginning. We learn how to preprocess inputs and handle the outputs, then we plug the model into our application.

 Pre-Trained Models refer specifically to the Model Zoo, containing models that already converted using the Model Optimizer. As such, we can use these models directly with the Inference Engine.

![Process](https://software.intel.com/sites/default/files/managed/ed/e9/inference-engine-700w-300h.png)

[Pre-Trained Models and documentations of them](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models)

## Types of Computer Vision Models

[Youtube Video](https://youtu.be/E8yBgSKfCoo)

In this part selected three types of computer vision models are covered: Classification, Detection and Segmentation.

**Classification** predicts "class" of an image, or an object in an image. Prediction is determined according to "probability" of classes, the highest probability is determined class, but you can also see the top 5 predictions as well.

**Detection** is used when we want to detect the objects that appear at different places in an image. Detection applications generally draw bounding boxes around the detected objects. We can say that the applications which find objects and their locations in an image is a Detection application. It also usually has some form of classification that determines the class of an object in a given bounding box. The bounding boxes have a confidence threshold so you can throw out low-confidence detections.

**Segmentation** classifies sections of an image by classifying each and every pixel. These networks are often post-processed in some way to avoid phantom classes here and there. 

There are also *subsets of segmentation*: Semantic Segmentation and Instance Segmentation. Semantic Segmentation considers all instances of a class as one, while Instance segmentation actually consider separate instances of a class as separate objects.

Here is a useful [Medium post](https://medium.com/analytics-vidhya/image-classification-vs-object-detection-vs-image-segmentation-f36db85fe81) if you want to go a little further on types of computer vision models.

## Case Studies in Computer Vision

[Youtube Video](https://youtu.be/7mUaovlA4aQ)

SSD, ResNet and MobileNet neural network structures are expressed in this part.

SSD is an object detection network that combined classification with object detection through the use of default bounding boxes at different network levels.

ResNet utilized residual layers to "skip" over sections of layers, helping to avoid the vanishing gradient problem with very deep neural networks.

MobileNet utilized layers like 1x1 convolutions to help cut down on computational complexity and network size, leading to fast inference without substantial decrease in accuracy.

One additional note here on the ResNet architecture - the paper itself actually theorizes that very deep neural networks have convergence issues due to exponentially lower convergence rates, as opposed to just the vanishing gradient problem. The vanishing gradient problem is also thought to be helped by the use of normalization of inputs to each different layer, which is not specific to ResNet. The ResNet architecture itself, at multiple different numbers of layers, was shown to converge faster during training than a "plain" network without the residual layers.    
    
[Single Shot Multibox Detector (SSD)](https://arxiv.org/abs/1512.02325) performs classification operations on different convolutional layer feature maps using default bounding boxes.

The "residual learning" achieved in the [ResNet](https://arxiv.org/pdf/1512.03385.pdf) model architecture is achieved by using "skip" layers that pass information forward by a couple of layers.

#### Further Research

Getting used to reading research papers is a key skill to build when working with AI and Computer Vision. Below, we can find the original research papers on some of the networks we discussed in this section.
* [SSD](https://arxiv.org/abs/1512.02325)
* [YOLO](https://arxiv.org/abs/1506.02640)
* [Faster RCNN](https://arxiv.org/abs/1506.01497)
* [MobileNet](https://arxiv.org/abs/1704.04861)
* [ResNet](https://arxiv.org/abs/1512.03385)
* [Inception](https://arxiv.org/pdf/1409.4842.pdf)

## Available Pre-Trained Models in OpenVINO™

[Youtube Video](https://youtu.be/SoTH1jr3-HA)

Pre-Trained Models are the models that trained already. We can use them on Text Detection, Pose Detection, Roadside Segmentation, Pedestrian Detection etc. problems.

There are two kind of Pre-Trained Models on OpenVINO toolkit: Public Model Set and Free Model Set.
- **Public Model Set** includes models that haven't been used on Model Optimizer, so they can be used to fine tune or use on Model Optimizer.
- **Free Model Set**  inculdes models that have been used on Model Optimizer and converted into Intermediate Representation so that we can use it on the Inference Engine.

We can fetch the models using OpenVINO Model Downloader tool.

[Pretrained Models page](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models) of the Intel® Distribution of OpenVINO™ toolkit can be reviewed.

## Exercise: Loading Pre-Trained Models

We try to download and load some of the pre-trained models available in the OpenVINO toolkit.

First of all we look at [Pre-Trained Models page](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models), then we can see the [full list of available models](https://docs.openvinotoolkit.org/latest/_models_intel_index.html).

**NOTE:** It is not necessary to download all of the models. We can download what we need.

**HINT:** We can use the `-h` command with the Model Downloader tool whenever we need to check out the possible arguments to use when we want to download a specific model and its precisions.

**HINT:** We can use `-o` argument if we want to download the model into a different directory instead of the default location.

We can go to the default **Model Downloader location** with the command below;

`cd /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/`

### Task 1 - We find the right models

We are responsible for the tasks below:
- Human pose estimation
- Text detection
- Determining car type & color

We can find them on the [Pre-Trained Models page](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models).

### Task 2 - We download the models

After we determine the models according to the tasks above, we download the models according to the precision levels below:
- Human pose estimation: All precision levels
- Text detection: FP16 only
- Determining car type & color: INT8 only

### Task 3 - We verify the downloads

We can verify our downloads by checking the download location.

We can use `ls` command to check it.

`ls /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/`

**NOTE:** The path we see here is the default path, if we use `-o` argument, the path can be different.

## Solution: Loading Pre-Trained Models

[Youtube Video](https://youtu.be/QMfTUdWFsGw)

**NOTE:** In OpenVINO 2019R3 version, `INT8` precision was used, but this precision re-named to `FP32-INT8` in 2020R1 version.

Working locally with 2020R1, the download doesn't fail if we specify `INT8`, but the related download directory will be empty. Therefore we need to specify `FP32-INT8` as the `--precisions` argument if we use 2020R1 version.

### Choosing Models
We can choose the models below for our tasks:
- Human pose estimation: [human-pose-estimation-0001](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html)
- Text detection: [text-detection-0004](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html)
- Determining car type & color: [vehicle-attributes-recognition-barrier-0039](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html)

### Downloading Models
We can go to the default **Model Downloader location** with the command below;

`cd /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/`

We can see `downloader.py` file and we can use `-h` argument with it to see available arguments.

`sudo ./downloader.py -h`  

**Note that** `downloader.py` uses the interpreter determined in the file (`python3`) with shebang. I tried to interpret that file using Anaconda environment and got errors.

In this exercise we use `--name` argument for model name and `--precisions` argument when we need only specific precisions. 

**Note:** Running `downloader.py` without arguments will download ***all*** available pre-trained models. This means that we download gigabytes of files if we do it.

**Note:** If we are on local, we can use the commands below directly, but if we want to download them on different location, we can add `-o ~/openvino-models` at the end of the commands to download them on our home directory.

#### Downloading Human Pose Model
`sudo ./downloader.py --name human-pose-estimation-0001`

#### Downloading Text Detection Model
`sudo ./downloader.py --name text-detection-0004 --precisions FP16`

#### Downloading Car Metadata Model
`sudo ./downloader.py --name vehicle-attributes-recognition-barrier-0039 --precisions FP32-INT8`

### Verifying Downloads
The downloader tells us the directories that hold our models, but to verify them, we can use `ls` command.

We can check the directories in the download location. Each directory of our three models must include subdirectories for each precision, with respective `.bin` and `.xml` for each model.

## Optimizations on the Pre-Trained Models

[Youtube Video](https://youtu.be/nKvZYnOnWm4)

Precisions that we used in the previous exercise are related to floating point values. Less precision means less memory used by the model and less compute resources. However, there are some trade-offs with accuracy when using lower precision.

We can also use fusion, which we fuse multiple layers into a single operation. We can achieve some optimizations through the Model Optimizer in OpenVINO. We will see these optimization techniques in this tutorial.

## Choosing the Right Model for Your App

[Youtube Video](https://youtu.be/CWC195DzgAI)

We can choose the right model according to criterions we specify for our application.

These criterions can be:
- Performance
- Speed

For example we can use the `Pedestrian Detection` pre-trained model to use on our waiter robot.

## Pre-processing Inputs

[Youtube Video](https://youtu.be/E9huKos96Uk)

The needed pre-processing can be done using documentation of a model. We can also use OpenVINO Toolkit documentation if we want to use a pre-trained model from OpenVINO Model Zoo.

We also need to be careful while using a library to load an image or frame. We can use OpenCV in this tutorial. OpenCV reads images and frames in the BGR format, so images must be converted into RGB format if we want to use them on some networks that trained with RGB images.

We can make another pre-processing operations besides channel order for our Computer Vision based models, such as:
- Image size (Input images must be the same size with determined size)
- Order of the image data (Color channels may come before or after the dimensions of the image)
- Pixel normalization (Models may require Pixel values between 0 and 1)

In OpenCV, we can use `cv2.imread` to read images in BGR format, and `cv2.resize` to resize them. The images are hold as a Numpy array, so we can also use array functions like `.transpose` and `.reshape` on them. So we can switch the dimension orders of arrays.

## Exercise: Pre-processing Inputs

We preprocess the inputs according to expectations of the models.

We have a few pre-trained models from the previous exercise. It is time to preprocess the inputs to fit inputs with models' expectations.

Our three model documentations are here:
- Human pose estimation: [human-pose-estimation-0001](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html)
- Text detection: [text-detection-0004](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html)
- Determining car type & color: [vehicle-attributes-recognition-barrier-0039](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html)

We can use the **Inputs** sections of the documentations. 

Input sections include:
- input shape, 
- order of the shape (i.e. color channel order)
- order of the color channels

Our task is to code three functions within `preprocess_inputs.py`, one for each of the three models.

We have potential sample images for the models.
- [For Human pose estimation](https://cdn.pixabay.com/photo/2014/02/15/12/22/figure-skater-266512_960_720.jpg)
- [For Text detection](https://cdn.pixabay.com/photo/2016/11/21/15/13/blue-1845901_960_720.jpg)
- [For Determining car type & color](https://cdn.pixabay.com/photo/2015/05/15/14/46/bmw-768688_960_720.jpg)

**Note:** We must assume in this exercise that images are loaded using OpenCV, so they come us as BGR format with Height, Width, Channel order. We make pre-processing according to this information.

In [None]:
import cv2
import numpy as np


def pose_estimation(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related pose estimation model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the pose estimation model

    return preprocessed_image


def text_detection(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related text detection model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the text detection model

    return preprocessed_image


def car_meta(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related car metadata model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the car metadata model

    return preprocessed_image


## Solution: Pre-processing Inputs

[Youtube Video](https://youtu.be/erNsB5nXgW4)

Because all of the models need the same preprocessing, except the height and width of the input of the networks. We fetch the images with `cv2.imread` and comes in the BGR format. Because our models want BGR inputs, we don't need to convert them to RGB format. However we need to reshape images because they come as `height x width x channels` order, but our model networks want channels first, along with an extra dimension at the start for batch size.

So we can code a preprocessing function and call that on the other functions that used for per models' input preprocessing. 

In preprocessing function we can;
1. Resize the image,
2. Move the channels from last to first,
3. Add an extra dimension of `1` to the start

In [None]:
import cv2
import numpy as np

def preprocessing(input_image, height, width):
    image = cv2.resize(input_image, (width,height)) # width, height
    image = image.transpose((2,0,1))
    image = image.reshape(1, 3, height, width)
    
    return image


def pose_estimation(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related pose estimation model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the pose estimation model
    preprocessed_image = preprocessing(preprocessed_image, 256, 456)

    return preprocessed_image


def text_detection(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related text detection model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the text detection model
    preprocessed_image = preprocessing(preprocessed_image, 768, 1280)

    return preprocessed_image


def car_meta(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related car metadata model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the car metadata model
    preprocessed_image = preprocessing(preprocessed_image, 72, 72)

    return preprocessed_image


**!!! Testing section can be added here.**

## Handling Network Outputs

[Youtube Video](https://youtu.be/pREe4P5yygM)

We saw three type of computer vision model outputs:
- Classes
- Bounding boxes
- Semantic labels

**Classification networks** typically output an array with the softmax probabilities of classes. Those probabilities become meaningful using `argmax` function. So we achieve an array of class predictions.

**Bounding boxes** generally come out from multiple bounding box detection models. Each box has a class and confidence. We can ignore boxes that has low confidence. 

Each bounding box has for values for location:
- Whether they can be X, Y pairs of opposite corner pairs of bounding boxes,
- or otherwise one X, Y values of a corner and height and width of bounding boxes

Further Research for [SSD and its output](https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab)

**Semantic labels** give the class for each pixel. Sometimes output gives a flattened version of them, or a different size than the original image. In these situations we need to reshape or resize them to map directly back to the input.

Further Research for [Semantic Segmentation](https://thegradient.pub/semantic-segmentation/)

#### Example
In a network like SSD, the output is a series of bounding boxes for potential object detections with a confidence threshold or how confident the model is about that particular detection.

Let's assume we have an output array with bounding box predictions. 

The array includes below ordinally:
- The class of the object,
- The confidence,
- Two corners (xmin, ymin, xmax, ymax)

We can extract the bounding boxes from a given network output using the script below:

In [None]:
for box in output:
    if box[1] > conf_threshold:
        xmin = int(box[2] * width)
        ymin = int(box[3] * height)
        xmax = int(box[4] * width)
        ymax = int(box[5] * height)

## Running Our First Edge App

[Youtube Video](https://youtu.be/FANZZXUqGac)

Now we have familiarity with pre-trained models, preprocessing inputs for it and handling its output. In the next exercise, we'll load a pre-trained model into the Inference Engine with its preprocessing and output handling functions in the appropriate locations. So we'll run our first edge application.

We are still abstracting some steps of dealing with the Inference Engine API. We'll dive deep into it, but for now, we can try to understand the concept. The concept is implemented similarly across different models.

## Exercise: Deploy An App at the Edge

We downloaded some pre-trained models and we made preprocessing operations on the input images.
Now we deploy our first application in this example.

There is a lot of code behind the scenes here. So we need to understand the concept in this exercise. We'll understand deeply on further parts of our tutorial.

We don't use Model Optimizer in this exercise, instead we use pre-trained models. We'll load our models into the Inference Engine.

If we want a sneak preview of some of the code that interfaces with the Inference Engine, we can check out the code below as `inference.py`.

In [None]:
'''
Contains code for working with the Inference Engine.
You'll learn how to implement this code and more in
the related lesson on the topic.
'''

import os
import sys
import logging as log
from openvino.inference_engine import IENetwork, IECore

class Network:
    '''
    Load and store information for working with the Inference Engine,
    and any loaded models.
    '''

    def __init__(self):
        self.plugin = None
        self.input_blob = None
        self.exec_network = None


    def load_model(self, model, device="CPU", cpu_extension=None):
        '''
        Load the model given IR files.
        Defaults to CPU as device for use in the workspace.
        Synchronous requests made within.
        '''
        model_xml = model
        model_bin = os.path.splitext(model_xml)[0] + ".bin"

        # Initialize the plugin
        self.plugin = IECore()

        # Add a CPU extension, if applicable
        if cpu_extension and "CPU" in device:
            self.plugin.add_extension(cpu_extension, device)

        # Read the IR as a IENetwork
        network = IENetwork(model=model_xml, weights=model_bin)

        # Load the IENetwork into the plugin
        self.exec_network = self.plugin.load_network(network, device)

        # Get the input layer
        self.input_blob = next(iter(network.inputs))

        # Return the input shape (to determine preprocessing)
        return network.inputs[self.input_blob].shape


    def sync_inference(self, image):
        '''
        Makes a synchronous inference request, given an input image.
        '''
        self.exec_network.infer({self.input_blob: image})
        return


    def extract_output(self):
        '''
        Returns a list of the results for the output layer of the network.
        '''
        return self.exec_network.requests[0].outputs

We'll work on `handle_models.py` file below.

In [None]:
import cv2
import numpy as np


def handle_pose(output, input_shape):
    '''
    Handles the output of the Pose Estimation model.
    Returns ONLY the keypoint heatmaps, and not the Part Affinity Fields.
    '''
    # TODO 1: Extract only the second blob output (keypoint heatmaps)
    heatmaps = output['Mconv7_stage2_L2']
    #print(heatmaps.shape)
    # TODO 2: Resize the heatmap back to the size of the input
    out_heatmap = np.zeros([heatmaps.shape[1], input_shape[0], 
                            input_shape[1]])
    # Iterate through and re-size each heatmap
    for h in range(len(heatmaps[0])):
        out_heatmap[h] = cv2.resize(heatmaps[0][h], 
                                   input_shape[0:2][::-1])

    return out_heatmap


def handle_text(output, input_shape):
    '''
    Handles the output of the Text Detection model.
    Returns ONLY the text/no text classification of each pixel,
        and not the linkage between pixels and their neighbors.
    '''
    # TODO 1: Extract only the first blob output (text/no text classification)
    text_classes = output['model/segm_logits/add']
    # TODO 2: Resize this output back to the size of the input
    out_text = np.empty([text_classes.shape[1], input_shape[0], input_shape[1]])
    for t in range(len(text_classes[0])):
        out_text[t] = cv2.resize(text_classes[0][t], input_shape[0:2][::-1])

    return out_text


def handle_car(output, input_shape):
    '''
    Handles the output of the Car Metadata model.
    Returns two integers: the argmax of each softmax output.
    The first is for color, and the second for type.
    '''
    # TODO 1: Get the argmax of the "color" output
    #print(output.keys())
    color = output['color'].flatten()
    car_type = output['type'].flatten()
    #print(color.shape)
    #print(car_type.shape)
    color_class = np.argmax(color)
    # TODO 2: Get the argmax of the "type" output
    type_class = np.argmax(car_type)

    return color_class, type_class


def handle_output(model_type):
    '''
    Returns the related function to handle an output,
        based on the model_type being used.
    '''
    if model_type == "POSE":
        return handle_pose
    elif model_type == "TEXT":
        return handle_text
    elif model_type == "CAR_META":
        return handle_car
    else:
        return None


'''
The below function is carried over from the previous exercise.
You just need to call it appropriately in `app.py` to preprocess
the input image.
'''
def preprocessing(input_image, height, width):
    '''
    Given an input image, height and width:
    - Resize to width and height
    - Transpose the final "channel" dimension to be first
    - Reshape the image to add a "batch" of 1 at the start 
    '''
    image = np.copy(input_image)
    image = cv2.resize(image, (width, height))
    image = image.transpose((2,0,1))
    image = image.reshape(1, 3, height, width)

    return image

And we call functions from our edge app.

In [None]:
import argparse
import cv2
import numpy as np

from handle_models import handle_output, preprocessing
from inference import Network


CAR_COLORS = ["white", "gray", "yellow", "red", "green", "blue", "black"]
CAR_TYPES = ["car", "bus", "truck", "van"]


def get_args():
    '''
    Gets the arguments from the command line.
    '''

    parser = argparse.ArgumentParser("Basic Edge App with Inference Engine")
    # -- Create the descriptions for the commands

    c_desc = "CPU extension file location, if applicable"
    d_desc = "Device, if not CPU (GPU, FPGA, MYRIAD)"
    i_desc = "The location of the input image"
    m_desc = "The location of the model XML file"
    t_desc = "The type of model: POSE, TEXT or CAR_META"

    # -- Add required and optional groups
    parser._action_groups.pop()
    required = parser.add_argument_group('required arguments')
    optional = parser.add_argument_group('optional arguments')

    # -- Create the arguments
    required.add_argument("-i", help=i_desc, required=True)
    required.add_argument("-m", help=m_desc, required=True)
    required.add_argument("-t", help=t_desc, required=True)
    optional.add_argument("-c", help=c_desc, default=None)
    optional.add_argument("-d", help=d_desc, default="CPU")
    args = parser.parse_args()

    return args


def get_mask(processed_output):
    '''
    Given an input image size and processed output for a semantic mask,
    returns a masks able to be combined with the original image.
    '''
    # Create an empty array for other color channels of mask
    empty = np.zeros(processed_output.shape)
    # Stack to make a Green mask where text detected
    mask = np.dstack((empty, processed_output, empty))

    return mask


def create_output_image(model_type, image, output):
    '''
    Using the model type, input image, and processed output,
    creates an output image showing the result of inference.
    '''
    if model_type == "POSE":
        # Remove final part of output not used for heatmaps
        output = output[:-1]
        # Get only pose detections above 0.5 confidence, set to 255
        for c in range(len(output)):
            output[c] = np.where(output[c]>0.5, 255, 0)
        # Sum along the "class" axis
        output = np.sum(output, axis=0)
        # Get semantic mask
        pose_mask = get_mask(output)
        # Combine with original image
        image = image + pose_mask
        return image
    elif model_type == "TEXT":
        # Get only text detections above 0.5 confidence, set to 255
        output = np.where(output[1]>0.5, 255, 0)
        # Get semantic mask
        text_mask = get_mask(output)
        # Add the mask to the image
        image = image + text_mask
        return image
    elif model_type == "CAR_META":
        # Get the color and car type from their lists
        color = CAR_COLORS[output[0]]
        car_type = CAR_TYPES[output[1]]
        # Scale the output text by the image shape
        scaler = max(int(image.shape[0] / 1000), 1)
        # Write the text of color and type onto the image
        image = cv2.putText(image, 
            "Color: {}, Type: {}".format(color, car_type), 
            (50 * scaler, 100 * scaler), cv2.FONT_HERSHEY_SIMPLEX, 
            2 * scaler, (255, 255, 255), 3 * scaler)
        return image
    else:
        print("Unknown model type, unable to create output image.")
        return image


def perform_inference(args):
    '''
    Performs inference on an input image, given a model.
    '''
    # Create a Network for using the Inference Engine
    inference_network = Network()
    # Load the model in the network, and obtain its input shape
    n, c, h, w = inference_network.load_model(args.m, args.d, args.c)

    # Read the input image
    image = cv2.imread(args.i)

    ### TODO: Preprocess the input image
    preprocessed_image = preprocessing(image, h, w)

    # Perform synchronous inference on the image
    inference_network.sync_inference(preprocessed_image)

    # Obtain the output of the inference request
    output = inference_network.extract_output()

    ### TODO: Handle the output of the network, based on args.t
    ### Note: This will require using `handle_output` to get the correct
    ###       function, and then feeding the output to that function.
    preprocess_func = handle_output(args.t)
    processed_output = preprocess_func(output, image.shape)

    # Create an output image based on network
    try:
        output_image = create_output_image(args.t, image, processed_output)
        print("Success")
    except:
        output_image = image
        print("Error!")

    # Save down the resulting image
    cv2.imwrite("outputs/{}-output.png".format(args.t), output_image)


def main():
    args = get_args()
    perform_inference(args)


if __name__ == "__main__":
    main()


TODOs

## Solution: Deploy an App at the Edge

[Youtube Video](https://youtu.be/X9yI7U2Rn00)

## Recap

[Youtube Video](https://youtu.be/o-fWs0BwbyM)

## Lesson Glossary

### Edge Application

Applications that make almost all processing at the Edge is expressed as Edge Applications.

### OpenVINO™ Toolkit

### Pre-Trained Model

### Transfer Learning

### Image Classification

### Object Detection

### Semantic Segmentation

### Instance Segmentation

### SSD

### YOLO

### Faster R-CNN

### MobileNet

### ResNet

### Inception

### Inference Precision