# Run the cell below to see the examples
<img src="assets/run-this.gif" style="height:80px; margin-left:0;"/>

In [1]:
from demoTools.catalog import DemoCatalog
cat = DemoCatalog('demoTools/OpenVINO_IoT_Smart_Video_Workshop.json')
cat.ShowRepositoryControls()
cat.ShowCatalog()
%autosave 0

Accordion(children=(VBox(children=(Button(description='Refresh Catalog', style=ButtonStyle()), HTML(value='Upo…

## List of IoT examples for Intel® Distribution of the OpenVINO™ in Python
## This section includes advanced examples using OpenVINO™ toolkit:

- [Multiple models usage example](https://github.com/intel-iot-devkit/smart-video-workshop/blob/master/advanced-video-analytics/multiple_models.md)
- [Tensor Flow example](https://github.com/intel-iot-devkit/smart-video-workshop/blob/master/advanced-video-analytics/tensor_flow.md)

<a href='advanced-video-analytics/multiple_models.md' target='_blank' class='big-jupyter-button'>Go to Lab: advanced-video-analytics/multiple_models.md</a>
# Deep Learning Tutorial
## MNIST Database - Handwritten digits (0-9)

On this tutorial we will use Python* to implement one [Convolutional Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network) - a simplified version of [LeNet](https://en.wikipedia.org/wiki/Convolutional_neural_network#LeNet-5) - that will recognized Handwritten digits. A project like this one, using the MNIST dataset is considered as the "Hello World" of Machine Learning.

We will use [Keras*](https://keras.io), [TensorFlow*](https://www.tensorflow.org) and the [MNIST database](https://en.wikipedia.org/wiki/MNIST_database).

According to the description on their website, *"**Keras** is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano*. **It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.**"*

We will use TensorFlow as the backend for Keras. TensorFlow is an open source software library for high performance numerical computation.

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. MNIST database is also available as a Keras dataset, with 60k 28x28 images of the 10 digits along with a test set of 10k images, so it is very easy to import and use it on our code.

One good visual and interactive reference on what we are developing can be found [here](http://scs.ryerson.ca/~aharley/vis/conv/). The basic difference between our code and this interactive sample is the number and size of convolutional and fully-connected layers (LeNet uses two of each, we will use a single one, to reduce training time). We also adjusted the layers size to balance between accuracy and training time. We are achieving 98,54% of accuracy with less than 2 minutes training time on an Intel® Core™ processor.

This code can also be optimized by several ways to increase accuracy, and we would like to invite you to explore this later, changing the number of epochs, filters, fully-connected neurons and also including additional convolutional and fully connected layers. You can also use [flattening](https://keras.io/layers/core/#flatten), [dropout](https://keras.io/layers/core/#dropout) and [batch normalization](https://keras.io/layers/normalization/) layers. Other optimization techniques can also be applied, so feel free to use this tutorial code as a base to explore those optimization techniques.

In a nutshell, the convolutional and pooling layers are responsible for extracting a set of features from the input images, and the fully-connected layers are responsible for classification.

Convolutional layers applies a set of filters to the input image to extract important features from the image. The filters are small matrixes also called image kernels that can be repeatedly applied to the input image ("sliding" the filter on the image). You may already used those filters on traditional image processing applications such as GIMP (i.e. blurring, sharpening or embossing). [This article](http://setosa.io/ev/image-kernels/) gives a good overview on image kernels with some live experiments. Each filter will generate a new image that will be the input for the next layer, typically a pooling layer.

Pooling layers reduces the spatial size of the image (downsampling), reducing the computation in the network and also controlling overfitting.

Fully connected layers are traditional Neural Network layers.

## Installing the Python* libraries

To install the necessary Python libraries on Linux, you need to run:
```
sudo pip3 install keras tensorflow
```
## Run the tutorial

```
cd ~/smart-video-workshop/dl-model-training
python3 Deep_Learning_Tutorial.py
```

## How the tutorial code works

The complete code for this tutorial can be found [here](https://github.com/intel-iot-devkit/smart-video-workshop/blob/master/dl-model-training/Deep_Learning_Tutorial.py)

### Importing the necessary objects from Keras*

[Sequential Network Model](https://keras.io/models/sequential):

```Python
from keras.models import Sequential
```
[Core Layers](https://keras.io/layers/core/):
  * **Dense:** densely-connected NN layer, to be used as classification layer
  * **Flatten:** layer to flatten the convolutional layers

```Python
from keras.layers import Dense, Flatten
```
[Convolutional Layers](https://keras.io/layers/convolutional/):
  * **Conv2D:** 2D convolution Layer

```Python
from keras.layers import Conv2D
```
[Pooling Layer](https://keras.io/layers/pooling/):
  * **MaxPooling2D:** Max pooling operation for spatial data

```Python
from keras.layers import MaxPooling2D
```
[Utilities](https://keras.io/utils/):
```Python
from keras.utils import np_utils
```
[MNIST Dataset](https://keras.io/datasets/):
  * Dataset of 60,000 28x28 handwritten images of the 10 digits, along with a test set of 10,000 images.

```Python  
from keras.datasets import mnist
```
### Download and load the MNIST database
This will load the MNIST Dataset on four different variables:
  * **train_set:** Dataset with the training data (60k elements)
  * **train_classes:** Dataset with the equivalent training classes (60k elements)
  * **test_dataset:** Dataset with test data (10k elements)
  * **test_classes:** Dataset with the equivalent test classes (10k elements)

```Python
(train_dataset, train_classes),(test_dataset, test_classes) = mnist.load_data()
```
**NOTE:** only on the first run on your machine, this will download the MNIST Dataset.

### Adjust the datasets to TensorFlow*

First step, we need to reduce the image channels, from 3 (color) to 1 (grayscale):
```Python
train_dataset = train_dataset.reshape(train_dataset.shape[0], 28, 28, 1)
test_dataset = test_dataset.reshape(test_dataset.shape[0], 28, 28, 1)
```

Second step, we will convert the data from int8 to float32:
```Python
train_dataset = train_dataset.astype('float32')
test_dataset = test_dataset.astype('float32')
```
Third step, we need to normalize the data to speed up processing time:

```Python
train_dataset = train_dataset / 255
test_dataset = test_dataset / 255
```
Forth step, convert the classes data from numerical to categorical:
```Python
train_classes = np_utils.to_categorical(train_classes, 10)
test_classes = np_utils.to_categorical(test_classes, 10)
```
Now the data is ready to be processed by the CNN.

### Create our Convolutional Neural Network (CNN)

It is very simple and easy to create Neural Networks with Keras. We basically create the network, add the necessary layers, compile and execute the training.

First thing is to create a Sequential Neural Network:
```Python
cnn = Sequential()
```
We now add the input layer, a 2D convolutional layer with 32 filters, 3x3 filter kernel size, input shape of 28 x 28 x 1 (as we adjusted on the training dataset) and using Rectified Linear Unit (relu) as the activation function.

```Python
cnn.add(Conv2D(32, (3,3), input_shape = (28, 28, 1), activation = 'relu'))
```
**NOTE:** We need to inform the *input_shape* parameter only if the convolutional layer is the input layer (first CNN layer). If you add another layers later on, you don't need to use this parameter.

We add one Pooling layer using the default 2x2 size. This means that this layer will reduce by half the input image  in both spatial dimentions.
```Python
cnn.add(MaxPooling2D())
```
At this point, a traditional LeNet network would add another two layers, one convolutional and one pooling, basically repeating the two lines of code we just created, (removing the *input_shape* from the first one). As explained before, to speed processing time and make it more easy to understand, we decided to use just the two layers we just created.

Now we need to convert the output of the polling layer from a matrix to a vector, to be used by the classification part of our neural network. We do that using on flattening layer:
```Python
cnn.add(Flatten())
```

Our data is now ready for the classification part of our neural network, that will be implemented using just two layers, one hidden layer and one output layer.

The first classification layer will be a fully-connected layer with 128 neurons and using rectified linear unit as the activation function.
```Python
cnn.add(Dense(units = 128, activation = 'relu'))
```

We now add another fully-connected layer that will be our output layer. Please note that this layer has 10 neurons, because we have 10 classes on our dataset. The activation function user here is Softmax.
```Python
cnn.add(Dense(units = 10, activation = 'softmax'))
```

Before we train the model, we need to "compile" it to configure the learning process.

We will compile the CNN using *categorical crossentropy* as the loss function, *adam* as the optimizer and using accuracy as the results evaluation metric that will be show on the end of each apoch and also on the end of the training process.

```Python
cnn.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
```
**NOTE:** Adam is a gradient descent optimization algorithm. A good introduction to Adam can be found [here](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/).

Our CNN is now ready to be trained.

### Training our CNN

To train the CNN we call the Fit method. On this training we will define:
  * **Training dataset and training classes:** our training dataset and training classes adjusted on the beginning of this tutorial.
  * **Batch size:** number of samples to be used per each gradient update, in our case, 128 (default is 32).
  * **epochs:**  number of epochs that will be used on the training, in our case, 5 (for time saving purposes).
  * **validation_data:** the dataset used to validate the training on the end of each epoch. Here is where we inform out test dataset.

```Python
cnn.fit(train_dataset, train_classes, batch_size = 128, epochs = 5, validation_data = (test_dataset, test_classes))
```
It will take a few minutes to run, and it will inform you the progress on the console. Note that it will inform the evolution of the loss (*loss:*) and accuracy (*acc:*) during the execution of each epoch, and this data is computed using the training data, so it cannot be used to evaluate the improvement of the epoch on the overall accuracy.

At the end of each epoch, Keras will use the test dataset we provided to evaluate the epoch results, and this data will be displayed as *val_loss:* and *val_acc:* and those are good parameters to follow on each epoch to see how the accuracy improves. In general, the more epochs you run, more accuracy you will have (and more time you will need to run the training), but **increasing the number of epochs is just one drop on the ocean of possibilities we have to optimize our CNN.**

### Evaluating the training results

The simplest way to evaluate the training results is to use the *evaluate* method. It will show the same data as we saw on *val_loss* and *val_acc* on the end of the last epoch, but now we can use this data. On our tutorial, we will just print it on the console:
```Python
result = cnn.evaluate(test_dataset, test_classes)
print ('Accuracy = ' + str(result[1] * 100) + "%")
```
To have detailed information about our network accuracy for each class, we can use one confusion matrix (a.k.a error matrix). *Scikit-learn* library can be used to do that and more information about it can be found [here](http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html). We will not implement the confusion matrix on this tutorial, but there are several online samples on how to create a confusion matrix using Keras and Scikit-learn and also on how to interpret the results.

<a href='dl-model-training/README.md' target='_blank' class='big-jupyter-button'>Go to Lab: dl-model-training/README.md</a>
# Object detection with Intel® Distribution of OpenVINO™ toolkit 

This tutorial uses a Single Shot MultiBox Detector (SSD) on a trained mobilenet-ssd* model to walk you through the basic steps of using two key components of the Intel® Distribution of OpenVINO™ toolkit: Model Optimizer and Inference Engine. 

Model Optimizer is a cross-platform command-line tool that takes pre-trained deep learning models and optimizes them for performance/space using conservative topology transformations. It performs static model analysis and adjusts deep learning models for optimal execution on end-point target devices. 

Inference is the process of using a trained neural network to interpret data, such as images. This lab feeds a short video of cars, frame-by-frame, to the Inference Engine which subsequently utilizes an optimized trained neural network to detect cars. 

### Download workshop content and set directory path
#### 1. Create the workshop directory

	sudo mkdir -p /opt/intel/workshop/
	
#### 2. Change ownership of the workshop directory to the current user 

> **Note:** *replace the usernames below with your user account name*
		
	sudo chown username.username -R /opt/intel/workshop/

#### 3. Navigate to the new directory

	cd /opt/intel/workshop/

#### 4. Download and clone the workshop content to the current directory (/opt/intel/workshop/smart-video-workshop).

	git clone https://github.com/intel-iot-devkit/smart-video-workshop.git
	
#### 5. Set short path for the workshop directory

	export SV=/opt/intel/workshop/smart-video-workshop/
    
## Part 1: Optimize a deep-learning model using the Model Optimizer (MO)

In this section, you will use the Model Optimizer to convert a trained model to two Intermediate Representation (IR) files (one .bin and one .xml). The Inference Engine requires this model conversion so that it can use the IR as input and achieve optimum performance on Intel hardware.

#### 1. Create a directory to store IR files
 	
	cd $SV/object-detection/
	mkdir -p mobilenet-ssd/FP32 

#### 2. Navigate to the Intel® Distribution of OpenVINO™ toolkit install directory

	cd /opt/intel/openvino/deployment_tools/model_optimizer

#### 3. Run the Model Optimizer on the pretrained Caffe* model. This step generates one .xml file and one .bin file and place both files in the tutorial samples directory (located here: /object-detection/)

	python3 mo_caffe.py --input_model /opt/intel/openvino/deployment_tools/tools/model_downloader/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel -o $SV/object-detection/mobilenet-ssd/FP32 --scale 256 --mean_values [127,127,127]

> **Note:** Although this tutorial uses Single Shot MultiBox Detector (SSD) on a trained mobilenet-ssd* model, the Inference Engine is compatible with other neural network architectures, such as AlexNet*, GoogleNet*, MxNet* etc.

<br>

The Model Optimizer converts a pretrained Caffe* model to make it compatible with the Intel Inference Engine and optimizes it for Intel® architecture. These are the files you would include with your C++ application to apply inference to visual data.
	
> **Note:** if you continue to train or make changes to the Caffe* model, you would then need to re-run the Model Optimizer on the updated model.

#### 4. Navigate to the tutorial sample model directory

	cd $SV/object-detection/mobilenet-ssd/FP32

#### 5. Verify creation of the optimized model files (the IR files)

	ls

You should see the following two files listed in this directory: **mobilenet-ssd.xml** and **mobilenet-ssd.bin**


## Part 2: Use the mobilenet-ssd* model and Inference Engine in an object detection application


#### 1. Open the sample app (main.cpp) in the editor of your choice to view the lines that call the Inference Engine.

	cd $SV/object-detection/
	gedit main.cpp

* Line 130 &#8212; loads the Inference Engine plugin for use within the application
* Line 144 &#8212; initializes the network object
* Line 210 &#8212; loads model to the plugin
* Line 228 &#8212; allocate input blobs
* Line 238 &#8212; allocate output blobs
* Line 289 &#8212; runs inference using the optimized model


#### 2. Close the source file

#### 3. Source your environmental variables

	source /opt/intel/openvino/bin/setupvars.sh

#### 4. Build the sample application with make file

 	cd $SV/object-detection/
	make

#### 5. Download the test video file to the object-detection folder. 
Put the below link in your favorite browser. 

	https://pixabay.com/en/videos/download/video-1900_source.mp4?attachment
	
Cars - 1900.mp4 file will get downloaded. Put that file in the $SV/object-detection folder. 

	mv ~/Downloads/Cars\ -\ 1900.mp4 .

#### 6. Run the sample application to use the Inference Engine on the test video
The below command runs the application 
	 
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml 
 
> **Note:** If you get an error related to "undefined reference to 'google::FlagRegisterer...", try uninstalling libgflags-dev: sudo apt-get remove libgflags-dev

#### 7. Display output
For simplicity of the code and in order to put more focus on the performance number, video rendering with rectangle boxes for detected objects has been separated from main.cpp. 

	 make -f Makefile_ROIviewer 
	./ROIviewer -i $SV/object-detection/Cars\ -\ 1900.mp4 -l $SV/object-detection/pascal_voc_classes.txt 
	
You should see a video play with cars running on the highway and red bounding boxes around them. 

Here are the parameters used in the above command to run the application:

	./tutorial1 -h

		-h              Print a usage message
		-i <path>       Required. Path to input video file
		-model <path>   Required. Path to model file.
		-b #            Batch size.
		-thresh #       Threshold (0-1: .5=50%)
		-d <device>     Infer target device (CPU or GPU or MYRIAD)
		-fr #           Maximum frames to process
	

## Part 3: Run the example on different hardware

**IT'S BEST TO OPEN A NEW TERMINAL WINDOW SO THAT YOU CAN COMPARE THE RESULTS**

 Make sure that you have sourced the environmental variables for each newly opened terminal window.
 
	source /opt/intel/openvino/bin/setupvars.sh
	
	export SV=/opt/intel/workshop/smart-video-workshop/
	
	cd $SV/object-detection
 
#### 1. CPU
```
./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml -d CPU
```
You will see the **total time** it took to run the inference.

#### 2. GPU
Since you installed the OpenCL™ drivers to use the GPU, you can run the inference on GPU and compare the difference.

Set target hardware as GPU with **-d GPU**
```
./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml -d GPU
```

The **total time** between CPU and GPU will vary depending on your system.

<a href='object-detection/README.md' target='_blank' class='big-jupyter-button'>Go to Lab: object-detection/README.md</a>
interact-face-detection.md

<a href='up2-vision-kit/openvino-projects-using-iss2019.md' target='_blank' class='big-jupyter-button'>Go to Lab: up2-vision-kit/openvino-projects-using-iss2019.md</a>
# Optimizing Computer Vision Applications
This tutorial shows some techniques to get better performance for computer vision applications with the Intel® Distribution of OpenVINO™ toolkit. 


## 1. Tune parameters - set batch size
In this section, we will see how changes in the batch size affect the performance. We will use the SSD300 model for the experiments.  

The default batch size for the Model Optimizer is 1. 

### Let us first look at the performance numbers for the batch size 1. 

	export SV=/opt/intel/workshop/smart-video-workshop/
	source /opt/intel/openvino/bin/setupvars.sh
	cd $SV/object-detection
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml


### Change the batch size to 2 and run the object-detection example for new batch size

	cd $SV/object-detection
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml -b 2

### Run the example for different batch sizes 
Change the batch sizes to 8,16,32,64,128 and so on and see the performance diffrence in terms of the inference time.


### 2. Pick the right model based on application and hardware
Use/train a model with the right performance/accuracy tradeoffs. Performance differences between models can be bigger than any optimization you can do at the inference app level.
Run various SSD models from the model_downloader in the car detection example which we used in the initial tutorial and observe the performance. We will run these tests on different hardware accelerators to determine how application performance depends on models as well as hardware. 

### Run Model Optimizer on the models to get IR files
	cd $SV/object-detection
	mkdir -p SSD512/{FP16,FP32} 
	mkdir -p SSD300/{FP16,FP32} 
	
	cd /opt/intel/openvino/deployment_tools/model_optimizer
	
	python3 mo_caffe.py --input_model /opt/intel/openvino/deployment_tools/tools/model_downloader/object_detection/common/ssd/512/caffe/ssd512.caffemodel -o $SV/object-detection/SSD512/FP32
	
	python3 mo_caffe.py --input_model /opt/intel/openvino/deployment_tools/tools/model_downloader/object_detection/common/ssd/512/caffe/ssd512.caffemodel -o $SV/object-detection/SSD512/FP16 --data_type FP16
	
	python3 mo_caffe.py --input_model /opt/intel/openvino/deployment_tools/tools/model_downloader/object_detection/common/ssd/300/caffe/ssd300.caffemodel -o $SV/object-detection/SSD300/FP32
	
	python3 mo_caffe.py --input_model /opt/intel/openvino/deployment_tools/tools/model_downloader/object_detection/common/ssd/300/caffe/ssd300.caffemodel -o $SV/object-detection/SSD300/FP16 --data_type FP16
		
### Set environmental variables and navigate to object detection tutorial directory

	source /opt/intel/openvino/bin/setupvars.sh
	cd $SV/object-detection

#### a) CPU
 
 	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD300/FP32/ssd300.xml
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD512/FP32/ssd512.xml
	
	
#### b) GPU
 
 	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml -d GPU
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD300/FP32/ssd300.xml -d GPU
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD512/FP32/ssd512.xml -d GPU
	
	
#### c) Intel® Movidius™ Neural Compute Stick

	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP16/mobilenet-ssd.xml -d MYRIAD
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD300/FP16/ssd300.xml -d MYRIAD
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD512/FP16/ssd512.xml -d MYRIAD
	
> **Note**: There is often USB write error for Intel® Movidius™ Neural Compute Stick, please try re-running the command. Sometimes it takes 3 trials. 

	
### 3. Use the right data type for your target harware and accuracy needs
In this section, we will consider an example running on a GPU. FP16 operations are better optimized than FP32 on GPUs. We will run the object detection example with SSD models with data types FP16 and FP32 and observe the performance difference. 

	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD300/FP32/ssd300.xml -d GPU 
	
	./tutorial1 -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/SSD300/FP16/ssd300.xml -d GPU

It is clear that we got better performance with FP16 models. 


### 4. Use async
The async API can improve the overall frame rate of the application. While the accelerator is busy with running inference operations, the application can continue encoding, decoding or post inference data processing on the host. For this section, we will use the object_detection_demo_ssd_async sample. This sample makes asynchronous requests to the inference engine. This reduces the inference request latency, so that the overall framerate is determined by the MAXIMUM(detection time, input capturing time) and not the SUM(detection time, input capturing time).
#### a) Navigate to the object_detection_demo_ssd_async sample build directory

	cd $HOME/inference_engine_samples_build/intel64/Release
    
#### b) Run the async example

	./object_detection_demo_ssd_async -i $SV/object-detection/Cars\ -\ 1900.mp4 -m $SV/object-detection/mobilenet-ssd/FP32/mobilenet-ssd.xml

> Press tab to switch to sync mode. Observe the number of fps (frames per second) for both sync and async mode. The number frames processed per second are more in async than the sync mode. 

There are important performance caveats though. Tasks that run in parallel should try to avoid oversubscribing to shared computing resources. For example, if the inference tasks are running on the FPGA and the CPU is essentially idle, then it makes sense to run tasks on the CPU in parallel. 

<a href='optimization-tools-and-techniques/README.md' target='_blank' class='big-jupyter-button'>Go to Lab: optimization-tools-and-techniques/README.md</a>


Autosave disabled
