# Intel Media SDK Utilisation in Video Applications

In previous labs, we have focuced on DL inference on the video applications but a lot more going on a video analytics applications. Intel provides a lot more tool to enhace the process.

At this section, we will run 6 application developed with OpenCV, Inference Engine and Media SDK C++ APIs each named as `tutorial_0` to `tutorial_5`.

All the source code of these examples placed under `/home/intel/Tutorials/interop_tutorials` folder.


|App Name   | Decoding   | Pre-Process | Inference   | Post-Process | Encoding  |
|-----------|------------|-------------|-------------|--------------|-----------|
|Tutorial 0 | OpenCV     |     OpenCV  | OpenCV      | OpenCV       | OpenCV    |
|Tutorial 1 | OpenCV     |     OpenCV  | OpenVINO    | OpenCV       | OpenCV    |
|Tutorial 2 | Media SDK  |     OpenCV  | OpenVINO    | OpenCV       | OpenCV    |
|Tutorial 3 | Media SDK  |  Media SDK  | OpenVINO    | OpenCV       | OpenCV    |
|Tutorial 4 | Media SDK  |  Media SDK  | OpenVINO    | Media SDK    | OpenCV    |
|Tutorial 5 | Media SDK  |  Media SDK  | OpenVINO    | Media SDK    | Media SDK |

# Tutorial 0

Tutorial 0 is all implemented with OpenCV package of Intel(R) Distribution of OpenVINO. 

We want to show the legacy performance metrics with Tutorial 0 at start.

Let's open a new terminal. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh

cd /home/intel/Tutorials/interop_tutorials/tutorial_0

./tutorial0 -h

[usage]
	tutorial_0 [option]
	options:

		-h              Print a usage message
		-i <path>       Required. Path to input video file
		-fr <path>      Number of frames from stream to process
		-m <path>       Required. Path to Caffe deploy.prototxt file.
		-weights <path> Required. Path to Caffe weights in .caffemodel file.
		-l <path>       Required. Path to labels file.
		-thresh <val>   Confidence threshold for bounding boxes 0-1
		-s              Display less information on the screen
```

As seen from above example, this application is able to load Caffe model and run inference on it. 

There is also a run.sh which runs the sample application with provided video input and reports out the preprocess, inference and post-process results as below.

```bash

./run.sh

Batch: 255/256
	pre-stage:	7.08 ms/frame
	infer:		87.22 ms/frame
	post-stage:	0.48 ms/frame

Batch: 256/256
	pre-stage:	7.05 ms/frame
	infer:		86.85 ms/frame
	post-stage:	0.49 ms/frame

Batch: 257/256
	pre-stage:	6.92 ms/frame


> Pre-stage average:	6.69 ms/frame (decoding, color converting, resizing)
> Infer average:	87.82 ms/frame (inferencing)
> Post-stage average:	0.52 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 24.36 sec

Done!

```

Here you see, that inference takes ~86 ms.

You can see the output on the current directory you run as out.h264. Play the output file and see the inferences.

```bash
ls 
mplayer out.264
```

# Tutorial 1

Tutorial 1 only replaces the inference part with Intel(R) Distribution of OpenVINO Inference Engine.


Let's open a new terminal. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

cd /home/intel/Tutorials/interop_tutorials/tutorial_1

./tutorial1 -h

[usage]
	tutorial_1 [option]
	options:

		-h                  Print a usage message
		-i <path/filename>  Required. Path to input video file
		-fr <val>           Number of frames from stream to process
		-m <path/filename>  Required. Path to IR .xml file.
		-l <path/filename>  Required. Path to labels file.
		-d <device>         Infer target device (CPU or GPU or MYRIAD)
		-pc                 Enables per-layer performance report
		-thresh <val>       Confidence threshold for bounding boxes 0-1
		-batch <val>        Batch size
		-s                  Display less information on the screen
		-e <path/filename>  Load layer extension plugin
		-mean               Mean value for normalization of data during planar BGR blob preprocess step
		-scale              Scale value for normalization of data during planar BGR blob preprocess step
		-show               Display a window for composited image with ROI

```

As seen from above example, this application is able to load OpenVINO model and is able to run inference on CPU, GPU and MYRIAD. 

We will use run.sh file to run tutorial1 on CPU and run_gpu.sh to run inference on GPU.

We see that there is a particular improvement on inference. 

```bash

./run.sh

Batch: 255/256
	pre-stage:	10.52 ms/frame
	infer:		56.12 ms/frame
	post-stage:	0.45 ms/frame

Batch: 256/256
	pre-stage:	15.02 ms/frame
	infer:		52.21 ms/frame
	post-stage:	0.43 ms/frame

Batch: 257/256
	pre-stage:	9.70 ms/frame


> Pre-stage average:	10.41 ms/frame (decoding, color converting, resizing)
> Infer average:	54.44 ms/frame (inferencing)
> Post-stage average:	0.53 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 16.81 sec

Done!
```

Let's run on GPU, this time IE loads clDNNPlugin. 

```bash

./run_gpu.sh

Batch: 255/256
	pre-stage:	11.44 ms/frame
	infer:		28.59 ms/frame
	post-stage:	0.49 ms/frame

Batch: 256/256
	pre-stage:	7.80 ms/frame
	infer:		27.85 ms/frame
	post-stage:	0.52 ms/frame

Batch: 257/256
	pre-stage:	11.04 ms/frame


> Pre-stage average:	10.16 ms/frame (decoding, color converting, resizing)
> Infer average:	27.67 ms/frame (inferencing)
> Post-stage average:	0.66 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 9.93 sec

Done!

```

Final output of inference written to out.h264 file, you can play it with mplayer and see the inference output.

```bash
mplayer out.h264`
```

Let's get into details and investigate the perofmance little more.

## Performance Analysis

As we seen from the previous sections. We are able to get performance analysis. It is also implemented for this application. It will be a messy output to see each inference performance outputs.

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_1/tutorial1 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_768x768.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -pc
```


## Output Video

Let's see the output interactively with `-show` option.

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_1/tutorial1 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_768x768.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -show
```

## Batch Support

Let's see how we can use a larger batch size with this example.

At the output, you should realize that, it does show 32 inferences which means it packed 256 frames to batches and does inference at the all 8 same time. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_1/tutorial1 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_768x768.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -batch 8

Batch: 29/32
	pre-stage:	5.84 ms/frame
	infer:		21.98 ms/frame
	post-stage:	5.39 ms/frame

Batch: 30/32
	pre-stage:	6.43 ms/frame
	infer:		22.82 ms/frame
	post-stage:	5.38 ms/frame

Batch: 31/32
	pre-stage:	5.99 ms/frame
	infer:		22.39 ms/frame
	post-stage:	4.69 ms/frame



> Pre-stage average:	5.65 ms/frame (decoding, color converting, resizing)
> Infer average:	22.30 ms/frame (inferencing)
> Post-stage average:	5.31 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 7.49 sec

Done!

```

## Tutorial 2

In Tutorial 2 application, we add Media SDK for decoding only and see the difference on the pre-stage.

Let's open a new terminal. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

cd /home/intel/Tutorials/interop_tutorials/tutorial_2

./tutorial2 -h

[usage]
	tutorial_2 [option]
	options:

		-h                  Print a usage message
		-i <path/filename>  Required. Path to input video file, video elementary stream only
		-fr <val>           Number of frames from stream to process
		-m <path/filename>  Required. Path to IR .xml file.
		-l <path/filename>  Required. Path to labels file.
		-d <device>         Infer target device (CPU or GPU or MYRIAD)
		-pc                 Enables per-layer performance report
		-thresh <val>       Confidence threshold for bounding boxes 0-1
		-batch <val>        Batch size
		-s                  Display less information on the screen
		-e <path/filename>  Load layer extension plugin
		-mean               Mean value for normalization of data during planar BGR blob preprocess step
		-scale              Scale value for normalization of data during planar BGR blob preprocess step

```

As seen from above example, this application is able to load OpenVINO model and is able to run inference on CPU, GPU and MYRIAD. 

We will use run.sh file to run tutorial2 on CPU and run_gpu.sh to run inference on GPU.

We see that there is a particular improvement on inference. 

```bash

./run.sh

Batch: 254/256
	pre-stage:	9.02 ms/frame
	infer:		53.26 ms/frame
	post-stage:	0.51 ms/frame

Batch: 255/256
	pre-stage:	11.34 ms/frame
	infer:		54.30 ms/frame
	post-stage:	0.42 ms/frame

Batch: 256/256
	pre-stage:	11.53 ms/frame
	infer:		54.07 ms/frame
	post-stage:	0.41 ms/frame



> Pre-stage average:	11.65 ms/frame (decoding, color converting, resizing)
> Infer average:	55.35 ms/frame (inferencing)
> Post-stage average:	0.52 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 17.40 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)


```

Let's run on GPU, this time IE loads clDNNPlugin. 

```bash

./run_gpu.sh

Batch: 254/256
	pre-stage:	9.37 ms/frame
	infer:		39.96 ms/frame
	post-stage:	0.49 ms/frame

Batch: 255/256
	pre-stage:	11.23 ms/frame
	infer:		39.23 ms/frame
	post-stage:	0.53 ms/frame

Batch: 256/256
	pre-stage:	11.12 ms/frame
	infer:		40.44 ms/frame
	post-stage:	0.76 ms/frame



> Pre-stage average:	11.13 ms/frame (decoding, color converting, resizing)
> Infer average:	40.62 ms/frame (inferencing)
> Post-stage average:	0.60 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 13.50 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)
```

Final output of inference written to out.h264 file, you can play it with mplayer and see the inference output.

```bash
mplayer out.h264`
```

Let's get into details and investigate the perofmance little more.

## Performance Analysis

As we seen from the previous sections. We are able to get performance analysis. It is also implemented for this application. It will be a messy output to see each inference performance outputs.

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_2/tutorial2 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_768x768.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -pc
```

## Batch Support

Let's see how we can use a larger batch size with this example.

At the output, you should realize that, it does show 32 inferences which means it packed 256 frames to batches and does inference at the all 8 same time. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_2/tutorial2 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_768x768.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -batch 8

Batch: 28/32
	pre-stage:	6.44 ms/frame
	infer:		22.91 ms/frame
	post-stage:	1.61 ms/frame

Batch: 29/32
	pre-stage:	6.78 ms/frame
	infer:		22.73 ms/frame
	post-stage:	2.14 ms/frame

Batch: 30/32
	pre-stage:	6.56 ms/frame
	infer:		22.39 ms/frame
	post-stage:	1.61 ms/frame



> Pre-stage average:	6.76 ms/frame (decoding, color converting, resizing)
> Infer average:	22.60 ms/frame (inferencing)
> Post-stage average:	1.85 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 7.57 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)

```

# Tutorial 3

In Tutorial 3 application, we add Media SDK for decoding and pre-processing let's see difference on the pre-stage.

Let's open a new terminal. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

cd /home/intel/Tutorials/interop_tutorials/tutorial_3

./tutorial3 -h

[usage]
	tutorial_3 [option]
	options:

		-h                  Print a usage message
		-i <path/filename>  Required. Path to input video file, video elementary stream only
		-fr <val>           Number of frames from stream to process
		-m <path/filename>  Required. Path to IR .xml file.
		-l <path/filename>  Required. Path to labels file.
		-d <device>         Infer target device (CPU or GPU or MYRIAD)
		-pc                 Enables per-layer performance report
		-thresh <val>       Confidence threshold for bounding boxes 0-1
		-batch <val>            Batch size
		-s                  Display less information on the screen
		-e <path/filename>  Load layer extension plugin
		-mean               Mean value for normalization of data during planar BGR blob preprocess step
		-scale              Scale value for normalization of data during planar BGR blob preprocess step

```

As seen from above example, this application is able to load OpenVINO model and is able to run inference on CPU, GPU and MYRIAD. 

We will use run.sh file to run tutorial3 on CPU and run_gpu.sh to run inference on GPU.

We see that there is a particular improvement on inference. 

```bash

./run.sh

Batch: 254/256
	pre-stage:	8.09 ms/frame
	infer:		53.41 ms/frame
	post-stage:	0.45 ms/frame

Batch: 255/256
	pre-stage:	7.31 ms/frame
	infer:		56.35 ms/frame
	post-stage:	2.44 ms/frame

Batch: 256/256
	pre-stage:	10.92 ms/frame
	infer:		63.68 ms/frame
	post-stage:	0.44 ms/frame



> Pre-stage average:	7.97 ms/frame (decoding, color converting, resizing)
> Infer average:	55.22 ms/frame (inferencing)
> Post-stage average:	0.57 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 16.39 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)

```

Let's run on GPU, this time IE loads clDNNPlugin. 

```bash

./run_gpu.sh

Batch: 254/256
	pre-stage:	7.67 ms/frame
	infer:		40.82 ms/frame
	post-stage:	0.49 ms/frame

Batch: 255/256
	pre-stage:	7.72 ms/frame
	infer:		38.60 ms/frame
	post-stage:	0.52 ms/frame

Batch: 256/256
	pre-stage:	10.43 ms/frame
	infer:		39.46 ms/frame
	post-stage:	0.58 ms/frame



> Pre-stage average:	8.92 ms/frame (decoding, color converting, resizing)
> Infer average:	39.93 ms/frame (inferencing)
> Post-stage average:	0.76 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 12.77 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)


```

Final output of inference written to out.h264 file, you can play it with mplayer and see the inference output.

```bash
mplayer out.h264
```

Let's get into details and investigate the perofmance little more.

## Performance Analysis

As we seen from the previous sections. We are able to get performance analysis. It is also implemented for this application. It will be a messy output to see each inference performance outputs.

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_3/tutorial3 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_1920x1080.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -pc
```

## Batch Support

Let's see how we can use a larger batch size with this example.

At the output, you should realize that, it does show 32 inferences which means it packed 256 frames to batches and does inference at the all 8 same time. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_3/tutorial3 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_1920x1080.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -batch 8

Batch: 30/32
	pre-stage:	8.21 ms/frame
	infer:		21.79 ms/frame
	post-stage:	2.25 ms/frame

Batch: 31/32
	pre-stage:	9.61 ms/frame
	infer:		22.20 ms/frame
	post-stage:	2.32 ms/frame

Batch: 32/32
	pre-stage:	8.91 ms/frame
	infer:		22.20 ms/frame
	post-stage:	3.41 ms/frame



> Pre-stage average:	8.46 ms/frame (decoding, color converting, resizing)
> Infer average:	22.43 ms/frame (inferencing)
> Post-stage average:	2.33 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 8.53 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)


```

# Tutorial 4

In Tutorial 4 application, we add Media SDK for decoding, pre-processing and post-processing steps let's see difference on the pre-stage.

Let's open a new terminal. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

cd /home/intel/Tutorials/interop_tutorials/tutorial_4

./tutorial4 -h

[usage]
	tutorial_4 [option]
	options:

		-h                  Print a usage message
		-i <path/filename>  Required. Path to input video file, video elementary stream only
		-fr <val>           Number of frames from stream to process
		-m <path/filename>  Required. Path to IR .xml file.
		-l <path/filename>  Required. Path to labels file.
		-d <device>         Infer target device (CPU or GPU or MYRIAD)
		-pc                 Enables per-layer performance report
		-thresh <val>       Confidence threshold for bounding boxes 0-1
		-batch <val>        Batch size
		-s                  Display less information on the screen
		-e <path/filename>  Load layer extension plugin
		-mean               Mean value for normalization of data during planar BGR blob preprocess step
		-scale              Scale value for normalization of data during planar BGR blob preprocess step

```

As seen from above example, this application is able to load OpenVINO model and is able to run inference on CPU, GPU and MYRIAD. 

We will use run.sh file to run tutorial4 on CPU and run_gpu.sh to run inference on GPU.

We see that there is a particular improvement on inference. 

```bash

./run.sh

Batch: 254/256
	pre-stage:	3.79 ms/frame
	infer:		54.34 ms/frame
	post-stage:	0.40 ms/frame

Batch: 255/256
	pre-stage:	3.54 ms/frame
	infer:		52.75 ms/frame
	post-stage:	0.51 ms/frame

Batch: 256/256
	pre-stage:	6.91 ms/frame
	infer:		57.08 ms/frame
	post-stage:	0.42 ms/frame



> Pre-stage average:	4.97 ms/frame (decoding, color converting, resizing)
> Infer average:	55.67 ms/frame (inferencing)
> Post-stage average:	0.55 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 15.83 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)


```

Let's run on GPU, this time IE loads clDNNPlugin. 

```bash

./run_gpu.sh

Batch: 254/256
	pre-stage:	4.80 ms/frame
	infer:		39.26 ms/frame
	post-stage:	0.83 ms/frame

Batch: 255/256
	pre-stage:	7.55 ms/frame
	infer:		39.18 ms/frame
	post-stage:	1.60 ms/frame

Batch: 256/256
	pre-stage:	7.02 ms/frame
	infer:		38.16 ms/frame
	post-stage:	1.12 ms/frame



> Pre-stage average:	5.36 ms/frame (decoding, color converting, resizing)
> Infer average:	39.80 ms/frame (inferencing)
> Post-stage average:	0.69 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 11.87 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)

```

Final output of inference written to out.h264 file, you can play it with mplayer and see the inference output.

```bash
mplayer out.h264
```

Let's get into details and investigate the perofmance little more.

## Performance Analysis

As we seen from the previous sections. We are able to get performance analysis. It is also implemented for this application. It will be a messy output to see each inference performance outputs.

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_4/tutorial4 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_1920x1080.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -pc
```

## Batch Support

Let's see how we can use a larger batch size with this example.

At the output, you should realize that, it does show 32 inferences which means it packed 256 frames to batches and does inference at the all 8 same time. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_4/tutorial4 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_1920x1080.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -batch 8

Batch: 30/32
	pre-stage:	4.87 ms/frame
	infer:		22.76 ms/frame
	post-stage:	2.34 ms/frame

Batch: 31/32
	pre-stage:	4.74 ms/frame
	infer:		22.19 ms/frame
	post-stage:	2.23 ms/frame

Batch: 32/32
	pre-stage:	5.00 ms/frame
	infer:		22.05 ms/frame
	post-stage:	2.28 ms/frame



> Pre-stage average:	5.07 ms/frame (decoding, color converting, resizing)
> Infer average:	22.32 ms/frame (inferencing)
> Post-stage average:	2.29 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 7.70 sec

> Done ! (Output is in out.h264 -> $ mplayer out.h264)

```

# Tutorial 5

In Tutorial 5 application, we add Media SDK for decoding, pre-processing, post-processing and encoding steps let's see difference on the pre-stage.

Let's open a new terminal. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

cd /home/intel/Tutorials/interop_tutorials/tutorial_5

./tutorial5 -h

[usage]
	tutorial_5 [option]
	options:

		-h                  Print a usage message
		-i <path/filename>  Required. Path to input video file, video elementary stream only
		-fr <val>           Number of frames from stream to process
		-m <path/filename>  Required. Path to IR .xml file.
		-l <path/filename>  Required. Path to labels file.
		-d <device>         Infer target device (CPU or GPU or MYRIAD)
		-pc                 Enables per-layer performance report
		-thresh <val>       Confidence threshold for bounding boxes 0-1
		-b <val>            Batch size
		-s                  Display less information on the screen
		-e <path/filename>  Load layer extension plugin
		-mean               Mean value for normalization of data during planar BGR blob preprocess step
		-scale              Scale value for normalization of data during planar BGR blob preprocess step


```

As seen from above example, this application is able to load OpenVINO model and is able to run inference on CPU, GPU and MYRIAD. 

Let's run on GPU, this time IE loads clDNNPlugin. 

```bash

./run_gpu.sh

Batch: 255/256
	pre-stage:	6.05 ms/frame
	infer:		38.90 ms/frame
	post-stage:	2.40 ms/frame

Batch: 256/256
	pre-stage:	6.66 ms/frame
	infer:		37.13 ms/frame
	post-stage:	4.58 ms/frame

Batch: 257/256
	pre-stage:	6.17 ms/frame


> Pre-stage average:	5.97 ms/frame (decoding, color converting, resizing)
> Infer average:	42.21 ms/frame (inferencing)
> Post-stage average:	2.74 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 13.08 sec

Done!

```

Final output of inference written to out.h264 file, you can play it with mplayer and see the inference output.

```bash
mplayer out.h264
```

Let's get into details and investigate the perofmance little more.

## Performance Analysis

As we seen from the previous sections. We are able to get performance analysis. It is also implemented for this application. It will be a messy output to see each inference performance outputs.

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_5/tutorial5 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_1920x1080.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -pc
```

## Batch Support

Let's see how we can use a larger batch size with this example.

At the output, you should realize that, it does show 32 inferences which means it packed 256 frames to batches and does inference at the all 8 same time. 

```bash
source /opt/intel/computer_vision_sdk/bin/setupvars.sh
export LD_LIBRARY_PATH=/home/intel/inference_engine_samples/intel64/Release/lib:$LD_LIBRARY_PATH

/home/intel/Tutorials/interop_tutorials/tutorial_5/tutorial5 -m /home/intel/Tutorials/test_content/IR/SSD/SSD_GoogleNet_v2_fp16.xml -i /home/intel/Tutorials/test_content/video/cars_1920x1080.h264 -l /home/intel/Tutorials/test_content/IR/SSD/pascal_voc_classes.txt -d GPU -batch 8

Batch: 31/32
	pre-stage:	7.95 ms/frame
	infer:		22.76 ms/frame
	post-stage:	5.50 ms/frame

Batch: 32/32
	pre-stage:	8.23 ms/frame
	infer:		22.83 ms/frame
	post-stage:	5.95 ms/frame

Batch: 33/32
	pre-stage:	6.54 ms/frame


> Pre-stage average:	6.81 ms/frame (decoding, color converting, resizing)
> Infer average:	22.88 ms/frame (inferencing)
> Post-stage average:	6.16 ms/frame (drawing bounding box, encoding, saving)

> Total elapsed execution time: 9.26 sec

Done!


```