<center><img src="https://upload.wikimedia.org/wikipedia/en/thumb/6/6d/Nvidia_image_logo.svg/1200px-Nvidia_image_logo.svg.png" width="250"></center>

In [None]:
# Copyright (c) 2020 NVIDIA

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

# DeepStream

In this notebook, we will introduce NVIDIA's DeepStream SDK and how it can be used for video analytics applications.

## Background

Deepstream applications introduce deep neural networks and other complex processing tasks into a stream processing pipeline to enable near real-time analytics on video and other sensor data.  Extracting meaningful insights from these sensors creates opportunities for improved operational efficiences and safety.  Cameras, for example, are the most deployed IoT sensor currently in use; cameras are foudn in our homes, on our streets, in parking los, shopping malls, warehouses, factories - they are everywhere.  The potential use of video analytics is enormous: access control, loss prevention, automated checkout, surveillance, safety, automated inspection (QA), package sort (smart logistics), traffic control/engineering, industrial automation, etc.

<center><img src="https://developer.nvidia.com/sites/default/files/akamai/ds_new.jpg" width="750"></center>
<center>Image credit: https://developer.nvidia.com/sites/default/files/akamai/ds_new.jpg</center>

Although intelligent video analytics (IVA) differs by industry and aplication, the flow from pixels to insights remains consistent across all use cases.  It is this common workflow that is the foundation for the DeepStream SDK generic streaming analytics plug and play architecture.

<center><img src="https://developer.nvidia.com/sites/default/files/akamai/DS_EdgetoCloud_GA_Productpage_Cropped.jpg" width="750"></center>
<center>Image credit: https://developer.nvidia.com/sites/default/files/akamai/DS_EdgetoCloud_GA_Productpage_Cropped.jpg</center>

More specifically, a DeepStream application is a set of modular plugins connected to form a processing pipeline.  Each plugin represents a functional block, e.g., inference using TensorRT, or multi-stream decode.  Hardware accelerated plugins interact with the underlying hardware (where applicable) to deliver maximum performance.  For example, the decode plugin interacts with NVDEC, and then inference plugin interacts with the GPU or DLA.  Each plugin can be instantiated multiple times within a pipeline as needed.

<center><img src="https://developer.nvidia.com/sites/default/files/akamai/ds-3-workflow%20%281%29.jpg" width="750"></center>
<center>Image credit: https://developer.nvidia.com/sites/default/files/akamai/ds-3-workflow%20%281%29.jpg</center>


Just a quick overview of a few of the accelerators located in our Jetson products:

- DLA - Deep Learning Accelerator (Jetson AGX Xavier only)
- PVA - Programmable Vision Accelerator (Jetson AGX Xavier only)
- ISP - Image Signal Processing
- VIC - Vision Image Compositor

## Deepstream SDK

The [NVIDIA DeepStream SDK](https://developer.nvidia.com/deepstream-sdk) is a streaming analytics toolkit based on the open source GStreamer multimedia framework.  The DeepStream SDK accelerates development of scalable IVA applicatins, making it easier for developers to build core deep learning networks instead of designing end-to-end applications from scratch.  The SDK is supported on systems that contain an NVIDIA Jetson module or an NVIDA dGPU adapter; it is comprisde of an extensible collection of hardware-accelerated plugins that interact with low-level libraries to optimize performance and defines a standardized metadata structure enabling custom/user-specific additions.

The materials and descriptions in this notebook provide a quick overview, for more detailed information and explanation of the DeepStream SDK refer to the following materials:

- [NVIDIA DeepStream SDK Development Guide](https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html)
- [NVIDIA DeepStream Plugin Manual](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/index.html)
- [NVIDIA DeepStream SDK API Reference Documentation](https://docs.nvidia.com/metropolis/deepstream/dev-guide/DeepStream%20Development%20Guide/baggage/index.html)

### Reference Applications

The DeepStream SDK is packaged with several test applications including pre-trained models, example configuration files and sample video streams that can be used to run those applications. Additional samples and source code examples provide enough information to jumpstart development efforts for most IVA use cases. Test applications demonstrate:

- How to use DeepStream elements (e.g., get source, decode and mux multiple streams, run inference on pre-trained models, annotate and render image)
- How to generate a batch of frames and run inference on it for better resource utilization
- How to add custom/user-specific metadata to any component of DeepStream
- And much more... see the NVIDIA DeepStream SDK Development Guide for complete details

### GStreamer Plugins

GStreamer is a framework for plugins, data flow, and media type handling/negotiation. It is used to create streaming media applications. Plugins are shared libraries that are dynamically loaded at runtime and can be extended and upgraded independently. When arranged and linked together, plugins form the processing pipeline that defines the data flow for a streaming media application. You can learn more about GStreamer through its extensive online documentation, beginning with ["What is GStreamer?"](https://gstreamer.freedesktop.org/documentation/application-development/introduction/gstreamer.html?gi-language=c).

In addition to the open source plugins you'll find in the GStreamer framework libraries, the DeepStream SDK includes NVIDIA hardware accelerated plugins that leverage GPU capabilities. For a complete list of DeepStream GStreamer plugins, see the [NVIDIA DeepStream Plugin Manual](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/index.html#page/DeepStream_Plugin_Manual%2Fdeepstream_plugin_introduction.html).

Some of the noteable NVIDIA Hardware Accelerated plugins include:
- [Gst-nvstreammux](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/DeepStream_Plugin_Manual/deepstream_plugin_details.02.03.html) - Batch streams before sending for AI inference.
- [Gst-nvinfer](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/DeepStream_Plugin_Manual/deepstream_plugin_details.02.01.html#wwpID0E0IZ0HA) - Run inference using TensorRT.
- [Gst-nvvideo4linux2](https://docs.nvidia.com/metropolis/deepstream/4.0.1/plugin-manual/index.html#page/DeepStream_Plugin_Manual/deepstream_plugin_details.02.12.html) - Decode video streams using the hardware accelerated decoder (NVDEC); Encode RAW data in I420 format to H264 or H265 output video stream using hardware accelerated encoder (NVENC).
- [Gst-nvvideoconvert](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/DeepStream_Plugin_Manual/deepstream_plugin_details.02.07.html) - Perform video color format conversion. The first Gst-nvvideoconvert plugin before Gst-nvdsosd plugin converts stream data from I420 to RGBA and the second Gst-nvvideoconvert plugin after Gst-nvdsosd plugin converts data from RGBA to I420.
- [Gst-nvdsosd](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/DeepStream_Plugin_Manual/deepstream_plugin_details.02.06.html) - Draw bounding boxes, text, and region of interest (ROI) polygons.
- [Gst-nvtracker](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/DeepStream_Plugin_Manual/deepstream_plugin_details.02.02.html) - Track object between frames.
- [Gst-nvmultistreamtiler](https://docs.nvidia.com/metropolis/deepstream/plugin-manual/index.html#page/DeepStream_Plugin_Manual%2Fdeepstream_plugin_details.02.05.html) - Composite a 2D tile from batched buffers.
- [Gst-nvv4l2decoder](https://developer.download.nvidia.com/embedded/L4T/r32_Release_v1.0/Docs/Accelerated_GStreamer_User_Guide.pdf) - Decode a video stream.
- [Gst-Nvv4l2h264enc](https://developer.download.nvidia.com/embedded/L4T/r32_Release_v1.0/Docs/Accelerated_GStreamer_User_Guide.pdf) - Encode a video stream.
- [Gst-NvArgusCameraSrc](https://developer.download.nvidia.com/embedded/L4T/r32_Release_v1.0/Docs/Accelerated_GStreamer_User_Guide.pdf) - Provide options to control ISP properties using the Argus API.

Let's do the same thing here where we make sure we are running at highest clock frequency so we get the most performance out of our device.

In [None]:
!echo nvidia | sudo -S nvpmodel -m 0

In [None]:
!echo nvidia | sudo -S nvpmodel -q

In [None]:
!echo nvidia | sudo -S jetson_clocks

## DeepStream Configuration Files

One of the ways to interact with DeepStream is to create/edit a configuration tile which tells the SDK how to construct the pipeline.

The Deepstream SDK comes with multiple sample configuration files that can be used out of the box for different use cases.  Some of these include (these are located in the `/opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/` directory on your Jetson device where DeepStream was installed:

- **config_infer_primary.txt:** Configures a nvinfer element as primary detector.
- **config_infer_secondary_carcolor.txt**, **config_infer_secondary_carmake.txt**, **config_infer_secondary_vehicletypes.txt:** Configure a nvinfer element as secondary classifier.
- **iou_config.txt:** Configures a low-level IOU (Intersection over Union) tracker.
- **tracker_config.yml:** Configures the NvDCF tracker.
- **source1_usb_dec_infer_resnet_int8.txt:** Demonstrates one USB camera as input.
- **source1_csi_dec_infer_resnet_int8.txt:** Demonstrates one CSI camera as input; for Jetson only.
- **source2_csi_usb_dec_infer_resnet_int8.txt:** Demonstrates one CSI camera and one USB camera as inputs; for Jetson only.
- **source6_csi_dec_infer_resnet_int8.txt:** Demonstrates six CSI cameras as inputs; for Jetson only.
- **source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt:** Demonstrates 8 Decode + Infer + Tracker; for Jetson Nano only.

We have created a few specialty configuration files based on these default ones specifically for this workshop.  The first configuration file we want to look at is the [source2.txt](deepstream-configs/source2.txt).  Feel free to look at the entire file, but a few key pieces are shown below.

First let's investigate the `source` portion of the configuration file.

In [None]:
!sed -n 28,39p deepstream-configs/source2.txt

Notice with the `source` portion of the configuration file (basically the pipeline input), we can set things like the type of video source (cameras, URI, RTSP, etc.) as well as memory types to use during the pipeline and GPU_ID (on the Jetson Nano, we only have 1 GPU, but on other machines, you could potentially have more).  For this example, notice that we are using MultiURI, but only providing 1 URI and then specifying `num-sources=2`.  This will just replicate the same video twice (in essence, streaming 2 videos, they just have the same content.

Next if we look at the `sink` portion of the configuration file, we can see a few options for streaming the output (i.e. output file, RTSP, etc.).

In [None]:
!sed -n 41,83p deepstream-configs/source2.txt

Here we can see a few different `sink` types.  First is the `EglSink` which will act as a on on screen display in most cases.  `sink1` represents output of the pipeline being streamed to a file (in this case an .mp4 file).  Notice with this option you can specify which codec to use (h264 or h265) as well as wehter to use the hardware encoder as part of the device or a software encoder for the video stream.  The last sink, `sink2`, represents output to an RTSP stream.  Once running, this can be viewed with most video streaming players (like VLC).  To access, you can simply navigate to `rtsp://<ip-address>:8554/ds-test` and the browser will ask to open VLC (or equivalent on your local machine).

Notice that only one of these is enabled at this point (for this particular configuration, we will be streaming directly to the on-screen display as an EglSink).  For the purposes of this lab, however, we will be using the second option (output to a file, mp4 video) so that it is viewable in Jupyter notebooks.

A few other configuration parameters that we could modify include `tiled-display` for creating the output view in tiled format, `osd` for on-screen display settings, and `streammux` which handles the multiplexing of the multiple streams.

In this configuration, you can also set paths and properties of the models you wish to use for inference.  For example...

In [None]:
!sed -n 122,180p deepstream-configs/source2.txt

Here we can see each of the properties designated with a `gie` represent one of the models that will be used in the inference pipeline.  In this case, we will be running a primary object detection network (i.e. `primary-gie`) which can detect vehicles and people.  We then pass that information on to a couple of different classification models (i.e. `secondary-gie[0,1,2]`) which are for classifying vehicle type, vehicle color, and vehicle make, respectively.

Notice there is a also a `tracker` here (which is not enabled) that could be used for giving each instance of vehicle or person in the video it's own unique ID.

Lastly, notice for each of the `gie` sections, we provide a `config-file`.  Inside these config files, you can find all of the components which correspond to the model itself including model path, batch size, classes, custom parsers, etc.

### Let's Try It

Now that we understand a little bit about the DeepStream configuration files which will help build the pipeline, we can now see a few examples of what the output looks like.  Note again that for this demonstration we will be creating an mp4 as output instead of on-screen display since we are inside of a Jupyter notebook, but the config files can be modified to use any of the `sink` options at a later time.

First, let's copy our configuration files to the location where Deepstream was installed so that it's easier for it to access.

In [None]:
!echo nvidia | sudo -S cp deepstream-configs/* /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/

Now to run the DeepStream SDK we can simply run `deepstream-app` with the appropriate parameters.

In [None]:
!deepstream-app --help-all

Since we have put most of our configuration inside the config files earlier, we can simply use those instead of command line arguments in order to successfully run the `deepstream-app`.

#### 2 Streams

Let's see what happens when we run 2 streams (notice if you look at this [config file](deepstream-configs/source2_mp4.txt), you will see that we are using a primary detector as well as 3 secondary classifiers).

You'll notice that when you first run the `deepstream-app` that there is an error.  This is alright, the error is just pointing out the the TensorRT engine does not exist and that is needs to build it.  Also you will notice if you read through some of the output that INT8 is not supported by this platform (namely, Jetson Nano); so eventhough we set INT8 mode in our configuration file, the app will automatically try to create the next-best-thing (i.e. FP16) instead.

In [None]:
!unset DISPLAY && deepstream-app -c /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/source2_mp4.txt

Notice that even with all 4 of those deep learning networks (1 detection and 3 classification), we are still able to achieve almost 30fps on 2 video streams.

Let's check out the output that we made.

In [None]:
from IPython.display import HTML

HTML("""
    <video width="960" height="720" controls>
        <source src="out_2streams.mp4" type="video/mp4">
    </video>
""")

#### 8 Streams

Now, let's try something a little bigger.  Now let's try to remove the secondary classifiers but increase the number of streams we are processing at once.  For this example, we will try 8 streams with the object detection model with the addition of the tracker.

In [None]:
!unset DISPLAY && deepstream-app -c /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/source8_mp4.txt

Notice for this one, we are actually sustaining 30FPS for each of the 8 streams across the entire length of the video.  This means that we are able to run that detection model and a KLT tracker on 8 streams of video at once and still achieve the full 30FPS that the video is being streamed.

In [None]:
from IPython.display import HTML

HTML("""
    <video width="960" height="720" controls>
        <source src="out_8streams.mp4" type="video/mp4">
    </video>
""")

### Python API with GStreamer

Up until now, to create the sample applications, we have been using the configuration files.  However, another way that you can use the DeepStream SDK is through the python API where you instantiate GStreamer plugins for each of the portions of your pipeline and then put them all together to form the final pipeline.

We will not cover this in this particular lab, but for more information, some of our Python samples can be found either in our DeepStream docker containers([dGPU](https://ngc.nvidia.com/catalog/containers/nvidia:deepstream), [Jetson](https://ngc.nvidia.com/catalog/containers/nvidia:deepstream-l4t)) on [NGC](http://ngc.nvidia.com) or on [Github](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps).

As just a quick sneak peak, here's what it would look like to build a very simple pipeline using the Python API.

Creating a source element for reading from a file
```python
source = Gst.ElementFactory.make("filesrc", "file-source")
if not source:
    sys.stderr.write(" Unable to create Source \n")
```

Creating a h264 parser
```python
print("Creating H264Parser \n")
h264parser = Gst.ElementFactory.make("h264parse", "h264-parser")
if not h264parser:
    sys.stderr.write(" Unable to create h264 parser \n")
```

Using nvdec_h264 for hardware accelerated decoding on GPU
```python
print("Creating Decoder \n")
decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")
if not decoder:
    sys.stderr.write(" Unable to create Nvv4l2 Decoder \n")
```

Create a nvstreammux instance to form batches for one or more sources
```python
streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
if not streammux:
    sys.stderr.write(" Unable to create NvStreamMux \n")
```

Setting up nvinfer to run inference on decoders output
```python
pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
if not pgie:
    sys.stderr.write(" Unable to create pgie \n")
```

Use a converter to convert from NV12 to RGBA as required by nvosd
```python
nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
if not nvvidconv:
    sys.stderr.write(" Unable to create nvvidconv \n")
```

Create OSD to draw on the converted RGBA buffer
```python
nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
if not nvosd:
    sys.stderr.write(" Unable to create nvosd \n")
```

Finally render the osd output
```python
print("Creating EGLSink \n")
sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
if not sink:
    sys.stderr.write(" Unable to create egl sink \n")
```

<center><img src="https://upload.wikimedia.org/wikipedia/en/thumb/6/6d/Nvidia_image_logo.svg/1200px-Nvidia_image_logo.svg.png" width="250"></center>