<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Building a Multi-DNN DeepStream Application #
DeepStream pipelines can be constructed to perform complex analytics that involve multiple neural networks. One common use case for this would be to use a detector as a primary inference engine to localize an object and a classifier as a secondary inference engine. This is useful since classification models can often perform better on single objects within the frame. 

## Learning Objectives ##
In this notebook, you will learn how to build a Multi-DNN DeepStream pipeline using Python, including: 
* Planning the Pipeline Architecture
* Using Specification File to Configure Deep Learning Inference
* Handling Metadata

**Table of Contents**
<br>
This notebook covers the below sections:  
1. [Designing the Pipeline](#s1)
    * [Exercise #1 - Preview the Input Video](#e1)
2. [Preparing the Deep Learning Models](#s2)
    * [Exercise #2 - Download TrafficCamNet and VehicleTypeNet Models](#e2)
3. [Building a Video AI Application](#s3)
    * [Pipeline Components](#s3.1)
    * [Exercise #3 - Initializing GStreamer and Pipeline](#e3)
    * [Exercise #4 - Creating Pipeline Elements](#e4)
    * [Exercise #5 - Modify the GIE Configuration Files](#e5)
    * [Exercise #6 - Linking Pipeline Elements](#e6)
    * [Exercise #7 - Add Probe to OSD Sink](#e7)
    * [Exercise #8 - Starting the Pipeline](#e8)
    * [Viewing Inference Results](#s3.2)

<a name='s1'></a>
## Designing the Pipeline ##
Building a video AI application begins by designing the project based on the use case. For this activity, we will build a DeepStream pipeline that will accurately detect cars and classify the vehicle type from a parking garage camera feed. We will use pre-trained models available from NGC. Let's begin by looking at the raw input video and use the `ffprobe` command line utility to understand its format. 

<a name='e1'></a>
#### Exercise #1 - Preview the Input Video ####

**Instructions**: <br>
* Execute the cell to set the environment variables. 
* Execute the cell below to preview the input .mp4 video. 
* Modify the `<FIXME>`s only and execute the cell below to study the input video. 
* Mark the video properties in the cell below. 

In [25]:
# DO NOT CHANGE THIS CELL
import os

# Set the input video path to an environment variable
os.environ['TARGET_VIDEO_PATH']='data/sample_30.h264'
os.environ['TARGET_VIDEO_PATH_MP4']='sample_30.mp4'

target_video_path=os.environ['TARGET_VIDEO_PATH']
target_video_path_mp4=os.environ['TARGET_VIDEO_PATH_MP4']

In [26]:
# DO NOT CHANGE THIS CELL
from IPython.display import Video

# Convert the H.264 encoded video file to MP4 container file - this will generate the sample_30.mp4 file
!ffmpeg -i $TARGET_VIDEO_PATH $TARGET_VIDEO_PATH_MP4 \
        -y \
        -loglevel quiet

# View the input video
Video(target_video_path_mp4, width=720)

In [27]:
!ffprobe -i $TARGET_VIDEO_PATH

ffprobe version 3.4.8-0ubuntu0.2 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-li

Click ... to show **solution**. 

<a name='s2'></a>
## Preparing the Deep Learning Models ##
We'll be using two purpose-built models from NGC - the [TrafficCamNet](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/trafficcamnet) object detection model and the [VehicleTypeNet](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/vehicletypenet) classification model. We need to download and install the NGC CLI before using it. 

In [28]:
# DO NOT CHANGE THIS CELL
import os
os.environ['NGC_DIR']='/dli/task/ngc_assets'

# Download and install NGC CLI - this will create the ngc_assets folder
%env CLI=ngccli_cat_linux.zip
!mkdir -p $NGC_DIR/ngccli
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $NGC_DIR/ngccli
!unzip -o \
       -u "$NGC_DIR/ngccli/$CLI" \
       -d $NGC_DIR/ngccli/
!rm $NGC_DIR/ngccli/*.zip
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("NGC_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2025-02-14 10:13:22--  https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 18.165.83.53, 18.165.83.59, 18.165.83.119, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|18.165.83.53|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 48777813 (47M) [application/zip]
Saving to: ‘/dli/task/ngc_assets/ngccli/ngccli_cat_linux.zip’


2025-02-14 10:13:23 (265 MB/s) - ‘/dli/task/ngc_assets/ngccli/ngccli_cat_linux.zip’ saved [48777813/48777813]

Archive:  /dli/task/ngc_assets/ngccli/ngccli_cat_linux.zip


<a name='e2'></a>
#### Exercise #2 - Download TrafficCamNet and VehicleTypeNet Models ####

**Instructions**: <br>
* Modify the `<FIXME>`s only and execute the cell to download the NGC models. 

In [29]:
# Download the purpose-built TrafficCamNet model from NGC
!ngc registry model download-version nvidia/tao/trafficcamnet:pruned_v1.0 --dest $NGC_DIR

# Download the purpose-built VehicleTypeNet model from NGC
!ngc registry model download-version nvidia/tao/vehicletypenet:pruned_v1.0 --dest $NGC_DIR

Getting files to download...
[2K[32m⠙[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 3 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠹[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 3 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠼[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 3 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠴[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 3 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠦[0m [36m━━[0m • [32m0…[0m • [36mRemaining:[0m [36m…[0m • [31m?[0m • [33mElapsed:[0m [33m0…[0m • [34mTotal: 3 - Completed: 0 - Failed: 0[0m
[2K[1A[2K  [90m━━[0m • [32m…[0m • [36mRemaining:[0m [36m0…[0m • [31m…[0m • [33mElapsed:[0m [3

Click ... to show **solution**. 

<a name='s3'></a>
## Building a Video AI Application ##

<a name='s3.1'></a>
### Pipeline Components ###
This is the pipeline architecture of the application. We'll be using an object detection network to identify and localize the cars in the frames, followed by a secondary inference to classify vehicle types. 
<p><img src="images/deepstream_multi_gie_pipeline.png" width='720'></p>

<a name='e3'></a>
#### Exercise #3 - Initializing GStreamer and Pipeline ####

**Instructions**: <br>
* Execute the below cell to import the necessary libraries. 
* Modify the `<FIXME>`s only and execute the cell below to initialize GStreamer and instantiate a pipeline. 

In [30]:
# DO NOT CHANGE THIS CELL
# Import necessary GStreamer libraries and DeepStream python bindings
import gi
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst, GLib
from common.bus_call import bus_call
import pyds

In [31]:
import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst

# Initialize GStreamer
Gst.init(None)

# Create Pipeline
pipeline=Gst.Pipeline()
print('Created pipeline')

Created pipeline


Click ... to show **solution**. 

<a name='e4'></a>
#### Exercise #4 - Creating Elements ####

**Instructions**: <br>
* Modify the `<FIXME>` only and execute the below cell to creating the necessary pipeline elements and set their properties. 
* Execute the cell below to add the elements to the pipeline. 

In [32]:
# Create Source element for reading from a file and set the location property
source=Gst.ElementFactory.make("filesrc", "file-source")
source.set_property('location', "data/sample_30.h264")

# Create H264 Parser with h264parse as the input file is an elementary h264 stream
h264parser=Gst.ElementFactory.make("h264parse", "h264-parser")

# Create Decoder with nvv4l2decoder for accelerating decoding on GPU
decoder=Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")

# Create Streamux with nvstreammux to form batches for one or more sources and set properties
streammux=Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
streammux.set_property('width', 888) 
streammux.set_property('height', 696) 
streammux.set_property('batch-size', 1)

# Create Primary GStreamer Inference Element with nvinfer to run inference on the decoder's output after batching
pgie=Gst.ElementFactory.make("nvinfer", "primary-inference")

# Create Secondary Inference Element with nvinfer to run inference on the pgie's output
sgie=Gst.ElementFactory.make("nvinfer", "secondary-inference")

# Create Convertor to convert from YUV to RGBA as required by nvdsosd
nvvidconv1=Gst.ElementFactory.make("nvvideoconvert", "convertor1")

# Create OSD with nvdsosd to draw on the converted RGBA buffer
nvosd=Gst.ElementFactory.make("nvdsosd", "onscreendisplay")

# Create Convertor to convert from RGBA to I420 as required by encoder
nvvidconv2=Gst.ElementFactory.make("nvvideoconvert", "convertor2")

# Create Capsfilter to enforce frame image format
capsfilter=Gst.ElementFactory.make("capsfilter", "capsfilter")
caps=Gst.Caps.from_string("video/x-raw, format=I420")
capsfilter.set_property("caps", caps)

# Create Encoder to encode I420 formatted frames using the MPEG4 codec
encoder=Gst.ElementFactory.make("avenc_mpeg4", "encoder")
encoder.set_property("bitrate", 2000000)

# Create Sink with fakesink as the end point of the pipeline
sink=Gst.ElementFactory.make('filesink', 'filesink')
sink.set_property('location', 'output_04_raw.mpeg4')
sink.set_property("sync", 1)
print('Created elements')

Created elements


Click ... to show **solution**. 

In [33]:
# DO NOT CHANGE THIS CELL
# Add elements to pipeline
pipeline.add(source)
pipeline.add(h264parser)
pipeline.add(decoder)
pipeline.add(streammux)
pipeline.add(pgie)
pipeline.add(sgie)
pipeline.add(nvvidconv1)
pipeline.add(nvosd)
pipeline.add(nvvidconv2)
pipeline.add(capsfilter)
pipeline.add(encoder)
pipeline.add(sink)
print('Added elements to pipeline')

Added elements to pipeline


<a name='e5'></a>
#### Exercise #5 - Modify the GIE Configuration File(s) ####

**Instructions**: <br>
* Review the [primary gie configuration file](./spec_files/pgie_config_trafficcamnet_04.txt) (`./spec_files/pgie_config_trafficcamnet_04.txt`), which has been completed for you. 
* Modify the `<FIXME>`s only in the secondary gie configuration file, which has been started for you as [spec_files/sgie_config_vehicletypenet_04.txt](./spec_files/sgie_config_vehicletypenet_04.txt) (`./spec_files/sgie_config_vehicletypenet_04.txt`). 
* Execute the cell to set the `config-file-path` for the primary and secondary inference plugins. 
* Execute the cell below to modify the [labels.txt](./ngc_assets/vehicletypenet_vpruned_v1.0/labels.txt) to be appropriate for a classifier. 

In [34]:
!cat spec_files/sgie_config_vehicletypenet_04_soln.txt

[property]
gpu-id=0
net-scale-factor=1
tlt-model-key=tlt_encode
tlt-encoded-model=/dli/task/ngc_assets/vehicletypenet_vpruned_v1.0/resnet18_vehicletypenet_pruned.etlt
labelfile-path=/dli/task/ngc_assets/vehicletypenet_vpruned_v1.0/labels.txt
input-dims=3;224;224;0
uff-input-blob-name=input_1
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
network-type=1
num-detected-classes=6
model-color-format=0
process-mode=2
gie-unique-id=2
operate-on-gie-id=1
operate-on-class-ids=0
output-blob-names=predictions/Softmax

Click ... to show **solution**. 

In [35]:
# DO NOT CHANGE THIS CELL
# Set the location of the config file
pgie.set_property('config-file-path', 'spec_files/pgie_config_trafficcamnet_03.txt')
# sgie.set_property('config-file-path', 'spec_files/sgie_config_vehicletypenet_04.txt')
sgie.set_property('config-file-path', 'spec_files/sgie_config_vehicletypenet_04.txt')

In [36]:
%%writefile ngc_assets/vehicletypenet_vpruned_v1.0/labels.txt

coupe;largevehicle;sedan;suv;truck;van

Overwriting ngc_assets/vehicletypenet_vpruned_v1.0/labels.txt


<p><img src='images/tip.png' width=720></p>

For classifiers, `labels.txt` should be semicolon delimited. 

<a name='e6'></a>
#### Exercise #6 - Link Elements ####

**Instructions**: <br>
* Modify `<FIXME>`s only and execute the below cell to link elements. 

In [37]:
# Link elements together
source.link(h264parser)
h264parser.link(decoder)

# Link decoder source pad to streammux sink pad
decoder_srcpad=decoder.get_static_pad("src")    
streammux_sinkpad=streammux.get_request_pad("sink_0")
decoder_srcpad.link(streammux_sinkpad)

# Link the rest of the elements in the pipeline
streammux.link(pgie)
pgie.link(sgie)
sgie.link(nvvidconv1)
nvvidconv1.link(nvosd)
nvosd.link(nvvidconv2)
nvvidconv2.link(capsfilter)
capsfilter.link(encoder)
encoder.link(sink)
print('Linked elements in pipeline')

Linked elements in pipeline


Click ... to show **solution**. 

<a name='e7'></a>
#### Exercise #7 - Add Probe to OSD Sink Pad ####

**Instructions**: <br>
* Execute the cell to define the `osd_sink_pad_buffer_probe` function. 
* Execute the cell below to define a helper `analyze_meta` function that analyzes the metadata generated by the secondary inference plugin. 
* Modify `<FIXME>`s only and execute the below cell to add the probe callback function. 

We can use a similar probe function to access the metadata. However, in this case we also traverse the metadata generated from the secondary inference plugin. In this example our secondary inference was a classifier performed on the `car` class from the primary inference. We can access the metadata generated in `classifier_meta_list` after we cast it with `NvDsClassifierMeta.cast()`. Depending on how many secondary inferences there are, the `NvDsObjectMeta` object may have one or more `NvDsClassifierMeta` objects. We will also need to cast to `NvDsLabelInfo` class to get the resulting classification of the secondary inference(s). 

In [38]:
# DO NOT CHANGE THIS CELL
# Define the Probe Function
def osd_sink_pad_buffer_probe(pad, info):
    gst_buffer = info.get_buffer()

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list

    # Iterate through each frame in the batch metadata until the end
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        frame_num=frame_meta.frame_num
        num_obj = frame_meta.num_obj_meta
        l_obj=frame_meta.obj_meta_list
        
        print("Frame Number={} Number of Objects={}".format(frame_num, num_obj))
        
        # Iterate through each object in the frame metadata until the end
        while l_obj is not None:
            try:
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
                
                # Define an analyze_meta function to manipulate metadata
                analyze_meta(obj_meta)
            except StopIteration:
                break
                
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        
        try:
            l_frame=l_frame.next
        except StopIteration:
            break
    return Gst.PadProbeReturn.OK

In [39]:
# DO NOT CHANGE THIS CELL
PGIE_CLASS_ID_CAR=0

# Define helper function
def analyze_meta(obj_meta): 
    # Only car supports secondary inference
    if obj_meta.class_id == PGIE_CLASS_ID_CAR:     
        cls_meta=obj_meta.classifier_meta_list
        
        # Iterate through each class meta until the end
        while cls_meta is not None:
            cls=pyds.NvDsClassifierMeta.cast(cls_meta.data)
            # Get label info
            label_info=cls.label_info_list  
            
            # Iterate through each label info meta until the end
            while label_info is not None:
                # Cast data type of label from pyds.GList
                label_meta=pyds.glist_get_nvds_label_info(label_info.data)
                if cls.unique_component_id==2:
                    print('\t Type & Probability = {}% {}'.format(round(label_meta.result_prob*100), label_meta.result_label))
                try:
                    label_info=label_info.next
                except StopIteration:
                    break
            
            try:
                cls_meta=cls_meta.next
            except StopIteration:
                break
    return None

In [40]:
# Add probe to nvdsosd plugin's sink
osdsinkpad=nvosd.get_static_pad("sink")
osdsinkpad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe)
print('Attached probe')

Attached probe


Click ... to show **solution**. 

<a name='e8'></a>
#### Exercise #8 - Start the Pipeline ####

**Instructions**: <br>
* Execute the cell to add the message handler to the bus. 
* Modify the `<FIXME>`s only and execute below cell to start the DeepStream pipeline. 

In [41]:
# DO NOT CHANGE THIS CELL
# Create an event loop
loop=GLib.MainLoop()

# Feed GStreamer bus messages to loop
bus=pipeline.get_bus()
bus.add_signal_watch()
bus.connect ("message", bus_call, loop)
print('Added bus message handler')

Added bus message handler


In [42]:
# Start play back and listen to events - this will generate the output_04_raw.mpeg4 file
print("Starting pipeline \n")
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass

# Cleaning up as the pipeline comes to an end
pipeline.set_state(Gst.State.NULL)

Starting pipeline 



Error: gst-library-error-quark: Configuration file parsing failed (5): gstnvinfer.cpp(794): gst_nvinfer_start (): /GstPipeline:pipeline1/GstNvInfer:secondary-inference:
Config file path: spec_files/sgie_config_vehicletypenet_04.txt


<enum GST_STATE_CHANGE_SUCCESS of type Gst.StateChangeReturn>

In [43]:
print("Starting pipeline \n")
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass
pipeline.set_state(Gst.State.NULL)


Starting pipeline 



Error: gst-library-error-quark: Configuration file parsing failed (5): gstnvinfer.cpp(794): gst_nvinfer_start (): /GstPipeline:pipeline1/GstNvInfer:secondary-inference:
Config file path: spec_files/sgie_config_vehicletypenet_04.txt


<enum GST_STATE_CHANGE_SUCCESS of type Gst.StateChangeReturn>

Click ... to show **solution**. 

<a name='s3.2'></a>
## Viewing the Inference ##
In the next step, we convert the video file into a container file before playing it since the MPEG4 encoded video file can't be played directly in JupyterLab. The [FFmpeg](https://ffmpeg.org/) tool is a very fast video and audio converter with the general syntax: 
* `ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ...` 

When using the `ffmpeg` command, the `-i` option lets us read an input URL, the `-loglevel quiet` option suppresses the logs to reduce the output, and the `-y` flag overwrites any existing output file with the same name. 

In [44]:
# DO NOT CHANGE THIS CELL
# Convert MPEG4 video file to MP4 container file
!ffmpeg -i /dli/task/output_04_raw.mpeg4 /dli/task/output_04.mp4 \
        -y \
        -loglevel quiet

# View the output video
# View the output video
Video("output_04.mp4", width=720, embed=True)


<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>