&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&ensp;
[Home Page](Start_Here.ipynb)
    
    

&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&ensp;
[1]
[2](Performance_Analysis_using_NSight_systems.ipynb)
[3](Performance_Analysis_using_NSight_systems_Continued.ipynb)
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
[Next Notebook](Performance_Analysis_using_NSight_systems.ipynb)

# Introduction to Performance analysis


In this notebook, we will get introduced to the various metrics used to measure the performance of a DeepStream pipeline and improve the performance of a DeepStream pipeline.

- [Latency, Throughput, and GPU Metrics](#Latency,-Throughput,-and-GPU-Metrics)
    - [Latency](#Latency)
    - [GPU Metrics](#GPU-Metrics)
    - [Throughput](#Throughput)
- [Case 1 : Multi-stream cascaded network pipeline](#Case-1:-Multi-stream-cascaded-network-pipeline.)
    - [Bench-marking with GST Probes](#Benchmarking-with-GST-Probes)
    - [Effects on OSD,Tiler & Queues](#Effects-on-OSD,-Tiler,-and-Queues)
- [Summary](#Summary)

## Latency, Throughput, and GPU Metrics


### Latency

Latency is important for real-time pipelines that are time-critical. Latency in a DeepStream pipeline can be measured using GStreamer debugging capabilities. By setting the `GST-DEBUG` environment variable to `GST_SCHEDULING:7`, we get a trace log that contains details on when the buffers are modified from which we can obtain detailed information about our pipeline.

In [None]:
#To make sure that right paths to the NVidia Libraries are added run this cell first
!rm ~/.cache/gstreamer-1.0/registry.x86_64.bin
!export LD_LIBRARY_PATH=/opt/tensorrtserver/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs:$LD_LIBRARY_PATH

In [None]:
!GST_DEBUG="GST_SCHEDULING:7" GST_DEBUG_FILE=../source_code/trace.log \
python3 ../source_code/deepstream-app-1/deepstream_test_1.py '/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_720p.h264'

The `trace.log` file is huge, and here is a small portion of the file that highlights the time a buffer entered the decoder plugin and the time the buffer enters the next input.

**DGX V100**
```bash
0:00:01.641136185 GST_SCHEDULING gstpad.c:4320:gst_pad_chain_data_unchecked:<nvv4l2-decoder:sink>[00m calling chainfunction &gst_video_decoder_chain with buffer buffer: 0x7ff010028d90, pts 99:99:99.999999999, dts 0:00:02.966666637, dur 0:00:00.033333333, size 30487, offset 947619, offset_end 1013155, flags 0x2000

00:01.648137739 GST_SCHEDULING gstpad.c:4320:gst_pad_chain_data_unchecked:<Stream-muxer:sink_0>[00m calling chainfunction &gst_nvstreammux_chain with buffer buffer: 0x7ff01001c5f0, pts 0:00:02.966666637, dts 99:99:99.999999999, dur 0:00:00.033333333, size 64, offset none, offset_end none, flags 0x0
```

**DGX A100**
```bash
0:00:32.913377171  GST_SCHEDULING gstpad.c:4323:gst_pad_chain_data_unchecked:<nvv4l2-decoder:sink> calling chainfunction &gst_video_decoder_chain with buffer buffer: 0x7fe23432d360, pts 99:99:99.999999999, dts 0:00:00.033333333, dur 0:00:00.033333333, size 25072, offset 48337, offset_end 113873, flags 0x2000
------
0:00:32.914303327  GST_SCHEDULING gstpad.c:4323:gst_pad_chain_data_unchecked:<nvv4l2-decoder:sink> calling chainfunction &gst_video_decoder_chain with buffer buffer: 0x7fe23432d120, pts 99:99:99.999999999, dts 0:00:00.066666666, dur 0:00:00.033333333, size 8792, offset 73409, offset_end 138945, flags 0x2000

```

Here latency can be calculated by looking at the time difference between the stream entering one element to the other in the pipeline. In the output shown above, it is `~7ms (00:01.648137739 - 0:00:01.641136185) on V100 and  ~9us (0:00:32.914303327 - 0:00:32.913377171) on A100`. It is these timestamps that help us denote the latency. 

For more details, check [GStreamer's documentation on Latency](https://gstreamer.freedesktop.org/documentation/additional/design/latency.html?gi-language=c)

### GPU Metrics

We can use `nvidia-smi` to explore the GPU performance metrics while our application is running. GPU utilization is something we want to pay attention to, and we will discuss it below. Run the cell below to re-run the application while logging the results of `nvidia-smi`

In [None]:
!nvidia-smi dmon -i 0 -s ucmt -c 8 > ../source_code/smi.log & \
python3 ../source_code/deepstream-app-1/deepstream_test_1.py '/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_720p.h264'

We can open the `smi.log` file to investigate our utilization metrics. 

In [None]:
!cat ../source_code/smi.log

#### Understanding nvidia-smi
The cell block above passed the following arguments to `nvidia-smi` :

- `dmon -i 0` 

    - Reports default metrics (device monitoring) for the devices selected by comma-separated device list. In this case, we are reporting default metrics for GPU with index 0 since that is the GPU we are using.
- `-s ucmt` : 
    - We can choose which metrics we want to display. In this case, we supplied ucmt to indicate we want metrics for
        - u: Utilization (SM, Memory, Encoder and Decoder Utilization in %) 
        - c: Proc and Mem Clocks (in MHz)
        - m: Frame Buffer and Bar1 memory usage (in MB)
        - t: PCIe Rx and Tx Throughput in MB/s (Maxwell and above)
- `-c 8`
    - We can configure the number of iterations for which we are monitoring. In this case, we choose 8 iterations.

Let's dive a bit deeper into a few of the metrics that we've selected since they are particularly useful to
monitor.

Utilization metrics report how busy each GPU is over time and can be used to determine how much an application is using the GPUs in the system. In particular, the `sm` column tracks the percent of the time over the past sample period during which one or more kernels were executing on the GPU. `fb` reports the GPU's frame buffer memory usage.

### Throughput 

The Throughput of the pipeline gives us an idea of the dataflow, which helps us understand how many Streams it can process concurrently at a required FPS. In this set of notebooks, we would mainly concentrate on increasing our pipelines' FPS using various optimizations.


## Case 1: Multi-stream cascaded network pipeline.

In this section, we will optimize a Multi-stream network that was part of the problem statement in the Introduction to DeepStream notebooks.

We will utilize our `deepstream-test-2-app` to include multi-stream functionalities using the `Streammux` plugin.


![Pipeline](images/app-2.png)


### Benchmarking with GST-Probes


Here we'll import the `GETFPS` Class and use the `get_fps()` method inside it to calculate the average FPS of our stream. This is part of [DeepStream Python Apps Github Repository](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps), here we have modified the average FPS output time from 5s to 1s for benchmarking purposes.


In [None]:
# Import required libraries 
import sys
sys.path.append('../source_code')
import gi
import configparser
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst
from gi.repository import GLib
from ctypes import *
import time
import sys
import math
import platform
from common.bus_call import bus_call
from common.FPS import GETFPS
import pyds


# Define variables to be used later
fps_streams={}

PGIE_CLASS_ID_VEHICLE = 0
PGIE_CLASS_ID_BICYCLE = 1
PGIE_CLASS_ID_PERSON = 2
PGIE_CLASS_ID_ROADSIGN = 3

MUXER_OUTPUT_WIDTH=1920
MUXER_OUTPUT_HEIGHT=1080

TILED_OUTPUT_WIDTH=1920
TILED_OUTPUT_HEIGHT=1080
OSD_PROCESS_MODE= 0
OSD_DISPLAY_TEXT= 0
pgie_classes_str= ["Vehicle", "TwoWheeler", "Person","RoadSign"]

################ Three Stream Pipeline ###########
# Define Input and output Stream information 
num_sources = 3 
INPUT_VIDEO_1 = '/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_720p.h264'
INPUT_VIDEO_2 = '/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_720p.h264'
INPUT_VIDEO_3 = '/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_720p.h264'
OUTPUT_VIDEO_NAME = "../source_code/N1/ds_out.mp4"

We define a function `make_elm_or_print_err()` to create our elements and report any errors if the creation fails.

Elements are created using the `Gst.ElementFactory.make()` function as part of Gstreamer library.

In [None]:
## Make Element or Print Error and any other detail
def make_elm_or_print_err(factoryname, name, printedname, detail=""):
  print("Creating", printedname)
  elm = Gst.ElementFactory.make(factoryname, name)
  if not elm:
     sys.stderr.write("Unable to create " + printedname + " \n")
  if detail:
     sys.stderr.write(detail)
  return elm

#### Initialise GStreamer and Create an Empty Pipeline

In [None]:
for i in range(0,num_sources):
        fps_streams["stream{0}".format(i)]=GETFPS(i)

# Standard GStreamer initialization
Gst.init(None)

# Create gstreamer elements */
# Create Pipeline element that will form a connection of other elements
print("Creating Pipeline \n ")
pipeline = Gst.Pipeline()

if not pipeline:
    sys.stderr.write(" Unable to create Pipeline \n")

#### Create Elements that are required for our pipeline

Compared to the first notebook , we use a lot of queues in this notebook to buffer data when it moves from one plugin to another.

In [None]:
########### Create Elements required for the Pipeline ########### 

######### Defining Stream 1 
# Source element for reading from the file
source1 = make_elm_or_print_err("filesrc", "file-source-1",'file-source-1')
# Since the data format in the input file is elementary h264 stream,we need a h264parser
h264parser1 = make_elm_or_print_err("h264parse", "h264-parser-1","h264-parser-1")
# Use nvdec_h264 for hardware accelerated decode on GPU
decoder1 = make_elm_or_print_err("nvv4l2decoder", "nvv4l2-decoder-1","nvv4l2-decoder-1")
   
##########

########## Defining Stream 2 
# Source element for reading from the file
source2 = make_elm_or_print_err("filesrc", "file-source-2","file-source-2")
# Since the data format in the input file is elementary h264 stream, we need a h264parser
h264parser2 = make_elm_or_print_err("h264parse", "h264-parser-2", "h264-parser-2")
# Use nvdec_h264 for hardware accelerated decode on GPU
decoder2 = make_elm_or_print_err("nvv4l2decoder", "nvv4l2-decoder-2","nvv4l2-decoder-2")
########### 

########## Defining Stream 3
# Source element for reading from the file
source3 = make_elm_or_print_err("filesrc", "file-source-3","file-source-3")
# Since the data format in the input file is elementary h264 stream, we need a h264parser
h264parser3 = make_elm_or_print_err("h264parse", "h264-parser-3", "h264-parser-3")
# Use nvdec_h264 for hardware accelerated decode on GPU
decoder3 = make_elm_or_print_err("nvv4l2decoder", "nvv4l2-decoder-3","nvv4l2-decoder-3")
########### 
    
# Create nvstreammux instance to form batches from one or more sources.
streammux = make_elm_or_print_err("nvstreammux", "Stream-muxer","Stream-muxer") 
# Use nvinfer to run inferencing on decoder's output, behaviour of inferencing is set through config file
pgie = make_elm_or_print_err("nvinfer", "primary-inference" ,"pgie")
# Use nvtracker to give objects unique-ids
tracker = make_elm_or_print_err("nvtracker", "tracker",'tracker')
# Seconday inference for Finding Car Color
sgie1 = make_elm_or_print_err("nvinfer", "secondary1-nvinference-engine",'sgie1')
# Seconday inference for Finding Car Make
sgie2 = make_elm_or_print_err("nvinfer", "secondary2-nvinference-engine",'sgie2')
# Seconday inference for Finding Car Type
sgie3 = make_elm_or_print_err("nvinfer", "secondary3-nvinference-engine",'sgie3')
# Creating Tiler to present more than one streams
tiler=make_elm_or_print_err("nvmultistreamtiler", "nvtiler","nvtiler")
# Use convertor to convert from NV12 to RGBA as required by nvosd
nvvidconv = make_elm_or_print_err("nvvideoconvert", "convertor","nvvidconv")
# Create OSD to draw on the converted RGBA buffer
nvosd = make_elm_or_print_err("nvdsosd", "onscreendisplay","nvosd")
# Use convertor to convert from NV12 to RGBA as required by nvosd
nvvidconv2 = make_elm_or_print_err("nvvideoconvert", "convertor2","nvvidconv2")
# Place an encoder instead of OSD to save as video file
encoder = make_elm_or_print_err("avenc_mpeg4", "encoder", "Encoder")
# Parse output from Encoder 
codeparser = make_elm_or_print_err("mpeg4videoparse", "mpeg4-parser", 'Code Parser')
# Create a container
container = make_elm_or_print_err("qtmux", "qtmux", "Container")
# Create Sink for storing the output 
sink = make_elm_or_print_err("filesink", "filesink", "Sink")

# # Create Sink for storing the output 
# fksink = make_elm_or_print_err("fakesink", "fakesink", "Sink")

Now that we have created the elements ,we can now set various properties for out pipeline at this point. 

In [None]:
############ Set properties for the Elements ############
# Set Input Video files 
source1.set_property('location', INPUT_VIDEO_1)
source2.set_property('location', INPUT_VIDEO_2)
source3.set_property('location', INPUT_VIDEO_3)
# Set Input Width , Height and Batch Size 
streammux.set_property('width', 1920)
streammux.set_property('height', 1080)
streammux.set_property('batch-size', 1)
# Timeout in microseconds to wait after the first buffer is available 
# to push the batch even if a complete batch is not formed.
streammux.set_property('batched-push-timeout', 4000000)
# Set configuration file for nvinfer 
# Set Congifuration file for nvinfer 
pgie.set_property('config-file-path', "../source_code/N1/dstest4_pgie_config.txt")
sgie1.set_property('config-file-path', "../source_code/N1/dstest4_sgie1_config.txt")
sgie2.set_property('config-file-path', "../source_code/N1/dstest4_sgie2_config.txt")
sgie3.set_property('config-file-path', "../source_code/N1/dstest4_sgie3_config.txt")
#Set properties of tracker from tracker_config
config = configparser.ConfigParser()
config.read('../source_code/N1/dstest4_tracker_config.txt')
config.sections()
for key in config['tracker']:
    if key == 'tracker-width' :
        tracker_width = config.getint('tracker', key)
        tracker.set_property('tracker-width', tracker_width)
    if key == 'tracker-height' :
        tracker_height = config.getint('tracker', key)
        tracker.set_property('tracker-height', tracker_height)
    if key == 'gpu-id' :
        tracker_gpu_id = config.getint('tracker', key)
        tracker.set_property('gpu_id', tracker_gpu_id)
    if key == 'll-lib-file' :
        tracker_ll_lib_file = config.get('tracker', key)
        tracker.set_property('ll-lib-file', tracker_ll_lib_file)
    if key == 'll-config-file' :
        tracker_ll_config_file = config.get('tracker', key)
        tracker.set_property('ll-config-file', tracker_ll_config_file)
    if key == 'enable-batch-process' :
        tracker_enable_batch_process = config.getint('tracker', key)
        tracker.set_property('enable_batch_process', tracker_enable_batch_process)
        
    
# Set display configurations for nvmultistreamtiler    
tiler_rows=int(2)
tiler_columns=int(2)
tiler.set_property("rows",tiler_rows)
tiler.set_property("columns",tiler_columns)
tiler.set_property("width", TILED_OUTPUT_WIDTH)
tiler.set_property("height", TILED_OUTPUT_HEIGHT)

# Set encoding properties and Sink configs
encoder.set_property("bitrate", 2000000)
sink.set_property("location", OUTPUT_VIDEO_NAME)
sink.set_property("sync", 0)
sink.set_property("async", 0)

We now link all the elements in the order we prefer and create Gstreamer bus to feed all messages through it. 

In [None]:
########## Add and Link ELements in the Pipeline ########## 

print("Adding elements to Pipeline \n")
pipeline.add(source1)
pipeline.add(h264parser1)
pipeline.add(decoder1)
pipeline.add(source2)
pipeline.add(h264parser2)
pipeline.add(decoder2)
pipeline.add(source3)
pipeline.add(h264parser3)
pipeline.add(decoder3)
pipeline.add(streammux)
pipeline.add(pgie)
pipeline.add(tracker)
pipeline.add(sgie1)
pipeline.add(sgie2)
pipeline.add(sgie3)
pipeline.add(tiler)
pipeline.add(nvvidconv)
pipeline.add(nvosd)
pipeline.add(nvvidconv2)
pipeline.add(encoder)
pipeline.add(codeparser)
pipeline.add(container)
pipeline.add(sink)


print("Linking elements in the Pipeline \n")

source1.link(h264parser1)
h264parser1.link(decoder1)


###### Create Sink pad and connect to decoder's source pad 
sinkpad1 = streammux.get_request_pad("sink_0")
if not sinkpad1:
    sys.stderr.write(" Unable to get the sink pad of streammux \n")
    
srcpad1 = decoder1.get_static_pad("src")
if not srcpad1:
    sys.stderr.write(" Unable to get source pad of decoder \n")
    
srcpad1.link(sinkpad1)

######

###### Create Sink pad and connect to decoder's source pad 
source2.link(h264parser2)
h264parser2.link(decoder2)

sinkpad2 = streammux.get_request_pad("sink_1")
if not sinkpad2:
    sys.stderr.write(" Unable to get the sink pad of streammux \n")
    
srcpad2 = decoder2.get_static_pad("src")
if not srcpad2:
    sys.stderr.write(" Unable to get source pad of decoder \n")
    
srcpad2.link(sinkpad2)

######

###### Create Sink pad and connect to decoder's source pad 
source3.link(h264parser3)
h264parser3.link(decoder3)

sinkpad3 = streammux.get_request_pad("sink_2")
if not sinkpad2:
    sys.stderr.write(" Unable to get the sink pad of streammux \n")
    
srcpad3 = decoder3.get_static_pad("src")
if not srcpad3:
    sys.stderr.write(" Unable to get source pad of decoder \n")
    
srcpad3.link(sinkpad3)

######


streammux.link(pgie)
pgie.link(tracker)
tracker.link(sgie1)
sgie1.link(sgie2)
sgie2.link(sgie3)
sgie3.link(tiler)
tiler.link(nvvidconv)
nvvidconv.link(nvosd)
nvosd.link(nvvidconv2)
nvvidconv2.link(encoder)
encoder.link(codeparser)
codeparser.link(container)
container.link(sink)

# create an event loop and feed gstreamer bus mesages to it
loop = GLib.MainLoop()
bus = pipeline.get_bus()
bus.add_signal_watch()
bus.connect ("message", bus_call, loop)

print("Added and Linked elements to pipeline")

Our pipeline now carries the metadata forward, but we have not done anything with it until now. And as mentioned in the above pipeline diagram, we will create a callback function to write relevant data on the frame once called and create a sink pad in the nvosd element to call the function.

In [None]:
# tiler_sink_pad_buffer_probe  will extract metadata received on OSD sink pad
# and update params for drawing rectangle, object information etc.
def tiler_src_pad_buffer_probe(pad,info,u_data):
    #Intiallizing object counter with 0.
    obj_counter = {
        PGIE_CLASS_ID_VEHICLE:0,
        PGIE_CLASS_ID_PERSON:0,
        PGIE_CLASS_ID_BICYCLE:0,
        PGIE_CLASS_ID_ROADSIGN:0
    }
    # Set frame_number & rectangles to draw as 0 
    frame_number=0
    num_rects=0
    
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        
        # Get frame number , number of rectables to draw and object metadata
        frame_number=frame_meta.frame_num
        num_rects = frame_meta.num_obj_meta
        l_obj=frame_meta.obj_meta_list
        
        while l_obj is not None:
            try:
                # Casting l_obj.data to pyds.NvDsObjectMeta
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break
            # Increment Object class by 1 and Set Box border to Red color     
            obj_counter[obj_meta.class_id] += 1
            obj_meta.rect_params.border_color.set(0.0, 0.0, 1.0, 0.0)
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        ################## Setting Metadata Display configruation ############### 
        # Acquiring a display meta object.
        display_meta=pyds.nvds_acquire_display_meta_from_pool(batch_meta)
        display_meta.num_labels = 1
        py_nvosd_text_params = display_meta.text_params[0]
        # Setting display text to be shown on screen
        py_nvosd_text_params.display_text = "Frame Number={} Number of Objects={} Vehicle_count={} Person_count={}".format(frame_number, num_rects, obj_counter[PGIE_CLASS_ID_VEHICLE], obj_counter[PGIE_CLASS_ID_PERSON])
        # Now set the offsets where the string should appear
        py_nvosd_text_params.x_offset = 10
        py_nvosd_text_params.y_offset = 12
        # Font , font-color and font-size
        py_nvosd_text_params.font_params.font_name = "Serif"
        py_nvosd_text_params.font_params.font_size = 10
        # Set(red, green, blue, alpha); Set to White
        py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)
        # Text background color
        py_nvosd_text_params.set_bg_clr = 1
        # Set(red, green, blue, alpha); set to Black
        py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)
        # Using pyds.get_string() to get display_text as string to print in notebook
        print(pyds.get_string(py_nvosd_text_params.display_text))
        pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)
        
        ############################################################################
         # FPS Probe      
        fps_streams["stream{0}".format(frame_meta.pad_index)].get_fps()
        try:
            l_frame=l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK


In [None]:
tiler_src_pad=sgie3.get_static_pad("src")
if not tiler_src_pad:
    sys.stderr.write(" Unable to get src pad \n")
else:
    tiler_src_pad.add_probe(Gst.PadProbeType.BUFFER, tiler_src_pad_buffer_probe, 0)

Now with everything defined , we can start the playback and listen to the events.

In [None]:
# List the sources
print("Now playing...")
print("Starting pipeline \n")
# start play back and listed to events		
pipeline.set_state(Gst.State.PLAYING)
start_time = time.time()
try:
    loop.run()
except:
    pass
# cleanup
print("Exiting app\n")
pipeline.set_state(Gst.State.NULL)
Gst.Object.unref(pipeline)
Gst.Object.unref(bus)
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
# Convert video profile to be compatible with Jupyter notebook
!ffmpeg -loglevel panic -y -an -i ../source_code/N1/ds_out.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 ../source_code/N1/output.mp4

In [None]:
# Display the Output
from IPython.display import HTML
HTML("""
 <video width="960" height="540" controls>
 <source src="../source_code/N1/output.mp4"
 </video>
""".format())

Let us now see how buffering can help us make the FPS higher by attaching the src_pad to a queue.

#### Queues 

The queue element adds a thread boundary to the pipeline and enables support for buffering. The input side will add buffers into a queue, which is then emptied on the output side from another thread via properties set on the queue element.

Let us now implement them in our pipeline and attach our callback function to the queue.

More details on the queues can be found from the GStreamer documentation [here](https://gstreamer.freedesktop.org/documentation/coreelements/queue.html)

In [None]:
!python3 ../source_code/utils/deepstream-osd-queue.py

### Effects on OSD, Tiler, and Queues

In the above case, OSD ( On-screen display and Tiling ) can slow down the pipeline. We can design our pipeline such that we get the Inference metadata without the need for visual outputs. This is particularly useful when using Edge devices that only need to send real-time inference metadata to the cloud server for further processing.

#### Disabling OSD & Tiler

We will now design a pipeline that doesn't include the on-screen display element and tiler element. For simplicity we have bundled all the code in one python file. Open the file [here](../source_code/utils/deepstream-no-osd.py).

In [None]:
!python3 ../source_code/utils/deepstream-no-osd.py

As you could observe in the total time printed we have improved the throughput of the pipeline using both Queues and removing OSD. Let us combine both methods and see if we can acheive any more performance gain. For simplicity we have bundled all the code in one python file. Open the file [here](.../source_code/utils/deepstream-no-osd-queue.py).

In [None]:
!python3 ../source_code/utils/deepstream-no-osd-queue.py --num-sources 3

## Summary 

Let us summarise our above benchmarks using a table.

|Pipeline|Relative Time(V100)|Relative Time(A100)|
|---|----|---|
|Default Pipeline|baseline|baseline|
|With Queues|~3x|~3.1x|
|Without OSD |~3.1x|10x|
|With Queues and without OSD|~3.15x|~10.12x|


We can now move on to benchmark our code further using NSight systems in the upcoming notebook.

## Licensing
  
This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).

&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&ensp;
[1]
[2](Performance_Analysis_using_NSight_systems.ipynb)
[3](Performance_Analysis_using_NSight_systems_Continued.ipynb)
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
[Next Notebook](Performance_Analysis_using_NSight_systems.ipynb)

&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&ensp;
[Home Page](Start_Here.ipynb)
    