&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&ensp;
[Home Page](Start_Here.ipynb)
    
    
[Previous Notebook](Introduction_to_Performance_analysis.ipynb)
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&ensp;
[1](Introduction_to_Performance_analysis.ipynb)
[2](Performance_Analysis_using_NSight_systems.ipynb)
[3]
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;


In the previous notebooks, we have optimized the Multi-stream version of DeepStream Test App 2.In this notebook, we will work on a different pipeline to optimize it further.


- [Case 2:COVID-19 Social Distancing Application.](#Case-2:-COVID-19-Social-Distancing-Application.)
    - [Finding distance between 2 people](#Finding-distance-between-2-people)
- [Solving the computational bottleneck](#Solving-the-computational-bottleneck)
- [Jetson specific optimizations](#Jetson-specific-optimizations)
- [Summary](#Summary)
    

#### Case 2: COVID-19 Social Distancing Application.


The COVID-19 Social Distance application can be constructed from `deepstream-test-app1` by adding suitable Metadata processing to determine whether two people have come in close contact and violated the social distancing norms. 

![image](images/covid.png)

##### Finding distance between 2 people

As we view people from a camera, it is necessary to have a perspective correction as far people would look smaller and appear less distant in pixel space. So, it's important to approximate world-space distance between persons.


We **assume** that avg human is of height 170 cm. We normalize bounding box height with this value. This normalized value is then further used to normalize pixel-space distance between objects.
We define the distance between two persons given their BBOX centroid(x,y) and BBOX height (h) as follows.


```python
    # Pixel distance
    dx = x2 - x1;
    dy = y2 - y1;
    # Pixel to real-world conversion using avg height as 170cm
    lx = dx * 170 * (1/h1 + 1/h2) / 2;
    ly = dy * 170 * (1/h1 + 1/h2) / 2;
    l = sqrt(lx*lx + ly*ly);
```

Limitations: Above method tries to approximate 3d distance between persons from a 2d camera. As expected, this has limitations and works best only if the persons' body is perpendicular to the camera. These limitations can be removed if we use multiple cameras and camera calibration data to approximate persons' 3d location.

Let us now start building our pipeline considering this assumtion.

In [None]:
# Import Required Libraries 
import sys
sys.path.append('../source_code')

import gi
import time
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst , GLib
from common.bus_call import bus_call
import pyds
import math

# Defining the Class Labels
PGIE_CLASS_ID_VEHICLE = 0
PGIE_CLASS_ID_BICYCLE = 1
PGIE_CLASS_ID_PERSON = 2
PGIE_CLASS_ID_ROADSIGN = 3

# Defining the input output video file 
INPUT_VIDEO_NAME  = 'file:///opt/nvidia/deepstream/deepstream-6.0/python/source_code/dataset/wt.mp4'
OUTPUT_VIDEO_NAME = "../source_code/N3/ds_out.mp4"

In [None]:
def cb_newpad(decodebin, decoder_src_pad,data):
    print("In cb_newpad\n")
    caps=decoder_src_pad.get_current_caps()
    gststruct=caps.get_structure(0)
    gstname=gststruct.get_name()
    source_bin=data
    features=caps.get_features(0)

    # Need to check if the pad created by the decodebin is for video and not
    # audio.
    print("gstname=",gstname)
    if(gstname.find("video")!=-1):
        # Link the decodebin pad only if decodebin has picked nvidia
        # decoder plugin nvdec_*. We do this by checking if the pad caps contain
        # NVMM memory features.
        print("features=",features)
        if features.contains("memory:NVMM"):
            # Get the source bin ghost pad
            bin_ghost_pad=source_bin.get_static_pad("src")
            if not bin_ghost_pad.set_target(decoder_src_pad):
                sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")
        else:
            sys.stderr.write(" Error: Decodebin did not pick nvidia decoder plugin.\n")

def decodebin_child_added(child_proxy,Object,name,user_data):
    print("Decodebin child added:", name, "\n")
    if(name.find("decodebin") != -1):
        Object.connect("child-added",decodebin_child_added,user_data)   

def create_source_bin(index,uri):
    print("Creating source bin")

    # Create a source GstBin to abstract this bin's content from the rest of the
    # pipeline
    bin_name="source-bin-%02d" %index
    print(bin_name)
    nbin=Gst.Bin.new(bin_name)
    if not nbin:
        sys.stderr.write(" Unable to create source bin \n")

    # Source element for reading from the uri.
    # We will use decodebin and let it figure out the container format of the
    # stream and the codec and plug the appropriate demux and decode plugins.
    uri_decode_bin=Gst.ElementFactory.make("uridecodebin", "uri-decode-bin")
    if not uri_decode_bin:
        sys.stderr.write(" Unable to create uri decode bin \n")
    # We set the input uri to the source element
    uri_decode_bin.set_property("uri",uri)
    # Connect to the "pad-added" signal of the decodebin which generates a
    # callback once a new pad for raw data has beed created by the decodebin
    uri_decode_bin.connect("pad-added",cb_newpad,nbin)
    uri_decode_bin.connect("child-added",decodebin_child_added,nbin)

    # We need to create a ghost pad for the source bin which will act as a proxy
    # for the video decoder src pad. The ghost pad will not have a target right
    # now. Once the decode bin creates the video decoder and generates the
    # cb_newpad callback, we will set the ghost pad target to the video decoder
    # src pad.
    Gst.Bin.add(nbin,uri_decode_bin)
    bin_pad=nbin.add_pad(Gst.GhostPad.new_no_target("src",Gst.PadDirection.SRC))
    if not bin_pad:
        sys.stderr.write(" Failed to add ghost pad in source bin \n")
        return None
    return nbin

## Make Element or Print Error and any other detail
def make_elm_or_print_err(factoryname, name, printedname, detail=""):
  print("Creating", printedname)
  elm = Gst.ElementFactory.make(factoryname, name)
  if not elm:
     sys.stderr.write("Unable to create " + printedname + " \n")
  if detail:
     sys.stderr.write(detail)
  return elm

In [None]:
# Standard GStreamer initialization
Gst.init(None)


# Create gstreamer elements
# Create Pipeline element that will form a connection of other elements
print("Creating Pipeline \n ")
pipeline = Gst.Pipeline()

if not pipeline:
    sys.stderr.write(" Unable to create Pipeline \n")

In [None]:
########### Create Elements required for the Pipeline ########### 

# Create nvstreammux instance to form batches from one or more sources.
streammux = make_elm_or_print_err("nvstreammux", "Stream-muxer","Stream-muxer") 

pipeline.add(streammux)

num_sources = 1 
for i in range(num_sources):
    print("Creating source_bin ",i," \n ")
    uri_name=INPUT_VIDEO_NAME
    if uri_name.find("rtsp://") == 0 :
        is_live = True
    source_bin=create_source_bin(i, uri_name)
    if not source_bin:
        sys.stderr.write("Unable to create source bin \n")
    pipeline.add(source_bin)
    padname="sink_%u" %i
    sinkpad = streammux.get_request_pad(padname) 
    if not sinkpad:
        sys.stderr.write("Unable to create sink pad bin \n")
    srcpad = source_bin.get_static_pad("src")
    if not srcpad:
        sys.stderr.write("Unable to create src pad bin \n")
    srcpad.link(sinkpad)


# Use nvinfer to run inferencing on decoder's output, behaviour of inferencing is set through config file
pgie = make_elm_or_print_err("nvinfer", "primary-inference" ,"pgie")
# Use convertor to convert from NV12 to RGBA as required by nvosd
nvvidconv = make_elm_or_print_err("nvvideoconvert", "convertor","nvvidconv")
# Create OSD to draw on the converted RGBA buffer
nvosd = make_elm_or_print_err("nvdsosd", "onscreendisplay","nvosd")
# Finally encode and save the osd output
queue = make_elm_or_print_err("queue", "queue", "Queue")
# Use convertor to convert from NV12 to RGBA as required by nvosd
nvvidconv2 = make_elm_or_print_err("nvvideoconvert", "convertor2","nvvidconv2")
# Place an encoder instead of OSD to save as video file
encoder = make_elm_or_print_err("avenc_mpeg4", "encoder", "Encoder")
# Parse output from Encoder 
codeparser = make_elm_or_print_err("mpeg4videoparse", "mpeg4-parser", 'Code Parser')
# Create a container
container = make_elm_or_print_err("qtmux", "qtmux", "Container")
# Create Sink for storing the output 
sink = make_elm_or_print_err("filesink", "filesink", "Sink")

In [None]:
############ Set properties for the Elements ############
print("Playing file ",INPUT_VIDEO_NAME)
# Set Input Width , Height and Batch Size 
streammux.set_property('width', 1920)
streammux.set_property('height', 1080)
streammux.set_property('batch-size', 1)
# Timeout in microseconds to wait after the first buffer is available 
# to push the batch even if a complete batch is not formed.
streammux.set_property('batched-push-timeout', 4000000)
# Set Congifuration file for nvinfer 
pgie.set_property('config-file-path', "../source_code/N3/dstest1_pgie_config.txt")
# Set Encoder bitrate for output video
encoder.set_property("bitrate", 2000000)
# Set Output file name and disable sync and async
sink.set_property("location", OUTPUT_VIDEO_NAME)
sink.set_property("sync", 0)
sink.set_property("async", 0)

In [None]:
########## Add and Link ELements in the Pipeline ########## 

print("Adding elements to Pipeline \n")

pipeline.add(pgie)
pipeline.add(nvvidconv)
pipeline.add(nvosd)
pipeline.add(queue)
pipeline.add(nvvidconv2)
pipeline.add(encoder)
pipeline.add(codeparser)
pipeline.add(container)
pipeline.add(sink)


# Linking elements to the Pipeline
print("Linking elements to Pipeline \n")

streammux.link(pgie)
pgie.link(nvvidconv)
nvvidconv.link(nvosd)
nvosd.link(queue)
queue.link(nvvidconv2)
nvvidconv2.link(encoder)
encoder.link(codeparser)
codeparser.link(container)
container.link(sink)

# create an event loop and feed gstreamer bus mesages to it
loop = GLib.MainLoop()
bus = pipeline.get_bus()
bus.add_signal_watch()
bus.connect ("message", bus_call, loop)

print("Created event loop")

In [None]:
############# Define Computation required for our pipeline #################

def compute_dist(p1, p2):
    
    (x1, y1, h1) = p1;
    (x2, y2, h2) = p2;
    dx = x2 - x1;
    dy = y2 - y1;

    lx = dx * 170 * (1/h1 + 1/h2) / 2;
    ly = dy * 170 * (1/h1 + 1/h2) / 2;

    l = math.sqrt(lx*lx + ly*ly);
    return l


def get_min_distances(centroids):
    mini=[]
    for i in range(len(centroids)):
        distance=[]
        for j in range(len(centroids)):
            distance.append(compute_dist(centroids[i],centroids[j]))
        distance[i]=10000000
        mini.append(min(distance))
    return mini


def visualize(objs):
    violations = 0 
    dist_threshold = 160  # Distance in cms
    for obj in objs:
        min_dist = obj["min_dist"]
        redness_factor = 1.5
        r_channel = max(255 * (dist_threshold - min_dist) / dist_threshold, 0) * redness_factor
        g_channel = 255 - r_channel
        b_channel = 0
        obj_meta = obj["obj_meta"]
        obj_meta.rect_params.border_color.red = r_channel
        obj_meta.rect_params.border_color.green = g_channel
        obj_meta.rect_params.border_color.blue = b_channel
        obj["violated"] = (min_dist < dist_threshold)
        violations = violations + int(min_dist < dist_threshold)
    return violations

def get_centroid(rect):

    xmin = rect.left
    xmax = rect.left + rect.width
    ymin = rect.top
    ymax = rect.top + rect.height
    centroid_x = (xmin + xmax) / 2
    centroid_y = (ymin + ymax) / 2

    return (centroid_x, centroid_y, rect.height)

def compute_min_distances_cpp(objs):
    centroids = [o["centroid"] for o in objs]    
    min_distances = get_min_distances(centroids)
    for o in range(len(objs)):
        objs[o]["min_dist"] = min_distances[o]



############## Working with the Metadata ################

def osd_sink_pad_buffer_probe(pad,info,u_data):
    #Intiallizing object counter with 0.
    obj_counter = {
        PGIE_CLASS_ID_VEHICLE:0,
        PGIE_CLASS_ID_PERSON:0,
        PGIE_CLASS_ID_BICYCLE:0,
        PGIE_CLASS_ID_ROADSIGN:0
    }
    # Set frame_number & rectangles to draw as 0 
    frame_number=0
    num_rects=0
    
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        objects=[]
        # Get frame number , number of rectables to draw and object metadata
        frame_number=frame_meta.frame_num
        num_rects = frame_meta.num_obj_meta
        l_obj=frame_meta.obj_meta_list
        
        while l_obj is not None:
            try:
                # Casting l_obj.data to pyds.NvDsObjectMeta
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break
            # Increment Object class by 1 and Set Box border to Red color     
            obj_counter[obj_meta.class_id] +=1
            obj_meta.rect_params.border_color.set(0.0, 0.0, 1.0, 0.0)
            
            if (obj_meta.class_id == PGIE_CLASS_ID_PERSON):
                obj = {}
                obj["tracker_id"] = obj_meta.object_id
                obj["unique_id"] = obj_meta.unique_component_id
                obj["centroid"] = get_centroid(obj_meta.rect_params)
                obj["obj_meta"] = obj_meta
                objects.append(obj)
            else:
                obj_meta.rect_params.border_width = 0

            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        # Get the number of violations 
        compute_min_distances_cpp(objects)
        violations = visualize(objects)
        ################## Setting Metadata Display configruation ############### 
        # Acquiring a display meta object.
        display_meta=pyds.nvds_acquire_display_meta_from_pool(batch_meta)
        display_meta.num_labels = 1
        py_nvosd_text_params = display_meta.text_params[0]
        # Setting display text to be shown on screen
        py_nvosd_text_params.display_text = "Frame Number={} Number of Objects={} Vehicle_count={} Person_count={} Violations={}".format(frame_number, num_rects, obj_counter[PGIE_CLASS_ID_VEHICLE], obj_counter[PGIE_CLASS_ID_PERSON],violations)
        # Now set the offsets where the string should appear
        py_nvosd_text_params.x_offset = 10
        py_nvosd_text_params.y_offset = 12
        # Font , font-color and font-size
        py_nvosd_text_params.font_params.font_name = "Serif"
        py_nvosd_text_params.font_params.font_size = 10
        # Set(red, green, blue, alpha); Set to White
        py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)
        # Text background color
        py_nvosd_text_params.set_bg_clr = 1
        # Set(red, green, blue, alpha); set to Black
        py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)
        # Using pyds.get_string() to get display_text as string to print in notebook
        print(pyds.get_string(py_nvosd_text_params.display_text))
        pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)
        
        ############################################################################

        try:
            l_frame=l_frame.next
        except StopIteration:
            break
    return Gst.PadProbeReturn.OK

In [None]:
# Lets add probe to get informed of the meta data generated, we add probe to the sink pad  
# of the osd element, since by that time, the buffer would have had got all the metadata.

osdsinkpad = nvosd.get_static_pad("sink")
if not osdsinkpad:
    sys.stderr.write(" Unable to get sink pad of nvosd \n")
    
osdsinkpad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe, 0)

print("Probe added")

In [None]:
# start play back and listen to events
print("Starting pipeline \n")
start_time = time.time()
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass
# cleanup
pipeline.set_state(Gst.State.NULL)
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
# Convert video profile to be compatible with Jupyter notebook
!ffmpeg -loglevel panic -y -an -i ../source_code/N3/ds_out.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 ../source_code/N3/output.mp4

In [None]:
# Display the Output
from IPython.display import HTML
HTML("""
 <video width="640" height="480" controls>
 <source src="../source_code/N3/output.mp4"
 </video>
""".format())

Let us now run multiple streams concurrently and benchmark the performance we obtain from this.

In [None]:
!python3 ../source_code/utils/deepstream-covid-19.py --num-sources 32

In [None]:
!nsys profile --force-overwrite true -o ../source_code/reports/report4 python3 ../source_code/utils/deepstream-covid-19.py --num-sources 32 --prof True

Download and save the report file by holding down <mark>Shift</mark> and <mark>Right-Clicking</mark> [Here](../source_code/reports/report4.qdrep)

![report4](images/report4.PNG)

#### Solving the computational bottleneck

Here we can notice that the bottleneck is now shifted to the NV Decode. For Hardware capable of decoding multiple inputs concurrently, such as the A100, NV Decode would not be a bottleneck. We can see that the `queue3`, which works on the computation of the distance between people, would become the bottleneck as it takes a long time to execute. (In this case ~48 ms),in such cases, we can reduce the computation time to C++ or CUDA to make the computation faster. Here is one such example where we use C++ to run the distancing algorithm.

In [None]:
! cd ../source_code/distancing && cmake . && make 

In [None]:
!python3 ../source_code/utils/deepstream-covid-19-cpp.py --num-sources 32

In [None]:
!nsys profile --force-overwrite true -o ../source_code/reports/report5 python3 ../source_code/utils/deepstream-covid-19-cpp.py --num-sources 32 --prof True

Download and save the report file by holding down <mark>Shift</mark> and <mark>Right-Clicking</mark> [Here](../source_code/reports/report5.qdrep)

![report5](images/report5.PNG)

Here we can notice that there has been a reduction in computation time when we shift it from Python to C++ (~48 ms to ~32ms), this can further be optimized in this case and even can be extended to CUDA if necessary.



### Jetson specific optimizations

#### Power 

For Jetson devices, it is recommended to be set the mode to use Max power. 

The Max power mode can be enabled using the following command 
```bash
$ sudo nvpmodel -m 0
```
The GPU clocks can be stepped to maximum using the following command 

```bash
$ sudo jetson_clocks
```

For information about supported power modes, see "Supported Modes and Power Efficiency" in the power management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., "Power Management for Jetson AGX Xavier Devices."


For Jetson devices, the details regarding the memory and compute usage can be queried using the following command.

```bash
$ tegrastats
```

This command cannot be run inside the Docker container and needs to be run in a separate terminal.

#### Deep Learning Accelerators

Jetson AGX Xavier and Jetson NX supports 2 DLA engines. DeepStream does support inferencing using GPU and DLAs in parallel. You can do this in separate processes or a single process. You will need three separate sets of configs, configured to run on GPU, DLA0, and DLA1.

More details on this can be found [here](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html#running-applications-using-dla) 

### Summary

In this notebook we learnt some techniques to optimize the Deepstream application and deal with computational bottlenecks that a may user can encounter and discussed one such way of solving them.

## Licensing
  
This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).

[Previous Notebook](Introduction_to_Performance_analysis.ipynb)
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&ensp;
[1](Introduction_to_Performance_analysis.ipynb)
[2](Performance_Analysis_using_NSight_systems.ipynb)
[3]
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;

&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&ensp;
[Home Page](Start_Here.ipynb)
    