<center><img src="images/DLI Header.png" alt="Header" width="400"></center>

# 1.0 Object Detection Application
In this notebook, you'll work with the `deepstream-test1` reference application to find objects in a video stream, annotate them with bounding boxes, and output the annotated stream along with a count of the objects found.

<img src="images/01_threethingsio.png">

You'll follow the steps below to build your own applications based on the reference app:

**[1.1 Build a Basic DeepStream Pipeline](#1.1-Build-a-Basic-DeepStream-Pipeline)**<br>
&nbsp; &nbsp; &nbsp; [1.1.1 Sample Application - `deepstream-test1`](#1.1.1-Sample-Application---deepstream-test1)<br>
&nbsp; &nbsp; &nbsp; [1.1.2 Sample Application Plus RTSP - `deepstream-test1-rtsp-out`](#1.1.2-Sample-Application-Plus-RTSP---deepstream-test1-rtsp-out)<br>
&nbsp; &nbsp; &nbsp; [1.1.3 Putting the Pipeline Together](#1.1.3-Putting-the-Pipeline-Together)<br>
&nbsp; &nbsp; &nbsp; [1.1.4 Exercise: Run the Base Application](#1.1.4-Exercise:-Run-the-Base-Application)<br>
**[1.2 Configure an Object Detection Model](#1.2-Configure-an-Object-Detection-Model)**<br>
&nbsp; &nbsp; &nbsp; [1.2.1 `Gst-nvinfer` Configuration File](#1.2.1-Gst-nvinfer-Configuration-File)<br>
&nbsp; &nbsp; &nbsp; [1.2.2 Exercise: Detect Only Two Object Types](#1.2.2-Exercise:-Detect-Only-Two-Object-Types)<br>

# 1.1 Build a Basic DeepStream Pipeline
The framework used to build a DeepStream application is a GStreamer **pipeline** consisting of a video input stream, a series of **elements** or **plugins** to process the stream, and an insightful output stream.  Plugins along the pipeline are sometimes referred to as **filters**, because they have both an input, also called the **sink**, and an output, called the **source**. The plugins at the start and end of the pipeline only have a source or sink and are referred to generally as source or sink plugins.

In the pipeline, the source **pad** of one plugin connects to the sink pad of the next in line.  The source includes data extracted from the processing, the **metadata**, which can be used for annotation of the video and other insights about the input stream. 

<img src="images/01_building_blocks.png">

## 1.1.1 Sample Application - `deepstream-test1`
The DeepStream SDK includes plugins for building a pipeline, and reference applications. For example, the `deepstream_test1` reference application can take a street scene video file as input, use object detection to find vehicles, people, bicycles, and road signs within the video, and output a video stream with bounding boxes around the objects found.

<img src="images/01_exampleio2.png">

The reference test applications are in the `sources` folder of the DeepStream SDK, which is located at `/opt/nvidia/deepstream/deepstream`.  This is linked in your workspace as simply `deepstream`. 

You can open and look at the Python code for the `deepstream-test1` app at [deepstream/sources/deepstream_python_apps/apps/deepstream-test1/deepstream_test_1.py](deepstream/sources/deepstream_python_apps/apps/deepstream-test1/deepstream_test_1.py)

Looking at the code, we can find where all the plugins are instantiated in `main()` definition using the `Gst.ElementFactory.make()` method.  This is a good way to see exactly which plugins are in the pipeline *(Note: the sample snippets shown are abbreviated code for clarity purposes)*:

```python
    # Create gstreamer elements
    # Create Pipeline element that will form a connection of other elements
    print("Creating Pipeline \n ")
    pipeline = Gst.Pipeline()

    # Source element for reading from the file
    print("Creating Source \n ")
    source = Gst.ElementFactory.make("filesrc", "file-source")

    # Since the data format in the input file is elementary h264 stream,
    # we need a h264parser
    print("Creating H264Parser \n")
    h264parser = Gst.ElementFactory.make("h264parse", "h264-parser")

    # Use nvdec_h264 for hardware accelerated decode on GPU
    print("Creating Decoder \n")
    decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")

    # Create nvstreammux instance to form batches from one or more sources.
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")

    # Use nvinfer to run inferencing on decoder's output,
    # behaviour of inferencing is set through config file
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")

    # Use convertor to convert from NV12 to RGBA as required by nvosd
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")

    # Create OSD to draw on the converted RGBA buffer
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")

    # Finally render the osd output
    if is_aarch64():
        transform = Gst.ElementFactory.make("nvegltransform", "nvegl-transform")

    sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
```

We see that the input is a file source, `filesrc`, in H.264 video format.  It is parsed (`h264parse`), decoded (`nvv4l2decoder`), batched (`nvstreammux`), and then run through the `nvinfer` inference engine to detect objects.  A buffer is created with `nvvideoconvert` so that bounding boxes can be overlaid on the video images with the `nvdsosd` plugin.  Finally, the output is rendered, in this case for a display using `nvegltransform` and `nvegglessink`.

## 1.1.2 Sample Application Plus RTSP - `deepstream-test1-rtsp-out`
For the purposes of this lab, which runs headless on a Jetson Nano connected to a laptop, the video stream must be converted to a format that can be transferred to the laptop media player.  This is accomplished with additional plugins and some logic for the rendering portion of the pipeline. Review the code in [deepstream/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out/deepstream_test1_rtsp_out.py](deepstream/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out/deepstream_test1_rtsp_out.py).<br><br>
Scrolling down to `main()`, we can see that there are a few differences in the rendering plugins after the OSD (On Screen Display) creation.  Instead of using the video sink, the stream is filtered, encoded, formatted for RTP payloads, and finally sinked as UDP packets:

```python

    # Create OSD to draw on the converted RGBA buffer
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    nvvidconv_postosd = Gst.ElementFactory.make("nvvideoconvert", "convertor_postosd")

    # Create a caps filter
    caps = Gst.ElementFactory.make("capsfilter", "filter")
    
    # Make the encoder
    if codec == "H264":
        encoder = Gst.ElementFactory.make("nvv4l2h264enc", "encoder")
    elif codec == "H265":
        encoder = Gst.ElementFactory.make("nvv4l2h265enc", "encoder")
    
    # Make the payload-encode video into RTP packets
    if codec == "H264":
        rtppay = Gst.ElementFactory.make("rtph264pay", "rtppay")
    elif codec == "H265":
        rtppay = Gst.ElementFactory.make("rtph265pay", "rtppay")

    # Make the UDP sink
    updsink_port_num = 5400
    sink = Gst.ElementFactory.make("udpsink", "udpsink")

```

## 1.1.3 Putting the Pipeline Together
The plugins are put in a pipeline with the `pipeline.add()` method:

```python
    pipeline.add(source)
    pipeline.add(h264parser)
    pipeline.add(decoder)
    pipeline.add(streammux)
    pipeline.add(pgie)
    pipeline.add(nvvidconv)
    pipeline.add(nvosd)
    pipeline.add(nvvidconv_postosd)
    pipeline.add(caps)
    pipeline.add(encoder)
    pipeline.add(rtppay)
    pipeline.add(sink)
```

Each plugin is then connected in order using its `.link()` method. Generally, this is as simple as 

```python
    source_plugin.link(sink_plugin)
```

However, when connecting a source to nvstreammux (`streammux` or "the muxer"), a new sink pad must be requested from the muxer, and that pad explicitly linked to the previous plugin's source pad.  In the code, there is a `srcpad` into `streammux` and a `srcpad` out of `decoder` defined.  These are then linked together directly. 

```python
    source.link(h264parser)
    h264parser.link(decoder)
    sinkpad = streammux.get_request_pad("sink_0")  
    srcpad = decoder.get_static_pad("src")
    srcpad.link(sinkpad)
    streammux.link(pgie)
    pgie.link(nvvidconv)
    nvvidconv.link(nvosd)
    nvosd.link(nvvidconv_postosd)
    nvvidconv_postosd.link(caps)
    caps.link(encoder)
    encoder.link(rtppay)
    rtppay.link(sink)
```

In summary, the pipeline for this app consists of the following plugins (ordered):

- `GstFileSrc` - reads the video data from file
- `GstH264Parse` - parses the incoming H264 stream
- `Gst-nvv4l2decoder` - hardware accelerated decoder; decodes video streams using NVDEC
- `Gst-nvstreammux` - batch video streams before sending for AI inference
- `Gst-nvinfer` - runs inference using TensorRT
- `Gst-nvvideoconvert` - performs video color format conversion (I420 to RGBA)
- `Gst-nvdsosd` - draw bounding boxes, text and region of interest (ROI) polygons
- `Gst-nvvideoconvert` - performs video color format conversion (RGBA to I420)
- `GstCapsFilter` - enforces limitations on data (no data modification)
- `Gst-nvv4l2h264enc` - encodes RAW data in I420 format to H264
- `GstRtpH264Pay` - converts H264 encoded Payload to RTP packets (RFC 3984)
- `GstUDPSink` - sends UDP packets to the network. When paired with RTP payloader (`Gst-rtph264pay`) it can implement RTP streaming

## 1.1.4 Exercise: Run the Base Application
In this exercise, we'll feed a simple video file through the pipeline and view the result using an RTSP stream.  

In the `deepstream-test1`/`deepstream-test1-rtsp-out` example app, object detection is performed on a per-frame basis. Counts for `Vehicle` and `Person` objects are also tracked.  Bounding boxes are drawn around the objects identified, and a counter display is overlaid in the upper left corner of the video. 

To begin, assign some user-friendly names to paths.  Next, list the available sample apps and video streams.

In [None]:
# Set some path locations for readability
PYTHON_APPS = '/opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/apps'
STREAMS = '/opt/nvidia/deepstream/deepstream/samples/streams'

In [None]:
# List the sample Python apps available
!ls $PYTHON_APPS

In [None]:
# List the sample video streams available
!ls $STREAMS

Before running the app, check out the script usage with the `--help` option.

In [None]:
# Check usage of the test1 app with the help option
!cd $PYTHON_APPS/deepstream-test1-rtsp-out \
    && python3 deepstream_test1_rtsp_out.py --help

#### Run the DeepStream app
If using VLC media player, open the app on your computer:
- Pull down the "Media" menu and select the "Open Network Stream" dialog.
- Set the URL to `rtsp://192.168.55.1:8554/ds-test`.
- Optionally, add a wait delay to VLC:
   - Click "Show more options" in the dialog.
   - Add ` :ipv4=120000` to the "Edit Options" line to add a 120 second delay.
- Start execution of the cell below.
- Click "Play" on your VLC media player *after* you start the cell execution.  

The stream will start from the Jetson Nano and display in the media player.  There is a delay while the model `.engine` file is built.  

If VLC fails, start it again. Close the VLC fail notice and press the "play" triangle.

In [None]:
# Run the app
!cd $PYTHON_APPS/deepstream-test1-rtsp-out \
    && python3 deepstream_test1_rtsp_out.py -i $STREAMS/sample_720p.h264

# 1.2 Configure an Object Detection Model

The sample application shows counts for two types of objects: `Vehicle` and `Person`.  This is specified in the display output line (line 101):

```python
py_nvosd_text_params.display_text = 
    "Frame Number={} Number of Objects={} Vehicle_count={} Person_count={}" \
    .format(frame_number, 
            num_rects, 
            obj_counter[PGIE_CLASS_ID_VEHICLE], 
            obj_counter[PGIE_CLASS_ID_PERSON])
```

However, the model used can actually detect four types of objects as revealed in the class ID assignments in the application script:

```python
PGIE_CLASS_ID_VEHICLE = 0
PGIE_CLASS_ID_BICYCLE = 1
PGIE_CLASS_ID_PERSON = 2
PGIE_CLASS_ID_ROADSIGN = 3
```

If you watch the application stream carefully, there is a bicycle in the early frames.  It is detected and boxed briefly.  The same is true for road signs.  You can see this when using a different stream such as the `sample_qHD.h264` sample.

## 1.2.1 `Gst-nvinfer` Configuration File
The classification labels (the types of objects detected) are specific to the model used for the inference, which in this case is a sample model provided with the DeepStream SDK.  The `Gst-nvinfer` plugin employs a configuration file to specify the model and various properties. Open the configuration file for the app we are using at [deepstream/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out/dstest1_pgie_config.txt](deepstream/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out/dstest1_pgie_config.txt).  The `Gst-nvinfer` configuration file uses a key file format, with details on key names found in the [DeepStream Developer Guide](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html#gst-nvinfer-file-configuration-specifications).
- The **\[property\]** group configures the general behavior of the plugin. It is the only mandatory group.
- The **\[class-attrs-all\]** group configures detection parameters for all classes.
- The **\[class-attrs-\<class-id\>\]** group configures detection parameters for a class specified by \<class-id\>. For example, the \[class-attrs-2\] group configures detection parameters for class ID 2\. This type of group has the same keys as \[class-attrs-all\]. 

For the most part, we can use the default values.  There are a few improvements we can make, however.

### [property] : `model-engine-file`
During the previous test run, you may have noticed that an error was produced:

```C
ERROR: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-test1-rtsp-out/../../../../samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine open error
```

This error indicates that the TensoRT optimized model engine is not present.  Since it is not there, an attempt is made to build it.  There is a problem though...

```c
WARNING: INT8 not supported by platform. Trying FP16 mode.
```

The Jetson Nano does not support INT8 mode, so an FP16 engine is built instead.  When it's complete, inference is run on the file stream and we see the result in the RTSP output.  Along the way, there is a notice that the engine was created and where it resides:

```c
serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_fp16.engine successfully
```

This engine now exists and can be reused, which saves a lot of time if you want to run the app again.  Unfortunately, since the configuration file specifies a different engine (the INT8 engine), the engine will be rebuilt anyway, which will cause an unnecessary delay!  

To reuse the engine just built, the configuration property, `model-engine-file`, must be set to the correct path.  The next cell provides a quick substitution fix.  Go ahead and execute it now:

In [None]:
# Change the engine to fp16
!sed -i 's/_int8.engine/_fp16.engine/g' $PYTHON_APPS/deepstream-test1-rtsp-out/dstest1_pgie_config.txt

### [class-attrs-\<all or class-id\>] : `pre-cluster-threshold`
The number of classes and the ordered `labels.txt` file path are specified in the \[property\] group along with the model engine. To configure which of these labels the object detector actually recognizes, we can change keys in the \[class-attrs-all\] and \[class-attrs-\<class-id\>\] groups.  The initial sample configuration file includes the following:
```c
[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=1
```

The `pre-cluster-threshold=0.2` key sets the detection confidence score. This tells us that all objects with a 20% confidence score or better will be marked as detected. If the threshold is greater than 1.0, then no objects will be detected, because a confidence of more than 100% would be required which is impossible!  

This "all" grouping is not granular enough if we only want to detect a subset of the objects possible, or if we wish to use a different confidence level with different objects.  For example, we might choose to detect only vehicles, or to identify people with a different confidence level than road signs.  To specify a threshold for the four individual objects available in this model, we can add a specific group to the config file for each class: 

- \[class-attrs-0\] for vehicles
- \[class-attrs-1\] for bicycles
- \[class-attrs-2\] for persons
- \[class-attrs-3\] for road signs

Then, in each group, we can specify the threshold value.  This can be used to determine object detection for each of the four object categories individually.

## 1.2.2 Exercise: Detect Only Two Object Types
Create a new app based on `deepstream-test1-rtsp_out` that detects *only* cars and bicycles. Begin by copying the existing app to a new workspace.

In [None]:
# Set up the workspace for my new Python apps
MY_APPS = '/opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/my_apps'
!cp -r $PYTHON_APPS/common $MY_APPS/

In [None]:
# Create a new app located at my_apps/dst1-two-objects 
#      based on deepstream-test1-rtsp_out
!mkdir -p $MY_APPS/dst1-two-objects
!cp -rfv $PYTHON_APPS/deepstream-test1-rtsp-out/* $MY_APPS/dst1-two-objects/

Using what you just learned, modify the [configuration file](deepstream/sources/deepstream_python_apps/my_apps/dst1-two-objects/dstest1_pgie_config.txt) in your new app to detect *only* cars and bicycles.  You will need to add *class-specific groups* for each of the four classes to the end of your configuration file.<br>
Class-specific example:
   ```
    # Per class configuration
    # car
    [class-attrs-0] 
    pre-cluster-threshold=0.2
   ```
Then, run the app to see if it worked!

In [None]:
# Run the app
!cd $MY_APPS/dst1-two-objects \
    && python3 deepstream_test1_rtsp_out.py -i $STREAMS/sample_720p.h264

#### How did you do?
If you see something like this image, with only bicycles and cars detected you did it!  If not, keep trying or take a peek at the [solution](solutions/ex1.2.2_DetectTwo/ex1.2.2_dstest1_pgie_config.txt) config file in the solutions directory. If you aren't satisfied with the detection of the bicycle, you can experiment with the confidence threshold value. <br>

<img src="images/01_bikes_and_cars.png">

<h2 style="color:green;">Congratulations!</h2>

You've run your first DeepStream sample app and created a new DeepStream app to detect different objects in a scene.<br>
Move on to [2.0 Analysis with Metadata](02_Metadata.ipynb) to expand your video analysis.

<center><img src="images/DLI Header.png" alt="Header" width="400"></center>