# DAC Contest
This reference design will help you walk through a design flow of DAC SDC 2023. This is a simplified design to help users get started on the FPGA platform and to understand the overall flow. It does not contain any object detection hardware.

If you have any questions, please post on the Slack page (link on SDC website sidebar).

### Hardware

### Software
Note:
  * You will not submit your `dac_sdc.py` file, so any changes you make to this file will not be considered during evluation.  
  * You can use both PS and PL side to do inference.

### Object Detection

Object detection will be done on images in batches:
  * You will provide a Python callback function that will perform object detection on batch of images.  This callback function wile be called many times.
  * The callback function should return the locations of all images in the batch.
  * Runtime will be recorded during your callback function.
  * Images will be loaded from SD card before each batch is run, and this does not count toward your energy usage or runtime.
  
### Notebook
Your notebook should contain 4 code cells:

1. Importing all libraries and creating your Team object.
1. Downloading the overlay, compile the code, and performany any one-time configuration.
1. Python callback function and any other Python helper functions.
1. Running object detection
1. Cleanup



# 0. Installation

## 0.1 Packages

We recommend creating a seperate conda environment to run the notebook. You can do so with:

```bash
conda create --name dac python=3.6.9
```

Install jupyter notebook:

```bash
pip install notebook
```

As given by the contest organizers, the following dependencies should already be satisfied:

Your team is responsible to make sure the correct packages are installed. For the contest environment, use the configuration below provided by Nvidia:
- [JetPack 4.6.1](https://developer.nvidia.com/embedded/jetpack-sdk-461)
    - Ubuntu 18.04
    - CUDA 10.2
    - cuDNN 8.2.1
    - gcc 7.5.0
    - python 3.6.9
    - TensorRT 8.2.1
    
If you are using an environment, you might need to add the tensorrt library like this (given that you use archiconda):

```bash
cp -r /usr/lib/python${PYTHON_VERSION}/dist-packages/tensorrt* ~/archiconda3/envs/dac/lib/python${PYTHON_VERSION}/site-packages/
```
    
We additionally require the following modules:
- numpy 1.19.5
- Pillow 8.4.0
- matplotlib 3.3.4
- opencv-python 4.7.0.72
- pycuda 2020.1



In [None]:
#Uncomment these lines for installation
'''
!pip install numpy==1.19.5
!pip install Pillow==8.4.0
!pip install matplotlib==3.3.4
!pip install opencv-python==4.7.0.72
!pip install pycuda==2020.1
'''

## 0.2 Creating trt file
As we only have access to a Nvidia Jetson Nano 2GB, but the contest will be evaluated on a Jetson Nano 4GB, the trt file should be regenerated to consider the larger RAM capacity when generating the trt file.

In case you have not yet generated the binary for the trtexec, please do so with the following commands


``` bash
cd /usr/src/tensorrt/samples/trtexec
make
```

You can then generate the new trt file with
``` bash
/usr/src/tensorrt/bin/trtexec --onnx=norm_simple.onnx --saveEngine=norm_simple.engine --shapes=input:1x3x352x640
```

Uncomment the following lines to generate the TensorRT Binary (trtexec) and convert the file to tensorrt format

In [None]:
!cd /usr/src/tensorrt/samples/trtexec && make

In [None]:
!/usr/src/tensorrt/bin/trtexec --onnx=simple_final.onnx --saveEngine=simple_final.engine --verbose

## 1. Imports and Create Team

In [None]:
import sys
import os

sys.path.append(os.path.abspath("../common"))

import math
import time
import numpy as np
from PIL import Image
from matplotlib import pyplot
import cv2
from datetime import datetime

import dac_sdc
import inference
from IPython.display import display

team_name = 'CapyNet'
dac_sdc.BATCH_SIZE = 1
team = dac_sdc.Team(team_name)

PATH_TO_ENGINE = 'simple_final.engine'
TEST_FILE = '../images_test/00001.jpg'

In [None]:
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt

#import matplotlib.pyplot as plt
#from PIL import Image
import time

In [None]:
#Init TRT Logger
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

# Load the TRT Plugin
trt.init_libnvinfer_plugins(None, '')
with open(PATH_TO_ENGINE, 'rb') as f:
    engine = trt.Runtime(TRT_LOGGER).deserialize_cuda_engine(f.read())

## 3. Python Callback Function and Helper Functions


### Pushing the picture through the pipeline
In this example, we use contiguous memory arrays for sending and receiving data via DMA.

The size of the buffer depends on the size of the input or output data.  The example images are 640x360 (same size as training and test data), and we will use `pynq.allocate` to allocate contiguous memory.

### Callback function
The callback function:
  - Will be called on each batch of images (will be called many times)
  - Is prvided with a list of tuples of (image path, RGB image)
  - It should return a dictionary with an entry for each image:
    - Key: Image name (`img_path.name`)
    - Value: Dictionary of item type and bounding box (keys: `type`, `x`, `y`, `width`, `height`)

See the code below for an example:


**Your team directory where you can access your notebook, and any other files you submit, is available as `team.team_dir`.**

In [None]:
start = time.time()
inputs, outputs, bindings, stream = inference.allocate_buffers(engine)
print("Allocate Buffer: {} sec".format(time.time()-start))

start = time.time()
context = engine.create_execution_context()
print("Create Exec: {} sec".format(time.time()-start))

### Run a test inference

In [None]:
#Read from a test file
start = time.time()
img = cv2.imread(TEST_FILE)
print("Img Load: {} sec".format(time.time()-start))

#Get scaling factors
x_scale_factor = img.shape[2] / 640
y_scale_factor = img.shape[1] / 352

# Resize the image (this is part of your runtime)
start = time.time()
img = cv2.resize(img, (640, 352), interpolation=cv2.INTER_LINEAR)
print("Resize: {} sec".format(time.time()-start))

start = time.time()
img = img.transpose(2, 0, 1)
print("Preprocess: {} sec".format(time.time()-start))

start = time.time()
input_buffer = np.ascontiguousarray(img).reshape(-1)
np.copyto(inputs[0].host, input_buffer)
print("Mem: {} sec".format(time.time()-start))

#do Inference
start = time.time()
output = inference.do_inference_v2(context,
                                   bindings=bindings,
                                   inputs=inputs,
                                   outputs=outputs,
                                   stream=stream)
print("Inference: {} sec".format(time.time()-start))

In [None]:
def my_callback(rgb_imgs: list):
    """Callback function for inference.
    Args:
        rgb_imgs (list): List of tuples contains Image Paths and Images
    """
    object_locations_by_image = {}
    
    #image format HWC
    for (img_path, img) in rgb_imgs:
        
        object_locations = []
        
        x_scale_factor = img.shape[1] / 640
        y_scale_factor = img.shape[0] / 352
        
        img = cv2.resize(img, (640, 352), interpolation=cv2.INTER_LINEAR)
        img = img.transpose(2, 0, 1)

        input_buffer = np.ascontiguousarray(img).reshape(-1)
        np.copyto(inputs[0].host, input_buffer)
        
        output = inference.do_inference_v2(context,
                                           bindings=bindings,
                                           inputs=inputs,
                                           outputs=outputs,
                                           stream=stream)
        # Suppress predictions with low confidence scores
        thr = 0.4
        preds = output[0].reshape(100, 5)
        conf_mask = preds[:, 4] > thr
        
        preds = preds[conf_mask]
        labels = output[1][conf_mask]
        
        for pred in range(len(preds)):
            x1 = int(preds[pred][0] * x_scale_factor)
            y1 = int(preds[pred][1] * y_scale_factor)
            x2 = int(preds[pred][2] * x_scale_factor)
            y2 = int(preds[pred][3] * y_scale_factor)
            object_locations.append({"type": int(labels[pred]) + 1, "x": x1, "y": y1, "width": x2 - x1, "height": y2 - y1})
        # Save to dictionary by image filename
        object_locations_by_image[img_path.name] = object_locations

    return object_locations_by_image


## 4. Running Object Detection

Call the following function to run the object detection.  Extra debug output is enabled when `debug` is `True`.

In [None]:
team.run(my_callback, debug=False)

In [None]:
#!python3 ../scripts/score.py ../CapyNet/ ../data/dac/train/label/