# Object Classification and Localization 
Final Project - BIOF509, Thomas May

A real time machine learning based automated object detection and tracking capability is developed along with a graphical user interface for the Tello unmanned aerial vehicle. A Single Shot Detector (SSD) combined with the Mobilenet V2 Convolutional Neural Network pretrained on the COCO dataset is used as the inference engine. The SDD is optimized to run on the NVidia Jetson nano hardware platform using Cuda optimized TensorFlow Run Time engine (TensorRT)

(see also demo.mov and project presentation BIOF509_Presentation.pptx)

## Project Source Code
The project is didvide into the following sections:
- TelloGUI_AI.ipynb - Launch point for the application. Provides Tello GUI, GUI bindings, Instanciates all classes
- tello.py - Interface class for the Tello drone
- stream_camera.py - Interface class for the Tello live video feed camera
- ml_process.py - Machine learning interface class. Runs SDD-DNN, generates output images, provides command/control for drone
    
Source files from NVidia
- ssd_mobilenet_v2_v04_coco.engine - NVidia nano optiized SDD-DNN network provided by NVidia
- object_detection.py - utilities and interface class for TFTModel class
- tensor_model.py - utilities and inferface class for TensorRT engines

## Data Source
Imagery data from the aircraft is transmitted as a raw H.264 UDP data stream over WiFi. UDP is a connectionless protocol and does not guarantee packet delivery. The ml system must be able to tolerate real-time input data corruption / loss.


## Sample of a raw H.264 data frame.


In [None]:
(True,
 array([[[ 39,  73,  80],
         [ 38,  72,  79],
         [ 40,  67,  75],
         ...,
         [ 19,  38,  39],
         [  0,  39,  37],
         [  0,  39,  37]],
  
        ...,
 
        [[155, 176, 183],
         [152, 173, 180],
         [146, 167, 174],
         ...,
         [131, 157, 164],
         [130, 155, 162],
         [130, 155, 162]]], dtype=uint8))

## Raw input video format and conversion
The video data frame is 1280 x 720 x 3 in BGR format. The frame is reformatted to 3, 300, 300 RGB and normalized to (-1 to 1) to match the requirements of the pretrained SSD.

## References

Single Shot MultiBox Detector - https://arxiv.org/abs/1512.02325

Resnet18/50 vs Alexnet - https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96

Jetson Nano - https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/

Understanding SSD-MultiBox - https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab

CCOC Dataset - http://cocodataset.org/#home

CoCo Model Data Set List - https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_complete_label_map.pbtxt

Non-Max Suppression Overview - https://www.coursera.org/lecture/convolutional-neural-networks/non-max-suppression-dvrjH

Scaling video in OpenCV - https://www.codingforentrepreneurs.com/blog/open-cv-python-change-video-resolution-or-scale

OpenCV VideoCapture() examples - https://www.programcreek.com/python/example/85663/cv2.VideoCapture

OpenCV imencode examples - https://www.programcreek.com/python/example/70396/cv2.imencode

Unofficial Tello Command and Video Protocols - https://gobot.io/blog/2018/04/20/hello-tello-hacking-drones-with-go/

Tello Vdeo Stream Information - https://tellopilots.com/threads/sdk-streamon-format.2809/