# Getting Started

## 1. Find out Python Libraries for Object Detection on Live Video Stream

Objects that must be identifiable:
* Book
* Laptop


### COCO dataset (Common Objects in COntext)

COCO is a large-scale object detection, segmentation, and captioning dataset from Microsoft.
http://cocodataset.org/#home

COCO has several features:

* Object segmentation
* Recognition in context
* Superpixel stuff segmentation
* 330K images (>200K labeled)
* 1.5 million object instances
* 80 object categories
* 91 stuff categories
* 5 captions per image
* 250,000 people with keypoints

Here are the 80 objetct included in the dataset.
Books and Laptops are among them!

````
person
bicycle
car
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
bed
dining table
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
````



### ImageAI
A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. 
https://github.com/OlafenwaMoses/ImageAI

Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. 
Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

#### Object Detection On Live Video Stream
https://github.com/OlafenwaMoses/ImageAI/blob/master/imageai/Detection/VIDEO.md

ImageAI provides convenient, flexible and powerful methods to perform object detection on videos. The video object detection class provided only supports RetinaNet, YOLOv3 and TinyYOLOv3. This version of ImageAI provides commercial grade video objects detection features, which include but not limited to device/IP camera inputs, per frame, per second, per minute and entire video analysis for storing in databases and/or real-time visualizations and for future insights. To start performing video object detection, you must download the RetinaNet, YOLOv3 or TinyYOLOv3 object detection model via the links below: 

- RetinaNet (Size = 145 mb, high performance and accuracy, with longer detection time) 
- YOLOv3 (Size = 237 mb, moderate performance and accuracy, with a moderate detection time) 
- TinyYOLOv3 (Size = 34 mb, optimized for speed and moderate performance, with fast detection time)

Because video object detection is a compute intensive tasks, we advise you perform this experiment using a computer with a NVIDIA GPU and the GPU version of Tensorflow installed. Performing Video Object Detection CPU will be slower than using an NVIDIA GPU powered computer. You can use Google Colab for this experiment as it has an NVIDIA K80 GPU available. 

### ImageAI - Oficial documentation
https://imageai.readthedocs.io/en/latest/


## Camera / Live Stream Video Detection

####  Install Anaconda and create a new enviroment for the project

I have named it "swda-env"
#### Activate the enviroment: 
mariaguinea@mgm:~/Workspace/SWDA_Hackathon$ source activate swda-env

#### ImageAI - Dependencies installation (directly on the console!!)

pip install pip

pip install --upgrade tensorflow 

pip install numpy

pip install scipy

pip install opencv-python 

pip install pillow

pip install matplotlib

pip install h5py

pip install keras 


#### ImageAI - Installation (directly on the console!!)

pip install https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.2/imageai-2.0.2-py3-none-any.whl 

#### Download the model and save it
https://github.com/OlafenwaMoses/ImageAI/releases/tag/1.0/
--> Download and save``yolo.h5``

### Code

#### VideoObjectDetection
The VideoObjectDetection class provides you function to detect objects in videos and live-feed from device cameras and IP cameras, using pre-trained models that was trained on the COCO dataset. The models supported are RetinaNet, YOLOv3 and TinyYOLOv3. This means you can detect and recognize 80 different kind of common everyday objects in any video. To get started, download any of the pre-trained model that you want to use via the links below.

In [19]:
from imageai.Detection import VideoObjectDetection
import os
import cv2

#check here for more info https://imageai.readthedocs.io/en/latest/video/index.html

execution_path = os.getcwd() #returns current working directory of a process


def forFrame(frame_number, output_array, output_count):
    print("FOR FRAME " , frame_number)
    print("Output for each object : ", output_array)
    print("Output count for unique objects : ", output_count)
    print("------------END OF A FRAME --------------")


camera = cv2.VideoCapture(0)

detector = VideoObjectDetection()
#set the model according to the downlead pre-trained model
detector.setModelTypeAsYOLOv3()
detector.setModelPath(os.path.join(execution_path , "yolo.h5")) #modify if other model is needed
detector.loadModel()


detector.detectObjectsFromVideo(camera_input=camera,
                                      output_file_path=os.path.join(execution_path, "camera_detected_video"),
                                      frames_per_second=20,
                                      per_frame_function=forFrame,
                                      minimum_percentage_probability=20)


KeyboardInterrupt: 

### Output - Example

````
FOR FRAME  1
Output for each object :  [{'name': 'book', 'percentage_probability': 76.48902535438538, 'box_points': (411, 359, 698, 573)}, {'name': 'refrigerator', 'percentage_probability': 95.99391222000122, 'box_points': (922, 36, 1264, 698)}, {'name': 'laptop', 'percentage_probability': 69.17145848274231, 'box_points': (411, 359, 698, 573)}, {'name': 'cup', 'percentage_probability': 28.488200902938843, 'box_points': (596, 309, 703, 438)}, {'name': 'person', 'percentage_probability': 98.6465334892273, 'box_points': (379, 49, 889, 720)}]
Output count for unique objects :  {'book': 1, 'refrigerator': 1, 'laptop': 1, 'cup': 1, 'person': 1}
------------END OF A FRAME --------------
````


### Conclusion
* YOLO has better accuracy than YOLO-TINY
* Using YOLO model it is possible to detect books and laptops on a strem video using a Intel i7 Processor, 8GB RAM, Intel HD Graphics 1546 MB.
