# Jupyter Notebook on Computer vision Crowd Counting

(Version 1.0)

<img src="pictures/TDI_logo.png">

### This notebook gives a summary of the machine learning model we used in this tool.  

<br>

**Explain the ML project here with a detailed description...**

> More project description here:

Crowd Counting is a common application of computer vision. There are constantly new algorithms that are being invented as research is still active in this area. Our business case, however, drives our model selection. I've built and compared three approaches and I'll include some introduction on each of them in the later part of the document.

<br>


**Explain this notebook's results here...**


> The model selection space:

> * Cascade Classifier Algorithm (Cascade)
> * Tensorflow based Fast-Regions with CNN (RCNN)
> * Pytorch-based Congested Scenes RNet (CSRNet)


> We discuss and compare these 3 models and include some code nippets. 

> * Cascade is common for object detection and it's fast. In our business case it's possible to run detection on restaurant's video frames almost in real time.
> * Problem with Cascase is that it's based on frontal face detection so accuracy drops when customer do not show face in the camera.
> * RCNN is taking 2000 region proposals to feed into a CNN so computational speed is slow. Fast RCNN resolves this by using a heat map. Faster RCNN is even faster by adding a separate network so can also almost achieve real-time.
> * Problem with RCNN is that the approach is new (from 2016) and there aren't many transfer learning model to use. When I trained my own model I found that it suffers from occulation (the scenario when crowd is dense and people covers each other). RCNN performs poorly when person do not have full body in the picture. occlusion-aware R-CNN is an active research area.
> * CSRNet use separate columns to handle the problem with different head size it's also more flexible on the input image size (sort of a one-size-fits-all solution). So could be generalized easily if commercialized.
> * CSRNet is density based and performs well in counting large crowds over 300+ people. I used Google Collab to train on the cloud using Pytorch, as I do not have a CUDA enabled GPU on my machine. I did not observe any advantage in either speed or accuracy in our case as our crowd is usually less than 50.





## Cascade approach:

In [1]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Apr 30 00:28:47 2020
Cascade is native in OpenCV (import as cv2). The trained classifer is saved as a XML file.
@author: georgehan
"""
import os
#os.chdir('/Users/georgehan/TDI/Capstone/Smart_Menu')
#os.chdir('/Users/georgehan/GitHub/menu')
print(os.getcwd())
#os.chdir(os.getcwd())
import cv2
import sys

E:\Github\menu


<img src="pictures/mcd_short.jpg">

In [5]:
imagePath = "pictures/mcd_short.jpg"
trainedWeights = "trained_customer.xml"

customerTrainedWeights =  cv2.CascadeClassifier(trainedWeights)

# Read the image
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# detect customer in pic

customers = customerTrainedWeights.detectMultiScale(
        gray,
        scaleFactor=1.2, # controlls fine (smaller) vs coarse (larger) trade off, needs to > 1.0
        minNeighbors=5, # used to combine overlapping small boxes into big one
        minSize=(60, 40) # box size, distance between recoginized customers
)

print("Found {0} customers!".format(len(customers)))
print("Hyperparameters are scaleFactor, minNeighbors, and minSize.")

Found 6 customers!
Hyperparameters are scaleFactor, minNeighbors, and minSize.


In [3]:
# Draw a rectangle around the customers
for (x, y, w, h) in customers:
    cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)

# cv2.imshow("customers found", image)
status = cv2.imwrite('customers_detect.jpg', image)
print ("Image written to file-system : ",status)
print(imagePath)

Image written to file-system :  True
pictures/mcd_very_long.jpg


<img src="mcd_short_marked.jpg">

**For Cascade approach can read more at:** http://www.willberger.org/cascade-haar-explained/

## Faster R-CNN approach:

In [1]:
# RCNN uses OpenCV and tensorflow
import numpy as np
import tensorflow as tf
import cv2
import time
import glob

In [5]:
import os
os.getcwd()

'/Users/georgehan/GitHub/menu'

In [2]:
#This is the main class for counting customers
class customer_Counter:

    def __init__(self, path):
        self.path = path
        self.detection_graph = tf.Graph()
        with self.detection_graph.as_default():
            od_graph_def = tf.GraphDef()
            with tf.gfile.GFile(self.path, 'rb') as fid:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')

        self.default_graph = self.detection_graph.as_default()
        self.sess = tf.Session(graph=self.detection_graph)

        self.image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0') # Defining tensors for the graph
        self.detection_boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0') # Each box denotes part of image with a person detected 
        self.detection_scores = self.detection_graph.get_tensor_by_name('detection_scores:0') # Score represents the confidence for the detected person
        self.detection_classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
        self.num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')

    def detect(self, image):
        image_np_expanded = np.expand_dims(image, axis=0)
        (boxes, scores, classes, num) = self.sess.run(
            [self.detection_boxes, self.detection_scores, self.detection_classes, self.num_detections],
            feed_dict={self.image_tensor: image_np_expanded}) # Using the model for detection

        im_height, im_width,_ = image.shape
        boxes_list = [None for i in range(boxes.shape[1])]
        for i in range(boxes.shape[1]):
            boxes_list[i] = (int(boxes[0,i,0] * im_height),
                        int(boxes[0,i,1]*im_width),
                        int(boxes[0,i,2] * im_height),
                        int(boxes[0,i,3]*im_width))

        return boxes_list, scores[0].tolist(), [int(x) for x in classes[0].tolist()], int(num[0])

    
    def close(self):
        self.sess.close()
        self.default_graph.close()

As discussed, RCNN model fails misearbly as it cannot detect unless has a full body of a person in the image, as shown below:

```python
if __name__ == "__main__":
    model_path = '../my_model.pb' 
    # This training weight is too large to upload to Github
    customer_counter = customer_Counter(path=model_path)
    threshold = 0.4
    no=1
    for n in pbar(glob.glob("./data/images/test/*.jpg")):
        count=0
        img = cv2.imread(n)
        img = cv2.resize(img, (640, 480))

        boxes, scores, classes, num = customer_counter.detect(img)

        for i in range(len(boxes)):
            if classes[i] == 1 and scores[i] > threshold:
                box = boxes[i]
                cv2.rectangle(img,(box[1],box[0]),(box[3],box[2]),(255,0,0),2)
                count+=1
        cv2.putText(img,'Count = '+str(count),(10,400),cv2.FONT_HERSHEY_SIMPLEX, 1.25,(255,255,0),2,cv2.LINE_AA)
        cv2.imwrite("./results/result%04i_menu_count.jpg" %no, img)
        no+=1
print("\n\t\t\tCustomers Count Saved!\n")
```

<img src="RCNN_result.jpg">

**For Faster R-CNN can read more at**: https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e

## CSRNet-Pytorch approach:

This is a very new model from 2018. The training can only be done on Google cloud using GPU as it takes too long. You can also accelerate it by using visdom, a computer vision open source tool from Facebook but it then requires other dependencies.
> pip install visdom
<br>
> python -m visdom.server

In [14]:
# This code is for running on google.colab 
# a cloud environment with GPU enabled.
'''
from google.colab import drive
drive.mount('/content/drive/')

import os
os.chdir("/content/drive/My Drive/app/CSRNet-pytorch")
'''

'\nfrom google.colab import drive\ndrive.mount(\'/content/drive/\')\n\nimport os\nos.chdir("/content/drive/My Drive/app/CSRNet-pytorch")\n'

The main training script is as following. Note that if you do not have CUDA enabled GPU, then you need to use: 
> device = torch.device("cude:0" if torch.cuda.is_available() else "cpu")

Implementation of CSRNet using Pytorch:

```python
import h5py
import scipy.io as io
import PIL.Image as Image
import numpy as np
import os
import glob
from matplotlib import pyplot as plt
from scipy.ndimage.filters import gaussian_filter 
import scipy
import json
import torchvision.transforms.functional as F
from matplotlib import cm as CM
from image import *
from model import CSRNet
import torch
%matplotlib inline

from torchvision import datasets, transforms
transform=transforms.Compose([
                       transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
                   ])


root = '/content/drive/My Drive/app/CSRNet-pytorch/data/'


#now generate the training data's ground truth
part_A_train = os.path.join(root,'part_A_final/train_data','images')
part_A_test = os.path.join(root,'part_A_final/test_data','images')
part_B_train = os.path.join(root,'part_B_final/train_data','images')
part_B_test = os.path.join(root,'part_B_final/test_data','images')
path_sets = [part_A_test]

img_paths = []
for path in path_sets:
    for img_path in glob.glob(os.path.join(path, '*.jpg')):
        img_paths.append(img_path)
        
        
model = CSRNet()
model = model.cuda()
checkpoint = torch.load('0model_best.pth.tar') 
# Again this trained model is 130mb too big for Github
# so not included in the folder

model.load_state_dict(checkpoint['state_dict'])

from matplotlib import cm as c
import matplotlib.image as mpimg
img = transform(Image.open('/content/drive/My Drive/app/CSRNet-pytorch/mcd_very_long.jpg').convert('RGB')).cuda()

output = model(img.unsqueeze(0))
print("Predicted Count : ",int(output.detach().cpu().sum().numpy()))
temp = np.asarray(output.detach().cpu().reshape(output.detach().cpu().shape[2],output.detach().cpu().shape[3]))
plt.figure
f, axarr = plt.subplots(1,2)
axarr[0].imshow(temp,cmap = c.jet)
axarr[1].imshow(mpimg.imread('/content/drive/My Drive/app/CSRNet-pytorch/mcd_very_long.jpg'))
plt.show()
        
        
```

The model implementation script is saved in CSRNet_pytorch folder. 

<img src="CSRNet_result.png">

CSRNet is very effective when there's a large crowd as it has a better ability to detect people with small heads. However, when the number of people in the picture is less, it tends to overcount as it uses a density based approach.

**Can read more on the ideas of CSRNet at:** https://medium.com/secure-and-private-ai-writing-challenge/implementation-of-csrnet-crowd-counting-project-for-udacity-project-showcase-a451b4397d71

In our case, considering the restaurant needs real-time speed, with reasonable accuracy and close up cameras. We choose Cascade classifier as it makes the most business sense. 

> - The number of people is usually small. 
> - The distance from the camera to the crowd is near.

We can overcome the draw back of face detection requirement by placing the detection camera always facing the customer queue.
<br>
<br>
This model selection is based on business case scenario.

<br>
Can add more here... on possible extensions. Feasibiliy of the this tool etc...
