<h2>Practicum I - Real-Time Object Detection/Tracking for Retail Business Intelligence</h2>
<h4>Regis University - CC&IS Department - Data Science</h4>
<h5>Practicum Advisor: Prof. Aiman Gannous</h5>
<h5>Student: Josh Butch</h5>

A way to use artificial neural networks and video cameras to learn more about the habits of retail customers.  The purpose of this project is to identify opportunities to improve the retail customer shopping experience by gathering unique intelligence about their shopping habits/experiences.<br>

The following customer counter allows for the tracking and detection of human objects utilizing a standard personal computer and a high definition webcam.  This model can be configured to work with the current multi-camera setup that the store uses for security.  The ability to track centroids across multiple cameras will greatly enhance the usefulness of this model in gathering shopping intel.

First things first - we are going to import the necessary packages for this model.  It's not an extensive amount of code, but there are multiple packages necessary to accomplish our goals:

In [None]:
# Import the necessary packages
from pyimagesearch.centroidtracker import CentroidTracker  # Centroid tracking capability
from pyimagesearch.trackableobject import TrackableObject  # Creates a trackable object ID
from imutils.video import VideoStream                      # Package to capture external video stream
from imutils.video import FPS                              # Package to track and count frames per second
import numpy as np                                         # Abbreviate numpy as np
import argparse                                            # Package to write and parse arguments
import imutils                                             # Basic image processing package
import time                                                # Package to track time
import dlib                                                # Package to make real world machine learning apps
import cv2                                                 # OpenCV package for deep neural networks

Now that the packages are installed it's time to create and parse the arguments for this model.  The following arguments will allow the user of the model to modify model inputs, outputs, video sources, confidence intervals, and frame detection rates.  Without these arguments that model would not be able to take inputs from the user.

In [None]:
# Construct and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-i", "--input", type=str,
	help="path to optional input video file")
ap.add_argument("-o", "--output", type=str,
	help="path to optional output video file")
ap.add_argument("-c", "--confidence", type=float, default=0.4,
	help="minimum probability to filter weak detections")
ap.add_argument("-s", "--skip-frames", type=int, default=30,
	help="# of skip frames between detections")
args = vars(ap.parse_args())

Next we'll be initializing the list of class labels that our model is trained to detect.  In this model we are using the CaffeModel training dataset.  These list of classes are the subcategories that our frames will be referenced against.  In the end the only class we'll be detecting is the "person" class.

We'll also take the opportunity to call up the model arguments by assigning "prototxt" and "model" args at this point.

In [None]:
# Initialize the list of class labels MobileNet SSD was trained to
# detect
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]

# Load the serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

Next we'll determine whether or not the video input will be from a webcam or a video file.  In this instance we'll be connecting using our webcam.  This is indicating by the "src=0" path in the below code.

In [None]:
# If a video path was not supplied, reference the webcam
if not args.get("input", False):
	print("[INFO] starting video stream...")
	vs = VideoStream(src=0).start()          # src=0 is the webcam
	time.sleep(2.0)                          # Allows a "warm-up" period if there's a delay in webcam activation

# Otherwise, grab a reference to the video file being inputted
else:
	print("[INFO] opening video file...")
	vs = cv2.VideoCapture(args["input"])

The following piece of code is to initialize a video writer and frame dimensions.  We will be addressing both of these instances a little further in the code by allowing the frame size to automatically set dimensions.  We also don't need a video writer in this instance.

In [None]:
# Initialize the video writer if needed
writer = None

# Initialize the frame dimensions or set them from the first frame of video
W = None
H = None

The following code are the last pieces to be instantiated before we begin looping over the frames to begin our object detection/tracking.  The centroid tracker is the key variable that identifies an object being tracked.  The centroid IS the object as far as the model is concerned.  Intializing the FPS counters and estimator are complete as well so now we can begin to analyze each frame to detect our desired objects.

In [None]:
# Instantiate our centroid tracker 
ct = CentroidTracker(maxDisappeared=40, maxDistance=50)
trackers = []           # Initialize a list to store each of our dlib correlation trackers
trackableObjects = {}   # Intialize a dictionary to map each unique object ID to a TrackableObject

# Initialize the total number of frames processed  
totalFrames = 0
totalDown = 0    # Total # of objects down
totalUp = 0      # Total # of objects up

# Start the frames per second throughput estimator
fps = FPS().start()

Beginning our for loop to begin to iterate over the frames and detect/track objects.  This is where the heart of the model begins to run it's object detection analysis.  The first part of the loop determines the video source and how to determine if the video is complete.

In [None]:
# Loop over frames from vs
while True:
	# Capture the next frame and determine either VideoCapture or VideoStream
	frame = vs.read()
	frame = frame[1] if args.get("input", False) else frame

	# If we are viewing a video and there's no frame to grab we've reached the end
	if args["input"] is not None and frame is None:
		break

Here we resize the frame and accomplish color inversion to help in edge detection.  As stated in the comments, if the frame dimension are empty the frame size will be set according to it's shape.

In [None]:
	# Resize the frame to have a maximum width of 500 pixels 
	frame = imutils.resize(frame, width=500)
	rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # Convert the frame from BGR to RGB for dlib

	# If the frame dimensions are empty, set them according to shape
	if W is None or H is None:
		(H, W) = frame.shape[:2]

Next we'll initialize the video writer if one is necessary.  We're not creating an output video file, but it's a really nice feature to be able to create a file of your model working over a particular piece of video that's interesting to analyze.  One thing that became apparent is the amount of nonproductive detection time that a model like this exhibits.

In [None]:
	# Initialize the writer if necessary
	if args["output"] is not None and writer is None:
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 30,
			(W, H), True)

	# Initialize the current status along with our list of bounding
	# box rectangles returned by either (1) our object detector or
	# (2) the correlation trackers
	status = "Waiting"
	rects = []

The next section is where the magic happens as we convert our frame to a blob in a single pass.  After converting the frame to a blob the model begins to loop over detections in an attempt to annotate their location and classify the object being detected as the frames progress.

In [None]:
		# Convert the frame to a blob and pass the blob through the
		# network and obtain the object detections
		blob = cv2.dnn.blobFromImage(frame, 0.007843, (W, H), 127.5)
		net.setInput(blob)
		detections = net.forward()

		# loop over the detections
		for i in np.arange(0, detections.shape[2]):
			confidence = detections[0, 0, i, 2]     # Extract the probability associated with the detection

In the following section we filter out weak detections based on our confidence rating.  Also, now that we have classified our detected object, we can drop any classes that we aren't interested in detecting.  In this case the only class we want to identify is "person."

In [None]:
			# Filter out weak detections 
			if confidence > args["confidence"]:
				idx = int(detections[0, 0, i, 1])  # Extract the index of the class label from the detections list

				# If the class label is not a person, ignore it
				if CLASSES[idx] != "person":
					continue

The next section of code will allow the model to compute the coordinates of each object's bounding box.  Once constructed the list of trackers will be appended to add any new objects being tracked.

In [None]:
				# Compute the coordinates of the bounding box for the object
				box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
				(startX, startY, endX, endY) = box.astype("int")

				 from the bounding
				tracker = dlib.correlation_tracker()  # Construct a dlib rectangle object
				rect = dlib.rectangle(startX, startY, endX, endY)
				tracker.start_track(rgb, rect)        # Start the dlib correlation tracker

				trackers.append(tracker)  # Add to list of trackers

The following else statement determines the best option for processing throughput.  It will loop over the trackers, setting the status as it does, which it then updates with a new position.  Once the position has the start and ending X,Y coordinates the model can add the bounding box to the object being detected.

In [None]:
	# Utilize our object *trackers* rather than object *detectors* to obtain a higher frame processing throughput
	else:
		# Loop over the trackers
		for tracker in trackers:
			status = "Tracking"  # Set the status of our system to be 'tracking'

			# Update the tracker and grab the updated position
			tracker.update(rgb)
			pos = tracker.get_position()

			# Unpack the position object
			startX = int(pos.left())
			startY = int(pos.top())
			endX = int(pos.right())
			endY = int(pos.bottom())

			# Add the bounding box coordinates to the rectangles list
			rects.append((startX, startY, endX, endY))

The horizontal reference line is the key piece of code when determining an objects movement.  In order to track an object there must be a spatial reference or line of demarcation that the centroid has to cross in one direction or another.  We are trying to determine entry and exit movements so we have a single plane tracking two directions of movement for our purposes.

In [None]:
	# Draw a horizontal line in the center of the frame -- once an
	# object crosses this line we will determine whether they were
	# moving 'up' or 'down'
	cv2.line(frame, (0, H // 2), (W, H // 2), (0, 255, 255), 2)

	objects = ct.update(rects)


Loop over the tracked objects to determine if one exists for that object ID.  If a trackable object does not exist for that object ID the model will create one, otherwise it will update the information of the previous centroid and, based on a positive or negative number, will be able to determine direction of movement.

In [None]:
	# Loop over the tracked objects
	for (objectID, centroid) in objects.items():
		to = trackableObjects.get(objectID, None) # Check to see if a trackable object exists for current object ID

		# If none, create one
		if to is None:
			to = TrackableObject(objectID, centroid)

		else:
			# The difference between the y-coordinate of the *current*
			# centroid and the mean of *previous* centroids will tell
			# us in which direction the object is moving (negative for
			# 'up' and positive for 'down')
			y = [c[1] for c in to.centroids]
			direction = centroid[1] - np.mean(y)
			to.centroids.append(centroid)

Again, we're going to analyze the direction of travel and tally whether or not that object has been counted.  The positive or negative resulting number will again determine the direction of travel and how the model classifies that object's movement.

In [None]:
			# Check to see if the object has been counted or not
			if not to.counted:
				# If the direction is negative (indicating the object
				# is moving up) AND the centroid is above the center
				# line, count the object
				if direction < 0 and centroid[1] < H // 2:
					totalUp += 1
					to.counted = True

				# If the direction is positive (indicating the object
				# is moving down) AND the centroid is below the
				# center line, count the object
				elif direction > 0 and centroid[1] > H // 2:
					totalDown += 1
					to.counted = True

In the next section we are going to store the trackable object we created with the objectID.  Once that's accomplished we'll be able to print the centroid and object ID on the output frame for labeling.

In [None]:
		# Store the trackable object
		trackableObjects[objectID] = to

		# Draw both the ID of the object and the centroid of the
		# object on the output frame
		text = "ID {}".format(objectID)
		cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
		cv2.circle(frame, (centroid[0], centroid[1]), 4, (0, 255, 0), -1)

The following code is what I refer to as "cleanup code" that essentially releases/closes any connections/dependencies that may need to be closed before exiting the program.  Disk writing is accomplished if necessary, outputs are validated, key breaks instituted, and counting increments/decrements adjusted based on direction of centroid travel.

In [1]:
	# Check to see if we should write the frame to disk
	if writer is not None:
		writer.write(frame)

	# Show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# If the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# Increment the total number of frames processed thus far and
	# then update the FPS counter
	totalFrames += 1
	fps.update()

# Stop the timer and display FPS information
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# Check to see if we need to release the video writer pointer
if writer is not None:
	writer.release()

# If not using a video file, stop the camera video stream
if not args.get("input", False):
	vs.stop()

# Release the video file pointer
else:
	vs.release()

# Close any open windows
cv2.destroyAllWindows()

IndentationError: expected an indented block (<ipython-input-1-d6a0ce9e9f23>, line 25)

In conclusion, we've managed to create a working customer counter that will monitor a single point of entry/exit.  Although this is the most basic of customer counters, it's a solid foundation to start from when adding additional cameras and path tracking.  Further modification of this file will allow for operational intelligence and customer behavior insights that weren't previously available.