In [9]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

In [2]:
#thread management
from threading import Thread,Lock
from collections import deque

#image manipulation
import numpy as np
import pandas as pd
import cv2 #OpenCV
import matplotlib.pyplot as plt
%matplotlib inline

#general modules
import time
import pickle

#camera modules
#import picamera as pc
#from picamera.array import PiRGBArray

# Streaming Video Analysis in Python 

** Written By:
** Matthew Rubashkin and Colin Higgins

At SVDS we have [*analyzed Caltrain
delays*](http://www.svds.com/the-trains-project-analyzing-caltrain-delays/)
to try to improve Caltrain arrival predictions using real time publicly
available data. However, there were some inconsistencies with the
station arrival time data we pulled from the API. In order to increase
the accuracy of our predictions, we needed to verify when, where, and in
which direction trains were going. In our previous [post](http://www.svds.com/image-processing-python/), Chloe Mawer implemented a proof-of-concept Caltrain detector
using a webcam to acquire a video at our Mountain View offices. She
explained the use of OpenCV’s python bindings to walk through
frame-by-frame image processing. She showed that using video alone, it
is possible to positively identify a train based on motion from frame to
frame. She also showed how to use regions of interest within the frame
to determine the direction in which the Caltrain was traveling.

<center><video width="960" height="360" align="center" controls>
  <source src="video/confusion_matrix_movie.mp4" type="video/mp4">
</video></center>

The previous work was done using pre-recorded, hand-selected video.
Since our goal is to provide real time Caltrain detection, we had to
implement a streaming train detection algorithm and measure its
performance under real-world conditions. Thinking about a Caltrain
detector IoT device as a product, we also needed to slim down from a
camera + laptop to something with a smaller form factor. We already had
some experience [*listening to
trains*](http://www.svds.com/listening-caltrain/) using a Raspberry Pi,
so we bought a [*camera
module*](https://www.raspberrypi.org/products/pi-noir-camera/) for it
and integrated our video acquisition and processing/detection pipeline
onto one device. 

<img src="figures/Pi_Video_Only_Architecture.png" alt="Smooth images" width="1200" align='left'>

On our Raspberry Pi 3B, our pipeline consists of hardware and software running on top of [Raspbian Jesse](https://www.raspberrypi.org/blog/raspbian-jessie-is-here/), a derivative of Debian Linux. All of the software is written in [python 2.7](https://www.python.org) and can be controlled from a [Jupyter Notebook](http://jupyter.org) run locally on the Pi or remotely on your laptop. Highlighted in green are our 3 major components for acquiring, processing, and evaluating streaming video:
-   **Video Camera**: Initializes picamera and captures frames from the video stream
-   **Video Sensor:** Processes the captured frames and dynamically varies video camera settings 
-   **Video Detector**: Determines motion in specifed Regions of Interst (ROIs), and evaluates if a train passed

In addition to our main camera, sensor and detector processes, several sub-classes (orange) are needed to perform image background subtraction, persist data, and run models:
-   **Mask**: Performs background subtraction on raw images, using powerful algorithims implented in OpenCV 3.0 
-   **History**: An accessible [Pandas](http://pandas.pydata.org) dataframe that is updated in real time to persist data and faciliates SQL-like queries
-   **Detector Worker**: Assists the video detector in evaluating image, motion and history data. This class consists of several modules (yellow) responsible for sampling frames from the video feed, plotting data and running models to determine train direction.



Caltrain detection, at its simplest, boils down to a simple question of
binary classification: Is there a train passing right now? Yes or no

<img src="figures/Binary_confusion_matrix.png" alt="OpenCV" width="150" align='right'>
As with any other binary classifier, the performance is defined by
evaluating the number of examples in each of four cases:
1. **Classifier says there is a train and there is a train, True Positive**
2. **Classifier says there is a train when there is none, False Positive**
3. **Classifier says there is no train when there is one, False Negative**
4. **Classifier says there is no train when there isn’t one, True Negative** 

For more info check out the blogs by Tom Fawcett, principal data scientist at
SVDS, on [classifier evaluation](http://www.svds.com/the-basics-of-classifier-evaluation-part-1)

After running our minimum viable Caltrain detector for a week, we began
to understand how our classifier performed, and importantly, where it
failed.

Causes of false positives:

-   Delivery trucks
-   Garbage trucks
-   Light rail
-   Freight trains

Causes of false negatives:

-   Darkness
-   Rain

Our classifier involves two main parameters set empirically, motion and time. We first evaluate the amount of motion in selected Region of Interest (ROIs). This is done at 5 frames per second. The second parameter we evaluate is motion over time, wherein a set amount of motion must occur over a certain amount of time to be considered a train. We set our time threshold at 2 seconds, since express trains take ~3 seconds to pass by our sensor located 50 feet from the tracks. As you can imagine, objects like humans walking past our IoT device will not create large enough motion to trigger a detection event, but large objects like freight trains or trucks will trigger a false positive detection event if they traverse the video sensor ROIs over 2 seconds or more. The next 2 blog posts will discuss how we integrate audio and image classification to decrease false positive events.

While our video classifier works decently well at detecting trains during the day, we were unable to detect trains (false negatives) in low light conditions after sunset. When we tried additional computationally expensive image processing to detect trains in low light on the Raspberry Pi, this caused all other processes including image capture to grind to a halt! 

So before we dive into the data and how we solved these problems, let’s talk about some of the nuts and bolts. How do we capture video and process it on the Raspberry Pi?

## PiCamera and the Video_Camera Class

The [*PiCamera*](https://picamera.readthedocs.io/) package is an
open-source package that offers a pure Python interface to the Pi camera
module that allows you to record image or video to file or stream. After
some experimentation, we decided to use PiCamera in a [continuous capture
mode](http://picamera.readthedocs.io/en/release-1.10/api_camera.html), as shown below in the **initialize_camera** and **initialize_video_stream** functions. 

In [4]:
class Video_Camera(Thread):
    def __init__(self,fps,width,height,vflip,hflip,mins):
        self.input_deque=deque(maxlen=fps*mins*60) 
        #...
        
    def initialize_camera(self):
        self.camera = pc.PiCamera(
            resolution=(self.width,self.height), 
            framerate=int(self.fps))
            #...
    
    def initialize_video_stream(self):
        self.rawCapture = pc.array.PiRGBArray(self.camera, size=self.camera.resolution) 
        self.stream = self.camera.capture_continuous(self.rawCapture,
             format="bgr", 
             use_video_port=True)
        
    def run(self):
        #This method is run when the command start() is given to the thread
        for f in self.stream:
            #add frame with timestamp to input queue
            self.input_deque.append({
                'time':time.time(),
                'frame_raw':f.array})

<img src="figures/camera_codeblock_1_folded.png" alt="Smooth images" width="800" align='left'>

The stream of still image frame captures are output as a numpy array representation of the image into a deque, a [double-ended queue](https://en.wikipedia.org/wiki/Double-ended_queue), for future processing. We decided to use a deque because we will need to add/remove/access objects from both the front (head) and back(tail) of the deque. Moreover, we can easily constrain the maximum length of our **input_deque** with the maxlen argument. As shown below, new images are appended to the front, and old images are automatically removed from the rear if the maxlen is exceeded. The deque allows calculation of motion over several frames, and enforces a limit on the total images stored in memory. It is important to minimize the memory footprint of this application as our IoT device, the Raspberry Pi 3 only has 1 GB of memory.

<img src="figures/deque_train_example.png" alt="Smooth images" width="800" align='left'>

## Threading and task management in python

As you may have noticed, we implement our video_camera class as a new [thread](https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/4_Threads.html) using the python [threading](https://docs.python.org/2/library/threading.html) module. In order to perform real time train detection on a raspberry pi, threading is critical to ensure robust performance and minimize data loss in our asynchronous detection pipeline. This is because multiple threads within a process (our python script) share the same data space with the main thread, facilitating:

-   Communication of information between threads

-   Interruption of individual threads without terminating the entire application

-   Most importantly, individual threads can be put to sleep (held in place) while other threads are running. This allows for nonparallel tasks to run without interruption on a single processor. 

<img src="figures/threading_diagram.png" alt="OpenCV" width="200" align='right'>
For example, imagine you are reading a book but are interrupted by a freight train rolling by your office. How would you be able to come back and continue reading from the exact place where you stopped? 

One way you could do this is by recording the page, line and word number. This way your execution context for reading a book are these 3 numbers! Now if your coworker is using the same technique, she can borrow the book and continue reading where she stopped before. When she is done, you can even take the book back and continue from where you were. Similar to reading a book with multiple people, or asynchronously processing video and audio signals, many tasks can share the same processor on the Raspberry Pi!

## Real Time Background Subtraction and the Video_Sensor Class

Now that we are collecting and storing data from the PiCamera in the **input_deque**, we can create a new thread, the **video_sensor**, which asynchronously process these images independent of the video_camera thread. The job of the **video_sensor** is to determine which pixels have changed values overtime, i.e. motion. To do this, we will need to identify the background of the image, the non-moving objects in the frame which inadvertently mask motion, and the foreground of the image: i.e. the new/moving objects in the frame. After we have identified motion, we will apply a 5x5 pixel kernal filter to reduce noise in our motion measurement via the [cv2.morphologyEx](http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html) function. 

In [7]:
class Video_Sensor(Thread):
    def __init__(self,video_camera,mask_type):
        #...
        
    def apply_mask_and_decrease_noise(self,frame_raw):
        #apply the background subtraction mask
        frame_motion = self.mask.apply(frame_raw)
        #apply morphology mask to decrease noise
        frame_motion_output = cv2.morphologyEx(
            frame_motion,\
            cv2.MORPH_OPEN,\
            kernel=np.ones((2,2),np.uint8))
        return frame_motion_output

<img src="figures/camera_codeblock_2.png" alt="Smooth images" width="500" align='left'>

## Real time background subtraction masks

Chloe [previously demonstrated](http://www.svds.com/image-processing-python/) that we could detect trains with processed video feeds that isolate motion, through a process called background subtraction, by setting thresholds for the minimum intensity and duration of motion. Since background subtraction must be applied to each frame and the Pi has only modest computational speed, we needed to streamline the algorithm to reduce computational overhead. Luckily, [OpenCV 3](http://opencv.org/opencv-3-0.html) comes with multiple [background subtraction algorithms](http://docs.opencv.org/3.1.0/db/d5c/tutorial_py_bg_subtraction.html#gsc.tab=0) that run optimized C code with convenient Python APIs including:

-   [backgroundsubtractorMOG2](http://www.sciencedirect.com/science/article/pii/S0167865505003521) : A Gaussian Mixture-based Background/Foreground Segmentation Algorithm developed by Zivkovic and colleagues.  It uses a method to model each background pixel by an optimized mixture of K Gaussian distributions. The weights of the mixture represent the time proportions that those colours stay in the scene. The probable background colours are the ones which stay longer and are more static.

<img src="figures/knn_theory.png" alt="OpenCV" width="200" align='right'>


-   [backgrounsubtractorKNN](http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_ml/py_knn/py_knn_understanding/py_knn_understanding.html) : KNN involves searching for the closest match of the test data in the feature space of historical image data. In our case, we are trying to discern large regions of pixels with motion and without motion. An example of this to the right, where we try and discern which class (blue square or red triangle) the new data (green circle) belongs to by factoring in not only the closest neighbor (red triangle), but the proximity threshold of k-nearest neighbors. For instance, if k=2 then the green circle would be assigned the red triangle (the two red triangles are closest), but if k=6 then the blue square class would be assigned (the closest 6 objects are 4 blue squares and only 2 red triangles). If tuned correctly, KNN background subtraction should excel at detecting large areas of motion (i.e. a train) and should reduce detection of small areas of motion (i.e. a distant tree fluttering in the wind).

We tested each and found that backgroundsubtractorKNN gave the best balance between rapid response to change and adaptability, robustly recognizing vehicle motion, while not being triggered by swaying vegetation. Moreover, the KNN method  can be improved through machine learning, and the classifer can be saved to file for repeated use. The cons of KNN include artifacts from full field motion, limited tutorials, incomplete documentation, and that backgroundsubtractorKNN requires OpenCV 3.0 and higher.

In [None]:
class Mask():
    def __init__(self,fps):
        #...
        
    def make_KNN_mask(self,bgsKNN_history,bgsKNN_d2T,bgsKNN_dS):
        mask = cv2.createBackgroundSubtractorKNN(\
            history=bgsKNN_history,\
            dist2Threshold=bgsKNN_d2T,\
            detectShadows=bgsKNN_dS)
        return mask

<img src="figures/camera_codeblock_3.png" alt="Smooth images" width="600" align='left'>

<center><video width="960" height="360" align="center" controls>
  <source src="video/mog2_knn_comparison.mp4" type="video/mp4">
</video></center>

## Dynamically update camera settings in response to varied lighting

The PiCamera does a great job at adjusting its brightness settings throughout the day, but struggles with limited illumination at night or during the rare downpour in Mountain View. Below you can see the motion we detected from our sensor over 24 hours, where the spikes correspond to moving objects like a CalTrain! Now if we were using a digital camera or phone, we could manually change the exposure time or turn on a flash to increase the motion we could capture post sunset or before the sunrise. However, with an automated IoT device, we must dynamically update the camera settings in response to varied lighting. We also picked a [night-vision compatible camera](https://www.raspberrypi.org/products/pi-noir-camera-v2/) without an infared (IR) filter to gather more light in the ~700-1000 nm range, where normal cameras only capture light from ~400-700nm.  This extra far to infared light is why some of our pictures seem discoloured compared to traditional cameras like your smart phone that have an IR filter. 

<img src="figures/sunrise_sunset_data.jpg" alt="Smooth images" width="800" align='left'>

In order to know when to change the camera settings we record the intensity mean of the image, which the camera tries to keep around 50% max levels at all times (half max = 128, i.e. half of the 8 bit 0-255 limit). We observed that when the sunset light dropped beneath ~1/16 of max, and we were unable to reliably detect motion. For this reason we set a low intensity level of 1/8 max intensity to trigger the camera night model, and we set the intensity threshold to 7/8 max intensity to trigger the day mode. We also do not continually trigger the night settings if it is already night, by checking the camera operating mode.

After we change the camera settings, we reset the background subtraction mask to ensure that we do not falsely trigger train detection. Importantly, we wait 1 second between setting camera settings and triggering the mask, to ensure the camera thread is not lagging and has updated before the mask is reset. 

In [None]:
class Video_Sensor(Thread):
    #...
    def vary_camera_settings(self,frame_raw):
            intensity_mean=frame_raw.ravel().mean() #8 bit camera
            #adjust camera properties dynamically if needed, then reset mask
            if ((intensity_mean < (255.0/8) ) & (self.camera.operating_mode=='day')):
                self.video_camera.apply_camera_night_settings()
                time.sleep(1)
                self.mask=self.mask_object.make_mask(self.mask_type)
                print 'Day Mode Activated - Camera'
            if ((intensity_mean < (255.0*(3/4)) ) & (self.camera.operating_mode=='night')):
                self.video_camera.apply_camera_day_settings()
                #...
            return intensity_mean,self.mask

<img src="figures/camera_codeblock_4.png" alt="Smooth images" width="800" align='left'>

### Real-time detection of trains with the Video_Detector Class

Now that the video sensor is recording motion in a frame 5 times a second (5 FPS), we need to create a Video_Detector for detecting how long and what direction an object has been moving through the frame. In order to do this, we create 3 Regions Of Interest (ROIs) in our frame which the train passes through. By having three ROIs, we can see if a train enters from the left (northbound) or right (southbound). We found that having a third center ROI decreases the effect of noise in an individual ROI; thereby improving our ability to predict train directionality and more accurately calculate speed.

We next create [circular buffer](https://en.wikipedia.org/wiki/Circular_buffer) to store when individual ROIs have exceeded the motion threshold. The length of this motion_detected_buffer is set as the minimum time corresponding to a train, multiplied by the camera FPS (we set this as 2 seconds, i.e. the motion_detected_buffer has a length of 10). We added logic to our Video_Detector class that prevents a train from being detected more than once in cooldown period, to prevent slow moving trains as being registered as a train more than one time. Additionally, we use a frame sampling buffer to keep a short term record of raw and processed frames for future analysis, plotting or saving. 

Using all of these buffers, the Video_Detector class creates ROI_to_process, which is an array of information which stores time, the motion from the 3 ROIs, if motion was detected, and the direction of the train motion.

In [None]:
class Video_Detector(Thread):
    def __init__(self,video_camera,video_sensor,\
                 motion_threshold,time_threshold,cooldown_period):
        
    def create_rois(self):
        #hardcode Regions Of Interest, ROI: ((x1, y1), (x2, y2))
        left_roi=((2,80),(50,135))
        center_roi=((145,90),(215,130))
        right_roi=((325,100),(380,130))
        self.all_rois=[left_roi,center_roi,right_roi]
    
    def create_buffers(self):
        self.train_detected_buffer=deque(maxlen=self.cooldown_period)
        self.train_direction_buffer=deque(maxlen=self.cooldown_period)
        #length of motion_detected_buffer determines how time of motion translates into detection
        self.motion_detected_buffer=deque(maxlen=self.time_threshold)
        #prefill these buffers to max length
        for i in range(0,self.cooldown_period):
            self.motion_detected_buffer.append(0)
            self.train_detected_buffer.append(0)
            self.train_direction_buffer.append(0)
        #create frame sampler buffer (do not prefill this buffer)
        self.frame_sampler_buffer=deque(maxlen=self.cooldown_period)
        
    def run(self):
        self.framenum=0
        while self.kill_all_threads!=True:
            data=self.output_deque.popleft()
            #...
            #update the history dataframe and adjust the frame number pointer
            self.history.iloc[self.framenum % self.history.shape[0]] = self.roi_data
            self.framenum+=1
            #add frames and data to sampler
            self.frame_sampler_buffer.append({
                    #...

<img src="figures/video_detector_and_buffers_codeblock.png" alt="Smooth images" width="800" align='left'>

<img src="figures/roi_to_process_code.png" alt="Smooth images" width="800" align='left'>

### Persist proceesed data to Pandas Dataframe

Now that we are storing relevant train sensor and detector data in memory, we use [pandas dataframes](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) to persist this data for future analysis. Pandas is a python package that provides fast and flexible data structures designed to work efficiently with both relational and labeled data. Similar to using SQL (structured query language) for managing data held in relational database management systems (RDBMS), pandas is an excellent tool for data analysis, that makes importing, querying and exporting data easy. 

The History class is used to create a pandas dataframe that loads time, sensor and processed detector data. As the raspberry pi 3 has limited memory (1 GB), we implement the History pandas dataframe as a limited length circular buffer to prevent memory errors. 

In [None]:
class History():
    def __init__(self,fps):
        self.len_history=int(fps*600)
        self.columns=['time','left_roi','middle_roi','right_roi',\
                 'motion_detected','train_detected','direction'] 
        
    def setup_history(self):
        #create pandas dataframe that contains  column information
        self.history = pd.DataFrame.from_records(\
            np.zeros((self.len_history,len(self.columns))),
            index=np.arange(self.len_history),
            columns=self.columns)
        #...

<img src="figures/history_codeblock_5.png" alt="Smooth images" width="650" align='left'>

Using pandas SQL-like commands, we can now easily retrieve and analyze train detection events. Shown below is 17 frames (3.4 seconds of data at 5 FPS) of data of a northbound Caltrain!  

<img src="figures/history_dataframe_example.png" alt="Smooth images" width="600" align='left'>

### Detector_Worker Class

Now that we are persisting data in a pandas dataframe, we want to be able to visualize the raw sensor and processed detector data. This requires additional processing time and resources, and we do not want to interrupt the video detector. We therefore use threading to create the Detector_Worker class. The Detector_Worker is responsible for plotting video, determing train direction and returning sampled frames to the jupyter console or filesystem. Shown below is the output of the video plotter. On the top left is one raw frame of video, and on the bottom right is one KNN-background-subtracted motion frame. The two right frames have the three ROIs overlaid onto the image.


<img src="figures/Frame_Sampler_Example_Pics_Only.png" alt="Smooth images" width="650" align='left'>

### Train Direction

In order to accurately detect train direction, the three of us on the Trainspotting team all tried different methods.

-    Static ‘Boolean’ Method (Chloe): Track motion level in each individual ROIs and then select north/south depending on which ROI exceeded the threshold first. We found that this static boolean method does not work well for express trains which triggered the north and south facing detectors simultaneously.


-    Streaming ‘Integration’ Method (Colin): This method involved summing the historical levels motion in each ROI, and determining direction by which ROI had the highest sum. We found that this method was too reliant on accurate setting of  ROI position, and broke down if the camera was ever moved.

-    Streaming ‘Curve-Fit’ Method (Matt R) - We next tried to combine the boolean and integration method with a simple [sigmoid model](https://en.wikipedia.org/wiki/Sigmoid_function) of motion across the frame. If average motion across the three ROIs exceeded our motion threshold, we  empirically fit a sigmoid curve where the ROI sensor hits 50% of the max value. If the data was noisy and curve fitting failed, we revert back to Chloe's static boolean method. Moreover, our curve-fit method  allows determination of train speed if the real distance between the ROIs is known!  

In [None]:
class Detector_Worker(Thread):
    def curve_func(self,x, a, b,c):
        #Sigmoid function
        return -a/(c+ np.exp(b * -x))
    
    #...
    
    def alternate_km_map(self,ydata,t,event_time):
        #determine emperically where the ROI sensor hits 50% of the max value
        max_value = max(ydata)
        for i in range(0,len(ydata)):
            #if the value is above half of the max value
            if ydata[i] > max_value/2.0:
                km=t[i]-event_time
                return km
        #if the value never exceeds half of the max value
        #return the end of the time series
        km=t[-1]-event_time
        return km

<img src="figures/Direction_Detection_Code_Small.png" alt="OpenCV" width="300" align='left'>

<img src="figures/Direction_Detection_Code_Big.png" alt="Smooth images" width="650" align='left'>

Below is an example of a local southbound train passing our Mountain View office. If you'd like to learn more about analyzing time series data, please see our colleague Tom Fawcett's [blog post on avoiding commond mistakes with time series data](http://www.svds.com/avoiding-common-mistakes-with-time-series/)

<img src="figures/Direction_Detection.png" alt="Smooth images" width="650" align='left'>

Determination of train speed will be covered in *Streaming Audio Analysis and IoT Sensor Fusion*. Importantly other false positives like light rails or large trucks that pass in front of the camera also trigger the sensor. By having a secondary data feed, i.e. audio, we can have a second input to determine if a train is passing by both visual and sound cues. Later in the trainspotting series we will also cover how to reduce false positive of freight trains using image recognition via *TensorFlow and Neural Nets for Recognizing Images on a Raspberry Pi*

We hope that you now understand how to design your own architecture for stream video processing on an IOT device. [Trainspotting blog](http://www.svds.com/introduction-to-trainspotting/) posts including *Connecting an IoT device to the Cloud* and *How to Build a Deployable IoT Device using a Raspberry Pi* will cover using a remote server to control a Raspberry Pi in greater detail! 