# Motion detection
Motion detection refers to the process of identifying changes in the position of objects relative to their background. It plays a crucial role in various domains such as intelligent video surveillance, traffic monitoring, event recognition, and people tracking. Among the many available modalities, such as infrared sensors, radar, and acoustic systems, camera-based motion detection stands out for its effectiveness and widespread use in computer vision applications.

<span style="color: lightgreen;">**Video-based**</span> motion detection works by capturing frames from a video feed and classifying the pixels of each frame into two categories: background and foreground, where the foreground represents regions of motion or interest. Several classical techniques exist for this task, each with distinct strengths and limitations. These methods can be used individually or combined to improve robustness and accuracy.

One of the most commonly used techniques is <span style="color: lightgreen;">**background subtraction**</span>, which involves computing the difference between a reference background frame and the current frame to detect moving regions. While simple in principle, its effectiveness can be significantly enhanced through advanced techniques such as edge-based background models, Gaussian mixture models (GMM), kernel density estimation (KDE), and temporal median filtering.

Another method, **frame differencing**, calculates the difference between consecutive frames. It is computationally lightweight and well-suited for real-time applications but may struggle to capture the full contours of moving objects, leading to partial or fragmented detection. This drawback can often be mitigated through morphological operations.

Similarly, the **temporal difference** approach computes pixel-wise differences across multiple consecutive frames. While it can detect rapid motion effectively, it may fail to detect slowly moving objects, as minimal changes between frames can be overlooked.

The **optical flow** technique, though computationally intensive, provides a more detailed analysis by estimating motion vectors for each pixel between frames. By clustering these vectors based on their distribution, it offers accurate and dense motion estimation. However, this method is sensitive to noise and often requires careful tuning, making it more suitable for controlled or high-fidelity applications.

Scientific literature presents a wide range of motion detection algorithms, each tailored to specific challenges **depending on assumptions about the environment, camera characteristics, scene dynamics, and the spatio-temporal complexity of the target application**.

Among these, background subtraction remains one of the most effective and practical methods for delivering accurate and consistent results in real-world scenarios.

Let us now take a closer look at the core techniques involved.

![background_subtraction_methods](./imgs/background_subtraction_methods.jpg)

Research in the field of motion detection is extensive. For example, the paper "Analysis of Computer Vision-Based Techniques for Motion Detection" proposes a classification framework that organizes motion detection techniques into eight distinct families. As illustrated in the previously shown table, various strategies exist for constructing and maintaining a background model. Despite the abundance of algorithmic variants, <span style="color: red;">**no single method seems to effectively address all the challenges encountered in real-world scenarios**.</span>

An even broader survey of traditional and recent techniques can be found in "Traditional and Recent Approaches in Background Modeling for Foreground Detection: An Overview" [3]. This work consolidates a comprehensive range of methodologies, highlighting their respective strengths and limitations.

In recent years, deep learning has also gained prominence in motion detection, offering powerful tools for complex and dynamic environments. Naturally, the available methods span a spectrum—from highly sophisticated algorithms to simpler, more lightweight solutions—reflecting the diversity of real-world scenarios.

In our case, the context involves a relatively simple and static indoor environment, and the system is constrained by the limited computational resources of the Raspberry Pi. For these reasons, the focus has been placed on traditional approaches to motion detection, prioritizing methods that balance:
- simplicity
- computational efficiency
- low resource consumption
- accurate results

To guide the selection and evaluation of suitable techniques, we referred to the paper "Background Subtraction Techniques: A Review", which provides a structured comparison of background subtraction algorithms, ranging from basic to advanced. The analysis concludes with the following comparative table:

![background_subtraction_perf](./imgs/background_subtraction_perf.jpg)

The table presents a comparison of several traditional motion detection methods, taking into account key factors such as processing speed, memory usage, and qualitative accuracy. A time complexity of $\mathcal{O}(1)$ is associated with the running Gaussian average approach, where each pixel is classified based on a simple thresholded difference, and the background model is updated by adjusting only one or two parameters.

In the case of the Mixture of Gaussians (MoG) method, let $m$ denote the number of Gaussian distributions used. Typically, $m$ is approximately 3 to 5, which results in a time complexity of $\mathcal{O}(3)$ to $\mathcal{O}(5)$ per pixel.

It is important to note that selecting the most complex model is **not** always necessary to achieve high accuracy. In fact, the MoG method, despite its moderate complexity, can still yield highly accurate results (denoted as $H$) in many practical scenarios.

## OPENCV
<span style="color: lightgreen;">OpenCV (Open Source Computer Vision Library)</span> is an open-source software library that provides a comprehensive set of tools and functions for image processing, computer vision, and machine learning tasks.

Let's see what have already been implemented in this field.

![background_subtraction_opencv](./imgs/background_subtraction_opencv.jpg)

As illustrated in the figure, there are numerous ready-to-use algorithms, each offering a specific set of parameters to fine-tune their behavior.

Motivated by the previous observations and the pursuit of simplicity, I initially experimented with the MOG2 algorithm. However, as we will see, it was ultimately **not** selected for inclusion in MANTIS.

### MOG2
The fundamental assumption behind the Mixture of Gaussians (MOG) model is that images of a scene without intrusive objects exhibit regular behavior that can be effectively described by a probabilistic model. If such a statistical representation of the scene is available, an intrusive object can be detected by identifying regions in the image that do not conform to the model.

The MOG model extends pixel-wise background subtraction by representing each pixel with a probability density function. In this approach, the probability density of a pixel is modeled as a weighted sum of several Gaussian components, where each component is defined by its mean $\hat{\mu_m}$ and variance $\hat{\sigma^2_m}$, with weights $\hat{\pi_m}$ indicating their relative contributions. This allows for greater flexibility in modeling complex pixel intensity distributions. A pixel in a new image is considered part of the background if its value is well represented by the mixture of Gaussian components. This method improves upon simpler models by accommodating dynamic environments and variations in illumination.

In particular, MOG2 also adapts the number of Gaussians per pixel; in practice, only those pixels that exhibit multimodal behavior over time are modeled with multiple Gaussians.

The time required for an object to be incorporated into the background depends on the background ratio and the history parameter $T$, and it can be approximately expressed as: $\frac{\log(1 - c_f)}{\log(1 - \alpha)}$ where $c_f = (1 - \text{background_ratio})$ and $\alpha$ is the learning rate, typically set to $\alpha = 1/T$.
For example, with $c_f = 0.2$ (i.e., background ratio = 0.8) and history $T = 500$ frames, it takes approximately 111 frames (about 3.7 seconds at 30 FPS) for an object to be considered part of the background. 

**[source]** *Zoran Zivkovic and Ferdinand van der Heijden. “Efficient adaptive density estimation per
image pixel for the task of background subtraction”. In: Pattern Recognition Letters 27.7
(2006), pp. 773–780. issn: 0167-8655. doi: https://doi.org/10.1016/j.patrec.2005.11.005.
url: https://www.sciencedirect.com/science/article/pii/S0167865505003521*.

### CNT
Directly addressing the previous observation, a novel algorithm called **CNT** has been proposed. It is designed with a strong focus on efficiency, speed, and simplicity. Unlike more complex methods such as **MOG2**, which model pixel color distributions over time, CNT adopts a **frame-counting** approach. The core idea behind CNT is that if a pixel remains unchanged for a specific number of consecutive frames, while allowing for minor lighting variations, it is classified as background.

This mechanism makes CNT significantly lighter and faster in terms of computation, which is particularly advantageous for low-end hardware platforms such as the Raspberry Pi, where CNT has demonstrated performance more than twice as fast as MOG2.

When considered alongside the earlier analysis on the "absorption" time of a new object into the background model, CNT's approach offers a mild explanation for how it can deliver performance comparable to that of MOG2, despite its simpler design.

<span style="color: lightgreen"><strong>Due to the fact that this method meets our requirements and has been benchmarked directly on our platform, CNT was selected to handle motion detection within the MANTIS system.</strong></span>

**[source code]**: https://github.com/sagi-z/BackgroundSubtractorCNT<br>
**[explaination]**: https://sagi-z.github.io/BackgroundSubtractorCNT/doxygen/html/index.html

```minPixelStability=self.fps```<br>
> How long to wait before considering a pixel to be a background? When you and I look at a scene, we wait for some time before we consider an item to be part of a background. The assumption here is that it takes about 1 second, but you can play with it. I recommend using your expected FPS as the value of minPixelStability when using createBackgroundSubtractorCNT(). The value represents the number of frames to wait when a pixel is not changing before marking it as background.


```maxPixelStability=60*self.fps```<br>
> How long to wait before recognizing the background changed? Okay – so we’ve set something to be a background, and things are passing in front of it. When something is in front of it for a long time, then it’s time to treat it as a background instead of the previous one, but how long to wait before doing this replacement? The algorithm here was tested with a 60 seconds value and gave good results. You can change that as you want, but I recommend setting maxPixelStability to “minPixelStability*60″ in createBackgroundSubtractorCNT().

```useHistory=True```<br>
> But what if you want to REACT VERY FAST TO SCENE CHANGES? If reducing maxPixelStability is not enough, you can use ‘false‘ for useHistory in createBackgroundSubtractorCNT(). In this case maxPixelStability is ignored. Because the background distinction is weaker, you’ll see small ghosts following your foreground objects and the background image will have some ghosts images fading in it. Using “minPixelStability=FPS/5” will reduce this phenomena.

```isParallel=True```<br>
> In my experience paralleling everything automatically is a double edged sword. On one hand you don’t need to worry about optimizations if you have enough processing power. On the other hand, splitting your processing carefully can yield a better optimization. I leave this to you to experiment and decide for your specific design.

In [None]:
import cv2
import numpy as np
import cv2.bgsegm

cap = cv2.VideoCapture(0)
fps = int(cap.get(cv2.CAP_PROP_FPS))

params = {
    "minPixelStability": fps,
    "useHistory": True, 
    "maxPixelStability": 60 * fps,
    "isParallel": True
}
background = cv2.bgsegm.createBackgroundSubtractorCNT(**params)

while(cap.isOpened()):
    # [READ]
    ret, frame = cap.read()
    if not ret or frame is None:
        # Release the Video if ret is false
        cap.release()
        print("Released Video Resource")
        # Break exit the for loops
        break

    # [PREPROCESS CURRENT FRAME]
    frame = cv2.GaussianBlur(src=frame, ksize=(5, 5), sigmaX=0)
    cv2.imshow("frame", frame)

    # [MODEL UPDATE and FOREGROUND MASK]
    foreground_mask = background.apply(frame)
    foreground_mask = cv2.medianBlur(foreground_mask, 3) # remove salt-pepper noise
    foreground_mask = cv2.erode(foreground_mask,  np.ones((3, 3), np.uint8), iterations=1) # erode the contours
    cv2.imshow("foreground mask", foreground_mask)
    
    background_img = background.getBackgroundImage()
    cv2.imshow("background_img", background_img)
    
    # [SOME KIND OF DETECTION]
    # if np.sum(foreground_mask) > 0:
    #     print("Something is moving!")    
    
    key = cv2.waitKey(1)
    if key == ord("q"):
        break

cap.release()
cv2.destroyAllWindows()