Ex 3.5: Difference keying. Implement a difference keying algorithm (see Section 3.1.3)
(Toyama, Krumm et al. 1999), consisting of the following steps:
1. Compute the mean and variance (or median and robust variance) at each pixel in an
“empty” video sequence.
2. For each new frame, classify each pixel as foreground or background (set the back-
ground pixels to RGBA=0).
3. (Optional) Compute the alpha channel and composite over a new background.
4. (Optional) Clean up the image using morphology (Section 3.3.1), label the connected
components (Section 3.3.3), compute their centroids, and track them from frame to
frame. Use this to build a “people counter”.


**Key points in the code above:**
1. We read **30 frames** of an empty scene to compute our background model (mean & variance).  
2. For each **new frame**, we compute the **difference** from the background mean, then compare with a threshold based on the standard deviation of the background.  
3. We use **morphological operations** (`cv2.morphologyEx`) to clean up the foreground mask.  
4. We use **connected component labeling** (`cv2.connectedComponents`) to detect blobs, draw bounding boxes, and find centroids.  
5. For an advanced “people counter,” you could track these centroids over time, counting how many unique objects appear and possibly exit the scene.

---

### **4. Practical Considerations**

1. **Illumination Changes:**  
   - Sudden lighting changes can break a simple mean/variance model. Consider more robust or adaptive background modeling (e.g., **Gaussian Mixture Models**, [OpenCV’s `BackgroundSubtractorMOG2`](https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html)).

2. **Static vs. Dynamic Background:**  
   - If the background is not truly static (e.g., waving trees), a single mean/variance might not be sufficient. Multiple model components or motion compensation might be needed.

3. **Threshold Parameter \(k\):**  
   - The choice of \(k\) directly affects how sensitive the detection is. A smaller \(k\) picks up minor changes, but can produce false positives. A larger \(k\) is more robust to noise, but can miss faint foregrounds.

4. **Foreground Updates / Online Learning:**  
   - Real systems often **update** the background model over time, gradually incorporating new frames where no foreground is detected. This allows adaptation to slow lighting changes.

5. **Connected Components vs. Contours:**  
   - You can also use `cv2.findContours` for blob detection. Choose the method that best suits your project requirements.

6. **Tracking & Counting:**  
   - Once you have each blob’s centroid, you can match them across frames (e.g., nearest centroid match, Hungarian algorithm, or a Kalman filter) to track movement and count how many objects enter or exit a region.

---

## **Summary**

**Difference keying** involves modeling the background (mean and variance) from an “empty” scene and then classifying new pixels that deviate significantly from this model as **foreground**. This approach is conceptually simple yet effective in static or near-static scenes with steady lighting. Optional steps include refining the alpha channel for partial transparency, cleaning up the mask with morphological operations, and tracking connected components (e.g., for people counting).  

**References:**
- R. Szeliski, *Computer Vision: Algorithms and Applications*, Section 3.1.3.  
- Toyama, K., Krumm, J., Brumitt, B., & Meyers, B. (1999). *Wallflower: Principles and practice of background maintenance*. In ICCV.

With this pipeline, you can detect and track objects in your scene, laying the groundwork for more advanced video analytics tasks!


In [None]:

import cv2
import numpy as np

# -------------------------
# Step 1: Capture "empty" scene frames to build background model
# -------------------------
# For demonstration, we assume we read a few frames from a video or webcam.
# Alternatively, this could be a set of pre-captured images.
NUM_EMPTY_FRAMES = 30
cap = cv2.VideoCapture('empty_scene_video.mp4')  # or 0 for a webcam

frame_count = 0
frames_list = []

while frame_count < NUM_EMPTY_FRAMES:
    ret, frame = cap.read()
    if not ret:
        break
    # Convert to float32
    float_frame = frame.astype(np.float32)
    frames_list.append(float_frame)
    frame_count += 1

if len(frames_list) == 0:
    print("No frames captured for background model!")
    exit()

# Compute mean and variance for background model
bg_mean = np.mean(frames_list, axis=0)  # shape: (H, W, C)
bg_var  = np.var(frames_list, axis=0)   # shape: (H, W, C)
bg_std  = np.sqrt(bg_var)               # standard deviation

# Define a scaling factor for threshold
k = 2.5  # tune this

# -------------------------
# Step 2: Process new frames and apply difference keying
# -------------------------
# Let's loop over the remainder of the video to do foreground detection.
while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Convert to float32
    float_frame = frame.astype(np.float32)
    
    # Compute difference from the background mean
    diff = cv2.absdiff(float_frame, bg_mean)
    # Optionally convert to grayscale or keep as 3D
    # We'll take the norm across color channels for simplicity
    diff_norm = np.sqrt(np.sum(diff**2, axis=2))
    
    # Compute threshold using standard deviations
    # We only need the grayscale equivalent of bg_std if we do norm-based detection
    bg_std_gray = np.sqrt(np.sum(bg_std**2, axis=2))  # approximate aggregated std
    threshold_map = k * bg_std_gray
    
    # Create a foreground mask
    # foreground_mask = 1 where diff_norm > threshold_map, else 0
    fg_mask = (diff_norm > threshold_map).astype(np.uint8)
    
    # (Optional) Morphological cleanup
    # Create a kernel for morphology
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
    # Remove noise (Opening)
    fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_OPEN, kernel)
    # Fill holes (Closing)
    fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_CLOSE, kernel)
    
    # (Optional) Label connected components
    num_labels, labels_im = cv2.connectedComponents(fg_mask)
    
    # Draw bounding boxes or centroids for each label
    # We skip the background label = 0
    output_frame = frame.copy()
    for label_idx in range(1, num_labels):
        mask_region = (labels_im == label_idx)
        # Find coordinates of this blob
        y_coords, x_coords = np.where(mask_region)
        # Compute bounding box
        x_min, x_max = x_coords.min(), x_coords.max()
        y_min, y_max = y_coords.min(), y_coords.max()
        
        # Draw bounding box
        cv2.rectangle(output_frame, (x_min, y_min), (x_max, y_max), (0,255,0), 2)
        
        # Centroid
        cx = int(np.mean(x_coords))
        cy = int(np.mean(y_coords))
        cv2.circle(output_frame, (cx, cy), 5, (0,0,255), -1)
    
    # Display results
    cv2.imshow('Foreground Mask', fg_mask*255)
    cv2.imshow('Labeled Objects', output_frame)
    
    key = cv2.waitKey(30)
    if key == 27:  # Escape key
        break

cap.release()
cv2.destroyAllWindows()