Skip to content

holmesjian/CaMonitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CaMonitor -- Edge AI Nursery Safety Guard

Real-time child safety monitoring using pose estimation on a Raspberry Pi 4. No cloud, no subscription, no data leaving the device.

Built as a personal project and engineering portfolio piece -- deployed to monitor a 20-month-old toddler at home.


What It Does

CaMonitor runs a full computer vision pipeline on a Raspberry Pi 4 to detect potentially dangerous situations involving a young child in real time.

Alert types:

Alert Category Severity Trigger
ZONE_ENTRY Spatial MEDIUM/HIGH Child enters a mapped danger zone (bed edge, furniture, kitchen)
INVERSION Posture HIGH Head drops below foot level (tumbling, hanging)
CLIMBING Posture HIGH Wrists and knees simultaneously above hip level
AIRBORNE Posture MEDIUM Both ankles above floor baseline
RAPID_DESCENT Motion HIGH Hip Y-coordinate drops > 4% of frame height per frame

On detection, the system sends an email notification with alert type, severity, timestamp, and an attached JPEG frame showing the skeleton overlay.

Zone severity levels:

  • MEDIUM -- furniture zones (bed edge, sofa, chairs)
  • HIGH -- kitchen zone (fridge, water tap/basin, and oven used as spatial anchors for zone boundary definition)

Kitchen zone is defined by the convex hull of anchor objects (fridge, water tap/basin, oven) detected by YOLOv8s during the one-time room scan. Any keypoint inside this hull triggers a HIGH-severity alert.


System Output -- Real Frames

All frames below are actual system output captured during live testing.

CLEAR -- Adult Detected, Alerts Suppressed

CLEAR adult squat

FPS:6.6 Frame:1359 -- Status: CLEAR. Adult in squat/crouch position near desk. Green border = no active alerts. Orange skeleton overlay with green keypoints tracked correctly. Adult filter correctly identifies adult body (bbox area > 0.02) and suppresses all alerts despite low crouching posture -- demonstrating orientation-invariant adult classification.


ZONE_ENTRY -- Toddler Enters Bed Zone

ZONE_ENTRY alert

FPS:4.4 Frame:2183 -- [MEDIUM] ZONE_ENTRY bed. Toddler standing and playing near the bed edge zone. Blue zone boxes for bed (top-left) and chair (bottom) visible. Full toddler skeleton tracked at 0.55 confidence threshold. Adult legs partially visible on right edge of frame -- adult filter correctly suppresses adult body, alerting only on the toddler.


CLIMBING -- Toddler Reaches Above Hip Level

CLIMBING alert

FPS:5.2 Frame:2297 -- [HIGH] CLIMBING (wrists+knees above hip_y=0.33). Red border = HIGH severity alert. Toddler reaching upward with arm fully extended above head level while seated at toy desk. Green keypoints on shoulder, elbow, and wrist chain. System correctly classifies elevated wrist position as climbing behaviour.


Room Scan -- YOLOv8 Zone Mapping

Room scan result

Room scan output from room_scan.py. YOLOv8s detected 3 zones: bed (magenta filled polygon, left), chair x2 (cyan bounding boxes, centre). conf=0.07 used to surface low-confidence furniture detections at top-down angle. Zone definitions saved to zones_config.yaml and loaded by alerts.py at startup. Re-run whenever camera position changes.


System Architecture

YOLOv8s room scan at startup (one-shot, conf=0.07)
    +---> Furniture zones (bed, sofa, chair, dining table)
    +---> Kitchen zone (convex hull of fridge/sink/oven anchors, min 2 required)
    v
zones_config.yaml loaded by alerts.py at runtime

Camera (Logitech StreamCam, 720p/MJPG mode, top-down ~2m height)
    |
    v
OpenCV frame capture (MJPG set before resolution -- USB 2.0 bandwidth requirement)
    |
    v
MediaPipe BlazePose (TFLite XNNPACK delegate, model_complexity=0)
    |
    +---> Adult filter (bbox area > 0.02 --> adult --> skip alerts)
    |     Bounding box of 11 upper-body keypoints
    |     Orientation-invariant for fixed overhead camera
    |
    +---> Zone alert  (hip/ankle keypoints vs YAML zone boxes)
    |     15-frame persistence filter -- requires ~2s sustained detection
    |
    +---> Posture alerts (inversion, climbing, airborne)
    |
    +---> Motion alert  (rapid descent -- inter-frame hip delta)
    |
    v
CSV logging + JPEG frame save + Gmail email notification

Benchmark Results

Phase 1 -- Resolution vs Baseline FPS

Resolution Avg FPS Avg CPU Avg Temp Peak Temp
480p 63.12 14.3% 50.2 C 52.6 C
720p 62.74 25.9% 52.3 C 55.0 C
1080p 37.11 30.7% 54.3 C 56.5 C

Decision: 1080p consumed 30% CPU before any AI. 720p held 63 FPS capture at 26% CPU, leaving full headroom for inference.

Phase 1b -- XNNPACK Delegate Verification

TFLite automatically selects the XNNPACK delegate on Raspberry Pi ARM Cortex-A72/A76, using NEON SIMD to vectorise neural network multiply-accumulate operations -- 4 floats processed per CPU instruction instead of 1.

Confirmed active via startup log:

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Backend FPS CPU Cost Status
Naive CPU (no delegate) ~2-3 FPS ~65% Free Too slow
XNNPACK / ARM NEON 8.63 FPS 36% Free, auto-selected Production
GPU delegate (VideoCore) Not supported -- Free Not available on Pi
Coral Edge TPU (USB) ~25-30 FPS ~15% ~S$80 Future upgrade

XNNPACK provides ~3x throughput over estimated naive CPU path at zero cost. Coral Edge TPU evaluated -- 8.63 FPS sufficient for monitoring a toddler, S$80 BOM addition unjustified at current stage.

Phase 1c -- PTQ Accuracy / Latency Sweep

Google ships BlazePose in three quantization operating points (all int8):

Complexity Model Size FPS Detection Rate Confidence Decision
0 -- lite 2.7 MB 8.63 100% 0.997 Production
1 -- full 6.2 MB 6.11 100% 0.998 Reference
2 -- heavy ~26 MB 1.73 98% 0.993 Not viable

Key finding: complexity=0 gives 41% more throughput than complexity=1 at only 0.1% confidence drop. Complexity=2 performed worst overall -- inference latency exceeds frame arrival rate, causing tracking continuity loss (98% detection rate vs 100% for lighter models). complexity=0 is the unambiguous choice for edge deployment.


Adult / Child Differentiation

Design Evolution

The system must suppress alerts when a parent is in frame (CHILD_MONITOR mode). Three approaches were evaluated for a fixed top-down overhead camera, each failing a distinct edge case:

Attempt 1 -- height_span (nose-to-ankle ratio) Works upright but fails when ankles are not visible (sitting, crouching). Defaults to child classification, triggering false alerts on parent. Calibrated value: height_span mean=0.5999, std=0.0429

Attempt 2 -- head/shoulder width ratio Ankles-independent, but fails when parent is side-on to camera. Shoulder width collapses in top-down projection, misclassifying adult as toddler.

Attempt 3 -- Bounding box area (current) Orientation-invariant for a fixed overhead camera. An adult body always occupies more 2D frame area than a toddler regardless of pose or facing direction. Uses bounding box of 11 upper-body keypoints (nose, shoulders, elbows, wrists, hips, knees).

Validated Thresholds

Subject / Pose bbox Area Classification
Adult standing / moving 0.087 - 0.189 adult=True
Adult sitting, side-on 0.063 - 0.074 adult=True
Adult crouching low 0.021 - 0.055 adult=True (~90% accuracy)
Toddler (20 months) standing < 0.020 child=True (100%)
Toddler crawling < 0.020 child=True (100%)

Validated threshold: bbox_area_threshold: 0.02 Validated 2026-03-10 with real child subject in frame.

At threshold=0.02, the adult/child gap is well-separated: even a deeply crouching adult correctly classifies at ~90% accuracy, while toddler detection is 100%. The 15-frame persistence filter further prevents any misclassified frame from triggering an alert.


Synthetic Data Strategy

RAPID_DESCENT (child falling) is an edge case that cannot be safely collected in real life. The correct approach is geometric augmentation of real landmark sequences -- scale, flip, and jitter 20 real captures to generate 1000+ synthetic variants for threshold validation and future classifier training.

def augment_landmarks(landmarks_array, n_augments=50):
    """landmarks_array: shape (n_frames, 33, 3) -- x, y, visibility"""
    augmented = []
    for _ in range(n_augments):
        aug = landmarks_array.copy()
        aug[:, :, :2] *= np.random.uniform(0.8, 1.2)     # scale
        if np.random.random() > 0.5:
            aug[:, :, 0] = 1.0 - aug[:, :, 0]            # horizontal flip
        aug[:, :, :2] += np.random.normal(0, 0.01,
                         aug[:, :, :2].shape)              # jitter
        aug[:, :, :2] = np.clip(aug[:, :, :2], 0, 1)
        augmented.append(aug)
    return augmented

Project Structure

camonitor/
+-- scripts/
|   +-- alerts.py                    # Main monitoring loop
|   +-- benchmark.py                 # Baseline benchmark harness
|   +-- ptq_benchmark.py             # PTQ complexity sweep benchmark
|   +-- room_scan.py                 # YOLOv8 zone detection
|   +-- calibrate_adult.py           # Adult profile calibration
|   +-- email_notifier.py            # Gmail SMTP notification
|   +-- config.yaml                  # Master config
|   +-- adult_profile.yaml           # Saved calibration ratios
|   +-- zones_config.yaml            # Room zone definitions
|   +-- email_config_template.yaml   # Credential template (copy to email_config.yaml)
+-- logs/
|   +-- benchmark_baseline.csv
|   +-- benchmark_inference.csv
|   +-- ptq_benchmark.csv
|   +-- alerts_log.csv
+-- data/
|   +-- room_scan_result.jpg
+-- docs/
|   +-- screenshots/
|       +-- clear_adult_squat.png    # Adult crouching -- CLEAR, alerts suppressed
|       +-- alert_zone_entry.jpg     # Toddler ZONE_ENTRY bed alert
|       +-- alert_climbing.jpg       # Toddler CLIMBING HIGH alert
|       +-- room_scan_result.jpg     # YOLOv8 zone mapping output
+-- README.md
+-- requirements.txt
+-- setup_github.sh

Setup

Tested on Raspberry Pi 4 (4GB), Raspberry Pi OS 64-bit, Python 3.11/3.12.

conda create -n camonitor python=3.11
conda activate camonitor
pip install -r requirements.txt

Camera Setup Note

Set MJPG format before resolution -- required for USB 2.0 bandwidth:

cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*'MJPG'))  # FIRST
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

If reversed, camera falls back to YUYV -- 3x USB bandwidth, causing frame drops.


Running

Step 1 -- Room Scan (once per camera position)

python scripts/room_scan.py

YOLOv8s maps furniture and kitchen anchors into zone boxes. Saves zones_config.yaml.

Model selection -- YOLOv8s vs alternatives:

Model Size mAP (COCO) Pi scan time Decision
YOLOv8n (nano) 3.2 MB 37.3 ~2s Rejected -- low-confidence detections too noisy
YOLOv8s (small) 11.2 MB 44.9 ~5s Production
YOLOv8m (medium) 25.9 MB 50.2 ~20s Unnecessary -- 5pt mAP gain not worth startup cost

YOLOv8s is the right choice because this is a one-time scan at startup, not live inference -- startup latency is not a constraint. The s model gives meaningfully better detection quality than n at low confidence, which matters here.

Confidence threshold: conf=0.07 -- intentionally very low. A missed kitchen anchor silently disables an entire alert zone with no warning. Over-detection is the safer failure mode -- spurious extra boxes are caught during the visual review of the saved scan image. The s model at 0.07 produces trustworthy low-confidence predictions; n at 0.07 would produce excessive noise.

For kitchen zone: ensure fridge, water tap/basin, and oven are visible in frame during scan. At least 2 anchors are required to form the kitchen zone polygon. Re-run whenever camera position changes.

Step 2 -- Adult Calibration (once)

python scripts/calibrate_adult.py

Stand in frame for ~8 seconds. Saves adult_profile.yaml.

Step 3 -- Configure

# config.yaml
mode: CHILD_MONITOR        # ADULT_TEST during development
debug: false               # true prints bbox area values for threshold tuning
adult_filter:
  bbox_area_threshold: 0.02  # validated 2026-03-10

Step 4 -- Monitor

# Foreground
python scripts/alerts.py

# Background (24/7)
nohup python scripts/alerts.py >> logs/monitor.log 2>&1 &

Step 5 -- PTQ Benchmark (optional)

python scripts/ptq_benchmark.py

Stand in frame during each complexity level run (~5 min total).


Email Notifications

cp scripts/email_config_template.yaml scripts/email_config.yaml
chmod 600 scripts/email_config.yaml
nano scripts/email_config.yaml        # add Gmail App Password

Gmail -> Security -> 2-Step Verification -> App Passwords -> generate for CaMonitor.


Configuration Reference

mode: CHILD_MONITOR     # ADULT_TEST or CHILD_MONITOR
debug: false            # true = print bbox area values for threshold tuning

adult_filter:
  enabled: true
  bbox_area_threshold: 0.02   # validated -- 100% child detection, ~90% adult at crouch

mediapipe:
  min_detection_confidence: 0.55
  min_tracking_confidence: 0.55
  model_complexity: 0         # 8.63 FPS, 100% detection, 0.997 confidence

alerts:
  zone_entry_frames: 15       # ~2 seconds persistence before alert fires
  inversion_buffer: 0.05
  descent_threshold: 0.04

email:
  enabled: true
  cooldown_sec: 300

performance:
  idle_sleep_sec: 0.3

Known Limitations

Adult presence = child safe: The system suppresses all alerts when an adult body is detected in frame. This is intentional -- an adult present means the child is supervised. The monitor is designed for unsupervised periods only.

BlazePose on toddlers: Model trained predominantly on adults. Lower keypoint confidence expected for a 20-month toddler's body proportions. min_detection_confidence: 0.55 set lower than default 0.60 to compensate. In practice, toddler skeleton tracked reliably at this threshold as shown in the ZONE_ENTRY and CLIMBING frames above.

RAPID_DESCENT camera height dependency: Requires camera at 1.8-2m height for sufficient hip Y range per frame at 8 FPS. At monitor-level mounting, hip vertical range is insufficient to exceed the per-frame delta threshold. Known hardware deployment constraint, not a logic bug.


Roadmap

  • LivenessChecker -- reject static false detections (stuffed animals, sheet creases)
  • RAPID_DESCENT validation at current camera height
  • Kitchen zone implementation -- fridge/basin/oven anchor detection in room_scan.py
  • Spatial floor calibration -- user-defined floor plane for posture alerts
  • Dynamic thresholding -- confidence gate based on frame brightness
  • Threading pipeline -- producer/consumer for capture, inference, alert logic
  • PWA dashboard -- Flask server + mobile live view
  • Synthetic data augmentation script for RAPID_DESCENT training data

Hardware

  • Raspberry Pi 4 (4GB RAM)
  • Logitech StreamCam USB webcam (720p/MJPG, top-down ~2m height)
  • Camera angled 15-20 degrees downward for full room coverage
  • Fan case recommended for sustained all-day operation

License

MIT -- personal project, provided as-is. Not a substitute for direct supervision or commercial child safety products.

About

Edge AI nursery safety guard — MediaPipe BlazePose on Raspberry Pi 4 v1. Real-time child monitoring, no cloud.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors