Skip to content

G5: Baby Monitoring System

Aly Elaswad edited this page May 23, 2026 · 30 revisions
Name GitHub
Aly Elaswad alyelaswad
Mazin Bersy mazinbersy
Omar Ganna omarganna

Github Repo: https://github.com/mazinbersy/Baby-Monitoring-System

1. The Proposal

Elevator Pitch

Caregivers cannot maintain constant presence near a baby, and existing monitors are either too simple or too expensive. They tell you nothing is wrong, but they cannot tell you why or what triggered a concern.

Smart Baby Monitoring System is a self-contained embedded device that monitors a baby across three dimensions simultaneously: sound, motion, and environment. It detects infant crying using FFT-based audio analysis, monitors ambient temperature, and detects prolonged inactivity. When distress is detected, the system first attempts to soothe the baby by playing a lullaby automatically. If crying persists, it escalates to a caregiver push notification with a live video stream.

All of this runs on a single ESP32-CAM, streaming wirelessly over WiFi using HTTP POST requests, with no physical connection required from the caregiver.

Project Objectives & Scope

Minimum Viable Product (MVP)

  • Detect infant crying using FFT-based audio analysis on the MAX9814 microphone
  • Play lullaby automatically via DFPlayer Mini within 30 second of cry detection
  • Monitor ambient temperature via LM35 and alert if outside safe range (< 18°C or > 30°C)
  • Detect prolonged inactivity via HC-SR501 PIR and alert if no motion > 2 minutes combined with silence
  • Deliver mobile push notifications via WiFi using HTTP POST on any alert
  • Activate event-triggered live video stream on any alert

Stretch Goals

  • Remote camera toggle - turn camera on/off from web app
  • [❌] Two-way audio - speak through web app, baby hears through speaker

2. System Architecture

2.1 High-Level Block Diagram

System Block Diagram

Subsystem Breakdown

The system is built around a single ESP32-CAM which serves as the central embedded controller, handling all sensing, processing, communication, and actuation.

On the input side, the ESP32-CAM interfaces with three sensors. The MAX9814 microphone outputs an analog signal to the ADC, sampled continuously at 8 kHz for FFT-based cry detection. The HC-SR501 PIR sensor connects to a GPIO pin and raises a flag on any detected body movement. The LM35 temperature sensor outputs an analog voltage read by the ADC every 5 seconds.

The processing subsystem runs entirely on the ESP32-CAM using FreeRTOS tasks. A sensor fusion event manager combines the outputs of all three sensor pipelines and classifies events. The cry detection pipeline runs FFT on a sliding 512-sample audio window, checks for sustained energy in the 300 Hz to 3 kHz band, and triggers a cry event if the threshold is exceeded for more than 10 seconds. The motion pipeline monitors the PIR GPIO flag and raises an inactivity alert if no trigger is registered for more than 2 minutes combined with silence. The temperature pipeline compares the LM35 reading against safe thresholds and raises an alert on violation.

On the output side, the ESP32-CAM drives the DFPlayer Mini over UART to initiate lullaby playback on cry detection. A buzzer connected via GPIO activates immediately on any alert. The ESP32-CAM hosts an MJPEG HTTP video stream activated on any alert, accessible from a caregiver's phone browser on the same WiFi network. Push notifications are delivered via HTTP POST over WiFi.


3. Hardware Design

Component Selection

Component Photo Role Interface
ESP32-CAM (AI-Thinker) Central MCU, video streaming, WiFi ADC, GPIO, UART, WiFi
MAX9814 Microphone with auto gain control Analog out to ADC
LM35 Temperature sensing Analog out to ADC
HC-SR501 PIR motion detection Digital GPIO
DFPlayer Mini (MP3-TF-16P) MP3 lullaby playback UART
4Ω 3W Speaker Audio output for lullabies Direct to DFPlayer Mini

Schematics & Wiring

STM32 Nucleo (Main Controller)

Microphone — MAX9814 (Analog Audio, ADC1_IN11)

STM32 Pin Function Connect To
PA6 ADC1_IN11 Mic analog OUT
3.3V Power Mic VCC
GND Ground Mic GND

Temperature Sensor — LM35DZ (ADC1_IN5, injected channel)

STM32 Pin Function Connect To
PA0 ADC1_IN5 LM35DZ Vout
3.3V Power LM35DZ +Vs
GND Ground LM35DZ GND

Audio Playback — DFPlayer Mini (USART1 @ 9600 baud)

STM32 Pin Function Connect To
PA9 USART1_TX DFPlayer RX
PA10 USART1_RX DFPlayer TX
5V Power DFPlayer VCC
GND Ground DFPlayer GND
DFPlayer SPK1/SPK2 → Speaker

Motion Sensor — HC-SR501 PIR (GPIO, PA4)

STM32 Pin Function Connect To
PA4 GPIO_INPUT (PULLDOWN) Sensor OUT
5V Power HC-SR501 VCC
GND Ground Sensor GND

ESP32-CAM Link — USART2 (@ 115200 baud)

STM32 Pin Function Connect To
PA2 USART2_TX ESP32-CAM GPIO 13 (RX)
PA3 USART2_RX ESP32-CAM GPIO 14 (TX)
GND Common ground ESP32-CAM GND

ESP32-CAM

UART to STM32 Nucleo (Serial2 @ 115200 baud)

ESP32-CAM Pin Function Connect To
GPIO 13 Serial2 RX Nucleo PA2 (USART2_TX)
GPIO 14 Serial2 TX Nucleo PA3 (USART2_RX)
GND Common ground Nucleo GND

Camera (built-in, no external wiring)

GPIO Signal
32 PWDN
0 XCLK
26 / 27 SIOD / SIOC (I2C)
35, 34, 39, 36, 21, 19, 18, 5 Y9–Y2 (data lines)
25 / 23 / 22 VSYNC / HREF / PCLK

WiFi

No pins. Connects to access point and communicates with the HTTPS server (Railway).


Summary Diagram

STM32 Nucleo
  PA0   ────   LM35DZ Vout           (temperature)
  PA4   ────   HC-SR501 OUT          (PIR motion)
  PA6   ────   MAX9814 OUT           (audio)
  PA9   ──→    DFPlayer RX           (USART1 TX)
  PA10  ──←    DFPlayer TX           (USART1 RX)
  PA2   ──→    ESP32-CAM GPIO 13     (USART2 TX)
  PA3   ──←    ESP32-CAM GPIO 14     (USART2 RX)
 
ESP32-CAM
  GPIO 13  ──←  Nucleo PA2           (receives CRY / NOMOV / DOOR / AWAKE / MSLP)
  GPIO 14  ──→  Nucleo PA3           (sends SLEEP_ON / SLEEP_OFF)
  GND      ────  Nucleo GND          (common ground)
  [Camera] ──→  JPEG frames → HTTPS server (Railway)
  [WiFi]   ──→  alerts / mode / status → HTTPS server

Bill of Materials (BOM)

(To be updated with costs and datasheet links)

Power Budget

(To be updated)


4. Software Implementation

4.1 Functional Requirements

  • Cry detection via FFT sampling at 8 kHz; alert triggered if cry-band energy (300 Hz – 3 kHz) sustained > 10 seconds
  • Lullaby playback initiated automatically via DFPlayer Mini within 1 second of cry detection
  • Caregiver alert sent if crying persists > 30 seconds despite active playback
  • Temperature sampled every 5 seconds via LM35 ADC; alert triggered if temp > 30°C or < 18°C
  • PIR motion sampled continuously; alert triggered if no motion detected for > 2 minutes combined with no cry signal
  • Event-triggered video stream activated within 5 seconds of any alert
  • Mobile push notification delivered via WiFi using HTTP POST within 5 seconds of any alert

4.2 Software Architecture

The firmware runs on the ESP32-CAM using the Arduino framework via PlatformIO. The architecture uses FreeRTOS tasks to separate time-critical audio sampling from sensor fusion logic and network communication.

  • Task 1 (Core 1): Audio sampling at 8 kHz and FFT processing. Runs on a dedicated core to prevent gaps in the audio buffer caused by WiFi or UART activity.
  • Task 2 (Core 0): Sensor fusion event manager. Polls PIR flag, reads LM35 every 5 seconds, evaluates alert conditions, and dispatches actions to DFPlayer Mini, buzzer, camera, and WiFi.
  • Camera: MJPEG stream started on alert trigger, runs as part of the ESP32-CAM camera server task.

4.3 Flowcharts & State Machines

Software Diagram

Sensor fusion logic:

Condition Action
Cry detected (FFT > 10s) Play lullaby via DFPlayer Mini
Crying persists > 30s despite playback Send mobile alert + start video stream + buzzer
No PIR motion > 2 min + no cry Send mobile alert + start video stream + buzzer
Temperature out of range Send mobile alert + buzzer

4.4 Key Algorithms

FFT Cry Detection

The MAX9814 analog output is sampled at 8 kHz. Every 512 samples (~64ms), a Fast Fourier Transform is applied to convert the time-domain signal into the frequency domain. The firmware sums energy across bins corresponding to the 300 Hz to 3 kHz range. If the summed energy exceeds a tuned threshold and remains above it for more than 10 seconds continuously, a cry event is raised. Duration gating filters out transient sounds. Frequency specificity filters out broadband background noise that does not match the infant cry profile.

4.5 Development Environment

(To be updated)


5. Testing, Validation & Debugging

5.1 Unit Testing

(To be updated)

5.2 Integration Testing

(To be updated)

5.3 Challenges & Solutions

Challenge Detail Mitigation
FFT tuning Thresholds require empirical calibration in a real environment Start with documented values and tune during testing
Cry detection false positives Sustained background sounds in 300 Hz – 3 kHz may trigger alerts Duration threshold and amplitude floor filter most noise
PIR sensitivity May trigger on ambient heat sources like sunlight or heaters Tune onboard sensitivity potentiometer
ESP32-CAM wake latency Camera initialization after trigger may exceed 5 second target Optimize boot sequence or use partial sleep mode
WiFi reliability Network congestion may delay push notifications Keep HTTP POST payload minimal
Dual UART contention ESP32-CAM communicates with DFPlayer Mini over UART alongside camera Use hardware UART for DFPlayer, manage timing carefully

6. Results & Demonstration

6.1 Final Prototype

(To be updated with photos)

6.2 Video Demonstration

(To be updated with link)

6.3 Performance Metrics

(To be updated)


7. Project Management

7.1 Division of Labor

(To be updated)

7.2 Timeline

Date Milestone Status Date of Completion
Apr 14, 2026 Team formation finalized and submitted ✅ Completed Apr 14, 2026
Apr 15, 2026 Proposal presentation ✅ Completed Apr 15, 2026
Apr 20, 2026 Wiki deployment with proposal and architecture ✅ Completed Apr 20, 2026
Apr 22–25, 2026 Phase 1: Sensor validation - MAX9814 ADC, LM35 ADC, PIR GPIO ⏳ Pending
Apr 26–29, 2026 Phase 2: Core processing - FFT pipeline, DFPlayer playback, ESP32-CAM stream ⏳ Pending
Apr 29, 2026 Milestone 3: Progress demo - at least one working subsystem ⏳ Pending
May 1–5, 2026 Phase 3: Full integration - sensor fusion, WiFi alerts, local buzzer ⏳ Pending
May 6, 2026 Checkpoint B: Integration update on wiki ⏳ Pending
May 8–12, 2026 Phase 4: Stretch goals - remote camera toggle, two-way audio ⏳ Pending
May 13, 2026 Final demo and presentation ⏳ Pending

8. Appendices & References

8.1 Source Code Repository

(Link to be added)

8.2 References

  • ESP32-CAM AI-Thinker datasheet
  • MAX9814 datasheet - Maxim Integrated
  • DFRobotDFPlayerMini Arduino library
  • HC-SR501 PIR sensor datasheet
  • LM35 datasheet - Texas Instruments

Clone this wiki locally