Skip to content

feat: add conversation gaze behavior#72

Open
xuruiray wants to merge 3 commits into
m5stack:mainfrom
xuruiray:codex/conversation-gaze
Open

feat: add conversation gaze behavior#72
xuruiray wants to merge 3 commits into
m5stack:mainfrom
xuruiray:codex/conversation-gaze

Conversation

@xuruiray
Copy link
Copy Markdown

@xuruiray xuruiray commented May 17, 2026

Summary

  • Adds a lightweight ConversationGazeModifier for AI Agent conversation states.
  • Keeps StackChan eyes and head focused while LISTENING or SPEAKING, then removes the modifier when the display returns to other statuses.
  • Feeds the gaze target from the onboard camera using lightweight sampled skin/motion cues, so it can actually follow a visible person/object instead of staying fixed at center.
  • Keeps the normalized target API so a future ESP-WHO face detector or sound-localization module can replace this heuristic source later.

Notes

  • This PR does not add ESP-WHO or a full face-detection model yet; the current implementation is a low-cost visual target heuristic that works within the existing firmware dependencies.
  • Tracking quality depends on lighting, camera view, and whether the target has enough skin-color or motion contrast.

Test plan

  • Installed ESP-IDF v5.5.4 side-by-side at /Users/xurui/esp/esp-idf-v5.5.4.
  • Ran python3 ./fetch_repos.py.
  • Ran . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py build.
  • Flashed an ESP32-S3 StackChan device on /dev/cu.usbmodem21301 with idf.py -p /dev/cu.usbmodem21301 flash.
  • Captured boot logs with idf.py -p /dev/cu.usbmodem21301 monitor; boot, PSRAM, LCD, camera, touch, RTC, IMU, servo, LVGL, Launcher, and AI Agent initialization completed without panic/backtrace/reboot loop during the observed startup window.

xuruiray and others added 3 commits May 17, 2026 17:31
Summary:
- Add ConversationGazeModifier to hold a focused eye/head pose while AI Agent is listening or speaking.
- Attach and remove it from Xiaozhi display status transitions.
- Expose normalized target input for future face or audio tracking.

Rationale:
- Gives StackChan a stable conversation gaze behavior without adding heavy vision dependencies.

Tests:
- . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py build

Co-authored-by: Codex <codex@openai.com>
Summary:
- Add lightweight visual target detection to StackChanCamera using sampled skin and motion cues.
- Feed detected camera targets into ConversationGazeModifier during listening and speaking.
- Guard camera frame access with a mutex while sharing capture paths.

Tests:
- . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py build
- . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py -p /dev/cu.usbmodem21301 flash
- . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py -p /dev/cu.usbmodem21301 monitor

Co-authored-by: Codex <codex@openai.com>
@xuruiray
Copy link
Copy Markdown
Author

wip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant