feat: add conversation gaze behavior by xuruiray · Pull Request #72 · m5stack/StackChan

xuruiray · 2026-05-17T09:32:36Z

Summary

Adds a lightweight ConversationGazeModifier for AI Agent conversation states.
Keeps StackChan eyes and head focused while LISTENING or SPEAKING, then removes the modifier when the display returns to other statuses.
Feeds the gaze target from the onboard camera using lightweight sampled skin/motion cues, so it can actually follow a visible person/object instead of staying fixed at center.
Keeps the normalized target API so a future ESP-WHO face detector or sound-localization module can replace this heuristic source later.

Notes

This PR does not add ESP-WHO or a full face-detection model yet; the current implementation is a low-cost visual target heuristic that works within the existing firmware dependencies.
Tracking quality depends on lighting, camera view, and whether the target has enough skin-color or motion contrast.

Test plan

Installed ESP-IDF v5.5.4 side-by-side at /Users/xurui/esp/esp-idf-v5.5.4.
Ran python3 ./fetch_repos.py.
Ran . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py build.
Flashed an ESP32-S3 StackChan device on /dev/cu.usbmodem21301 with idf.py -p /dev/cu.usbmodem21301 flash.
Captured boot logs with idf.py -p /dev/cu.usbmodem21301 monitor; boot, PSRAM, LCD, camera, touch, RTC, IMU, servo, LVGL, Launcher, and AI Agent initialization completed without panic/backtrace/reboot loop during the observed startup window.

Summary: - Add ConversationGazeModifier to hold a focused eye/head pose while AI Agent is listening or speaking. - Attach and remove it from Xiaozhi display status transitions. - Expose normalized target input for future face or audio tracking. Rationale: - Gives StackChan a stable conversation gaze behavior without adding heavy vision dependencies. Tests: - . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py build Co-authored-by: Codex <codex@openai.com>

Summary: - Add lightweight visual target detection to StackChanCamera using sampled skin and motion cues. - Feed detected camera targets into ConversationGazeModifier during listening and speaking. - Guard camera frame access with a mutex while sharing capture paths. Tests: - . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py build - . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py -p /dev/cu.usbmodem21301 flash - . /Users/xurui/esp/esp-idf-v5.5.4/export.sh >/tmp/idf554_export.log 2>&1 && idf.py -p /dev/cu.usbmodem21301 monitor Co-authored-by: Codex <codex@openai.com>

xuruiray · 2026-05-17T12:05:11Z

wip

xuruiray and others added 3 commits May 17, 2026 17:31

fix(firmware): make conversation gaze track visibly

bab72eb

xuruiray mentioned this pull request May 17, 2026

[Feature Request] 支持对话时视觉跟随 / visual gaze following during conversation #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add conversation gaze behavior#72

feat: add conversation gaze behavior#72
xuruiray wants to merge 3 commits into
m5stack:mainfrom
xuruiray:codex/conversation-gaze

xuruiray commented May 17, 2026 •

edited

Loading

Uh oh!

xuruiray commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xuruiray commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Test plan

Uh oh!

xuruiray commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xuruiray commented May 17, 2026 •

edited

Loading