Skip to content

Releases: enixz/AirControl

AirControl v1.3.0

Choose a tag to compare

@enixz enixz released this 15 Jun 05:43

AirControl v1.3.0

本版本重点提升近距离手部与指尖稳定性、输入响应速度、裁剪放大可靠性,并完善配置、打包和自动化测试流程。

This release focuses on near-range hand and fingertip stability, lower input latency, more reliable crop zoom, and a stronger configuration, packaging, and automated testing workflow.

主要亮点 / Highlights

1. 手部与指尖精度 / Hand and Fingertip Precision

  • 保留亚像素关键点坐标:避免过早取整造成的细微移动丢失,让近距离慢速移动更加连续。
    Sub-pixel landmark coordinates are preserved, preventing small movements from being lost through premature rounding and improving fine near-range motion.
  • 修正 One Euro 自适应滤波:速度估计改为基于原始观测值,减少滤波延迟被重复反馈的问题,在稳定性和跟手性之间取得更好的平衡。
    The One Euro adaptive filter was corrected so speed estimation uses raw observations, reducing feedback-induced lag while retaining jitter suppression.
  • 鼠标指针采用加权关键点:不再只依赖单一指尖点,而是融合中指尖及相邻关节,降低指尖识别波动造成的光标抖动。
    Pointer control now uses weighted landmarks, blending the middle fingertip with nearby joints instead of relying on one unstable point.
  • 新增精度回归测试:覆盖亚像素坐标、滤波响应和加权指针计算,防止后续修改重新引入抖动问题。
    New precision regression tests cover sub-pixel coordinates, filter response, and weighted pointer calculations.

2. 裁剪放大与超分辨率 / Crop Zoom and Super Resolution

  • 增强手部丢失后的重新捕获:裁剪区域连续漏检时会周期性回到完整画面检测,更快找回移出局部区域的手。
    Faster hand reacquisition periodically retries full-frame detection after crop misses, recovering hands that leave the local crop sooner.
  • 稳定超分辨率档位切换:加入切换稳定机制,避免在临界尺寸附近频繁启停超分辨率模型。
    Super-resolution switching is stabilized, preventing rapid model toggling near size thresholds.
  • Real-ESRGAN 按需加载:仅在用户明确选择对应模式时加载,避免默认占用大量 CPU、GPU 与内存资源。
    Real-ESRGAN is loaded on demand only when explicitly selected, avoiding unnecessary CPU, GPU, and memory use.

3. 板书与全局手势 / Drawing and Global Gestures

  • 强化落笔与抬笔状态机:改善侧手、拇指遮挡和短暂识别波动时的笔迹连续性,减少误落笔和断触。
    Drawing state transitions were hardened to improve stroke continuity during side views, thumb occlusion, and brief recognition fluctuations.
  • 全局模式切换更可靠:🤟 保持切换模式的识别器持续观察真实帧,不会因绘图状态或临时预测帧而错误触发。
    Global 🤟 mode switching is more reliable because the recognizer continues to observe real frames instead of stale or predicted input.
  • 轨迹录制默认关闭draw_record_trace 默认为 false,正常使用不会持续生成大型轨迹诊断文件。
    Trace recording is disabled by default with draw_record_trace=false, preventing diagnostic trace files from growing during normal use.

4. 延迟与运行性能 / Latency and Runtime Performance

  • 最新帧优先处理:采集线程只保留最新摄像头帧,推理繁忙时自动丢弃过时帧,减少操作“拖尾”。
    Latest-frame processing drops stale camera frames when inference is busy, reducing accumulated input lag.
  • UI 背压控制:限制待处理界面更新,防止帧信号在主线程队列中堆积。
    UI backpressure prevents frame updates from accumulating in the main-thread event queue.
  • 优化图像转换路径:采用直接的 BGR 到 Qt 图像转换,减少不必要的内存复制与颜色转换。
    The image conversion path is streamlined with direct BGR-to-Qt conversion, reducing unnecessary copying and conversion work.
  • 更确定的退出流程:摄像头、推理线程、录制器和语音服务会按顺序释放,降低退出卡住或残留后台进程的概率。
    Shutdown is now deterministic, releasing camera, inference, recorder, and voice resources in a controlled order.

5. 配置与稳定性 / Configuration and Reliability

  • 原子化保存配置:先写入临时文件再替换正式配置,降低断电或异常退出造成配置损坏的风险。
    Configuration is saved atomically, reducing corruption risk during crashes or interrupted writes.
  • 配置 Schema 校验:错误类型或越界参数会回退到安全默认值,并记录警告而不是导致程序启动失败。
    Schema-based validation falls back to safe defaults for invalid values instead of failing at startup.
  • 统一运行时资源路径:开发环境和 PyInstaller 打包环境使用一致的资源定位逻辑。
    Runtime resource paths are unified across source and PyInstaller builds.
  • 安全重载追踪器与语音命令:运行时配置切换的清理和重建流程更加完整。
    Tracker and voice-command reloads are safer, with more complete cleanup and reconstruction during runtime changes.
  • 改进原始帧诊断能力:出现识别问题时可以记录更有用的输入信息,便于复现和调试。
    Raw-frame diagnostics were improved to make recognition issues easier to reproduce and investigate.

6. 打包、测试与持续集成 / Packaging, Testing, and CI

  • 重构 Windows PyInstaller 打包配置,并加入无需摄像头或麦克风的 --self-test 自检模式。
    The Windows PyInstaller configuration was reworked and now includes a hardware-free --self-test mode.
  • 新增 GitHub Actions 持续集成,在 Python 3.10 和 3.12 环境自动运行检查。
    GitHub Actions CI now validates the project on Python 3.10 and 3.12.
  • 本次发布验证结果:134 项测试通过,另有 132 项子测试通过;Ruff 静态检查、Python 编译检查、PyInstaller 打包和可执行文件自检均通过。
    Release verification completed with 134 tests and 132 subtests passing, plus successful Ruff linting, Python compilation, PyInstaller packaging, and executable self-test.

升级说明 / Upgrade Notes

  • 建议升级后先使用默认配置测试;旧配置会经过校验,不合法的字段会自动回退。
    Test with the default settings after upgrading; existing configuration is validated and invalid fields fall back automatically.
  • Real-ESRGAN_x2plus.onnx 约 67 MB,因 GitHub 源码仓库体积限制未纳入版本控制。需要 Real-ESRGAN 模式时,请将该文件放在项目根目录;普通模式和其他功能不依赖它。
    Real-ESRGAN_x2plus.onnx is approximately 67 MB and is not tracked in the source repository because of repository size concerns. Place it in the project root when using a Real-ESRGAN mode; normal modes and other features do not require it.
  • 完整安装、模型配置和打包说明请参阅 README。
    See the README for full installation, model setup, and packaging instructions.

完整变更 / Full Changelog

v1.2.0...v1.3.0

v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen

Choose a tag to compare

@enixz enixz released this 14 Jun 03:45

v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen

✨ New features

  • Far-distance hand tracking — crop-zoom super-resolution, One Euro smoothing, and active-region mapping reliably track hands at greater distances.
  • 🤟 New mode-switch gesture — hold a single-hand "I Love You" sign to switch modes (replaces the previous mode-switch gesture).
  • Side-view-robust pen control in drawing mode — stable writing even when the camera sees the hand from the side or up close.
  • Crash logging — native / thread / Qt crashes are now captured to crash.log.
  • Raw-video record + offline replay harness — record raw footage and replay it offline for objective detection A/B testing.

🐛 Fixes

  • Zoom viewport now locks to the top hand and snaps on engage — stops the mid-distance "bellows" oscillation.
  • Strokes no longer break from unreliable thumb / geometry signals at side view.
  • Completed the central voting pen state machine in draw mode.
  • Fixed two-finger hover false positives that caused severe stroke breaking.

🔧 Internal

  • Decoupled the floating-window UI from service orchestration via a new AirControlOrchestrator.
  • Cleaned up redundant / broken install & startup scripts; added tests (dictation, replay, swipe distance, zoom SR).

Full changelog: v1.1.0...v1.2.0

v1.1.0 — Offline Dictation, Multi-Camera & Hardened UX

Choose a tag to compare

@enixz enixz released this 24 May 16:05

✨ 新功能

🎙️ 语音听写(板书模式)

  • 接入离线 SenseVoice-Small(阿里达摩院开源,sherpa-onnx 加载),支持 中/英/日/韩/粤 五语种自动识别
  • 板书模式说 "开始板书" 开录,"结束板书" 停录,识别结果自动写到画布
  • KWS 误触发"结束板书"时用 SenseVoice 复核最近 3 秒音频做二次确认,避免长句被中途截断
  • 实时字幕黑底随文字伸缩,不再一开始就遮挡整个屏幕

📷 摄像头多选 + 自动探测

  • ⚙ 设置顶部新增摄像头下拉,后台异步枚举可用索引(跳过当前正在用的避免抢占),保存即运行时切换
  • 启动按 1080p → 720p → 540p → 480p → 360p 从高到低探测,选第一个驱动接受且 ≥20 fps 的
  • 断连后指数退避自动重连,不刷日志

💬 语音指令面板(cheatsheet)

  • 浮动面板列出当前模式可用的全部语音指令
  • 点 🎤 标签 toggle 开关
  • 无边框 + 右上角红 ✕ + 可拖动 + 跟随主程序窗口移动

🐛 关键修复

  • 鼠标模式死锁:手势 LEFTDOWN 落在对话框标题栏触发 Win32 模态拖拽循环,堵死主线程导致 `handle()` 停转、左键 hold 状态卡死。新增后台守护线程,检测到 `handle()` 静默 >0.7s 且左键 hold 中时直接发 LEFTUP 打破死锁
  • 光标层被对话框遮住:`SetWindowPos(HWND_TOPMOST, NOACTIVATE)` 每 50 ms 把光标层钉到 topmost 组最上层,`WS_EX_TRANSPARENT` 保证点击仍穿透
  • VoiceDictationService.dictate() 加锁串行化,避免 partial ASR 与验证 ASR 并发使用同一 recognizer

⚙️ 配置变更

  • 新增:`dictation_`、`camera_width/height/force_mjpeg/min_fps`、`hand__confidence`、`floating_window_scale`、`debug_overlay`
  • 删除:`tencent_asr_*`(已被离线 SenseVoice 替代)

📦 模型下载

SenseVoice 模型 239 MB 超过 GitHub 单文件 100 MB 限制,需要手动下载:

  1. sherpa-onnx releases(tag: `asr-models`)
  2. 下载 `sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2`
  3. 解压并放到 `models/sense-voice/`,需要包含 `model.int8.onnx` 和 `tokens.txt`

未放置模型时听写功能自动停用,KWS 关键词不受影响。

🧪 升级影响

  • 配置文件向下兼容:旧的 `config.json` 启动时会自动补齐新字段
  • 如果之前用过腾讯云 ASR,那几个 `tencent_asr_*` 字段可以从 `config.json` 里删掉(保留也不影响运行)

完整 diff: `v1.0.0...v1.1.0` — 28 files changed, +3046 / -305

v1.0.0 - Initial Release

Choose a tag to compare

@enixz enixz released this 18 May 07:43

🎯 AirControl v1.0.0 - Initial Release

✨ Features

🖐️ Gesture Control

  • Presentation Mode: Wave to change slides, start/stop presentation
  • Mouse Mode: Air mouse, pinch to click, scissor to scroll
  • Drawing Mode: Finger writing, fist to clear, shape correction

🎤 Voice Assistant

  • Offline KWS: Sherpa-ONNX keyword detection (privacy-first)
  • Online ASR: Tencent Cloud for free text input
  • Mode-aware: Auto-switch voice commands by mode

🧠 Smart Recognition

  • Kalman + EMA dual smoothing for 21 hand landmarks
  • Smart shape correction (lines, triangles, rectangles, ellipses)
  • Edge acceleration for screen-edge cursor speed boost

⚡ High Performance

  • Async MediaPipe inference in background thread
  • 30 FPS real-time hand tracking
  • < 50ms response latency

📦 Installation

git clone https://github.com/enixz/AirControl.git
cd AirControl
pip install -r requirements.txt
python -m app.main_ui

📄 License

Apache License 2.0


Full Changelog: https://github.com/enixz/AirControl/commits/v1.0.0