Releases: enixz/AirControl
Releases · enixz/AirControl
Release list
AirControl v1.3.0
AirControl v1.3.0
本版本重点提升近距离手部与指尖稳定性、输入响应速度、裁剪放大可靠性,并完善配置、打包和自动化测试流程。
This release focuses on near-range hand and fingertip stability, lower input latency, more reliable crop zoom, and a stronger configuration, packaging, and automated testing workflow.
主要亮点 / Highlights
1. 手部与指尖精度 / Hand and Fingertip Precision
- 保留亚像素关键点坐标:避免过早取整造成的细微移动丢失,让近距离慢速移动更加连续。
Sub-pixel landmark coordinates are preserved, preventing small movements from being lost through premature rounding and improving fine near-range motion. - 修正 One Euro 自适应滤波:速度估计改为基于原始观测值,减少滤波延迟被重复反馈的问题,在稳定性和跟手性之间取得更好的平衡。
The One Euro adaptive filter was corrected so speed estimation uses raw observations, reducing feedback-induced lag while retaining jitter suppression. - 鼠标指针采用加权关键点:不再只依赖单一指尖点,而是融合中指尖及相邻关节,降低指尖识别波动造成的光标抖动。
Pointer control now uses weighted landmarks, blending the middle fingertip with nearby joints instead of relying on one unstable point. - 新增精度回归测试:覆盖亚像素坐标、滤波响应和加权指针计算,防止后续修改重新引入抖动问题。
New precision regression tests cover sub-pixel coordinates, filter response, and weighted pointer calculations.
2. 裁剪放大与超分辨率 / Crop Zoom and Super Resolution
- 增强手部丢失后的重新捕获:裁剪区域连续漏检时会周期性回到完整画面检测,更快找回移出局部区域的手。
Faster hand reacquisition periodically retries full-frame detection after crop misses, recovering hands that leave the local crop sooner. - 稳定超分辨率档位切换:加入切换稳定机制,避免在临界尺寸附近频繁启停超分辨率模型。
Super-resolution switching is stabilized, preventing rapid model toggling near size thresholds. - Real-ESRGAN 按需加载:仅在用户明确选择对应模式时加载,避免默认占用大量 CPU、GPU 与内存资源。
Real-ESRGAN is loaded on demand only when explicitly selected, avoiding unnecessary CPU, GPU, and memory use.
3. 板书与全局手势 / Drawing and Global Gestures
- 强化落笔与抬笔状态机:改善侧手、拇指遮挡和短暂识别波动时的笔迹连续性,减少误落笔和断触。
Drawing state transitions were hardened to improve stroke continuity during side views, thumb occlusion, and brief recognition fluctuations. - 全局模式切换更可靠:🤟 保持切换模式的识别器持续观察真实帧,不会因绘图状态或临时预测帧而错误触发。
Global 🤟 mode switching is more reliable because the recognizer continues to observe real frames instead of stale or predicted input. - 轨迹录制默认关闭:
draw_record_trace默认为false,正常使用不会持续生成大型轨迹诊断文件。
Trace recording is disabled by default withdraw_record_trace=false, preventing diagnostic trace files from growing during normal use.
4. 延迟与运行性能 / Latency and Runtime Performance
- 最新帧优先处理:采集线程只保留最新摄像头帧,推理繁忙时自动丢弃过时帧,减少操作“拖尾”。
Latest-frame processing drops stale camera frames when inference is busy, reducing accumulated input lag. - UI 背压控制:限制待处理界面更新,防止帧信号在主线程队列中堆积。
UI backpressure prevents frame updates from accumulating in the main-thread event queue. - 优化图像转换路径:采用直接的 BGR 到 Qt 图像转换,减少不必要的内存复制与颜色转换。
The image conversion path is streamlined with direct BGR-to-Qt conversion, reducing unnecessary copying and conversion work. - 更确定的退出流程:摄像头、推理线程、录制器和语音服务会按顺序释放,降低退出卡住或残留后台进程的概率。
Shutdown is now deterministic, releasing camera, inference, recorder, and voice resources in a controlled order.
5. 配置与稳定性 / Configuration and Reliability
- 原子化保存配置:先写入临时文件再替换正式配置,降低断电或异常退出造成配置损坏的风险。
Configuration is saved atomically, reducing corruption risk during crashes or interrupted writes. - 配置 Schema 校验:错误类型或越界参数会回退到安全默认值,并记录警告而不是导致程序启动失败。
Schema-based validation falls back to safe defaults for invalid values instead of failing at startup. - 统一运行时资源路径:开发环境和 PyInstaller 打包环境使用一致的资源定位逻辑。
Runtime resource paths are unified across source and PyInstaller builds. - 安全重载追踪器与语音命令:运行时配置切换的清理和重建流程更加完整。
Tracker and voice-command reloads are safer, with more complete cleanup and reconstruction during runtime changes. - 改进原始帧诊断能力:出现识别问题时可以记录更有用的输入信息,便于复现和调试。
Raw-frame diagnostics were improved to make recognition issues easier to reproduce and investigate.
6. 打包、测试与持续集成 / Packaging, Testing, and CI
- 重构 Windows PyInstaller 打包配置,并加入无需摄像头或麦克风的
--self-test自检模式。
The Windows PyInstaller configuration was reworked and now includes a hardware-free--self-testmode. - 新增 GitHub Actions 持续集成,在 Python 3.10 和 3.12 环境自动运行检查。
GitHub Actions CI now validates the project on Python 3.10 and 3.12. - 本次发布验证结果:134 项测试通过,另有 132 项子测试通过;Ruff 静态检查、Python 编译检查、PyInstaller 打包和可执行文件自检均通过。
Release verification completed with 134 tests and 132 subtests passing, plus successful Ruff linting, Python compilation, PyInstaller packaging, and executable self-test.
升级说明 / Upgrade Notes
- 建议升级后先使用默认配置测试;旧配置会经过校验,不合法的字段会自动回退。
Test with the default settings after upgrading; existing configuration is validated and invalid fields fall back automatically. Real-ESRGAN_x2plus.onnx约 67 MB,因 GitHub 源码仓库体积限制未纳入版本控制。需要 Real-ESRGAN 模式时,请将该文件放在项目根目录;普通模式和其他功能不依赖它。
Real-ESRGAN_x2plus.onnxis approximately 67 MB and is not tracked in the source repository because of repository size concerns. Place it in the project root when using a Real-ESRGAN mode; normal modes and other features do not require it.- 完整安装、模型配置和打包说明请参阅 README。
See the README for full installation, model setup, and packaging instructions.
完整变更 / Full Changelog
v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen
v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen
✨ New features
- Far-distance hand tracking — crop-zoom super-resolution, One Euro smoothing, and active-region mapping reliably track hands at greater distances.
- 🤟 New mode-switch gesture — hold a single-hand "I Love You" sign to switch modes (replaces the previous mode-switch gesture).
- Side-view-robust pen control in drawing mode — stable writing even when the camera sees the hand from the side or up close.
- Crash logging — native / thread / Qt crashes are now captured to
crash.log. - Raw-video record + offline replay harness — record raw footage and replay it offline for objective detection A/B testing.
🐛 Fixes
- Zoom viewport now locks to the top hand and snaps on engage — stops the mid-distance "bellows" oscillation.
- Strokes no longer break from unreliable thumb / geometry signals at side view.
- Completed the central voting pen state machine in draw mode.
- Fixed two-finger hover false positives that caused severe stroke breaking.
🔧 Internal
- Decoupled the floating-window UI from service orchestration via a new
AirControlOrchestrator. - Cleaned up redundant / broken install & startup scripts; added tests (dictation, replay, swipe distance, zoom SR).
Full changelog: v1.1.0...v1.2.0
v1.1.0 — Offline Dictation, Multi-Camera & Hardened UX
✨ 新功能
🎙️ 语音听写(板书模式)
- 接入离线 SenseVoice-Small(阿里达摩院开源,sherpa-onnx 加载),支持 中/英/日/韩/粤 五语种自动识别
- 板书模式说 "开始板书" 开录,"结束板书" 停录,识别结果自动写到画布
- KWS 误触发"结束板书"时用 SenseVoice 复核最近 3 秒音频做二次确认,避免长句被中途截断
- 实时字幕黑底随文字伸缩,不再一开始就遮挡整个屏幕
📷 摄像头多选 + 自动探测
- ⚙ 设置顶部新增摄像头下拉,后台异步枚举可用索引(跳过当前正在用的避免抢占),保存即运行时切换
- 启动按 1080p → 720p → 540p → 480p → 360p 从高到低探测,选第一个驱动接受且 ≥20 fps 的
- 断连后指数退避自动重连,不刷日志
💬 语音指令面板(cheatsheet)
- 浮动面板列出当前模式可用的全部语音指令
- 点 🎤 标签 toggle 开关
- 无边框 + 右上角红 ✕ + 可拖动 + 跟随主程序窗口移动
🐛 关键修复
- 鼠标模式死锁:手势 LEFTDOWN 落在对话框标题栏触发 Win32 模态拖拽循环,堵死主线程导致 `handle()` 停转、左键 hold 状态卡死。新增后台守护线程,检测到 `handle()` 静默 >0.7s 且左键 hold 中时直接发 LEFTUP 打破死锁
- 光标层被对话框遮住:`SetWindowPos(HWND_TOPMOST, NOACTIVATE)` 每 50 ms 把光标层钉到 topmost 组最上层,`WS_EX_TRANSPARENT` 保证点击仍穿透
- VoiceDictationService.dictate() 加锁串行化,避免 partial ASR 与验证 ASR 并发使用同一 recognizer
⚙️ 配置变更
- 新增:`dictation_`、`camera_width/height/force_mjpeg/min_fps`、`hand__confidence`、`floating_window_scale`、`debug_overlay`
- 删除:`tencent_asr_*`(已被离线 SenseVoice 替代)
📦 模型下载
SenseVoice 模型 239 MB 超过 GitHub 单文件 100 MB 限制,需要手动下载:
- 去 sherpa-onnx releases(tag: `asr-models`)
- 下载 `sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2`
- 解压并放到 `models/sense-voice/`,需要包含 `model.int8.onnx` 和 `tokens.txt`
未放置模型时听写功能自动停用,KWS 关键词不受影响。
🧪 升级影响
- 配置文件向下兼容:旧的 `config.json` 启动时会自动补齐新字段
- 如果之前用过腾讯云 ASR,那几个 `tencent_asr_*` 字段可以从 `config.json` 里删掉(保留也不影响运行)
完整 diff: `v1.0.0...v1.1.0` — 28 files changed, +3046 / -305
v1.0.0 - Initial Release
🎯 AirControl v1.0.0 - Initial Release
✨ Features
🖐️ Gesture Control
- Presentation Mode: Wave to change slides, start/stop presentation
- Mouse Mode: Air mouse, pinch to click, scissor to scroll
- Drawing Mode: Finger writing, fist to clear, shape correction
🎤 Voice Assistant
- Offline KWS: Sherpa-ONNX keyword detection (privacy-first)
- Online ASR: Tencent Cloud for free text input
- Mode-aware: Auto-switch voice commands by mode
🧠 Smart Recognition
- Kalman + EMA dual smoothing for 21 hand landmarks
- Smart shape correction (lines, triangles, rectangles, ellipses)
- Edge acceleration for screen-edge cursor speed boost
⚡ High Performance
- Async MediaPipe inference in background thread
- 30 FPS real-time hand tracking
- < 50ms response latency
📦 Installation
git clone https://github.com/enixz/AirControl.git
cd AirControl
pip install -r requirements.txt
python -m app.main_ui📄 License
Apache License 2.0
Full Changelog: https://github.com/enixz/AirControl/commits/v1.0.0