Release list

AirControl v1.3.0 Latest

Latest

enixz released this 15 Jun 05:43

v1.3.0

b39dddd

AirControl v1.3.0

本版本重点提升近距离手部与指尖稳定性、输入响应速度、裁剪放大可靠性，并完善配置、打包和自动化测试流程。

This release focuses on near-range hand and fingertip stability, lower input latency, more reliable crop zoom, and a stronger configuration, packaging, and automated testing workflow.

主要亮点 / Highlights

1. 手部与指尖精度 / Hand and Fingertip Precision

保留亚像素关键点坐标：避免过早取整造成的细微移动丢失，让近距离慢速移动更加连续。
Sub-pixel landmark coordinates are preserved, preventing small movements from being lost through premature rounding and improving fine near-range motion.
修正 One Euro 自适应滤波：速度估计改为基于原始观测值，减少滤波延迟被重复反馈的问题，在稳定性和跟手性之间取得更好的平衡。
The One Euro adaptive filter was corrected so speed estimation uses raw observations, reducing feedback-induced lag while retaining jitter suppression.
鼠标指针采用加权关键点：不再只依赖单一指尖点，而是融合中指尖及相邻关节，降低指尖识别波动造成的光标抖动。
Pointer control now uses weighted landmarks, blending the middle fingertip with nearby joints instead of relying on one unstable point.
新增精度回归测试：覆盖亚像素坐标、滤波响应和加权指针计算，防止后续修改重新引入抖动问题。
New precision regression tests cover sub-pixel coordinates, filter response, and weighted pointer calculations.

2. 裁剪放大与超分辨率 / Crop Zoom and Super Resolution

增强手部丢失后的重新捕获：裁剪区域连续漏检时会周期性回到完整画面检测，更快找回移出局部区域的手。
Faster hand reacquisition periodically retries full-frame detection after crop misses, recovering hands that leave the local crop sooner.
稳定超分辨率档位切换：加入切换稳定机制，避免在临界尺寸附近频繁启停超分辨率模型。
Super-resolution switching is stabilized, preventing rapid model toggling near size thresholds.
Real-ESRGAN 按需加载：仅在用户明确选择对应模式时加载，避免默认占用大量 CPU、GPU 与内存资源。
Real-ESRGAN is loaded on demand only when explicitly selected, avoiding unnecessary CPU, GPU, and memory use.

3. 板书与全局手势 / Drawing and Global Gestures

强化落笔与抬笔状态机：改善侧手、拇指遮挡和短暂识别波动时的笔迹连续性，减少误落笔和断触。
Drawing state transitions were hardened to improve stroke continuity during side views, thumb occlusion, and brief recognition fluctuations.
全局模式切换更可靠：🤟 保持切换模式的识别器持续观察真实帧，不会因绘图状态或临时预测帧而错误触发。
Global 🤟 mode switching is more reliable because the recognizer continues to observe real frames instead of stale or predicted input.
轨迹录制默认关闭：draw_record_trace 默认为 false，正常使用不会持续生成大型轨迹诊断文件。
Trace recording is disabled by default with draw_record_trace=false, preventing diagnostic trace files from growing during normal use.

4. 延迟与运行性能 / Latency and Runtime Performance

最新帧优先处理：采集线程只保留最新摄像头帧，推理繁忙时自动丢弃过时帧，减少操作“拖尾”。
Latest-frame processing drops stale camera frames when inference is busy, reducing accumulated input lag.
UI 背压控制：限制待处理界面更新，防止帧信号在主线程队列中堆积。
UI backpressure prevents frame updates from accumulating in the main-thread event queue.
优化图像转换路径：采用直接的 BGR 到 Qt 图像转换，减少不必要的内存复制与颜色转换。
The image conversion path is streamlined with direct BGR-to-Qt conversion, reducing unnecessary copying and conversion work.
更确定的退出流程：摄像头、推理线程、录制器和语音服务会按顺序释放，降低退出卡住或残留后台进程的概率。
Shutdown is now deterministic, releasing camera, inference, recorder, and voice resources in a controlled order.

5. 配置与稳定性 / Configuration and Reliability

原子化保存配置：先写入临时文件再替换正式配置，降低断电或异常退出造成配置损坏的风险。
Configuration is saved atomically, reducing corruption risk during crashes or interrupted writes.
配置 Schema 校验：错误类型或越界参数会回退到安全默认值，并记录警告而不是导致程序启动失败。
Schema-based validation falls back to safe defaults for invalid values instead of failing at startup.
统一运行时资源路径：开发环境和 PyInstaller 打包环境使用一致的资源定位逻辑。
Runtime resource paths are unified across source and PyInstaller builds.
安全重载追踪器与语音命令：运行时配置切换的清理和重建流程更加完整。
Tracker and voice-command reloads are safer, with more complete cleanup and reconstruction during runtime changes.
改进原始帧诊断能力：出现识别问题时可以记录更有用的输入信息，便于复现和调试。
Raw-frame diagnostics were improved to make recognition issues easier to reproduce and investigate.

6. 打包、测试与持续集成 / Packaging, Testing, and CI

重构 Windows PyInstaller 打包配置，并加入无需摄像头或麦克风的 --self-test 自检模式。
The Windows PyInstaller configuration was reworked and now includes a hardware-free --self-test mode.
新增 GitHub Actions 持续集成，在 Python 3.10 和 3.12 环境自动运行检查。
GitHub Actions CI now validates the project on Python 3.10 and 3.12.
本次发布验证结果：134 项测试通过，另有 132 项子测试通过；Ruff 静态检查、Python 编译检查、PyInstaller 打包和可执行文件自检均通过。
Release verification completed with 134 tests and 132 subtests passing, plus successful Ruff linting, Python compilation, PyInstaller packaging, and executable self-test.

升级说明 / Upgrade Notes

建议升级后先使用默认配置测试；旧配置会经过校验，不合法的字段会自动回退。
Test with the default settings after upgrading; existing configuration is validated and invalid fields fall back automatically.
Real-ESRGAN_x2plus.onnx 约 67 MB，因 GitHub 源码仓库体积限制未纳入版本控制。需要 Real-ESRGAN 模式时，请将该文件放在项目根目录；普通模式和其他功能不依赖它。
Real-ESRGAN_x2plus.onnx is approximately 67 MB and is not tracked in the source repository because of repository size concerns. Place it in the project root when using a Real-ESRGAN mode; normal modes and other features do not require it.
完整安装、模型配置和打包说明请参阅 README。
See the README for full installation, model setup, and packaging instructions.

完整变更 / Full Changelog

v1.2.0...v1.3.0

Assets 2

v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen

enixz released this 14 Jun 03:45

v1.2.0

f97e6a0

v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen

✨ New features

Far-distance hand tracking — crop-zoom super-resolution, One Euro smoothing, and active-region mapping reliably track hands at greater distances.
🤟 New mode-switch gesture — hold a single-hand "I Love You" sign to switch modes (replaces the previous mode-switch gesture).
Side-view-robust pen control in drawing mode — stable writing even when the camera sees the hand from the side or up close.
Crash logging — native / thread / Qt crashes are now captured to crash.log.
Raw-video record + offline replay harness — record raw footage and replay it offline for objective detection A/B testing.

🐛 Fixes

Zoom viewport now locks to the top hand and snaps on engage — stops the mid-distance "bellows" oscillation.
Strokes no longer break from unreliable thumb / geometry signals at side view.
Completed the central voting pen state machine in draw mode.
Fixed two-finger hover false positives that caused severe stroke breaking.

🔧 Internal

Decoupled the floating-window UI from service orchestration via a new AirControlOrchestrator.
Cleaned up redundant / broken install & startup scripts; added tests (dictation, replay, swipe distance, zoom SR).

Full changelog: v1.1.0...v1.2.0

Assets 2

v1.1.0 — Offline Dictation, Multi-Camera & Hardened UX

enixz released this 24 May 16:05

v1.1.0

4c72753

✨ 新功能

🎙️ 语音听写（板书模式）

接入离线 SenseVoice-Small（阿里达摩院开源，sherpa-onnx 加载），支持中/英/日/韩/粤五语种自动识别
板书模式说 "开始板书" 开录，"结束板书" 停录，识别结果自动写到画布
KWS 误触发"结束板书"时用 SenseVoice 复核最近 3 秒音频做二次确认，避免长句被中途截断
实时字幕黑底随文字伸缩，不再一开始就遮挡整个屏幕

📷 摄像头多选 + 自动探测

⚙ 设置顶部新增摄像头下拉，后台异步枚举可用索引（跳过当前正在用的避免抢占），保存即运行时切换
启动按 1080p → 720p → 540p → 480p → 360p 从高到低探测，选第一个驱动接受且 ≥20 fps 的
断连后指数退避自动重连，不刷日志

💬 语音指令面板（cheatsheet）

浮动面板列出当前模式可用的全部语音指令
点 🎤 标签 toggle 开关
无边框 + 右上角红 ✕ + 可拖动 + 跟随主程序窗口移动

🐛 关键修复

鼠标模式死锁：手势 LEFTDOWN 落在对话框标题栏触发 Win32 模态拖拽循环，堵死主线程导致 `handle()` 停转、左键 hold 状态卡死。新增后台守护线程，检测到 `handle()` 静默 >0.7s 且左键 hold 中时直接发 LEFTUP 打破死锁
光标层被对话框遮住：`SetWindowPos(HWND_TOPMOST, NOACTIVATE)` 每 50 ms 把光标层钉到 topmost 组最上层，`WS_EX_TRANSPARENT` 保证点击仍穿透
VoiceDictationService.dictate() 加锁串行化，避免 partial ASR 与验证 ASR 并发使用同一 recognizer

⚙️ 配置变更

新增：`dictation_`、`camera_width/height/force_mjpeg/min_fps`、`hand__confidence`、`floating_window_scale`、`debug_overlay`
删除：`tencent_asr_*`（已被离线 SenseVoice 替代）

📦 模型下载

SenseVoice 模型 239 MB 超过 GitHub 单文件 100 MB 限制，需要手动下载：

去 sherpa-onnx releases（tag: `asr-models`）
下载 `sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2`
解压并放到 `models/sense-voice/`，需要包含 `model.int8.onnx` 和 `tokens.txt`

未放置模型时听写功能自动停用，KWS 关键词不受影响。

🧪 升级影响

配置文件向下兼容：旧的 `config.json` 启动时会自动补齐新字段
如果之前用过腾讯云 ASR，那几个 `tencent_asr_*` 字段可以从 `config.json` 里删掉（保留也不影响运行）

完整 diff: `v1.0.0...v1.1.0` — 28 files changed, +3046 / -305

Assets 2

v1.0.0 - Initial Release

enixz released this 18 May 07:43

v1.0.0

e7731a3

🎯 AirControl v1.0.0 - Initial Release

✨ Features

🖐️ Gesture Control

Presentation Mode: Wave to change slides, start/stop presentation
Mouse Mode: Air mouse, pinch to click, scissor to scroll
Drawing Mode: Finger writing, fist to clear, shape correction

🎤 Voice Assistant

Offline KWS: Sherpa-ONNX keyword detection (privacy-first)
Online ASR: Tencent Cloud for free text input
Mode-aware: Auto-switch voice commands by mode

🧠 Smart Recognition

Kalman + EMA dual smoothing for 21 hand landmarks
Smart shape correction (lines, triangles, rectangles, ellipses)
Edge acceleration for screen-edge cursor speed boost

⚡ High Performance

Async MediaPipe inference in background thread
30 FPS real-time hand tracking
< 50ms response latency

📦 Installation

git clone https://github.com/enixz/AirControl.git
cd AirControl
pip install -r requirements.txt
python -m app.main_ui

📄 License

Apache License 2.0

Full Changelog: https://github.com/enixz/AirControl/commits/v1.0.0

Assets 2

Releases: enixz/AirControl

Release list

AirControl v1.3.0

AirControl v1.3.0

主要亮点 / Highlights

1. 手部与指尖精度 / Hand and Fingertip Precision

2. 裁剪放大与超分辨率 / Crop Zoom and Super Resolution

3. 板书与全局手势 / Drawing and Global Gestures

4. 延迟与运行性能 / Latency and Runtime Performance

5. 配置与稳定性 / Configuration and Reliability

6. 打包、测试与持续集成 / Packaging, Testing, and CI

升级说明 / Upgrade Notes

完整变更 / Full Changelog

Uh oh!

v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen

v1.2.0 — Far-distance tracking, 🤟 mode-switch, robust draw-mode pen

✨ New features

🐛 Fixes

🔧 Internal

Uh oh!

v1.1.0 — Offline Dictation, Multi-Camera & Hardened UX

✨ 新功能

🎙️ 语音听写（板书模式）

📷 摄像头多选 + 自动探测

💬 语音指令面板（cheatsheet）

🐛 关键修复

⚙️ 配置变更

📦 模型下载

🧪 升级影响

Uh oh!

v1.0.0 - Initial Release

🎯 AirControl v1.0.0 - Initial Release

✨ Features

🖐️ Gesture Control

🎤 Voice Assistant

🧠 Smart Recognition

⚡ High Performance

📦 Installation

📄 License

Uh oh!