A Windows desktop teaching agent that shows a helping hand character on screen. Users click the hand to give voice commands, and the app uses real browser automation with teaching overlays to complete tasks while explaining every step.
Download from https://www.python.org/downloads/ and install. Check "Add Python to PATH" during installation.
cd "C:\Users\maxgi\OneDrive\เอกสาร\AHH!"
python -m venv venv
venv\Scripts\activatepip install -r requirements.txt
playwright install chromiumset ANTHROPIC_API_KEY=your_anthropic_api_key_here
set ELEVENLABS_API_KEY=your_elevenlabs_api_key_hereOr create a .env file (optional, you'll need to source it manually):
ANTHROPIC_API_KEY=sk-ant-...
ELEVENLABS_API_KEY=xi-...
python main.py- Helping hand appears on screen (bottom-right). Drag it anywhere.
- Click the hand to start voice recording. Click again to stop and process.
- If voice fails, a text input box appears as fallback.
- The app calls Claude to plan the task as browser steps.
- If clarification is needed, bubble options appear near the hand.
- A Playwright browser opens and the app mirrors actions with the real OS cursor.
- Teaching overlays show: cursor halo, trail, click pulses, element highlights, arrows, and captions.
- Step stack panel shows progress through the plan.
- Press ESC or click STOP to halt at any time.
/ahh
/ui - PySide6 overlay components
hand_widget.py - Draggable helping hand
overlay_window.py - Main transparent overlay
step_stack.py - Plan steps panel
bubbles.py - Clarification bubbles
caption_strip.py - Action captions
confirm_modal.py - Safety confirmation
cursor_overlay.py - Halo, trail, click pulse, highlight, arrow
text_input.py - Text fallback input
/audio - Audio capture and STT
recorder.py - Microphone recording
stt_client.py - ElevenLabs Scribe API
/agent - LLM planning
planner.py - Claude API planner
schema.py - JSON schema + validation
/automation - Browser + cursor control
browser_driver.py - Playwright Chromium driver
cursor_executor.py - OS cursor mirroring
/assets
hand.svg - Hand character asset
main.py - Application entrypoint
requirements.txt - Python dependencies
- Python 3.11+
- PySide6 (Qt6) - Overlay UI
- Playwright - Browser automation
- PyAudio - Microphone capture
- pyautogui - OS cursor control
- anthropic - Claude API
- requests - HTTP client for ElevenLabs
- pydantic - JSON schema validation
- numpy - Audio amplitude