Lightweight voice-to-text using Puter.js. Two keys. That's it.
A system-wide voice assistant for Linux that captures audio from global hotkeys, transcribes it using Puter's cloud-based AI, and can either copy to clipboard or execute commands via AI interpretation.
-
Two Operation Modes:
- Transcribe Mode: Voice to clipboard text
- Command Mode: AI-powered command execution (shell, file operations, Claude CLI integration)
-
Global Hotkeys: Right-Ctrl + Right-Shift to trigger recording
-
Puter.js Integration: Cloud-based speech-to-text and AI interpretation
-
System Tray: Status indicator and controls
-
Cross-Platform: Linux-focused with optional Electron UI
The easiest way to get started is with our automated setup script:
# One-command installation with TUI
curl -fsSL https://raw.githubusercontent.com/jdgafx/voice-agent/main/setup.sh | bashThe setup script provides:
- Beautiful TUI interface with interactive prompts
- Automatic dependency detection and installation
- Multiple implementation options (Node.js or Python)
- GitHub integration and publishing setup
- System configuration and launcher creation
If you prefer manual installation:
- Node.js 18+ or Python 3.8+
- Linux (primary target)
- Microphone access
- GitHub token configured (for MCP server)
# Clone the repository
git clone https://github.com/jdgafx/voice-agent.git
cd voice-agent
# Install dependencies (Node.js)
npm install
# Or install dependencies (Python)
pip install sounddevice numpy PyQt6 httpx pyperclip
# Set up GitHub token for MCP server
export GITHUB_TOKEN=your_github_token_here
# Start the voice agent
npm start # Node.js version
# or
python3 src/voice_agent.py # Python version- Start the agent:
npm start - Trigger recording: Press Right-Ctrl + Right-Shift
- Speak naturally: Say what you want to do
- Results:
- Transcribe mode: Text copied to clipboard
- Command mode: AI interprets and executes commands
Edit src/utils/config.js to customize:
- Hotkey combinations
- Operation mode (transcribe/command)
- UI preferences
- Puter.js settings
The setup script supports multiple publishing options for sharing your Voice Agent:
- Landing page served from
docs/directory - Automatic setup via setup script
- URL:
https://jdgafx.github.io/voice-agent
- Vercel: Free tier with global CDN
- Netlify: Free hosting with continuous deployment
- Firebase: Google's hosting platform (free tier available)
# Run setup script and choose publishing option
./setup.sh # Select your preferred hosting platformVisit the published landing page at: https://jdgafx.github.io/voice-agent
Features:
- Modern responsive design
- Installation instructions
- Feature showcase
- Demo video placeholder
- Multi-platform support info
- PuterClient: Successfully authenticates and handles transcription/chat
- VoiceListener: Audio buffer creation and processing
- CommandCenter: Command interpretation and execution
- HotkeyManager: Global keyboard capture (Right-Ctrl + Right-Shift)
- ClipboardManager: Cross-platform clipboard operations
- End-to-end transcription pipeline: Audio → Puter.js → Text
- Command interpretation: Natural language → Actionable commands
- Chat functionality: AI conversation capabilities
- Authentication: Token-based Puter.js access
- ✅ Core verification tests passing
- ✅ Command center functionality verified
- ✅ Integration workflow successful
- ✅ Puter.js API connectivity confirmed
npm start- Start the voice agentnpm run widget- Launch Electron widgetnpm run dev- Development mode with auto-restartnpm test- Run test suite
npm testTests include:
- Core component verification
- Integration tests
- Command execution validation
@heyputer/puter.js- Puter cloud platform integrationmic- Microphone audio captureuiohook-napi- Global hotkey detectionclipboardy- System clipboard accesseventemitter3- Event handlingclaude-flow- Claude CLI integration
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details
- ✅ Complete setup script: Beautiful TUI with automated installation
- ✅ Multiple implementations: Node.js and Python versions available
- ✅ GitHub integration: Automatic repository sync and publishing
- ✅ Web landing page: Modern responsive design for GitHub Pages
- ✅ Multi-platform publishing: Support for Vercel, Netlify, Firebase
- ✅ Core functionality verified: Puter.js integration, transcription pipeline, command interpretation
- ✅ Component integration tested: VoiceListener, PuterClient, CommandCenter working together
- ✅ Global hotkeys implemented: Right-Ctrl + Right-Shift combination supported
- ✅ Test suite passing: Core verification, command center, integration tests successful
- Complete voice agent implementation with Puter.js integration
- Dual mode operation (transcribe/command)
- Global hotkey support
- System tray interface
- Claude CLI integration for AI commands