Skip to content

jdgafx/voice-agent

Repository files navigation

Voice Agent

Lightweight voice-to-text using Puter.js. Two keys. That's it.

A system-wide voice assistant for Linux that captures audio from global hotkeys, transcribes it using Puter's cloud-based AI, and can either copy to clipboard or execute commands via AI interpretation.

Features

  • Two Operation Modes:

    • Transcribe Mode: Voice to clipboard text
    • Command Mode: AI-powered command execution (shell, file operations, Claude CLI integration)
  • Global Hotkeys: Right-Ctrl + Right-Shift to trigger recording

  • Puter.js Integration: Cloud-based speech-to-text and AI interpretation

  • System Tray: Status indicator and controls

  • Cross-Platform: Linux-focused with optional Electron UI

Quick Start

Automated Installation (Recommended)

The easiest way to get started is with our automated setup script:

# One-command installation with TUI
curl -fsSL https://raw.githubusercontent.com/jdgafx/voice-agent/main/setup.sh | bash

The setup script provides:

  • Beautiful TUI interface with interactive prompts
  • Automatic dependency detection and installation
  • Multiple implementation options (Node.js or Python)
  • GitHub integration and publishing setup
  • System configuration and launcher creation

Manual Installation

If you prefer manual installation:

Prerequisites

  • Node.js 18+ or Python 3.8+
  • Linux (primary target)
  • Microphone access
  • GitHub token configured (for MCP server)

Installation Steps

# Clone the repository
git clone https://github.com/jdgafx/voice-agent.git
cd voice-agent

# Install dependencies (Node.js)
npm install

# Or install dependencies (Python)
pip install sounddevice numpy PyQt6 httpx pyperclip

# Set up GitHub token for MCP server
export GITHUB_TOKEN=your_github_token_here

# Start the voice agent
npm start  # Node.js version
# or
python3 src/voice_agent.py  # Python version

Usage

  1. Start the agent: npm start
  2. Trigger recording: Press Right-Ctrl + Right-Shift
  3. Speak naturally: Say what you want to do
  4. Results:
    • Transcribe mode: Text copied to clipboard
    • Command mode: AI interprets and executes commands

Configuration

Edit src/utils/config.js to customize:

  • Hotkey combinations
  • Operation mode (transcribe/command)
  • UI preferences
  • Puter.js settings

Publishing & Deployment

The setup script supports multiple publishing options for sharing your Voice Agent:

GitHub Pages (Free)

  • Landing page served from docs/ directory
  • Automatic setup via setup script
  • URL: https://jdgafx.github.io/voice-agent

Other Options

  • Vercel: Free tier with global CDN
  • Netlify: Free hosting with continuous deployment
  • Firebase: Google's hosting platform (free tier available)

Publishing Setup

# Run setup script and choose publishing option
./setup.sh  # Select your preferred hosting platform

Web Interface

Visit the published landing page at: https://jdgafx.github.io/voice-agent

Features:

  • Modern responsive design
  • Installation instructions
  • Feature showcase
  • Demo video placeholder
  • Multi-platform support info

Verified Working Features ✅

Core Components

  • PuterClient: Successfully authenticates and handles transcription/chat
  • VoiceListener: Audio buffer creation and processing
  • CommandCenter: Command interpretation and execution
  • HotkeyManager: Global keyboard capture (Right-Ctrl + Right-Shift)
  • ClipboardManager: Cross-platform clipboard operations

Integration Tests

  • End-to-end transcription pipeline: Audio → Puter.js → Text
  • Command interpretation: Natural language → Actionable commands
  • Chat functionality: AI conversation capabilities
  • Authentication: Token-based Puter.js access

Test Results

  • ✅ Core verification tests passing
  • ✅ Command center functionality verified
  • ✅ Integration workflow successful
  • ✅ Puter.js API connectivity confirmed

Development

Scripts

  • npm start - Start the voice agent
  • npm run widget - Launch Electron widget
  • npm run dev - Development mode with auto-restart
  • npm test - Run test suite

Testing

npm test

Tests include:

  • Core component verification
  • Integration tests
  • Command execution validation

Dependencies

  • @heyputer/puter.js - Puter cloud platform integration
  • mic - Microphone audio capture
  • uiohook-napi - Global hotkey detection
  • clipboardy - System clipboard access
  • eventemitter3 - Event handling
  • claude-flow - Claude CLI integration

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Recent Changes

  • Complete setup script: Beautiful TUI with automated installation
  • Multiple implementations: Node.js and Python versions available
  • GitHub integration: Automatic repository sync and publishing
  • Web landing page: Modern responsive design for GitHub Pages
  • Multi-platform publishing: Support for Vercel, Netlify, Firebase
  • Core functionality verified: Puter.js integration, transcription pipeline, command interpretation
  • Component integration tested: VoiceListener, PuterClient, CommandCenter working together
  • Global hotkeys implemented: Right-Ctrl + Right-Shift combination supported
  • Test suite passing: Core verification, command center, integration tests successful
  • Complete voice agent implementation with Puter.js integration
  • Dual mode operation (transcribe/command)
  • Global hotkey support
  • System tray interface
  • Claude CLI integration for AI commands

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors