Voice Agent

Lightweight voice-to-text using Puter.js. Two keys. That's it.

A system-wide voice assistant for Linux that captures audio from global hotkeys, transcribes it using Puter's cloud-based AI, and can either copy to clipboard or execute commands via AI interpretation.

Features

Two Operation Modes:
- Transcribe Mode: Voice to clipboard text
- Command Mode: AI-powered command execution (shell, file operations, Claude CLI integration)
Global Hotkeys: Right-Ctrl + Right-Shift to trigger recording
Puter.js Integration: Cloud-based speech-to-text and AI interpretation
System Tray: Status indicator and controls
Cross-Platform: Linux-focused with optional Electron UI

Quick Start

Automated Installation (Recommended)

The easiest way to get started is with our automated setup script:

# One-command installation with TUI
curl -fsSL https://raw.githubusercontent.com/jdgafx/voice-agent/main/setup.sh | bash

The setup script provides:

Beautiful TUI interface with interactive prompts
Automatic dependency detection and installation
Multiple implementation options (Node.js or Python)
GitHub integration and publishing setup
System configuration and launcher creation

Manual Installation

If you prefer manual installation:

Prerequisites

Node.js 18+ or Python 3.8+
Linux (primary target)
Microphone access
GitHub token configured (for MCP server)

Installation Steps

# Clone the repository
git clone https://github.com/jdgafx/voice-agent.git
cd voice-agent

# Install dependencies (Node.js)
npm install

# Or install dependencies (Python)
pip install sounddevice numpy PyQt6 httpx pyperclip

# Set up GitHub token for MCP server
export GITHUB_TOKEN=your_github_token_here

# Start the voice agent
npm start  # Node.js version
# or
python3 src/voice_agent.py  # Python version

Usage

Start the agent: npm start
Trigger recording: Press Right-Ctrl + Right-Shift
Speak naturally: Say what you want to do
Results:
- Transcribe mode: Text copied to clipboard
- Command mode: AI interprets and executes commands

Configuration

Edit src/utils/config.js to customize:

Hotkey combinations
Operation mode (transcribe/command)
UI preferences
Puter.js settings

Publishing & Deployment

The setup script supports multiple publishing options for sharing your Voice Agent:

GitHub Pages (Free)

Landing page served from docs/ directory
Automatic setup via setup script
URL: https://jdgafx.github.io/voice-agent

Other Options

Vercel: Free tier with global CDN
Netlify: Free hosting with continuous deployment
Firebase: Google's hosting platform (free tier available)

Publishing Setup

# Run setup script and choose publishing option
./setup.sh  # Select your preferred hosting platform

Web Interface

Visit the published landing page at: https://jdgafx.github.io/voice-agent

Features:

Modern responsive design
Installation instructions
Feature showcase
Demo video placeholder
Multi-platform support info

Verified Working Features ✅

Core Components

PuterClient: Successfully authenticates and handles transcription/chat
VoiceListener: Audio buffer creation and processing
CommandCenter: Command interpretation and execution
HotkeyManager: Global keyboard capture (Right-Ctrl + Right-Shift)
ClipboardManager: Cross-platform clipboard operations

Integration Tests

End-to-end transcription pipeline: Audio → Puter.js → Text
Command interpretation: Natural language → Actionable commands
Chat functionality: AI conversation capabilities
Authentication: Token-based Puter.js access

Test Results

✅ Core verification tests passing
✅ Command center functionality verified
✅ Integration workflow successful
✅ Puter.js API connectivity confirmed

Development

Scripts

npm start - Start the voice agent
npm run widget - Launch Electron widget
npm run dev - Development mode with auto-restart
npm test - Run test suite

Testing

npm test

Tests include:

Core component verification
Integration tests
Command execution validation

Dependencies

@heyputer/puter.js - Puter cloud platform integration
mic - Microphone audio capture
uiohook-napi - Global hotkey detection
clipboardy - System clipboard access
eventemitter3 - Event handling
claude-flow - Claude CLI integration

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

MIT License - see LICENSE file for details

Recent Changes

✅ Complete setup script: Beautiful TUI with automated installation
✅ Multiple implementations: Node.js and Python versions available
✅ GitHub integration: Automatic repository sync and publishing
✅ Web landing page: Modern responsive design for GitHub Pages
✅ Multi-platform publishing: Support for Vercel, Netlify, Firebase
✅ Core functionality verified: Puter.js integration, transcription pipeline, command interpretation
✅ Component integration tested: VoiceListener, PuterClient, CommandCenter working together
✅ Global hotkeys implemented: Right-Ctrl + Right-Shift combination supported
✅ Test suite passing: Core verification, command center, integration tests successful
Complete voice agent implementation with Puter.js integration
Dual mode operation (transcribe/command)
Global hotkey support
System tray interface
Claude CLI integration for AI commands

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
assets/icons		assets/icons
config		config
docs		docs
scripts		scripts
src		src
tests		tests
~/node_modules_temp		~/node_modules_temp
.gitignore		.gitignore
.nojekyll		.nojekyll
CLAUDE.md		CLAUDE.md
README.md		README.md
bash.sh		bash.sh
install.sh		install.sh
llms.txt		llms.txt
package.json		package.json
repomix-output.xml		repomix-output.xml
requirements.txt		requirements.txt
setup.sh		setup.sh
stderr.txt		stderr.txt
stdout.txt		stdout.txt
tree.txt		tree.txt
voice-agent.service		voice-agent.service
voice-agent.sh		voice-agent.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent

Features

Quick Start

Automated Installation (Recommended)

Manual Installation

Prerequisites

Installation Steps

Usage

Configuration

Publishing & Deployment

GitHub Pages (Free)

Other Options

Publishing Setup

Web Interface

Verified Working Features ✅

Core Components

Integration Tests

Test Results

Development

Scripts

Testing

Dependencies

Contributing

License

Recent Changes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agent

Features

Quick Start

Automated Installation (Recommended)

Manual Installation

Prerequisites

Installation Steps

Usage

Configuration

Publishing & Deployment

GitHub Pages (Free)

Other Options

Publishing Setup

Web Interface

Verified Working Features ✅

Core Components

Integration Tests

Test Results

Development

Scripts

Testing

Dependencies

Contributing

License

Recent Changes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages