Give any AI coding agent a voice. This skill teaches your agent how to build fully hands-free, voice-driven browser interfaces using the native Web Speech API β zero dependencies, zero API keys, entirely client-side.
This is an Agent Skills package compatible with Claude Code, Gemini CLI, Google Antigravity, Cursor, and any other tool that follows the open SKILL.md standard.
When you ask your agent to "add voice control", "make this voice-activated", or "build a hands-free interface", this skill loads and provides the agent with:
- A complete command registry pattern for mapping speech β DOM actions
- A ready-to-use HTML/JS boilerplate with status UI and error handling
- All browser compatibility rules, permission requirements, and UX constraints
- Patterns for scrolling, tab control, search, form fill, click-by-label, and text-to-speech feedback
"Add voice control to this page"
"Make the dashboard hands-free"
"Build a voice-activated search"
"Let users scroll with voice commands"
"Create a voice command interface"
"Make it so I can click buttons by saying their name"
browser-voice-control/
βββ SKILL.md β Main instructions (required)
βββ references/
βββ speech-recognition-api.md β Full SpeechRecognition API reference
βββ locales.md β BCP 47 language codes
| Voice command | Action |
|---|---|
| "Open a new tab" | window.open('about:blank', '_blank') |
| "Close this tab" | window.close() |
| "Scroll down / up" | window.scrollBy() |
| "Go to top / bottom" | window.scrollTo() |
| "Search for [query]" | Focuses search input or opens Google |
| "Click [element name]" | Fuzzy-matches and clicks any button/link |
| "Fill [field] with [value]" | Targets input by name/placeholder and fills it |
| "Go back / forward" | history.back() / history.forward() |
| "Refresh the page" | location.reload() |
| "Read the page" | SpeechSynthesis reads visible text aloud |
| "Stop listening" | Stops the recognition session |
| Browser | Support | Notes |
|---|---|---|
| Chrome 33+ | β Full | Recommended |
| Edge 79+ | β Full | Chromium-based |
| Safari 14.1+ | No continuous mode; single-shot only |
|
| Firefox | β Not supported | Web Speech API not implemented |
| Mobile Chrome | β Full | Works on Android |
| Mobile Safari | iOS 14.5+; requires user gesture | |
| Samsung Internet | β Partial | Based on Chromium |
| Opera | β Full | Chromium-based |
Note: The skill always generates a fallback message when
SpeechRecognitionis undefined, so unsupported browsers degrade gracefully.
SpeechRecognition is blocked on plain http:// pages in all browsers. The only exception is localhost during local development.
Set recognition.lang to any BCP 47 code. Leaving it empty ("") defaults to the browser's UI language.
| Code | Language |
|---|---|
en-US |
English (United States) |
en-GB |
English (United Kingdom) |
es-ES |
Spanish (Spain) |
es-MX |
Spanish (Mexico) |
fr-FR |
French (France) |
de-DE |
German (Germany) |
it-IT |
Italian (Italy) |
pt-BR |
Portuguese (Brazil) |
ja-JP |
Japanese |
ko-KR |
Korean |
zh-CN |
Chinese (Simplified) |
zh-TW |
Chinese (Traditional) |
ar-SA |
Arabic (Saudi Arabia) |
hi-IN |
Hindi (India) |
ru-RU |
Russian |
nl-NL |
Dutch |
pl-PL |
Polish |
sv-SE |
Swedish |
tr-TR |
Turkish |
See references/locales.md for the full list.
- Download the
.skillfile from Releases or clone this repo and zip the folder. - In Claude.ai, go to Customize β Skills.
- Click the + button β Upload a skill.
- Upload the
.skillzip file. - Toggle the skill on β it will now activate automatically when relevant.
Skills require Code execution to be enabled. Go to Settings β Capabilities and turn it on if you haven't already. Available on Pro, Max, Team, and Enterprise plans.
Skills live in ~/.claude/skills/ (global, all projects) or .claude/skills/ (project-local).
Option A β clone directly into your skills folder:
git clone https://github.com/YOUR_USERNAME/browser-voice-control \
~/.claude/skills/browser-voice-controlOption B β copy a single skill folder manually:
mkdir -p ~/.claude/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. ~/.claude/skills/browser-voice-control/Option C β use the Vercel skills CLI:
npx agent-skills-cli add YOUR_USERNAME/browser-voice-controlAfter installation, Claude Code discovers the skill automatically. You can also invoke it directly:
/browser-voice-control
Gemini CLI follows the same Agent Skills open standard. Place the skill in ~/.gemini/skills/ for global access, or .gemini/skills/ inside your project.
# Global install (available in all projects)
mkdir -p ~/.gemini/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. ~/.gemini/skills/browser-voice-control/
# Or project-local
mkdir -p .gemini/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. .gemini/skills/browser-voice-control/Gemini CLI will auto-detect the skill from the SKILL.md frontmatter. You can also invoke it directly in the chat:
/browser-voice-control
The skill works with all models available in Gemini CLI (Gemini 3 Pro, Gemini 3 Flash, etc.).
Google Antigravity is Google's agentic IDE. It supports the same SKILL.md format natively with two install scopes:
Workspace scope (project-specific β recommended for team projects):
mkdir -p .agent/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. .agent/skills/browser-voice-control/Global scope (available across all your Antigravity projects):
mkdir -p ~/.gemini/antigravity/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. ~/.gemini/antigravity/skills/browser-voice-control/After copying, restart your agent session so Antigravity re-scans the skills directory. The skill will then appear in the agent's toolkit and auto-activate when you ask for voice control features.
Unlike Rules (which are always on) and Workflows (which you trigger with
/), Skills in Antigravity are agent-triggered β the model loads them automatically when your request matches the skill description.
Cursor 2.3+ supports Agent Skills natively via .cursor/skills/.
Project-local install:
mkdir -p .cursor/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. .cursor/skills/browser-voice-control/Global install (available in all your Cursor projects via MCP β see cursor-skills for setup):
mkdir -p ~/.cursor/skills/browser-voice-control
cp -r /path/to/browser-voice-control/. ~/.cursor/skills/browser-voice-control/Or use the Cursor command palette:
- Open
Cmd+Shift+Pβ Cursor Rules: Add Skill - Point it at your local copy of this folder.
Cursor loads the skill dynamically when the agent determines it is relevant. You can also trigger it explicitly in the agent panel:
/browser-voice-control
- User gesture required β
recognition.start()must be triggered by a click or keydown event. Autostart on page load is blocked by all browsers. - Microphone permission β the browser will prompt on first use. Permission is remembered per origin.
- Always show a status indicator β users must be able to see when the microphone is active. The boilerplate includes a fixed status badge for this.
| Error code | Cause | Fix |
|---|---|---|
not-allowed |
Mic permission denied | Prompt user to allow mic access |
no-speech |
Silence timeout | Restart recognition or notify user |
network |
Browser voice engine call failed | Retry; some engines require a network connection |
aborted |
recognition.abort() was called |
Expected; no action needed |
audio-capture |
No microphone found | Check hardware; show error UI |
The Web Speech API routes audio through the browser's built-in voice recognition engine. In Chrome and Edge, this typically means audio is processed on Google's servers (similar to Google Search voice input). Safari uses Apple's on-device engine. No data is sent to any third-party server by this skill β but browser vendor terms apply.
PRs welcome. If you add new command patterns, extend the locale list, or improve Safari compatibility, please update SKILL.md and the relevant reference file in references/.
MIT
