A Chrome extension that doesn't just automate your browser — it knows which AI tool is fastest for each job and uses it for you.
Every browser AI agent today — Claude for Chrome, OpenAI Operator, Manus, Do Browser — tries to do everything by itself. Want a presentation? The agent clicks through Google Slides one button at a time. Want to compare prices? It opens tabs sequentially and slowly reads each page. Want a background removed from an image? It tries to navigate Photoshop or Canva's complex UI.
This approach is fundamentally broken. LLMs are bad at navigating web pages. The best browser agent in the world scores only 10.4% on complex end-to-end workflows (CUB benchmark). Claude for Chrome's own developer community admits that "the information density of a web page is an order of magnitude lower than code or documents — where LLMs actually shine."
The result: agents that are slow, expensive, inaccurate, and frustrating.
ctrl doesn't try to be the AI. It orchestrates AIs.
When you say "make me a presentation," ctrl doesn't click through Slides for 10 minutes. It opens Gamma, writes the perfect prompt, and gets a professional deck in 30 seconds.
When you say "remove this background," it doesn't navigate Canva. It opens remove.bg and does it in 5 seconds.
When you say "research this topic," it doesn't Google and click links one by one. It opens Perplexity and gets a cited summary in 10 seconds.
When you say "fill this form," it doesn't need an external tool — it reads the form fields directly, matches them against your saved profile, asks you for anything missing, and fills everything in seconds.
ctrl knows 12+ specialized skills, each using the fastest possible method — whether that's a purpose-built AI tool or direct browser automation. You just talk, and it figures out the rest.
You speak (or type)
→ Gemini Live transcribes + understands intent in real-time
→ Orchestrator picks the right skill for the job
→ Skill executes using the fastest method:
• External AI tool (Gamma, Perplexity, remove.bg, etc.)
• Direct browser automation (forms, navigation, clicks)
• Parallel tab operations (price comparison, research)
→ Agent narrates progress via voice as you watch it work
→ Result presented — you approve, correct, or move on
The agentic loop for each skill runs up to 25 rounds per step:
observe (DOM + accessibility tree + screenshot)
→ plan next action (via LLM)
→ execute action (via Chrome DevTools Protocol)
→ verify result (screenshot → vision model)
→ repeat until step is complete
ctrl uses the same approach as Claude for Chrome for browser understanding — the Accessibility Tree (AX tree), not raw DOM or screenshots:
- AX tree = ~800 tokens per page vs 10,000+ for full DOM. 12x cheaper, 12x more accurate.
- Chrome DevTools Protocol (CDP) for action execution — native mouse/keyboard events that work on React apps, SPAs, and every modern web app.
- Screenshot verification after every action — a vision model confirms the action worked before proceeding.
- Shared login state — works with your existing sessions. No re-authentication needed.
But unlike Claude for Chrome, ctrl doesn't try to do complex creative tasks via raw DOM clicks. It delegates to the right tool and achieves in 30 seconds what other agents fail to do in 10 minutes.
- Powered by Gemini Live — full duplex, real-time, interruptible voice conversation.
- Speak in Hindi, English, or Hinglish. Change your mind mid-sentence. The agent adapts.
- Text input also available in the side panel.
- 12+ built-in skills, each using the optimal method for its task type.
- Automatically routes your request to the right skill — you never need to specify which tool to use.
- Community skills can be installed directly from the side panel.
- Opens tabs, fills forms, clicks buttons, scrolls, navigates — all via CDP.
- Parallel tab operations for tasks like price comparison.
- Works with your existing logins (Gmail, Amazon, Notion — anything you're signed into).
- Save your name, email, phone, address, and other details once.
- ctrl uses this to auto-fill forms without asking every time.
- Missing fields are requested interactively via voice — then saved for next time.
- Permission modes: strict (ask before every action), smart (ask for sensitive actions only), or auto (full autonomy for trusted sites).
- Abort button stops any running task immediately.
- Submit confirmation required before any form is submitted.
Each skill is a self-contained module that handles a specific type of task. Skills can use external AI tools, direct browser automation, or a combination.
| Skill | What It Does | Method |
|---|---|---|
ppt-gamma |
Creates presentations via Gamma.app | External AI tool |
design-stitch |
Designs landing pages and UI via Stitch | External AI tool |
research-perplexity |
Deep research with citations via Perplexity | External AI tool |
youtube-summarize |
Summarizes the current YouTube video | Transcript extraction + LLM |
price-compare |
Compares prices across shopping sites | Parallel tabs + AX tree |
shopping |
Searches and browses products on any site | Direct browser automation |
booking |
Books restaurants, hotels, flights | Direct browser automation |
email |
Drafts, reads, and sends emails in Gmail | Direct browser automation |
media |
Plays music, podcasts, videos | Direct browser automation |
research |
In-page scraping and summarization | AX tree + LLM |
tab-manager |
Opens, closes, and switches tabs | Chrome APIs |
form-fill |
Fills any web form using saved profile + voice Q&A | Direct browser automation |
Skills are modular. To add a new skill:
- Create a new file in
extension/skills/(e.g.,my-skill.js) - Export an object with
name,description,triggers,canHandle(), andexecute() - Register it in
extension/skills/index.js
Community skills can also be installed at runtime via URL from the side panel.
extension/
manifest.json Chrome MV3 manifest
background.js Service worker — orchestrator, skill router, CDP, API calls
sidepanel.html/.js Side panel UI — voice controls, status, settings, skills manager
content.js Content script — DOM snapshots, action execution, permission banners
offscreen.html/.js Offscreen document — microphone capture, PCM16 streaming
audio-processor.js AudioWorklet — float32 → PCM16 conversion
audio-capture.js Mic capture bootstrap
skills/
index.js Skill registry (built-in + community)
form-fill.js Generic form-fill skill
ppt-gamma.js Presentation skill (via Gamma)
design-stitch.js Design skill (via Stitch)
youtube-summarize.js YouTube summarization skill
research-perplexity.js Deep research skill (via Perplexity)
research.js In-page research skill
price-compare.js Price comparison skill
shopping.js Shopping skill
booking.js Booking skill
email.js Email skill
media.js Media playback skill
tab-manager.js Tab management skill
storage/
user-profile.js User profile CRUD helpers
┌──────────────────────────────────────────────────────────┐
│ Chrome + ctrl Extension │
│ │
│ ┌─────────────┐ ┌────────────────┐ ┌──────────────┐ │
│ │ Side Panel │ │ Background │ │ Content │ │
│ │ (voice UI, │◄►│ (orchestrator │◄►│ Script │ │
│ │ settings) │ │ skill router │ │ (DOM + acts)│ │
│ └─────────────┘ │ CDP control) │ └──────┬───────┘ │
│ └───────┬────────┘ │ │
│ ┌─────────────┐ │ ┌──────▼───────┐ │
│ │ Offscreen │──audio──►│ │ Active Tab │ │
│ │ (mic input) │ │ │ (your page) │ │
│ └─────────────┘ │ └──────────────┘ │
│ │ │
└────────────────────────────┼─────────────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌───────▼──────┐ ┌────▼─────┐ ┌──────▼───────┐
│ Gemini Live │ │ Groq │ │ OpenRouter │
│ (voice + │ │ (intent │ │ (action │
│ intent) │ │ routing │ │ planning) │
└──────────────┘ │ + draft)│ └──────────────┘
└──────────┘
| Role | Model | Via |
|---|---|---|
| Voice conversation + intent detection | gemini-2.5-flash-native-audio |
Gemini Live WebSocket |
| Orchestration + content drafting | llama-4-scout / llama-4-maverick |
Groq REST API |
| Per-round action planning | Configurable (default: best available) | OpenRouter REST API |
| Screenshot verification | llama-4-scout (multimodal) |
Groq REST API |
- Chrome (or Chromium) with Side Panel API support
- Gemini API key — aistudio.google.com
- Groq API key — groq.com
- OpenRouter API key — openrouter.ai
git clone https://github.com/<your-org>/ctrl.git
cd ctrlThere is no build step. The extension loads directly.
- Open
chrome://extensions - Enable Developer mode
- Click Load unpacked → select the
extension/folder - Click the ctrl icon to open the side panel
Open ctrl's side panel → Settings:
- Enter your Gemini, Groq, and OpenRouter API keys
- The status indicator turns green when Gemini Live connects
- (Optional) Fill in your user profile for automatic form filling
Just talk to it:
| What you say | What ctrl does |
|---|---|
| "Make a 10-slide presentation on climate change" | Opens Gamma → writes prompt → generates deck → asks for corrections |
| "Compare iPhone 16 prices on Amazon and Flipkart" | Opens both sites in parallel → extracts prices → tells you the best deal |
| "Fill out this job application" | Reads form fields → fills from your profile → asks for missing info → fills the rest |
| "Summarize this YouTube video" | Extracts transcript → summarizes via LLM → reads summary aloud |
| "Research the best laptops under 80000" | Opens Perplexity → writes query → reads cited results |
| "Remove the background from this image" | Opens remove.bg → uploads → downloads clean result |
| "Book a table for 2 at an Italian place in Delhi" | Searches restaurants → navigates booking flow → confirms with you |
| "Draft a follow-up email to the last meeting" | Opens Gmail → drafts email based on context → waits for your approval |
ctrl captures microphone audio only when the mic button is active. It captures DOM snapshots and screenshots only for the active tab during task execution. All data is sent over HTTPS to Gemini, Groq, and OpenRouter.
ctrl does not:
- Inspect tabs you're not actively working in
- Submit forms without your explicit confirmation
- Store audio or screenshots beyond the current session
- Send data to any servers other than the configured AI APIs
- Create
extension/skills/your-skill.js - Follow the pattern from existing skills (see
form-fill.jsas a template) - Register in
extension/skills/index.js - Submit a PR
Open an issue with:
- What you said (voice command)
- What happened vs what you expected
- The site you were on (if relevant)
- Console errors (if any)
MIT
ctrl — Stop clicking. Start talking.