A powerful SDK for Chrome extensions to interact with browsers and web pages. Simplifies browser automation, web scraping, and page interaction tasks.
- 🎯 CDP-Powered - Direct Chrome DevTools Protocol integration for fine-grained browser control
- 🔒 Stable & Reliable - CDP provides deterministic, low-level operations without DOM injection
- 🚀 Easy to Use - Simple, intuitive API design
- 🌐 Comprehensive - Browser control, page interactions, content extraction
- 🎨 Flexible - Support for keyboard, mouse, and complex interactions
- 📦 Lightweight - Minimal dependencies, no external browser binaries required
Unlike traditional browser automation tools (Puppeteer, Playwright, Selenium), Browser AgentKit is specifically designed for Chrome extension development:
| Feature | Browser AgentKit | Puppeteer/Playwright |
|---|---|---|
| Runtime | Inside Chrome extension | External Node.js process |
| Browser Instance | Uses user's existing browser | Launches separate browser |
| User Session | Access to logged-in sessions & cookies | Isolated browser context |
| Installation | npm package only | Requires browser binary download |
| Use Case | Extension-based automation | Testing & scraping servers |
Browser AgentKit leverages the Chrome DevTools Protocol (CDP) directly, the same protocol that powers Chrome DevTools, Puppeteer, and Playwright internally:
- Fine-grained Control - Direct access to browser internals: DOM, network, input, runtime
- No Script Injection - Input simulation happens at the browser level, not via JavaScript injection
- Deterministic Operations - CDP commands are executed synchronously by the browser engine
- Anti-Detection Friendly - Native browser events are indistinguishable from real user actions
- Full Debugging Capabilities - Access to the same powerful APIs used by Chrome DevTools
✅ Use Browser AgentKit when you need:
- Build Chrome extensions with automation capabilities
- Access user's authenticated sessions (no re-login required)
- Operate within user's existing browser environment
- Lightweight SDK without bundled browser binaries
❌ Use Puppeteer/Playwright instead when you need:
- Server-side web scraping or testing
- Headless browser automation in CI/CD pipelines
- Cross-browser testing (Firefox, Safari, etc.)
- Isolated browser contexts for parallel execution
npm install browser-agentkitimport { browser, Page } from 'browser-agentkit';
// Open a new tab
const tab = await browser.openTab('https://example.com');
// Create a page instance
const page = new Page(tab.id);
await page.initialize();
// Interact with the page
await page.fill('#search', 'browser automation');
await page.click('#submit');
// Extract content
const extractor = page.getContentExtractor();
const html = await extractor.getHTML();
const metadata = await extractor.getMetadata();
console.log('Page title:', metadata.title);
// Clean up
await page.close();
await browser.closeTab(tab.id);Manage browser tabs and windows:
import { browser } from 'browser-agentkit';
// Open and manage tabs
const tab = await browser.openTab('https://example.com');
const currentTab = await browser.getCurrentTab();
const allTabs = await browser.queryTabs({ currentWindow: true });
// Navigation
await browser.navigate(tab.id, 'https://google.com');
await browser.goBack(tab.id);
await browser.reload(tab.id);
// Screenshots
const screenshot = await browser.captureVisibleTab(tab.id);Interact with web pages:
import { Page, ScrollDirection } from 'browser-agentkit';
const page = new Page(tabId);
await page.initialize();
// Click elements
await page.click('#button');
// Fill forms
await page.fill('#email', 'user@example.com');
await page.fill('#password', 'secret');
// Scroll
await page.scroll(ScrollDirection.DOWN);
// Wait for elements
await page.waitForSelector('#result');
// Advanced interactions
const keyboard = await page.getKeyboard();
await keyboard.press('Enter');
const mouse = await page.getMouse();
await mouse.click(100, 200);Extract data from web pages:
import { ContentExtractor } from 'browser-agentkit';
const extractor = new ContentExtractor(tabId);
// Get page content
const html = await extractor.getHTML();
const text = await extractor.getText();
const metadata = await extractor.getMetadata();
// Get element content
const result = await extractor.getElementContent('#article');
console.log(result.text);
// Check if PDF
const isPdf = await extractor.isPDF();Simulate keyboard and mouse input:
import { createKeyboard, createMouse } from 'browser-agentkit';
const keyboard = await createKeyboard({ tabId });
// Type text
await keyboard.type('Hello World');
// Press keys
await keyboard.press('Enter');
await keyboard.press('ControlOrMeta+KeyA'); // Ctrl/Cmd + A
// Mouse operations
const mouse = createMouse({ tabId }, keyboard);
await mouse.move(100, 200);
await mouse.click(100, 200);
await mouse.dblclick(150, 250);Listen to browser events:
import { events } from 'browser-agentkit';
// Tab events
events.onTabCreated((tab) => {
console.log('New tab created:', tab.url);
});
events.onTabUpdated((tabId, changeInfo, tab) => {
if (changeInfo.status === 'complete') {
console.log('Tab loaded:', tab.url);
}
});
// Network events
events.onBeforeRequest((details) => {
console.log('Request:', details.url);
});
events.onCompleted((details) => {
console.log('Response:', details.url);
});import { browser, Page } from 'browser-agentkit';
async function fillLoginForm() {
const tab = await browser.openTab('https://example.com/login');
const page = new Page(tab.id);
await page.initialize();
await page.fill('#username', 'myuser');
await page.fill('#password', 'mypass');
await page.click('#login-button');
await page.waitForNavigation();
console.log('Login successful!');
await page.close();
}import { browser, Page } from 'browser-agentkit';
async function scrapeData() {
const tab = await browser.openTab('https://example.com/data');
const page = new Page(tab.id);
await page.initialize();
const extractor = page.getContentExtractor();
const metadata = await extractor.getMetadata();
const content = await extractor.getElementContent('.data-container');
const data = {
title: metadata.title,
description: metadata.description,
content: content.text
};
await page.close();
await browser.closeTab(tab.id);
return data;
}import { Page } from 'browser-agentkit';
async function dragAndDrop(tabId: number) {
const page = new Page(tabId);
await page.initialize();
const mouse = await page.getMouse();
// Drag from (100, 100) to (300, 300)
await mouse.move(100, 100);
await mouse.down();
await mouse.move(300, 300, { steps: 10 }); // Smooth movement
await mouse.up();
await page.close();
}Main class for browser-level operations.
| Method | Description |
|---|---|
openTab(url, options?) |
Opens a new tab |
getTab(tabId) |
Gets tab information |
getCurrentTab(windowId?) |
Gets the active tab |
queryTabs(queryInfo) |
Queries tabs |
searchHistory(query) |
Searches browser history |
closeTab(tabId) |
Closes a tab |
navigate(tabId, url) |
Navigates to URL |
goBack(tabId) |
Goes back in history |
goForward(tabId) |
Goes forward in history |
reload(tabId, bypassCache?) |
Reloads a tab |
captureVisibleTab(tabId, options?) |
Captures visible area screenshot |
captureFullPage(tabId) |
Captures full page screenshot |
createHiddenTab(url) |
Creates a hidden tab for background processing |
captureVisibleTab(tabId, options?) |
Captures visible area |
Main class for page-level operations.
| Method | Description |
|---|---|
initialize() |
Initializes the page (must call before other methods) |
close() |
Cleans up resources |
click(selector) |
Clicks an element, The param selector can be any CSS selector or a string in the format node=id, e.g. "node=123". |
fill(selector, text) |
Fills an input field, The param selector can be any CSS selector or a string in the format node=id, e.g. "node=123". |
scroll(direction, amount?) |
Scrolls the page |
navigate(url) |
Navigates to URL |
waitForNavigation(options?) |
Waits for navigation to complete |
waitForSelector(selector, options?) |
Waits for element to appear |
element(selector) |
Creates PageElement instance |
getKeyboard() |
Gets Keyboard instance |
getMouse() |
Gets Mouse instance |
getContentExtractor() |
Gets ContentExtractor instance |
evaluate(fn, ...args) |
Executes script in page context |
captureVisible(tabId, options?) |
Captures using debugger API |
Standalone class for page actions (used internally by Page).
| Method | Description |
|---|---|
click(selector) |
Clicks an element |
fill(selector, value) |
Fills an input field |
search(selector, value) |
Fills input and presses Enter |
scroll(direction, amount?) |
Scrolls the page |
waitForSelector(selector, options?) |
Waits for element to appear |
Handles page navigation.
| Method | Description |
|---|---|
navigate(url) |
Navigates to URL |
waitForNavigation(options?) |
Waits for navigation to complete |
waitForCondition(fn, options?) |
Waits for a condition to be true |
Represents a DOM element for interaction.
| Method | Description |
|---|---|
waitForExist(options?) |
Waits for element to exist |
exists() |
Checks if element exists |
getBoundingBox() |
Gets element's bounding box |
getText() |
Gets element's text content |
getAttributeValue(name) |
Gets attribute value |
scrollIntoView() |
Scrolls element into view |
findElementNodeIds() |
Finds element and returns node IDs |
Extracts content from web pages.
| Method | Description |
|---|---|
getPageSnapshot(options?) |
Gets page snapshot with HTML and metadata |
getHTML(options?) |
Gets Structured HTML Content (options: { viewportOnly?: boolean }). Every interactive DOM element has a unique "node" attribute, e.g. ...<button onclick="..." node="1234"></button>.... |
getText() |
Gets plain text content |
getMetadata() |
Gets page metadata (title, description, etc.) |
getElementContent(selector) |
Gets element's HTML and text content |
isPDF() |
Checks if current page is a PDF |
Simulates keyboard input.
| Method | Description |
|---|---|
down(key) |
Presses a key down |
up(key) |
Releases a key |
press(key, options?) |
Presses and releases a key |
type(text, options?) |
Types a sequence of characters |
insertText(text) |
Inserts text directly |
Simulates mouse input.
| Method | Description |
|---|---|
move(x, y, options?) |
Moves mouse to position |
down(options?) |
Presses mouse button down |
up(options?) |
Releases mouse button |
click(x, y, options?) |
Clicks at position |
dblclick(x, y, options?) |
Double-clicks at position |
wheel(deltaX, deltaY) |
Scrolls using mouse wheel |
Manages browser event listeners.
| Method | Description |
|---|---|
on(eventName, callback) |
Registers custom event listener |
off(eventName, callback) |
Removes custom event listener |
emit(eventName, ...args) |
Emits custom event |
onTabCreated(callback) |
Listens for tab created events |
onTabRemoved(callback) |
Listens for tab removed events |
onTabUpdated(callback) |
Listens for tab updated events |
onTabActivated(callback) |
Listens for tab activated events |
onBeforeRequest(callback, filter?) |
Listens for network requests |
onCompleted(callback, filter?) |
Listens for completed requests |
cleanup() |
Removes all listeners |
Low-level tab management (used internally by Browser).
| Method | Description |
|---|---|
create(url, options?) |
Creates a new tab |
get(tabId) |
Gets tab by ID |
getActive(windowId?) |
Gets active tab |
query(queryInfo) |
Queries tabs |
close(tabId) |
Closes a tab |
update(tabId, properties) |
Updates tab properties |
navigate(tabId, url) |
Navigates to URL |
goBack(tabId) |
Goes back in history |
goForward(tabId) |
Goes forward in history |
reload(tabId, bypassCache?) |
Reloads tab |
createHidden(url) |
Creates hidden tab |
enum ScrollDirection {
UP = 'UP',
DOWN = 'DOWN',
}// Tab information
type TabInfo = chrome.tabs.Tab
// Tab context for operations
interface TabContext {
tabId: number
mainTabId?: number
url?: string
title?: string
}
// Page content snapshot
interface PageContent {
html: string
markdown?: string
meta?: PageMetadata
isPdf?: boolean
resources?: unknown[]
}
// Page metadata
interface PageMetadata {
title?: string
description?: string
url?: string
image?: string
[key: string]: string | undefined
}
// Screenshot options
interface ScreenshotOptions {
format?: 'png' | 'jpeg'
quality?: number // 0-100, jpeg only
}
// Element bounding box
interface BoundingBox {
x: number
y: number
width: number
height: number
}
// Viewport information
interface ViewportInfo {
width: number
height: number
x: number
y: number
}| Function | Description |
|---|---|
createKeyboard(debuggee) |
Creates Keyboard instance with platform detection |
createMouse(tabContext, keyboard) |
Creates Mouse instance |
import { browser, events } from 'browser-agentkit';
// Pre-initialized Browser instance
browser.openTab('https://example.com');
// Pre-initialized EventManager instance
events.onTabCreated((tab) => console.log(tab));- Chrome/Chromium browser
- Chrome extension with appropriate permissions:
tabsdebuggerscriptingactiveTabhistory
MIT