Skip to content

floomhq/openbrowser

Repository files navigation

OpenBrowser

License: MIT Python 3.12+ MCP Remote API

OpenBrowser is browser infrastructure for AI agents: isolated Chrome sessions, persistent profiles, proxy-aware identities, human login handoff, a remote API, and MCP tools.

It lets Claude, Codex, Cursor, browser-use, OpenBrowser-style agents, and custom workers share real Chrome browsers without fighting over one CDP port. Agents lease a browser, use a named profile when account state is needed, route selected identities through proxies, hand login challenges to a human, and leave behind telemetry plus issue reports that can be audited later.

OpenBrowser architecture

flowchart LR
  Agent["Agent\nClaude / Codex / Cursor / worker"] --> Client["MCP or API client"]
  Client --> Broker["OpenBrowser Broker"]
  Broker --> Lease["Lease manager"]
  Broker --> Profiles["Profile + identity manager"]
  Broker --> Proxy["Proxy router"]
  Broker --> Observability["Telemetry + feedback + audit"]
  Lease --> Pool["Chrome pool"]
  Profiles --> Pool
  Proxy --> Pool
  Pool --> Slots["pool-a ... pool-h\nisolated CDP sessions"]
Loading

Why

Most browser agents break in the same ways:

  • several agents connect to the same Chrome instance and block each other
  • logged-in sessions are tied to one fragile browser profile
  • passwords and 2FA prompts become unsafe chat messages
  • rich-text apps such as Slack, Discord, Notion, Linear, and X ignore DOM fill calls
  • failures vanish into logs, so the next agent repeats the same mistake

OpenBrowser gives agents one operating contract: lease, act, release, report.

Features

  • Browser pool: multiple isolated Chrome slots with CDP endpoints managed behind one broker.
  • Persistent profiles: named identities reuse Chrome profile directories and session cookies.
  • Profile replicas: selected identities can run in parallel without Chrome profile-lock conflicts.
  • Proxy routing: identities can pin traffic to an HTTP/SOCKS proxy via proxy_ref.
  • Remote API: bearer-token protected /openbrowser/v1 API for agents on any machine.
  • MCP servers: local MCP for same-host agents and remote MCP for HTTPS-backed access.
  • Human auth handoff: one-time portal links for login, 2FA, passkeys, and manual challenges.
  • Active lease control: short-lived manual control links for a browser tab already held by an agent.
  • Rich-text keyboard tools: real keyboard events for editors that reject simple DOM value changes.
  • Telemetry and issues: sanitized events, feedback issue tracking, and usage audits.
  • browser-use and OpenBrowser adapters: wrappers lease a slot, run the tool, then release the slot.

Use Cases

Use case OpenBrowser gives you
Remote browser automation HTTPS API and remote MCP for agents running on other machines.
Logged-in workflows Named Chrome identities with persisted profile state.
Multi-agent work Lease isolation so parallel agents do not steal each other's tabs.
Account-specific routing Per-identity proxy refs, locale, timezone, and profile policy.
Human-in-the-loop auth One-time portal links for passwords, 2FA, passkeys, and manual checks.
Debuggable automation Sanitized telemetry, native feedback issues, and session-log audits.

Architecture

sequenceDiagram
  participant A as Agent
  participant B as OpenBrowser Broker
  participant C as Chrome slot
  participant H as Human auth portal
  A->>B: lease(owner, identity_id?)
  B->>C: reserve isolated Chrome session
  B-->>A: lease_id
  A->>B: navigate / click / type / screenshot
  B->>C: browser action over CDP
  C-->>B: page state
  B-->>A: snapshot or result
  alt login or challenge required
    A->>B: auth_request or lease_control_request
    B-->>H: one-time portal URL
    H->>C: human completes login
  end
  A->>B: release(lease_id)
  B->>B: telemetry + feedback + audit trail
Loading

Quick Start

Python

git clone https://github.com/floomhq/openbrowser.git
cd openbrowser
python3 -m venv .venv
. .venv/bin/activate
pip install -e .
playwright install chromium
cp .env.example .env
cp config/identities.example.json config/identities.local.json

Start the broker:

openbrowser-broker

Docker

git clone https://github.com/floomhq/openbrowser.git
cd openbrowser
OPENBROWSER_API_KEYS="$(openssl rand -base64 48)" docker compose up --build

Lease a browser:

curl -fsS http://127.0.0.1:8767/lease \
  -H "content-type: application/json" \
  -d '{"owner":"demo","ttl_seconds":300}'

Use the returned lease_id:

curl -fsS http://127.0.0.1:8767/browser/navigate \
  -H "content-type: application/json" \
  -d '{"lease_id":"<lease_id>","url":"https://example.com"}'

curl -fsS http://127.0.0.1:8767/browser/snapshot \
  -H "content-type: application/json" \
  -d '{"lease_id":"<lease_id>"}'

curl -fsS -X POST http://127.0.0.1:8767/release/<lease_id>

Remote API

Expose the broker behind your HTTPS proxy or tunnel and configure:

OPENBROWSER_API_KEYS="your-long-random-api-key"
OPENBROWSER_PUBLIC_OPENBROWSER_BASE_URL="https://browser.example.com/openbrowser/v1"

Then call:

BASE=https://browser.example.com/openbrowser/v1
KEY=your-long-random-api-key

curl -fsS "$BASE/docs" \
  -H "authorization: Bearer $KEY" \
  -H "user-agent: openbrowser-client/1.0"

The API covers leases, navigation, snapshots, screenshots, clicks, typing, keyboard events, tabs, auth handoff, lease control, profiles, feedback issues, telemetry, and audits.

flowchart TD
  RemoteAgent["Remote agent"] -->|Bearer token| API["/openbrowser/v1"]
  API --> Lease["POST /leases"]
  API --> Browser["POST /browser/*"]
  API --> Auth["POST /auth/request"]
  API --> Issues["POST /feedback/issues"]
  API --> Audit["GET /audit"]
Loading

MCP

Local MCP, for agents running on the broker host:

{
  "mcpServers": {
    "openbrowser-broker": {
      "command": "openbrowser-mcp"
    }
  }
}

Remote MCP, for agents running anywhere:

{
  "mcpServers": {
    "openbrowser-remote": {
      "command": "openbrowser-remote-mcp",
      "env": {
        "OPENBROWSER_API_KEY": "<OPENBROWSER_API_KEY>",
        "OPENBROWSER_BASE_URL": "https://browser.example.com/openbrowser/v1"
      }
    }
  }
}

Core MCP tools:

  • browser_lease, browser_release, browser_heartbeat
  • browser_navigate, browser_snapshot, browser_screenshot
  • browser_click, browser_type, browser_keyboard_type, browser_keyboard_press
  • browser_tabs, browser_new_tab, browser_switch_tab, browser_wait
  • auth_request, auth_status, lease_control_request
  • feedback_report_issue, feedback_list_issues, feedback_update_issue
  • telemetry_record_event, telemetry_list_events, telemetry_summary
  • broker_audit, broker_docs, profile_status

Persistent Profiles

Identities are configured in config/identities.local.json:

{
  "identities": {
    "work-main": {
      "label": "Work account",
      "site": "example.com",
      "slot": "auto",
      "profile_dir": "/var/lib/openbrowser-broker/profiles/work-main",
      "proxy_ref": "residential:work-main",
      "timezone": "America/New_York",
      "lang": "en-US",
      "policy": {
        "max_parallel_sessions": 1,
        "requires_human_auth": true
      }
    }
  }
}

When an identity needs login:

curl -fsS "$BASE/auth/request" \
  -H "authorization: Bearer $KEY" \
  -H "content-type: application/json" \
  -d '{"owner":"setup","identity_id":"work-main","url":"https://example.com/login","reason":"initial_login"}'

Open the returned portal_url, complete login in the browser view, then mark the request complete. Future leases for that identity reuse the saved profile state.

flowchart LR
  Identity["identity_id=work-main"] --> Profile["Chrome profile dir"]
  Identity --> Policy["parallel-session policy"]
  Identity --> ProxyRef["optional proxy_ref"]
  ProxyRef --> Forwarder["local proxy forwarder"]
  Profile --> Chrome["leased Chrome slot"]
  Forwarder --> Chrome
Loading

Proxy Routing

Add proxy credentials in secrets/proxies.json:

{
  "proxies": {
    "residential:work-main": {
      "scheme": "http",
      "host": "proxy.example.net",
      "port": 12345,
      "username": "user",
      "password": "pass"
    }
  }
}

Then set "proxy_ref": "residential:work-main" on the identity. The broker starts a local proxy forwarder and launches Chrome with the matching proxy for that profile.

Safety Model

  • Raw cookies, passwords, tokens, proxy credentials, and VNC passwords are never returned by tools.
  • Telemetry redacts sensitive keys and secret-shaped strings.
  • Browser typing telemetry stores text length, not typed text.
  • Login and challenge handling use human handoff portals instead of secrets in chat.
  • CAPTCHA solving and ban-circumvention automation are outside the project boundary.

What This Is Not

  • Not a CAPTCHA solver.
  • Not a token extractor.
  • Not a shared global Chrome tab for every agent.
  • Not a scraping bypass toolkit.
  • Not a replacement for product APIs when a stable API exists.

Operations

openbrowser-audit --json
openbrowser-use --json open https://example.com
openbrowser-adapter status --format json

Systemd examples live in systemd/. Detailed runbooks live in docs/.

Development

python3 -m compileall ax_browser_broker tests
pytest -q

Project Status

OpenBrowser is an alpha public release of production-oriented browser infrastructure. The core lease manager, profile identities, remote API, MCP surfaces, human auth handoff, telemetry, feedback issues, audits, and adapter wrappers are covered by tests. New deployments can use the generic commands and environment variables above; legacy ax-* command wrappers remain for existing installations.

License

MIT