Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cd.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .github/workflows/ci.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .releaserc.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ browser-agent is an A2A (Agent-to-Agent) server implementing the [A2A Protocol](

### ADL-Generated Structure

The codebase is generated using ADL CLI 0.23.8 and follows a strict generation pattern:
The codebase is generated using ADL CLI 0.23.9 and follows a strict generation pattern:
- **Generated Files**: Marked with `DO NOT EDIT` headers - manual changes will be overwritten
- **Configuration Source**: `agent.yaml` - defines agent capabilities, skills, and metadata
- **Server Implementation**: Built on the ADK (Agent Development Kit) framework from `github.com/inference-gateway/adk`
Expand Down Expand Up @@ -117,7 +117,7 @@ Activate with: `flox activate` (if Flox is installed)

- **Generated Files**: Never manually edit files with "DO NOT EDIT" headers
- **Configuration Changes**: Always modify `agent.yaml` and regenerate
- **ADL Version**: Ensure ADL CLI 0.23.8 or compatible version for regeneration
- **ADL Version**: Ensure ADL CLI 0.23.9 or compatible version for regeneration
- **Port Configuration**: Default 8080, configurable via `A2A_PORT` or `A2A_SERVER_PORT`

## Debugging Tips
Expand Down
41 changes: 33 additions & 8 deletions Dockerfile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

46 changes: 42 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,18 @@ A production-ready [Agent-to-Agent (A2A)](https://github.com/inference-gateway/a
## Quick Start

```bash
# Run the agent
# Run the agent locally
go run .

# Or with Docker
# Or with Docker (Chromium only - smallest image)
docker build -t browser-agent .
docker run -p 8080:8080 browser-agent

# Build with specific browser engine
docker build --build-arg BROWSER_ENGINE=firefox -t browser-agent:firefox .

# Run with Xvfb enabled (for extensions or specific rendering features)
docker run -p 8080:8080 -e BROWSER_XVFB_ENABLED=true browser-agent
```

## Features
Expand Down Expand Up @@ -62,16 +68,21 @@ The following custom configuration variables are available:
|----------|----------|-------------|---------|
| **Browser** | `BROWSER_ARGS` | Args configuration | `[--disable-blink-features=AutomationControlled --disable-features=VizDisplayCompositor --no-first-run --disable-default-apps --disable-extensions --disable-plugins --disable-sync --disable-translate --hide-scrollbars --mute-audio --no-zygote --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-renderer-backgrounding --disable-ipc-flooding-protection]` |
| **Browser** | `BROWSER_DATA_DIR` | Data_dir configuration | `/tmp/playwright/artifacts` |
| **Browser** | `BROWSER_ENGINE` | Engine configuration | `chromium` |
| **Browser** | `BROWSER_HEADER_ACCEPT` | Header_accept configuration | `text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7` |
| **Browser** | `BROWSER_HEADER_ACCEPT_ENCODING` | Header_accept_encoding configuration | `gzip, deflate, br` |
| **Browser** | `BROWSER_HEADER_ACCEPT_LANGUAGE` | Header_accept_language configuration | `en-US,en;q=0.9` |
| **Browser** | `BROWSER_HEADER_CONNECTION` | Header_connection configuration | `keep-alive` |
| **Browser** | `BROWSER_HEADER_DNT` | Header_dnt configuration | `1` |
| **Browser** | `BROWSER_HEADER_UPGRADE_INSECURE_REQUESTS` | Header_upgrade_insecure_requests configuration | `1` |
| **Browser** | `BROWSER_HEADLESS` | Headless configuration | `true` |
| **Browser** | `BROWSER_STEALTH_MODE` | Stealth_mode configuration | `false` |
| **Browser** | `BROWSER_USER_AGENT` | User_agent configuration | `Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36` |
| **Browser** | `BROWSER_VIEWPORT_HEIGHT` | Viewport_height configuration | `1080` |
| **Browser** | `BROWSER_VIEWPORT_WIDTH` | Viewport_width configuration | `1920` |
| **Browser** | `BROWSER_XVFB_DISPLAY` | Xvfb_display configuration | `:99` |
| **Browser** | `BROWSER_XVFB_ENABLED` | Xvfb_enabled configuration | `false` |
| **Browser** | `BROWSER_XVFB_SCREEN_RESOLUTION` | Xvfb_screen_resolution configuration | `1920x1080x24` |

| Category | Variable | Description | Default |
|----------|----------|-------------|---------|
Expand Down Expand Up @@ -158,10 +169,10 @@ docker run --rm -it --network host ghcr.io/inference-gateway/a2a-debugger:latest

### Docker

The Docker image can be built with custom version information using build arguments:
The Docker image can be built with custom version information and browser selection using build arguments:

```bash
# Build with default values from ADL
# Build with default values from ADL (Chromium only)
docker build -t browser-agent .

# Build with custom version information
Expand All @@ -170,15 +181,42 @@ docker build \
--build-arg AGENT_NAME="My Custom Agent" \
--build-arg AGENT_DESCRIPTION="Custom agent description" \
-t browser-agent:1.2.3 .

# Build with specific browser engine
docker build --build-arg BROWSER_ENGINE=firefox -t browser-agent:firefox .

# Build with all browsers (larger image)
docker build --build-arg BROWSER_ENGINE=all -t browser-agent:all .
```

**Available Build Arguments:**
- `VERSION` - Agent version (default: `0.4.1`)
- `AGENT_NAME` - Agent name (default: `browser-agent`)
- `AGENT_DESCRIPTION` - Agent description (default: `AI agent for browser automation and web testing using Playwright`)
- `BROWSER_ENGINE` - Browser to install (`chromium`, `firefox`, `webkit`, or `all`) (default: `chromium`)

These values are embedded into the binary at build time using linker flags, making them accessible at runtime without requiring environment variables.

#### Xvfb Configuration

By default, the browser runs in native headless mode. For cases requiring a virtual display (e.g., extensions, specific rendering features), you can enable Xvfb:

```bash
# Run with Xvfb enabled
docker run -p 8080:8080 \
-e BROWSER_XVFB_ENABLED=true \
browser-agent

# Customize Xvfb display settings
docker run -p 8080:8080 \
-e BROWSER_XVFB_ENABLED=true \
-e BROWSER_XVFB_DISPLAY=:99 \
-e BROWSER_XVFB_SCREEN_RESOLUTION=1920x1080x24 \
browser-agent
```

**Security Note:** Xvfb is configured without the `-ac` flag (access control disabled) for security. The X server uses `-nolisten tcp` to prevent network access.

## License

MIT License - see LICENSE file for details
2 changes: 1 addition & 1 deletion Taskfile.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ spec:
config:
browser:
headless: true
engine: "chromium"
stealth_mode: false
user_agent: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
viewport_width: 1920
viewport_height: 1080
Expand All @@ -22,6 +24,9 @@ spec:
header_connection: "keep-alive"
header_upgrade_insecure_requests: "1"
data_dir: "/tmp/playwright/artifacts"
xvfb_enabled: false
xvfb_display: ":99"
xvfb_screen_resolution: "1920x1080x24"
args:
- "--disable-blink-features=AutomationControlled"
- "--disable-features=VizDisplayCompositor"
Expand Down
7 changes: 6 additions & 1 deletion config/config.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 60 additions & 0 deletions docker-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/bin/bash
set -e

# Configuration from environment variables
XVFB_ENABLED="${BROWSER_XVFB_ENABLED:-false}"
XVFB_DISPLAY="${BROWSER_XVFB_DISPLAY:-:99}"
XVFB_SCREEN="${BROWSER_XVFB_SCREEN_RESOLUTION:-1920x1080x24}"

# Function to check if Xvfb is ready
wait_for_xvfb() {
local max_attempts=10
local attempt=0

while [ $attempt -lt $max_attempts ]; do
if xdpyinfo -display "$XVFB_DISPLAY" >/dev/null 2>&1; then
echo "Xvfb is ready on display $XVFB_DISPLAY"
return 0
fi
attempt=$((attempt + 1))
sleep 0.5
done

echo "Warning: Xvfb failed to start within timeout"
return 1
}

# Start Xvfb if enabled
if [ "$XVFB_ENABLED" = "true" ]; then
echo "Starting Xvfb on display $XVFB_DISPLAY with screen resolution $XVFB_SCREEN"

# Start Xvfb without -ac flag for security
# Use -nolisten tcp to prevent network access
Xvfb "$XVFB_DISPLAY" -screen 0 "$XVFB_SCREEN" -nolisten tcp &
XVFB_PID=$!

# Wait for Xvfb to be ready
if wait_for_xvfb; then
export DISPLAY="$XVFB_DISPLAY"
echo "Xvfb started successfully (PID: $XVFB_PID)"
else
echo "Error: Xvfb failed to start properly"
kill "$XVFB_PID" 2>/dev/null || true
exit 1
fi

# Trap to cleanup Xvfb on exit
trap "echo 'Stopping Xvfb...'; kill $XVFB_PID 2>/dev/null || true" EXIT
else
echo "Xvfb disabled, using native headless mode"
fi

# Log configuration
echo "Browser configuration:"
echo " Engine: ${BROWSER_ENGINE:-chromium}"
echo " Headless: ${BROWSER_HEADLESS:-true}"
echo " Stealth Mode: ${BROWSER_STEALTH_MODE:-false}"
echo " Xvfb Enabled: $XVFB_ENABLED"

# Start the main application
exec ./main
4 changes: 4 additions & 0 deletions example/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,9 @@ DEEPSEEK_API_KEY=
GOOGLE_API_KEY=

# Agent
BROWSER_ENGINE=chromium
BROWSER_HEADLESS=false
BROWSER_XVFB_ENABLED=true
BROWSER_STEALTH_MODE=true
A2A_AGENT_CLIENT_PROVIDER=deepseek
A2A_AGENT_CLIENT_MODEL=deepseek-chat
29 changes: 29 additions & 0 deletions example/Dockerfile.vnc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM alpine:latest

# Install x11vnc
RUN apk add --no-cache x11vnc

# Create a startup script that connects to the shared X display
COPY <<'EOF' /usr/local/bin/start-vnc.sh
#!/bin/sh
set -e

echo "Waiting for X display :99 to be available..."
echo "Checking /tmp/.X11-unix directory..."
ls -la /tmp/.X11-unix/ 2>/dev/null || echo "Directory not found or empty"

while true; do
if [ -S /tmp/.X11-unix/X99 ]; then
echo "X11 socket found, attempting to start x11vnc..."
x11vnc -display :99 -forever -shared -passwd password -rfbport 5900 -listen 0.0.0.0 -noshm && break
else
echo "Waiting for /tmp/.X11-unix/X99 socket... ($(date))"
ls -la /tmp/.X11-unix/ 2>/dev/null || echo "Directory still empty"
fi
sleep 5
done
EOF

RUN chmod +x /usr/local/bin/start-vnc.sh

ENTRYPOINT ["/usr/local/bin/start-vnc.sh"]
Loading