This section will guide you through setting up and running the Computer Use Preview model. Follow these steps to get started.
Clone the Repository
git clone https://github.com/google/computer-use-preview.git
cd computer-use-preview
Set up Python Virtual Environment and Install Dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Install Playwright and Browser Dependencies
# Install system dependencies required by Playwright for Chrome
playwright install-deps chrome
# Install the Chrome browser for Playwright
playwright install chrome
You can get started using either the Gemini Developer API or Vertex AI.
You need a Gemini API key to use the agent:
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
Or to add this to your virtual environment:
echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activate
Replace YOUR_GEMINI_API_KEY
with your actual key.
You need to explicitly use Vertex AI, then provide project and location to use the agent:
export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"
Or to add this to your virtual environment:
echo 'export USE_VERTEXAI=true' >> .venv/bin/activate
echo 'export VERTEXAI_PROJECT="your-project-id"' >> .venv/bin/activate
echo 'export VERTEXAI_LOCATION="your-location"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activate
Replace YOUR_PROJECT_ID
and YOUR_LOCATION
with your actual project and location.
The primary way to use the tool is via the main.py
script.
General Command Structure:
python main.py --query "Go to Google and type 'Hello World' into the search bar"
Available Environments:
You can specify a particular environment with the --env <environment>
flag. Available options:
playwright
: Runs the browser locally using Playwright.browserbase
: Connects to a Browserbase instance.
Local Playwright
Runs the agent using a Chrome browser instance controlled locally by Playwright.
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"
You can also specify an initial URL for the Playwright environment:
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright" --initial_url="https://www.google.com/search?q=latest+AI+news"
Browserbase
Runs the agent using Browserbase as the browser backend. Ensure the proper Browserbase environment variables are set:BROWSERBASE_API_KEY
and BROWSERBASE_PROJECT_ID
.
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="browserbase"
The main.py
script is the command-line interface (CLI) for running the browser agent.
Argument | Description | Required | Default | Supported Environment(s) |
---|---|---|---|---|
--query |
The natural language query for the browser agent to execute. | Yes | N/A | All |
--env |
The computer use environment to use. Must be one of the following: playwright , or browserbase |
No | N/A | All |
--initial_url |
The initial URL to load when the browser starts. | No | https://www.google.com | All |
--highlight_mouse |
If specified, the agent will attempt to highlight the mouse cursor's position in the screenshots. This is useful for visual debugging. | No | False (not highlighted) | playwright |
Variable | Description | Required |
---|---|---|
GEMINI_API_KEY | Your API key for the Gemini model. | Yes |
BROWSERBASE_API_KEY | Your API key for Browserbase. | Yes (when using the browserbase environment) |
BROWSERBASE_PROJECT_ID | Your Project ID for Browserbase. | Yes (when using the browserbase environment) |