# Computer Use Demo (Visual Mode)

This notebook demonstrates using webtask in **visual mode** with Google's Computer Use model.

In visual mode:
- Agent sees screenshots (no DOM tree)
- Agent uses pixel-based tools (click_at, type_text_at, scroll_at, etc.)
- Coordinates are normalized (0-999) and scaled to actual viewport

In [1]:
from webtask import Webtask
from webtask.integrations.llm.google import GeminiComputerUse

## Setup

Create a Webtask instance and GeminiComputerUse LLM.

The `GeminiComputerUse` class:
- Uses the `gemini-2.5-computer-use-preview-05-2025` model
- Has `coordinate_scale = 1000` (model returns 0-999 coordinates)

In [2]:
wt = Webtask()
llm = GeminiComputerUse()

## Create Agent in Visual Mode

Set `mode="visual"` to use pixel-based tools and screenshot context.
Browser options like `headless` are passed to `create_agent()`.

In [3]:
agent = await wt.create_agent(llm=llm, mode="visual", headless=False)

## Run a Task

The agent will:
1. See screenshots of the page
2. Use tools like `click_at(x, y)` and `type_text_at(x, y, text)` to interact

In [4]:
result = await agent.do(
    "Go to https://practicesoftwaretesting.com/, search for 'Cordless Drill 24V', and add it to the cart"
)
print(f"Feedback: {result.feedback}")

Feedback: Successfully searched for 'Cordless Drill 24V' and added it to the shopping cart.


## Cleanup

In [5]:
await wt.close()