Skip to content

Add scroll action to web action tools #427

@lmorchard

Description

@lmorchard

Current state

Pilo's action set in packages/core/src/tools/webActionTools.ts covers click, fill, select, hover, check, uncheck, focus, enter, wait, goto, back, forward, extract, done, abort — but does not include a scroll action.

Implicit scroll happens via locator.scrollIntoViewIfNeeded (packages/core/src/browser/playwrightBrowser.ts:815) before each click/fill/etc. That brings a known ref into view but cannot:

  • Drive infinite-scroll feeds (Twitter, search results, product listings)
  • Trigger lazy-loaded content that only appears after scroll
  • Navigate to a specific page offset
  • Scroll inside an overflow: scroll container without first interacting with a child

The system prompt instructs the agent (packages/core/src/prompts.ts:157-159):

The accessibility tree shows all currently loaded page elements. On dynamic pages, some content may only appear after scrolling or interaction — if expected data isn't visible, try scrolling or interacting to trigger loading.

This guidance is misleading: the agent has no scroll tool, so "try scrolling" cannot be acted on directly.

The gap

Without an explicit scroll action:

  • The agent cannot complete infinite-scroll tasks ("find the 50th product in this list").
  • Tasks requiring data loaded below the fold often fail because the data never enters the snapshot.
  • The agent sometimes thrashes — repeatedly clicking page-N buttons or "load more" anchors — when a simple viewport scroll would suffice.

Proposed scope

Add a scroll tool to tools/webActionTools.ts:

scroll: tool({
  description:
    "Scroll the page (or a specific scrollable element). direction='down' or 'up'. " +
    "pages=0.5 scrolls half a viewport; pages=3 scrolls three viewports. " +
    "Pass a ref to scroll inside that element instead of the page body.",
  inputSchema: z.object({
    direction: z.enum(["up", "down"]).default("down"),
    pages: z.number().min(0.1).max(20).default(1),
    ref: z.string().optional(),
  }),
  execute: async ({ direction, pages, ref }) => {
    return performActionWithValidation(
      PageAction.Scroll,
      context,
      ref,
      JSON.stringify({ direction, pages }),
    );
  },
}),

In packages/core/src/browser/ariaBrowser.ts, add Scroll to the PageAction enum.

In packages/core/src/browser/playwrightBrowser.ts, add a Scroll handler that:

  • If ref is provided AND the element is scrollable (scrollHeight > clientHeight), scroll the element.
  • Otherwise scroll the window.
  • For pages >= 1, chunk into single-viewport steps with a ~150ms delay between them so lazy-loaders fire.

Update prompts.ts:

  • Add scroll({...}) to buildToolExamples (around prompts.ts:163-210).
  • Add a best-practices bullet about when to scroll vs. when to use search/find tools.
  • The "try scrolling" line in youArePrompt becomes accurate.

Implementation notes

  • Scrollable element detection: only scroll the ref if the element actually has overflow. Otherwise the model can call scroll(ref=...) on a non-scrollable element and nothing happens silently. Falling through to window scroll is the safer default.
  • Smooth vs. instant: prefer behavior: "instant". Smooth scroll plus immediate snapshot can race; the snapshot fires before the page has visually settled.
  • At-bottom signal: optionally return { atBottom: true } when window.scrollY + window.innerHeight >= document.body.scrollHeight - 1. The agent can stop trying to scroll further on dead-end pages.
  • pageChanged: true: the next iteration must re-snapshot to capture newly-loaded content.
  • Cross-frame scroll: defer iframe-internal scrolling for v2. Most use cases need page-body scroll.

Acceptance criteria

  • scroll action exists in webActionTools.ts, listed in tool examples and best-practices guidance.
  • Tests in packages/core/test/ cover: scroll down by N pages, scroll up, scroll within a specific scrollable element, scroll on a non-scrollable ref falls through to window scroll, at-bottom returns the right signal.
  • The misleading "try scrolling" guidance in youArePrompt is now backed by an actual tool.
  • Manual smoke test on at least one infinite-scroll site (a product listing or social feed) showing the agent can navigate past viewport-one content.

Effort estimate

1-2 days including tests and prompt updates.

Related issues

Independent. Pairs naturally with the modal/viewport-context work in another issue (scroll position becomes more visible to the model).

Files likely affected

  • packages/core/src/tools/webActionTools.ts
  • packages/core/src/browser/ariaBrowser.ts (PageAction enum)
  • packages/core/src/browser/playwrightBrowser.ts (Scroll handler)
  • packages/core/src/prompts.ts (tool examples + best practices)
  • packages/core/test/webAgent.test.ts or a new dedicated scroll test file

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions