Current state
Pilo's action set in packages/core/src/tools/webActionTools.ts covers click, fill, select, hover, check, uncheck, focus, enter, wait, goto, back, forward, extract, done, abort — but does not include a scroll action.
Implicit scroll happens via locator.scrollIntoViewIfNeeded (packages/core/src/browser/playwrightBrowser.ts:815) before each click/fill/etc. That brings a known ref into view but cannot:
- Drive infinite-scroll feeds (Twitter, search results, product listings)
- Trigger lazy-loaded content that only appears after scroll
- Navigate to a specific page offset
- Scroll inside an
overflow: scroll container without first interacting with a child
The system prompt instructs the agent (packages/core/src/prompts.ts:157-159):
The accessibility tree shows all currently loaded page elements. On dynamic pages, some content may only appear after scrolling or interaction — if expected data isn't visible, try scrolling or interacting to trigger loading.
This guidance is misleading: the agent has no scroll tool, so "try scrolling" cannot be acted on directly.
The gap
Without an explicit scroll action:
- The agent cannot complete infinite-scroll tasks ("find the 50th product in this list").
- Tasks requiring data loaded below the fold often fail because the data never enters the snapshot.
- The agent sometimes thrashes — repeatedly clicking page-N buttons or "load more" anchors — when a simple viewport scroll would suffice.
Proposed scope
Add a scroll tool to tools/webActionTools.ts:
scroll: tool({
description:
"Scroll the page (or a specific scrollable element). direction='down' or 'up'. " +
"pages=0.5 scrolls half a viewport; pages=3 scrolls three viewports. " +
"Pass a ref to scroll inside that element instead of the page body.",
inputSchema: z.object({
direction: z.enum(["up", "down"]).default("down"),
pages: z.number().min(0.1).max(20).default(1),
ref: z.string().optional(),
}),
execute: async ({ direction, pages, ref }) => {
return performActionWithValidation(
PageAction.Scroll,
context,
ref,
JSON.stringify({ direction, pages }),
);
},
}),
In packages/core/src/browser/ariaBrowser.ts, add Scroll to the PageAction enum.
In packages/core/src/browser/playwrightBrowser.ts, add a Scroll handler that:
- If
ref is provided AND the element is scrollable (scrollHeight > clientHeight), scroll the element.
- Otherwise scroll the window.
- For
pages >= 1, chunk into single-viewport steps with a ~150ms delay between them so lazy-loaders fire.
Update prompts.ts:
- Add
scroll({...}) to buildToolExamples (around prompts.ts:163-210).
- Add a best-practices bullet about when to scroll vs. when to use search/find tools.
- The "try scrolling" line in
youArePrompt becomes accurate.
Implementation notes
- Scrollable element detection: only scroll the ref if the element actually has overflow. Otherwise the model can call
scroll(ref=...) on a non-scrollable element and nothing happens silently. Falling through to window scroll is the safer default.
- Smooth vs. instant: prefer
behavior: "instant". Smooth scroll plus immediate snapshot can race; the snapshot fires before the page has visually settled.
- At-bottom signal: optionally return
{ atBottom: true } when window.scrollY + window.innerHeight >= document.body.scrollHeight - 1. The agent can stop trying to scroll further on dead-end pages.
pageChanged: true: the next iteration must re-snapshot to capture newly-loaded content.
- Cross-frame scroll: defer iframe-internal scrolling for v2. Most use cases need page-body scroll.
Acceptance criteria
scroll action exists in webActionTools.ts, listed in tool examples and best-practices guidance.
- Tests in
packages/core/test/ cover: scroll down by N pages, scroll up, scroll within a specific scrollable element, scroll on a non-scrollable ref falls through to window scroll, at-bottom returns the right signal.
- The misleading "try scrolling" guidance in
youArePrompt is now backed by an actual tool.
- Manual smoke test on at least one infinite-scroll site (a product listing or social feed) showing the agent can navigate past viewport-one content.
Effort estimate
1-2 days including tests and prompt updates.
Related issues
Independent. Pairs naturally with the modal/viewport-context work in another issue (scroll position becomes more visible to the model).
Files likely affected
packages/core/src/tools/webActionTools.ts
packages/core/src/browser/ariaBrowser.ts (PageAction enum)
packages/core/src/browser/playwrightBrowser.ts (Scroll handler)
packages/core/src/prompts.ts (tool examples + best practices)
packages/core/test/webAgent.test.ts or a new dedicated scroll test file
Current state
Pilo's action set in
packages/core/src/tools/webActionTools.tscovers click, fill, select, hover, check, uncheck, focus, enter, wait, goto, back, forward, extract, done, abort — but does not include a scroll action.Implicit scroll happens via
locator.scrollIntoViewIfNeeded(packages/core/src/browser/playwrightBrowser.ts:815) before each click/fill/etc. That brings a known ref into view but cannot:overflow: scrollcontainer without first interacting with a childThe system prompt instructs the agent (
packages/core/src/prompts.ts:157-159):This guidance is misleading: the agent has no scroll tool, so "try scrolling" cannot be acted on directly.
The gap
Without an explicit scroll action:
Proposed scope
Add a
scrolltool totools/webActionTools.ts:In
packages/core/src/browser/ariaBrowser.ts, addScrollto thePageActionenum.In
packages/core/src/browser/playwrightBrowser.ts, add a Scroll handler that:refis provided AND the element is scrollable (scrollHeight > clientHeight), scroll the element.pages >= 1, chunk into single-viewport steps with a ~150ms delay between them so lazy-loaders fire.Update
prompts.ts:scroll({...})tobuildToolExamples(aroundprompts.ts:163-210).youArePromptbecomes accurate.Implementation notes
scroll(ref=...)on a non-scrollable element and nothing happens silently. Falling through to window scroll is the safer default.behavior: "instant". Smooth scroll plus immediate snapshot can race; the snapshot fires before the page has visually settled.{ atBottom: true }whenwindow.scrollY + window.innerHeight >= document.body.scrollHeight - 1. The agent can stop trying to scroll further on dead-end pages.pageChanged: true: the next iteration must re-snapshot to capture newly-loaded content.Acceptance criteria
scrollaction exists inwebActionTools.ts, listed in tool examples and best-practices guidance.packages/core/test/cover: scroll down by N pages, scroll up, scroll within a specific scrollable element, scroll on a non-scrollable ref falls through to window scroll, at-bottom returns the right signal.youArePromptis now backed by an actual tool.Effort estimate
1-2 days including tests and prompt updates.
Related issues
Independent. Pairs naturally with the modal/viewport-context work in another issue (scroll position becomes more visible to the model).
Files likely affected
packages/core/src/tools/webActionTools.tspackages/core/src/browser/ariaBrowser.ts(PageAction enum)packages/core/src/browser/playwrightBrowser.ts(Scroll handler)packages/core/src/prompts.ts(tool examples + best practices)packages/core/test/webAgent.test.tsor a new dedicated scroll test file