Summary
Add a dedicated upload_file action tool to Pilo's web action vocabulary (packages/core/src/tools/webActionTools.ts). This is a focused extraction of section C from #436 — surfaced as its own issue because real-world evals identify file upload as the single dominant capability gap for interaction-heavy web tasks, distinct from the other three tools in that proposal.
Evidence
End-to-end run bu-benchmark-p22ml against the browser-use/benchmark interaction stress-tests (20 tasks via make cloud-eval-browser-use-benchmark, gemini-2.5-flash + Chromium). 13/20 failed. 11 of those 13 failures were file-upload related, across 9 distinct form frameworks:
| Framework |
Agent's abort message (verbatim) |
| React Hook Form |
"I cannot interact with the file system to create and upload a file" |
| Formik |
"the form cannot be submitted without a file" |
| AngularJS |
"the form requires a file upload, which cannot be completed through the current browsing environment" |
| Vue |
"unable to upload a file as there is no tool to interact with the file system directly" |
| Svelte |
"fill command on the file input fails with 'Input of type "file" cannot be filled'" |
| Material-UI |
"I cannot interact with a file selection dialog using the provided tools" |
| jQuery Bootstrap |
"the 'Upload Profile Picture' field (file input type) cannot be filled using the available tools" |
| Ember |
"I cannot interact with the file upload dialog directly" |
| Wufoo-style |
"Cannot upload a file as there is no tool to create a dummy file from the local file system" |
The failure shape is consistent: the agent reaches a required <input type=\"file\">, correctly recognizes that fill rejects file inputs (Playwright design — page.fill() errors on file inputs), tries click on the picker button without an interactable dialog, and aborts. Judge classifies all 9 as agent_gave_up_early.
Proposed solution (matches #436 section C)
upload_file: tool({
description:
\"Upload a file to a file input element. The file path must be a local filesystem path \" +
\"or a URL that the agent has been authorized to fetch.\",
inputSchema: z.object({
ref: z.string().describe(\"Element reference of the file input (or its container)\"),
path: z.string().describe(\"Local file path or pre-authorized URL\"),
}),
execute: async ({ ref, path }) => {
return performActionWithValidation(PageAction.UploadFile, context, ref, path);
},
}),
Implementation: Playwright's locator.setInputFiles(path). Per #436, gated behind WebAgentOptions.allowFileUpload?: { allowedPaths?: string[] }. Default: disabled.
Secondary concern surfaced by the eval
Several tasks worded as "complete the form, if needed create a file, then submit" — the agent's abort message frequently notes a second gap: no way to synthesize a source file. With upload_file alone (path required), the agent still can't satisfy these tasks unless there's an external source.
Two ways to address this without expanding the security surface much:
- Pre-staged fixture directory (recommended for evals): the runner provides a known
/tmp/agent-fixtures/ (or similar) with sample.pdf, sample.png, sample.csv, etc., plus a manifest the agent can read. Agent picks an appropriate file for the upload. Zero file-synthesis surface in Pilo itself.
- URL-fetch path (already implied by 436's
path accepting URLs): agent fetches from a trusted URL list and feeds the temp path to upload_file. Same security model as web fetches.
Not asking this issue to resolve the synthesis gap — flagging it as a real-world finding so whoever picks up upload_file can think about which side handles it.
Why split from #436
#436 bundles four orthogonal tool additions (send_keys, screenshot, upload_file, dropdown_options). Each has independent value, independent implementation effort, and very different security profiles — upload_file is the only one with a non-trivial security surface (arbitrary file reads). Treating it as a standalone deliverable gives it the focus it needs, and the eval evidence above gives it strong prioritization signal vs. the others in #436.
Acceptance
upload_file exists in webActionTools.ts, gated by config option, with tests
- A re-run of
make cloud-eval-browser-use-benchmark moves the 11 file-upload failures out of the agent_gave_up_early bucket (some may still fail for unrelated reasons, but not on "can't upload")
References
Summary
Add a dedicated
upload_fileaction tool to Pilo's web action vocabulary (packages/core/src/tools/webActionTools.ts). This is a focused extraction of section C from #436 — surfaced as its own issue because real-world evals identify file upload as the single dominant capability gap for interaction-heavy web tasks, distinct from the other three tools in that proposal.Evidence
End-to-end run
bu-benchmark-p22mlagainst the browser-use/benchmark interaction stress-tests (20 tasks viamake cloud-eval-browser-use-benchmark, gemini-2.5-flash + Chromium). 13/20 failed. 11 of those 13 failures were file-upload related, across 9 distinct form frameworks:fillcommand on the file input fails with 'Input of type "file" cannot be filled'"The failure shape is consistent: the agent reaches a required
<input type=\"file\">, correctly recognizes thatfillrejects file inputs (Playwright design —page.fill()errors on file inputs), triesclickon the picker button without an interactable dialog, and aborts. Judge classifies all 9 asagent_gave_up_early.Proposed solution (matches #436 section C)
Implementation: Playwright's
locator.setInputFiles(path). Per #436, gated behindWebAgentOptions.allowFileUpload?: { allowedPaths?: string[] }. Default: disabled.Secondary concern surfaced by the eval
Several tasks worded as "complete the form, if needed create a file, then submit" — the agent's abort message frequently notes a second gap: no way to synthesize a source file. With
upload_filealone (path required), the agent still can't satisfy these tasks unless there's an external source.Two ways to address this without expanding the security surface much:
/tmp/agent-fixtures/(or similar) withsample.pdf,sample.png,sample.csv, etc., plus a manifest the agent can read. Agent picks an appropriate file for the upload. Zero file-synthesis surface in Pilo itself.pathaccepting URLs): agent fetches from a trusted URL list and feeds the temp path toupload_file. Same security model as web fetches.Not asking this issue to resolve the synthesis gap — flagging it as a real-world finding so whoever picks up
upload_filecan think about which side handles it.Why split from #436
#436 bundles four orthogonal tool additions (
send_keys,screenshot,upload_file,dropdown_options). Each has independent value, independent implementation effort, and very different security profiles —upload_fileis the only one with a non-trivial security surface (arbitrary file reads). Treating it as a standalone deliverable gives it the focus it needs, and the eval evidence above gives it strong prioritization signal vs. the others in #436.Acceptance
upload_fileexists inwebActionTools.ts, gated by config option, with testsmake cloud-eval-browser-use-benchmarkmoves the 11 file-upload failures out of theagent_gave_up_earlybucket (some may still fail for unrelated reasons, but not on "can't upload")References