Skip to content

fix(cua): nest python gemini screenshots in FunctionResponse, drop unused openai dep#157

Merged
masnwilliams merged 1 commit intohypeship/unified-cua-templatefrom
hypeship/cua-fix-python-gemini-screenshots
May 5, 2026
Merged

fix(cua): nest python gemini screenshots in FunctionResponse, drop unused openai dep#157
masnwilliams merged 1 commit intohypeship/unified-cua-templatefrom
hypeship/cua-fix-python-gemini-screenshots

Conversation

@masnwilliams
Copy link
Copy Markdown
Contributor

@masnwilliams masnwilliams commented May 5, 2026

Summary

Two bugbot findings on commit `c684ca7`:

  1. Medium — Python Gemini provider sent screenshots as a separate `Part(inline_data=...)` entry in the user content after the `FunctionResponse` part. With multiple function calls per turn the model can't bind a screenshot to its originating call. The standalone `python/gemini-computer-use` template and the TS unified template both nest the screenshot as a `FunctionResponsePart` inside `FunctionResponse.parts`. This PR matches that structure and adds the predefined-actions allowlist that gates screenshot inclusion.

  2. Low — `openai` was listed in `pyproject.toml` but never imported. The OpenAI provider uses raw `httpx` against the Responses API. Removed.

Test plan

  • Smoke run python cua with Gemini against a multi-call turn and confirm screenshot binds to the originating call
  • `uv sync` after dep change

Note

Medium Risk
Moderate risk because it changes the structure of Gemini tool-call response parts, which could affect how multi-call turns are interpreted by the model or SDK. Dependency removal is low risk but may impact downstream installs if they relied on the extra package.

Overview
Gemini Python CUA now nests screenshots inside each tool call response. Instead of sending a standalone Part(inline_data=...) after the FunctionResponse, screenshots are attached as FunctionResponse.parts (as FunctionResponsePart/FunctionResponseBlob) so multi-call turns can reliably associate images with the correct action; screenshot inclusion is gated by a PREDEFINED_ACTIONS allowlist.

Template deps cleanup. Removes the unused openai dependency from pyproject.toml.

Reviewed by Cursor Bugbot for commit ee48a5c. Bugbot is set up for automated code reviews on this repo. Configure here.

…enai dep

- Python gemini provider was sending screenshots as separate Part(inline_data=...)
  entries after the FunctionResponse part. With multiple function calls per turn
  the model can't bind a screenshot to its originating call. Match the standalone
  gemini-computer-use template (and the TS unified template) by nesting the
  screenshot as a FunctionResponsePart inside FunctionResponse.parts, gated on
  the predefined-actions allowlist.
- Drop openai from pyproject.toml — provider uses httpx directly against the
  Responses API; the SDK was never imported.
@masnwilliams masnwilliams marked this pull request as ready for review May 5, 2026 22:55
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR modifies Python Gemini provider and dependencies in the CUA template, not kernel API endpoints or Temporal workflows as specified in the filter.

To monitor this PR anyway, reply with @firetiger monitor this.

@masnwilliams masnwilliams merged commit 51b69fb into hypeship/unified-cua-template May 5, 2026
3 checks passed
@masnwilliams masnwilliams deleted the hypeship/cua-fix-python-gemini-screenshots branch May 5, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant