Skip to content

Add image and audio to prompt/exec#3

Merged
vaibhavpandeyvpz merged 1 commit intoone710:mainfrom
vaibhavqrcg:VPZ/support-for-file-prompts
Mar 16, 2026
Merged

Add image and audio to prompt/exec#3
vaibhavpandeyvpz merged 1 commit intoone710:mainfrom
vaibhavqrcg:VPZ/support-for-file-prompts

Conversation

@vaibhavqrcg
Copy link
Contributor

Adds support for multiple prompt content types in prompt and exec: text (existing), image, and audio. Users can pass files with --image and --audio (repeatable). Content is sent as ACP content blocks (text, image, audio) in session/prompt requests.

Usage

  • Flags: --image <path>, --audio <path> (global; repeatable). Must appear before the command.
  • Examples:
    • codeye --image screenshot.png exec "what is shown here?"
    • codeye --image a.png --image b.png cursor prompt <session-id> "compare these"
    • codeye --audio meeting.wav exec "transcribe and list action items"
  • Order: Text block first (from positional args), then image blocks in order, then audio blocks. Image/audio files are read and sent as base64 with MIME type from extension.

Implementation

ACP

  • PromptTextPart replaced by PromptPart with Type, Text, MimeType, Data (base64).
  • TextPrompt(text) []PromptPart helper for text-only prompts.
  • SessionPromptRequest.Prompt is now []PromptPart.

Client & runtime

  • Client.Prompt(ctx, sessionID, parts []acp.PromptPart); all session runtime prompt/exec entrypoints take []acp.PromptPart instead of a single string.
  • Existing call sites use acp.TextPrompt("...") for backward compatibility.

CLI

  • internal/cli/prompt_parts.go: BuildPromptParts(text, imagePaths, audioPaths) builds the parts list, reads files, infers MIME type, and base64-encodes.
  • Supported: images .png, .jpg, .jpeg, .gif, .webp; audio .wav, .mp3, .mpeg, .ogg, .flac, .m4a.
  • dispatch for prompt and exec builds parts from positional text plus flags.AudioPaths and flags.ImagePaths.

Queue

  • Request.PromptParts []acp.PromptPart (optional). When present, server uses it; otherwise uses Prompt as a single text part (backward compatible).
  • Handler.Prompt(ctx, sessionID, parts []acp.PromptPart) updated; queue tests adjusted.

Docs

  • README: --audio / --image in global options and a short usage note.
  • SKILL.md: Same options plus an “Image and audio in prompt/exec” section with examples (one/many image, one/many audio, image+audio, for both exec and prompt).

Notes

  • Agents must advertise image and/or audio prompt capabilities in initialization for those blocks to be accepted (per ACP).
  • All tests updated and passing; existing behavior (text-only) unchanged.

@vaibhavpandeyvpz vaibhavpandeyvpz merged commit 359a3a2 into one710:main Mar 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants