fix(tools): use /images/generations endpoint for Gemini and OpenAI by xthanhn91 · Pull Request #9 · nextlevelbuilder/goclaw

xthanhn91 · 2026-02-27T14:59:29Z

Summary

create_image exclusively used /chat/completions with modalities:["image","text"], which only works on OpenRouter
Gemini returns HTTP 400: "Image generation is not yet supported on the chat.completions endpoint for this model"
OpenAI DALL-E models also require /images/generations, not /chat/completions
Route OpenRouter → /chat/completions (supports modalities), all other providers → /images/generations
Update default Gemini model from deprecated gemini-2.0-flash-exp to gemini-2.5-flash-image

Root Cause

The callImageGenAPI function was designed for OpenRouter's modalities-based chat completions format and used for all providers. But only OpenRouter supports this — Gemini and OpenAI both require the standard /images/generations endpoint.

Gemini error response:

{
  "error": {
    "code": 400,
    "message": "Image generation is not yet supported on the chat.completions endpoint for this model. Please use the standard client.images.generate method for creation"
  }
}

Fix

Added callStandardImageGenAPI using the /images/generations endpoint with response_format:"b64_json" — the standard OpenAI-compatible image generation format supported by Gemini, OpenAI, and most other providers.

Provider	Endpoint	Status
OpenRouter	`/chat/completions` + modalities	✅ (unchanged)
Gemini	`/images/generations` + b64_json	✅ (fixed)
OpenAI	`/images/generations` + b64_json	✅ (fixed)

Test plan

Test with Gemini provider: create_image with gemini-2.5-flash-image
Test with OpenRouter provider: verify modalities-based flow still works
Test with OpenAI provider (if available): verify DALL-E image generation
Verify MEDIA: path returned and image delivered to channel

…age gen create_image exclusively used /chat/completions with modalities:["image","text"] which only works on OpenRouter. Gemini returns HTTP 400: "Image generation is not yet supported on the chat.completions endpoint" OpenAI's DALL-E models also require /images/generations, not /chat/completions. Fix: route OpenRouter through /chat/completions (supports modalities), route all other providers (Gemini, OpenAI, etc.) through the standard /images/generations endpoint with response_format:"b64_json". Also update default Gemini model from deprecated gemini-2.0-flash-exp to gemini-2.5-flash-image.

Sprint 0 — Security hardening before feature development. HIGH fixes: - #1: Whitelist table names in execMapUpdate() — prevents SQL injection via dynamic table name (store/pg/helpers.go) - #2: Log invalid groupBy values in snapshot queries (store/pg/snapshot.go) - #3: Validated shellEscape() — single-quote wrapping is correct; added PBT tests for shell injection (tools/dynamic_tool_security_test.go) MEDIUM fixes: - #4-5: Log security warnings for no-token and viewer-fallback auth (gateway/router.go) - #6: Restrict CORS on OpenAPI endpoint — removed wildcard, allow only localhost origins (http/openapi.go) - #7: Add CheckSSRFWithPinning() for DNS rebinding TOCTOU prevention (tools/web_shared.go) - #8: Log warning when TLS verification is disabled (tracing/otelexport/exporter.go) - #9: Pin all Python package versions in Dockerfile — prevents supply chain attacks via unpinned dependencies - #10: Change HOME fallback from /tmp to /app — prevents temp dir abuse (tools/credentialed_exec.go) Also fixes arargoclaw double-rename bug in 356 Go import paths. Tests: PBT tests for table whitelist and shell escaping (testing/quick). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

9 checkpoint documents covering the upgrade from 43% to ~85% pattern matching with Claude Code's architectural patterns. Checkpoints: - CP-00: Current state analysis - CP-01: Context defense 5 layers (Pattern nextlevelbuilder#9) - CP-02: Concurrency-safe partitioning (Pattern nextlevelbuilder#4) - CP-03: Streaming tool execution (Pattern nextlevelbuilder#5) - CP-04: Escalating recovery (Pattern nextlevelbuilder#3) - CP-05: Context modifier chain + fork isolation (Patterns nextlevelbuilder#6, nextlevelbuilder#8) - CP-06: Permission classification pipeline (Pattern nextlevelbuilder#10) - CP-07: Skill system upgrade (Patterns nextlevelbuilder#11-13) - CP-08: Plugin ecosystem (Patterns nextlevelbuilder#14-16) Based on analysis from "Giai phau mot Agentic Operating System" (18 patterns from 513K LOC Claude Code source). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Phase 4 — final phase of the TTS params/layout/agent-override plan. Adds a 3-key allow-list (`speed`, `emotion`, `style`) per agent stored in `agents.other_config.tts_params`. Backend resolves and merges into `opts.Params` PER ATTEMPT inside the fallback loop so each provider sees its own native shape — never the primary's keys when fallback runs (Finding #1 critical). Backend: - `AgentOverridable bool` on `audio.ParamSchema`. UI filter reads this flag from /v1/tts/capabilities; no separate TS literal mirror — capabilities API is the single source of truth (Finding #9). - `audio.AdaptAgentParams(generic, provider)` maps the 3 generic keys to provider-native paths (e.g. `speed` → `voice_settings.speed` for ElevenLabs, flat `speed` for OpenAI/MiniMax, dropped for Edge/Gemini). - `Manager.SynthesizeWithFallbackAdapted` adapts inside the loop so fallback providers receive correctly-shaped params. - `manager_auto.go` and `tools/tts.go` Execute do per-attempt adaptation on the tenant + direct + fallback call sites. - Drop log bumped to `slog.Info("tts.agent.params.dropped", ...)` for audit trail when a generic key isn't supported by the active provider. - Cross-check test asserts every adapter switch case has at least one capability ParamSchema with `AgentOverridable: true`, and vice versa. Security (red-team findings): - Allow-list ENFORCED at write path: `validateAgentTTSParams` in HTTP `handleUpdate` AND WS `agents_update` rejects any `tts_params` key outside `{speed, emotion, style}` (Finding #5). - 64KB body cap on agent PUT via `http.MaxBytesReader` (Finding #6). - Explicit tenant-scope guard after `agents.GetByID` (Finding #12). - Concurrent-tab clobber: handleSave merges `tts_params` into a fresh copy of `otherConfig` rather than reusing stale state (Finding #13). - Rate-limit verified — RoleAdmin gate sufficient for v1 (Finding #15). Frontend (web + desktop): - `TtsOverrideBlock` rewritten: filters capability params to `agent_overridable === true`, renders via `DynamicParamForm`. Hides entirely for providers with no overridable params (Edge, Gemini). - Bidirectional adapter (generic ↔ capability-native form state) so agent storage stays in generic keys while UI works in native paths. 25 round-trip tests cover all 5 providers. - Desktop `AgentDetailPanel` gains an inline fine-tune section gated on `globalProvider`, reusing the desktop `DynamicParamForm`. i18n: `tts.override.params.title` ("Fine-tune") added to web + desktop en/vi/zh. Tests: all 9 backend suites green (race), web 214/214, desktop build clean, both Go build tags pass.

Post-review cleanup of Phase 4. Closes Finding #9 properly and corrects the Finding #13 documentation lie surfaced in the code-review report. Capability schema: - Replace `AgentOverridable bool` with `AgentOverridableAs string` on ParamSchema. Empty string = not overridable; non-empty = the generic key alias (`"speed"`, `"emotion"`, `"style"`). - Each provider declaration now carries the alias inline, so the generic↔native mapping has a single TS-readable source. Frontend: - Web `tts-override-block.tsx` drops the inline `GENERIC_TO_NATIVE` literal and derives the bidirectional adapter from the filtered capability params (each param self-describes its alias). Adapter tests rewritten around the new shape. - Desktop `AgentDetailPanel.tsx` drops the 45-line inline IIFE in favour of a new `<TtsOverrideFineTune>` component that uses the same alias-based mapping. Backend: - Move `AgentTTSParamsAllowedKeys` + `ValidateAgentTTSParams` to `internal/audio/agent_params_adapter.go`. HTTP `validate.go` and WS `gateway/methods/agents_update.go` both delegate, eliminating the duplicated `{speed, emotion, style}` literal. Cleanup: - Delete orphan i18n keys `MsgTtsParamInvalidJSON` and `MsgTtsParamDependsOn` from `keys.go` + en/vi/zh catalogs (no in-code references; DependsOn is FE-only, JSON parse failures already surface via slog). Documentation: - `prompt-settings-section.tsx` Finding #13 comment rewritten to honestly describe the best-effort merge into a fresh local copy of the cached `otherConfig` prop. Concurrent-tab clobber remains possible — server-side JSON-merge-patch endpoint planned for v2. Tests: 9 backend suites (race), web 217/217, desktop build clean, both Go build tags pass.

xthanhn91 force-pushed the fix/create-image-provider-endpoints branch from 5124857 to 6b0fadb Compare February 27, 2026 15:18

viettranx merged commit 370c290 into nextlevelbuilder:main Feb 28, 2026

MiltonSilvaJr mentioned this pull request Mar 22, 2026

security: fix 10 AppSec audit findings (3 HIGH, 7 MEDIUM) vellus-ai/argoclaw#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tools): use /images/generations endpoint for Gemini and OpenAI#9

fix(tools): use /images/generations endpoint for Gemini and OpenAI#9
viettranx merged 1 commit intonextlevelbuilder:mainfrom
xthanhn91:fix/create-image-provider-endpoints

xthanhn91 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xthanhn91 commented Feb 27, 2026

Summary

Root Cause

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants