Skip to content

fix(tools): use /images/generations endpoint for Gemini and OpenAI#9

Merged
viettranx merged 1 commit intonextlevelbuilder:mainfrom
xthanhn91:fix/create-image-provider-endpoints
Feb 28, 2026
Merged

fix(tools): use /images/generations endpoint for Gemini and OpenAI#9
viettranx merged 1 commit intonextlevelbuilder:mainfrom
xthanhn91:fix/create-image-provider-endpoints

Conversation

@xthanhn91
Copy link
Copy Markdown
Contributor

Summary

  • create_image exclusively used /chat/completions with modalities:["image","text"], which only works on OpenRouter
  • Gemini returns HTTP 400: "Image generation is not yet supported on the chat.completions endpoint for this model"
  • OpenAI DALL-E models also require /images/generations, not /chat/completions
  • Route OpenRouter → /chat/completions (supports modalities), all other providers → /images/generations
  • Update default Gemini model from deprecated gemini-2.0-flash-exp to gemini-2.5-flash-image

Root Cause

The callImageGenAPI function was designed for OpenRouter's modalities-based chat completions format and used for all providers. But only OpenRouter supports this — Gemini and OpenAI both require the standard /images/generations endpoint.

Gemini error response:

{
  "error": {
    "code": 400,
    "message": "Image generation is not yet supported on the chat.completions endpoint for this model. Please use the standard client.images.generate method for creation"
  }
}

Fix

Added callStandardImageGenAPI using the /images/generations endpoint with response_format:"b64_json" — the standard OpenAI-compatible image generation format supported by Gemini, OpenAI, and most other providers.

Provider Endpoint Status
OpenRouter /chat/completions + modalities ✅ (unchanged)
Gemini /images/generations + b64_json ✅ (fixed)
OpenAI /images/generations + b64_json ✅ (fixed)

Test plan

  • Test with Gemini provider: create_image with gemini-2.5-flash-image
  • Test with OpenRouter provider: verify modalities-based flow still works
  • Test with OpenAI provider (if available): verify DALL-E image generation
  • Verify MEDIA: path returned and image delivered to channel

…age gen

create_image exclusively used /chat/completions with modalities:["image","text"]
which only works on OpenRouter. Gemini returns HTTP 400:
  "Image generation is not yet supported on the chat.completions endpoint"
OpenAI's DALL-E models also require /images/generations, not /chat/completions.

Fix: route OpenRouter through /chat/completions (supports modalities),
route all other providers (Gemini, OpenAI, etc.) through the standard
/images/generations endpoint with response_format:"b64_json".

Also update default Gemini model from deprecated gemini-2.0-flash-exp
to gemini-2.5-flash-image.
@xthanhn91 xthanhn91 force-pushed the fix/create-image-provider-endpoints branch from 5124857 to 6b0fadb Compare February 27, 2026 15:18
@viettranx viettranx merged commit 370c290 into nextlevelbuilder:main Feb 28, 2026
MiltonSilvaJr referenced this pull request in vellus-ai/argoclaw Mar 22, 2026
Sprint 0 — Security hardening before feature development.

HIGH fixes:
- #1: Whitelist table names in execMapUpdate() — prevents SQL injection
  via dynamic table name (store/pg/helpers.go)
- #2: Log invalid groupBy values in snapshot queries (store/pg/snapshot.go)
- #3: Validated shellEscape() — single-quote wrapping is correct;
  added PBT tests for shell injection (tools/dynamic_tool_security_test.go)

MEDIUM fixes:
- #4-5: Log security warnings for no-token and viewer-fallback auth
  (gateway/router.go)
- #6: Restrict CORS on OpenAPI endpoint — removed wildcard, allow only
  localhost origins (http/openapi.go)
- #7: Add CheckSSRFWithPinning() for DNS rebinding TOCTOU prevention
  (tools/web_shared.go)
- #8: Log warning when TLS verification is disabled
  (tracing/otelexport/exporter.go)
- #9: Pin all Python package versions in Dockerfile — prevents
  supply chain attacks via unpinned dependencies
- #10: Change HOME fallback from /tmp to /app — prevents temp dir
  abuse (tools/credentialed_exec.go)

Also fixes arargoclaw double-rename bug in 356 Go import paths.

Tests: PBT tests for table whitelist and shell escaping (testing/quick).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MiltonSilvaJr referenced this pull request in vellus-ai/argoclaw Mar 22, 2026
Sprint 0 — Security hardening before feature development.

HIGH fixes:
- #1: Whitelist table names in execMapUpdate() — prevents SQL injection
  via dynamic table name (store/pg/helpers.go)
- #2: Log invalid groupBy values in snapshot queries (store/pg/snapshot.go)
- #3: Validated shellEscape() — single-quote wrapping is correct;
  added PBT tests for shell injection (tools/dynamic_tool_security_test.go)

MEDIUM fixes:
- #4-5: Log security warnings for no-token and viewer-fallback auth
  (gateway/router.go)
- #6: Restrict CORS on OpenAPI endpoint — removed wildcard, allow only
  localhost origins (http/openapi.go)
- #7: Add CheckSSRFWithPinning() for DNS rebinding TOCTOU prevention
  (tools/web_shared.go)
- #8: Log warning when TLS verification is disabled
  (tracing/otelexport/exporter.go)
- #9: Pin all Python package versions in Dockerfile — prevents
  supply chain attacks via unpinned dependencies
- #10: Change HOME fallback from /tmp to /app — prevents temp dir
  abuse (tools/credentialed_exec.go)

Also fixes arargoclaw double-rename bug in 356 Go import paths.

Tests: PBT tests for table whitelist and shell escaping (testing/quick).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
blackbirdzzzz365-gif pushed a commit to blackbirdzzzz365-gif/goclaw that referenced this pull request Apr 12, 2026
9 checkpoint documents covering the upgrade from 43% to ~85% pattern
matching with Claude Code's architectural patterns.

Checkpoints:
- CP-00: Current state analysis
- CP-01: Context defense 5 layers (Pattern nextlevelbuilder#9)
- CP-02: Concurrency-safe partitioning (Pattern nextlevelbuilder#4)
- CP-03: Streaming tool execution (Pattern nextlevelbuilder#5)
- CP-04: Escalating recovery (Pattern nextlevelbuilder#3)
- CP-05: Context modifier chain + fork isolation (Patterns nextlevelbuilder#6, nextlevelbuilder#8)
- CP-06: Permission classification pipeline (Pattern nextlevelbuilder#10)
- CP-07: Skill system upgrade (Patterns nextlevelbuilder#11-13)
- CP-08: Plugin ecosystem (Patterns nextlevelbuilder#14-16)

Based on analysis from "Giai phau mot Agentic Operating System"
(18 patterns from 513K LOC Claude Code source).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
viettranx added a commit that referenced this pull request Apr 20, 2026
Phase 4 — final phase of the TTS params/layout/agent-override plan.

Adds a 3-key allow-list (`speed`, `emotion`, `style`) per agent stored
in `agents.other_config.tts_params`. Backend resolves and merges into
`opts.Params` PER ATTEMPT inside the fallback loop so each provider
sees its own native shape — never the primary's keys when fallback
runs (Finding #1 critical).

Backend:
- `AgentOverridable bool` on `audio.ParamSchema`. UI filter reads this
  flag from /v1/tts/capabilities; no separate TS literal mirror —
  capabilities API is the single source of truth (Finding #9).
- `audio.AdaptAgentParams(generic, provider)` maps the 3 generic keys
  to provider-native paths (e.g. `speed` → `voice_settings.speed` for
  ElevenLabs, flat `speed` for OpenAI/MiniMax, dropped for Edge/Gemini).
- `Manager.SynthesizeWithFallbackAdapted` adapts inside the loop so
  fallback providers receive correctly-shaped params.
- `manager_auto.go` and `tools/tts.go` Execute do per-attempt adaptation
  on the tenant + direct + fallback call sites.
- Drop log bumped to `slog.Info("tts.agent.params.dropped", ...)` for
  audit trail when a generic key isn't supported by the active provider.
- Cross-check test asserts every adapter switch case has at least one
  capability ParamSchema with `AgentOverridable: true`, and vice versa.

Security (red-team findings):
- Allow-list ENFORCED at write path: `validateAgentTTSParams` in HTTP
  `handleUpdate` AND WS `agents_update` rejects any `tts_params` key
  outside `{speed, emotion, style}` (Finding #5).
- 64KB body cap on agent PUT via `http.MaxBytesReader` (Finding #6).
- Explicit tenant-scope guard after `agents.GetByID` (Finding #12).
- Concurrent-tab clobber: handleSave merges `tts_params` into a fresh
  copy of `otherConfig` rather than reusing stale state (Finding #13).
- Rate-limit verified — RoleAdmin gate sufficient for v1 (Finding #15).

Frontend (web + desktop):
- `TtsOverrideBlock` rewritten: filters capability params to
  `agent_overridable === true`, renders via `DynamicParamForm`. Hides
  entirely for providers with no overridable params (Edge, Gemini).
- Bidirectional adapter (generic ↔ capability-native form state) so
  agent storage stays in generic keys while UI works in native paths.
  25 round-trip tests cover all 5 providers.
- Desktop `AgentDetailPanel` gains an inline fine-tune section gated
  on `globalProvider`, reusing the desktop `DynamicParamForm`.

i18n: `tts.override.params.title` ("Fine-tune") added to web + desktop
en/vi/zh.

Tests: all 9 backend suites green (race), web 214/214, desktop build
clean, both Go build tags pass.
viettranx added a commit that referenced this pull request Apr 20, 2026
Post-review cleanup of Phase 4. Closes Finding #9 properly and corrects
the Finding #13 documentation lie surfaced in the code-review report.

Capability schema:
- Replace `AgentOverridable bool` with `AgentOverridableAs string` on
  ParamSchema. Empty string = not overridable; non-empty = the generic
  key alias (`"speed"`, `"emotion"`, `"style"`).
- Each provider declaration now carries the alias inline, so the
  generic↔native mapping has a single TS-readable source.

Frontend:
- Web `tts-override-block.tsx` drops the inline `GENERIC_TO_NATIVE`
  literal and derives the bidirectional adapter from the filtered
  capability params (each param self-describes its alias). Adapter
  tests rewritten around the new shape.
- Desktop `AgentDetailPanel.tsx` drops the 45-line inline IIFE in
  favour of a new `<TtsOverrideFineTune>` component that uses the
  same alias-based mapping.

Backend:
- Move `AgentTTSParamsAllowedKeys` + `ValidateAgentTTSParams` to
  `internal/audio/agent_params_adapter.go`. HTTP `validate.go` and WS
  `gateway/methods/agents_update.go` both delegate, eliminating the
  duplicated `{speed, emotion, style}` literal.

Cleanup:
- Delete orphan i18n keys `MsgTtsParamInvalidJSON` and
  `MsgTtsParamDependsOn` from `keys.go` + en/vi/zh catalogs (no
  in-code references; DependsOn is FE-only, JSON parse failures
  already surface via slog).

Documentation:
- `prompt-settings-section.tsx` Finding #13 comment rewritten to
  honestly describe the best-effort merge into a fresh local copy of
  the cached `otherConfig` prop. Concurrent-tab clobber remains
  possible — server-side JSON-merge-patch endpoint planned for v2.

Tests: 9 backend suites (race), web 217/217, desktop build clean,
both Go build tags pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants