feat: gemini image audio video support#578
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the relay layer to support Gemini-native multimodal features (image generation, TTS, and long-running video operations), adds a configurable max video generation count limit alongside existing seconds limits, and refactors video request handling across multiple adaptors while updating Swagger and tests accordingly.
Changes:
- Introduces new relay modes/routes for Gemini video + operation polling, Gemini TTS, and Gemini image generation (including streaming image events).
- Adds
max_video_generation_countto model/group configuration + UI, and enforces it for applicable server-side request types. - Refactors video adaptors and request/response handlers to distinguish “jobs” vs “videos” flows; updates usage/async usage accounting, JSON parsing utilities, and API docs.
Reviewed changes
Copilot reviewed 85 out of 86 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/validation/model.ts | Adds frontend validation for max_video_generation_count in model create/update schema. |
| web/src/types/model.ts | Extends model config/types and supported-type constants for new Gemini/video limit options. |
| web/src/types/group.ts | Extends group model config typings to support video count overrides. |
| web/src/feature/model/components/ModelForm.tsx | Adds model form field + payload wiring for max_video_generation_count. |
| web/src/feature/group/components/GroupModelConfigsTab.tsx | Adds group override UI/state wiring for video count limit. |
| web/public/locales/zh/translation.json | Adds i18n strings for max video count and new Gemini mode labels (zh). |
| web/public/locales/en/translation.json | Adds i18n strings for max video count and new Gemini mode labels (en). |
| core/task/task.go | Changes async usage completion to preserve/charge stored per-request pricing. |
| core/task/async_usage_test.go | Adds test ensuring async usage completion charges stored PerRequestPrice. |
| core/router/relay.go | Updates Gemini routing to path-based handler and registers operation GET routes. |
| core/router/relay_test.go | Adds regression test for registering model-scoped Gemini operation routes. |
| core/relay/utils/utils.go | Adds unmarshallers for VideosRequest / VideosRemixRequest. |
| core/relay/utils/testreq.go | Extends test-request builder for Gemini image/TTS/video modes. |
| core/relay/utils/testreq_test.go | Adds test coverage for building Gemini video test request bodies. |
| core/relay/render/claudeevent.go | Switches JSON node extraction to shared helper. |
| core/relay/plugin/web-search/search.go | Switches sonic JSON parsing calls to shared helper. |
| core/relay/plugin/timeout/timeout.go | Switches stream-detection JSON parsing to shared helper. |
| core/relay/plugin/thinksplit/split.go | Switches stream chunk JSON parsing to shared helper. |
| core/relay/plugin/streamfake/fake.go | Switches JSON parsing for request/stream chunks to shared helper. |
| core/relay/plugin/patch/patch.go | Switches patch JSON parsing/helpers to shared helper for node retrieval. |
| core/relay/plugin/cachefollow/cachefollow.go | Switches retention JSON parsing to shared helper. |
| core/relay/model/video.go | Splits legacy video request type into VideosRequest / VideosRemixRequest; adds seconds field to job request struct. |
| core/relay/model/image.go | Adds image streaming fields/events and request options (partial_images, stream). |
| core/relay/model/gemini_video.go | Adds Gemini video operation response models for long-running operations. |
| core/relay/model/errors.go | Ensures Gemini video modes use video-shaped OpenAI error wrappers. |
| core/relay/mode/define.go | Adds new modes (GeminiVideo/Operations/TTS/Image) and refactors stringification via map. |
| core/relay/mode/define_test.go | Locks in persisted numeric IDs for newly added modes. |
| core/relay/meta/meta.go | Adds OperationID to request metadata + option helper. |
| core/relay/controller/video.go | Adds validation/pricing/usage logic for Videos + GeminiVideo and max video count limit. |
| core/relay/adaptor/vertexai/gemini/adapter.go | Routes more modes through Gemini inner adaptor and normalizes video params to Vertex schema. |
| core/relay/adaptor/vertexai/adaptor_test.go | Adds tests for Gemini video request URLs, operation URLs, and video request conversions. |
| core/relay/adaptor/siliconflow/video.go | Refactors SiliconFlow video handling into separate job vs videos flows + shared helpers. |
| core/relay/adaptor/siliconflow/adaptor.go | Wires new SiliconFlow job/videos handlers into adaptor mode switch. |
| core/relay/adaptor/siliconflow/adaptor_test.go | Updates/extends tests for split job/videos behavior and usage mutation expectations. |
| core/relay/adaptor/openai/video.go | Splits OpenAI video converters/handlers by endpoint (jobs/videos/remix/get/content). |
| core/relay/adaptor/openai/stt.go | Switches JSON parsing in STT handling to shared helper. |
| core/relay/adaptor/openai/image.go | Updates image stream handler to use Responses-style SSE data rendering + shared JSON helper. |
| core/relay/adaptor/openai/image_test.go | Updates stream expectations and adds coverage for renamed video converter entrypoint. |
| core/relay/adaptor/openai/chat.go | Switches stream chunk JSON parsing to shared helper. |
| core/relay/adaptor/openai/adaptor.go | Updates mode routing for split video converters/handlers. |
| core/relay/adaptor/interface.go | Extends StoreCache with Metadata field for operation tracking. |
| core/relay/adaptor/gemini/openai.go | Adds Gemini TTS conversion/handling and exports helper predicates for image/TTS meta detection. |
| core/relay/adaptor/gemini/openai_test.go | Adds test verifying OpenAI speech → Gemini TTS mapping. |
| core/relay/adaptor/gemini/image.go | Implements Gemini image generation conversion + streaming conversion to OpenAI-like image events. |
| core/relay/adaptor/gemini/image_test.go | Adds coverage for Gemini image conversion, streaming, usage behavior, and stream URL selection. |
| core/relay/adaptor/gemini/export_test.go | Exposes additional helpers for tests (video conversion/config, image aspect ratio, IDs). |
| core/relay/adaptor/gemini/config.go | Adds config knob enable_person_generation_allow_all. |
| core/relay/adaptor/gemini/async_usage.go | Implements async usage fetcher for Gemini video operations. |
| core/relay/adaptor/gemini/adaptor.go | Adds support for new Gemini modes and operation URL resolution; wires request/response handlers. |
| core/relay/adaptor/doubao/video.go | Splits Doubao job vs videos flows and aligns request parsing to endpoint semantics. |
| core/relay/adaptor/doubao/main.go | Refactors Doubao URL builder and rewires split converters/handlers. |
| core/relay/adaptor/doubao/main_test.go | Updates image stream tests and adds coverage for duration precedence per endpoint. |
| core/relay/adaptor/doubao/image.go | Converts Doubao image stream events to unified image stream event model + Responses SSE rendering. |
| core/relay/adaptor/anthropic/main.go | Switches JSON parsing to shared helper. |
| core/relay/adaptor/ali/video.go | Refactors Ali video parsing into a structured “parsed request” separating job vs videos semantics. |
| core/relay/adaptor/ali/adaptor.go | Updates mode routing for split Ali video converters/handlers. |
| core/relay/adaptor/ali/adaptor_test.go | Updates/adds tests for job vs videos precedence and ignored fields. |
| core/model/yaml_integration.go | Adds YAML model type name mappings for new Gemini modes. |
| core/model/store.go | Adds metadata persistence to StoreV2 and upsert paths. |
| core/model/store_cache.go | Adds metadata to redis store cache mapping. |
| core/model/modelconfig.go | Adds MaxVideoGenerationCount and group override loading. |
| core/model/modelconfig_test.go | Adds tests for group override and clearing of max video generation count settings. |
| core/model/groupmodel.go | Adds group model config fields and ensures zero-value updates include new fields. |
| core/middleware/distributor.go | Expands relay-mode compatibility matrix; adds Gemini operation model resolution via store lookup. |
| core/middleware/distributor_video_test.go | Adds tests for Gemini operation path parsing helpers. |
| core/middleware/distributor_user_test.go | Switches JSON parsing to shared helper. |
| core/middleware/distributor_service_tier_test.go | Switches JSON parsing to shared helper. |
| core/middleware/distributor_gemini_video_test.go | Adds compatibility tests for new Gemini modes and Responses interoperability rules. |
| core/middleware/ctxkey.go | Adds operation_id context key. |
| core/docs/swagger.yaml | Updates API definitions for new modes, new fields (image stream/audio/video), and new Gemini operation endpoints. |
| core/docs/swagger.json | Regenerates Swagger JSON with new endpoints/models and request schema renames. |
| core/controller/relay.go | Adds GeminiByPath + GeminiOperation handlers and updates swagger annotations for videos/remix request types. |
| core/controller/relay-controller.go | Wires validation/pricing/usage handlers for Videos and GeminiVideo modes; switches JSON node parsing helper. |
| core/controller/relay-controller_test.go | Adds test ensuring video-related modes register appropriate validators. |
| core/controller/relay-channel.go | Enables cachefollow support for additional Gemini modes. |
| core/controller/group.go | Adds group model config API fields for overriding max video generation count. |
| core/controller/dashboard.go | Adds max_video_generation_count to dashboard’s group model payload. |
| core/common/consume/consume.go | Adjusts consumption recording for pending async usage (defer amount calculation). |
| core/common/consume/consume_record_test.go | Extends “skip record consume” mode list to include Gemini video operations. |
| core/common/body.go | Adds shared GetJSONNodeNoCopy helper and routes other node getters through it. |
| core/common/body_test.go | Adds test coverage for GetJSONNodeNoCopy. |
Files not reviewed (1)
- core/docs/docs.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+285
to
303
| func getGeminiVideoRequestUsageParams(c *gin.Context) (geminiVideoRequestUsageParams, error) { | ||
| node, err := common.UnmarshalRequest2NodeReusable(c.Request) | ||
| if err != nil { | ||
| return geminiVideoRequestUsageParams{}, NewBadRequestParamError(err.Error()) | ||
| } | ||
|
|
||
| parameters := node.Get("parameters") | ||
| params := geminiVideoRequestUsageParams{ | ||
| seconds: defaultGeminiVideoDurationSeconds, | ||
| variants: 1, | ||
| resolution: defaultGeminiVideoResolution, | ||
| } | ||
|
|
||
| if parameters != nil && parameters.Exists() && parameters.TypeSafe() != ast.V_NULL { | ||
| parsedResolution := stringValueFromNode(parameters, "resolution") | ||
| if parsedResolution != "" { | ||
| params.resolution = parsedResolution | ||
| } | ||
| } |
Comment on lines
95
to
99
| export const STREAM_TIMEOUT_SUPPORTED_MODEL_TYPES = [1, 2, 12, 16, 21] as const | ||
| export const IMAGE_GENERATION_COUNT_LIMIT_SUPPORTED_MODEL_TYPES = [5, 6] as const | ||
| export const VIDEO_GENERATION_SECONDS_LIMIT_SUPPORTED_MODEL_TYPES = [13, 22, 26] as const | ||
| export const VIDEO_GENERATION_SECONDS_LIMIT_SUPPORTED_MODEL_TYPES = [13, 22, 26, 27] as const | ||
| export const VIDEO_GENERATION_COUNT_LIMIT_SUPPORTED_MODEL_TYPES = [13, 26, 27] as const | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.