Fix audio transcription deserialization when duration field is absent#512
Conversation
The `duration` field on `AudioTranscriptionResponse` uses a custom `deserialize_with`, which makes serde require the field to be present in the JSON — even though it's `Option<f64>`. When the inference proxy (vllm-proxy-rs) returns a response without a top-level `duration` field (it nests it under `usage.seconds`), deserialization fails with "missing field `duration`", causing a 502. Fix: add `#[serde(default)]` so missing `duration` deserializes as `None`.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical deserialization issue within the audio transcription service. By modifying the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request modifies the AudioTranscriptionResponse struct in crates/inference_providers/src/models.rs by adding the #[serde(default)] attribute to the duration field. This change ensures that the duration field defaults to None if it is not present during deserialization. I have no feedback to provide.
There was a problem hiding this comment.
Pull request overview
This PR fixes deserialization of audio transcription responses when providers omit the top-level duration field, preventing avoidable transcription failures (and downstream 502s) in the inference providers layer.
Changes:
- Add
#[serde(default)]toAudioTranscriptionResponse.durationso missingdurationdeserializes toNone.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[serde(default)] | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| #[serde(deserialize_with = "deserialize_duration")] | ||
| pub duration: Option<f64>, |
There was a problem hiding this comment.
Consider adding a regression unit test (in this file’s existing #[cfg(test)] mod tests) that deserializes an AudioTranscriptionResponse payload where the top-level duration field is absent (like the proxy example) and asserts duration == None. This will prevent future changes to deserialize_with/serde attributes from reintroducing the missing field 'duration' failure.
|
PR Review: Fix audio transcription deserialization when duration field is absent Change: +1 line - adds serde(default) to the duration field on AudioTranscriptionResponse. Analysis: The fix is correct. When deserialize_with is specified, serde treats the field as required even if the type is Option. Adding serde(default) restores the expected behavior: a missing duration key short-circuits before calling deserialize_duration and returns None directly. The three-attribute combination is a standard serde pattern:
No logic issues, no privacy/logging concerns, no performance impact. Approved - minimal, correct fix. |
Summary
#[serde(default)]toAudioTranscriptionResponse.durationfieldRoot Cause
The
durationfield uses#[serde(deserialize_with = "deserialize_duration")], which makes serde require the field to be present in the JSON — even though it'sOption<f64>. The inference proxy (vllm-proxy-rs) returns:{"id":"trans-...","text":"...","usage":{"seconds":1,"type":"duration"}}No top-level
durationfield → deserialization fails withmissing field 'duration'→AudioTranscriptionError::TranscriptionError→ mapped to HTTP 502.Reproduction
Fix
#[serde(default)]allows the field to be absent, deserializing asNone.Verified with standalone test: deserialization of the exact proxy response succeeds after the fix.