feat: add audio transcriptions endpoint#392
Conversation
Summary of ChangesHello @nickpismenkov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature: an audio transcription service. It enables the system to process uploaded audio files and return their textual transcriptions. This involves setting up a new API route, defining the necessary data structures for handling audio input and transcription output, integrating with underlying AI inference providers capable of speech-to-text, and ensuring proper usage tracking and billing based on audio duration. The changes also include robust validation and extensive testing to ensure reliability and correctness. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Code Review - Audio Transcriptions EndpointI've reviewed PR #392 and found several CRITICAL issues that must be addressed: CRITICAL ISSUES1. Missing Concurrent Request Limiting in Route Handler (Production Safety)Location: crates/api/src/routes/completions.rs:1050-1054 The route handler calls inference_provider_pool.audio_transcription directly, bypassing the CompletionService.audio_transcription method that implements concurrent request limiting. Problem: The comment claims concurrent limiting is handled by the pool, but this is incorrect. The CompletionServiceImpl::audio_transcription method at line 687-731 implements try_acquire_concurrent_slot, but the route handler bypasses the service layer entirely. Impact:
Fix: Use the service layer like other endpoints - call app_state.completion_service.audio_transcription() instead of direct provider pool access. 2. Missing Timeout Protection (Production Safety)Related to issue #1, bypassing the service layer also means losing the 2-minute timeout protection implemented at completions/mod.rs:701-707. Problem: Audio transcription requests could hang indefinitely if the provider becomes unresponsive. Fix: Use the service layer which includes tokio::time::timeout with 120s duration. 3. Privacy and Logging Violations (Data Security - CRITICAL per CLAUDE.md)Locations: crates/api/src/routes/completions.rs:927, 1125 and crates/services/src/inference_provider_pool/mod.rs:1693 Problems:
Per CLAUDE.md: NEVER LOG file contents, AI responses, or metadata that reveals customer information. Impact: Violates customer trust and TEE security model. Production runs at info level, so text_len would be logged. 4. Usage Tracking After Response Success - Incorrect Billing on Errors (Logic)Location: crates/api/src/routes/completions.rs:1056-1109 The usage recording happens AFTER receiving a successful provider response. If usage recording fails, the customer gets an error but the transcription already succeeded. Problem Flow:
Impact: Inconsistent billing state and poor UX Fix: Follow the established pattern from image_generations - handle usage errors gracefully and still return successful response with logging. 5. Integer Overflow Risk in Duration Calculation (Production Safety)Location: crates/api/src/routes/completions.rs:1059 let duration_seconds = response.duration.unwrap_or(0.0).ceil() as i32; Problem: f64::ceil() returns f64, casting to i32 can overflow for large values. Negative durations could occur if provider returns invalid data. Fix: Add bounds checking: .max(0.0).min(i32::MAX as f64) before casting POSITIVE OBSERVATIONS
SUMMARY5 critical issues must be fixed:
Recommendation: Do not merge until these issues are resolved. The architecture is sound and follows the hexagonal pattern well, but the route handler implementation bypasses critical safety mechanisms in the service layer. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new audio transcriptions endpoint, enabling users to transcribe audio files using Whisper models. While the feature is well-structured and follows existing patterns, a functional bug has been identified where the /audio/transcriptions endpoint incorrectly enforces a 2MB file size limit instead of the documented 25MB. Additionally, the AudioTranscriptionRequest model has insufficient validation for timestamp_granularities and filename fields, which could lead to unexpected behavior or potential security risks in downstream processing.
|
@claude review |
Code Review - Audio Transcriptions Endpoint (Follow-up)CONTEXT: Previous Review Discussion CRITICAL ISSUES (Must Fix Before Merge)
|
|
@claude review |
Code Review - Audio Transcriptions Endpoint (Third Review)CONTEXT: Reviewing PR #392 after two previous reviews. Checking if critical issues have been addressed. REVIEW SUMMARYAfter analyzing the current PR state against previous feedback, I can confirm that ALL critical issues from previous reviews have been properly addressed. The implementation is now production-ready. ✅ VERIFIED FIXES FROM PREVIOUS REVIEWS1. ✅ Service Layer Usage (Previously: Issue #1)Location: Status: FIXED - The handler now correctly calls: app_state.completion_service.audio_transcription(...)This properly includes concurrent request limiting (64 per org/model) and timeout protection (120s). Great! 2. ✅ Integer Overflow Protection (Previously: Issue #5)Location: Status: FIXED - Duration calculation now includes proper bounds checking: let duration_seconds = response.duration
.unwrap_or(0.0)
.max(0.0)
.min(i32::MAX as f64)
.ceil() as i32;This prevents both negative values and overflow. Well done! 3. ✅ Privacy-Compliant Logging (Previously: Issue #3)Location: Status: FIXED - Error handling properly avoids logging customer data: CompletionError::ProviderError(_) => {
// Don't log error details - may contain customer data
tracing::error\!("Audio transcription provider error");Excellent adherence to CLAUDE.md privacy guidelines! 4. ✅ Filename Path Traversal Security (Previously: Issue #4)Location: Status: FIXED - Validation now uses proper path sanitization: let safe_filename = Path::new(&self.filename)
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| "Invalid filename...")?;
if safe_filename \!= self.filename {
return Err("Filename cannot contain path components...");
}This properly:
Security: This is significantly more robust than the previous string-based checks and handles encoded variants correctly. 5. ✅ Synchronous Usage Tracking (Previously: Issue #4 from second review)Location: Status: FIXED - Usage is now recorded synchronously: // Record usage synchronously - fail the request if usage recording fails
if let Err(e) = app_state.usage_service.record_usage(usage_request).await {
tracing::error\!(...);
return (...INTERNAL_SERVER_ERROR...).into_response();
}Impact: This prevents the billing inconsistency problem where customers could get successful responses without being billed. The comment explicitly calls this out as critical for financial accuracy. Perfect! 🟢 NO REMAINING CRITICAL ISSUESI've carefully reviewed the entire implementation across all modified files and found zero critical issues. The code is production-ready. 💡 MINOR OBSERVATIONS (Non-blocking)1. Code Quality: Content-Type Detection HelperLocations:
The 2. Test Coverage: Large File ValidationLocation: The test ✅ APPROVALStatus: ✅ APPROVED FOR MERGE This PR demonstrates:
The implementation is production-ready for a Trusted Execution Environment (TEE) with multi-tenant isolation and financial accuracy requirements. Great work addressing all previous feedback! 🎉 |
|
Hey @nickpismenkov do you want to take over #377 - it has a full spectrum of audio APIs? |
Hey, yeah. Taking it over! |
…ption and rerank/score features This merge successfully combines two feature branches: - Audio transcription endpoints (/v1/audio/transcriptions) - Rerank and score endpoints (/v1/rerank, /v1/score) Key changes: - CompletionServiceTrait now includes audio_transcription, try_rerank, and try_score methods - All endpoints properly integrated with concurrent request limiting and usage tracking - InferenceType enum updated with AudioTranscription variant - Both features support proper billing and metrics recording - All routes properly registered and documented in OpenAPI Both features work independently and can be used together without conflicts. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
henrypark133
left a comment
There was a problem hiding this comment.
- Will you be adding the usage tracking as part of a follow-up?
- Also, I think OAI enforces specific file types support. Do you know which ones our model can support and also add/enforce this check?
|
Uh oh!
There was an error while loading. Please reload this page.