-
Notifications
You must be signed in to change notification settings - Fork 464
feat: add Gladia batch STT adapter #2116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Warning Rate limit exceeded@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 50 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a Gladia batch transcription adapter for owhisper with an asynchronous workflow that uploads audio files, configures language and diarization options, creates transcription tasks, polls for completion, and converts Gladia responses to the internal BatchResponse format. Changes
Sequence DiagramsequenceDiagram
actor Caller
participant Client as owhisper Client
participant Gladia as Gladia API
participant Converter as Response Converter
Caller->>Client: transcribe_file(path, config)
Client->>Client: Read audio bytes & infer MIME type
Client->>Gladia: POST /upload (audio file)
Gladia-->>Client: file_url
Client->>Gladia: POST /pre-recorded (config + file_url)
Gladia-->>Client: task_id + status
rect rgb(200, 220, 255)
Note over Client,Gladia: Polling Loop
Client->>Gladia: GET /pre-recorded/{id}
Gladia-->>Client: status (pending/done/error)
alt status == "done"
Client->>Converter: Convert TranscriptResponse
Converter-->>Client: BatchResponse
else status == "pending"
Client->>Client: Wait & retry
else status == "error"
Client-->>Caller: Error
end
end
Client-->>Caller: BatchResponse (utterances as BatchWord items)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Add build_ws_url_with_api_key method to RealtimeSttAdapter trait - Use ureq for blocking POST request to get session token - Fix language_config format to use object with languages array - Return None for build_auth_header since token is in URL Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…GladiaAdapter Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
1dcd7f9 to
34ead84
Compare
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
58-65: Minor simplification.The intermediate variable binding is unnecessary.
pub(crate) fn batch_api_url(api_base: &str) -> url::Url { if api_base.is_empty() { return API_BASE.parse().expect("invalid_default_api_url"); } - let url: url::Url = api_base.parse().expect("invalid_api_base"); - url + api_base.parse().expect("invalid_api_base") }owhisper/owhisper-client/src/adapter/gladia/batch.rs (1)
201-205: Consider making diarization configurable.As noted in the PR checklist,
diarizationis hardcoded totrue. If some users don't need speaker diarization, this could add unnecessary processing time or cost.Consider adding a field to
ListenParamsor making this configurable:let transcript_request = TranscriptRequest { audio_url: upload_result.audio_url, language_config, - diarization: Some(true), + diarization: Some(params.diarization.unwrap_or(true)), };
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
owhisper/owhisper-client/src/adapter/gladia/batch.rs(1 hunks)owhisper/owhisper-client/src/adapter/gladia/mod.rs(3 hunks)owhisper/owhisper-client/src/lib.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)
owhisper/owhisper-client/src/lib.rs (1)
api_base(41-44)crates/pyannote-cloud/src/test_key.rs (1)
test(12-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (9)
owhisper/owhisper-client/src/lib.rs (1)
14-15: LGTM!The
GladiaAdapteris correctly added to the public exports alongside existing adapters, maintaining consistency with the module's API surface.owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)
1-1: LGTM!Module declaration correctly added for the new batch functionality.
113-131: LGTM!Good test coverage for both the host detection and batch URL construction, including positive and negative cases.
owhisper/owhisper-client/src/adapter/gladia/batch.rs (6)
16-30: LGTM!Clean separation between the trait implementation and the async worker function. The
PathBufconversion handles the lifetime correctly.
32-124: LGTM!Request and response structures are well-defined with appropriate
#[serde(default)]annotations for optional fields and#[serde(skip_serializing_if = ...)]for conditional serialization.
146-154: LGTM!MIME type detection covers the common audio formats. The fallback to
application/octet-streamis a reasonable default for unrecognized extensions.
334-355: LGTM!The integration test is appropriately marked as
#[ignore]for CI and provides good coverage of the happy path. The assertions verify the essential structure of the response.
229-269: LGTM. The polling logic correctly handles completion, error, and in-progress states. ThePollingConfig::default()provides 300 max attempts at the 3-second interval specified in the code, yielding a 15-minute timeout window—reasonable for batch transcription where processing times can vary significantly. This comfortably accommodates the ~4-minute test runs mentioned in the PR.
306-312: Multi-channel information from Gladia API responses is discarded.The
Utterancestruct includes achannelfield that indicates which channel each utterance originated from, but this information is not used during conversion. All utterances are flattened into a singleBatchChannelviaflat_map(lines 277–289), losing the channel distinction. For multi-channel audio recordings (e.g., stereo or multi-speaker setups with diarization enabled), this means utterances from different channels cannot be distinguished in the final response.While the
BatchWordstruct preserves speaker information (u.speaker), the source channel information is completely discarded. This appears to be an intentional design choice consistent across all adapters, but results in irretrievable loss of channel metadata when processing multi-channel audio.
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Summary
Adds batch (pre-recorded) speech-to-text support for Gladia, complementing the realtime STT adapter from PR #2115. The implementation follows the existing adapter pattern used by Deepgram and AssemblyAI batch adapters.
The adapter implements the full Gladia pre-recorded API flow:
/v2/uploadendpoint (preserves original file format with proper MIME type detection)/v2/pre-recordedendpointBatchResponseformatAlso exports
GladiaAdapterpublicly fromlib.rsand includes an integration test.Review & Testing Checklist for Human
convert_to_batch_responsearound lines 283-299)truein the transcript request (line 204). Consider if this should be configurable viaListenParamsRecommended test plan:
Notes
https://api.gladia.io/v2with the provided test API key (took ~4 minutes for a 20-second audio file)devin/1764853636-gladia-realtime-stt(PR feat: add Gladia realtime STT adapter #2115) since both touch the gladia moduleUpdates since last revision
origin/mainto resolve merge conflicts inmod.rsandlive.rsbuild_ws_url_from_baselive.rspunctuated_wordto use trimmed value consistently withwordfield (both now use the same trimmed string)Requested by: @yujonglee (yujonglee.dev@gmail.com)
Devin Session: https://app.devin.ai/sessions/ef1a5751c1424a0bbfa92ffe14a4354b