feat: add Gladia batch STT adapter #2116

yujonglee · 2025-12-04T13:19:22Z

Summary

Adds batch (pre-recorded) speech-to-text support for Gladia, complementing the realtime STT adapter from PR #2115. The implementation follows the existing adapter pattern used by Deepgram and AssemblyAI batch adapters.

The adapter implements the full Gladia pre-recorded API flow:

Upload audio file to /v2/upload endpoint (preserves original file format with proper MIME type detection)
Initiate transcription via /v2/pre-recorded endpoint
Poll for completion until status is "done"
Convert Gladia's response format to the internal BatchResponse format

Also exports GladiaAdapter publicly from lib.rs and includes an integration test.

Review & Testing Checklist for Human

Speaker assignment logic: Words inherit speaker ID from their parent utterance. Verify this matches expected behavior (see convert_to_batch_response around lines 283-299)
Diarization is hardcoded to true in the transcript request (line 204). Consider if this should be configurable via ListenParams
Test with different audio formats: The adapter detects MIME type from file extension (wav, mp3, ogg, flac, m4a, webm). Verify behavior with various formats.

Recommended test plan:

GLADIA_API_KEY="your-key" cargo test -p owhisper-client test_gladia_batch_transcription -- --ignored --nocapture

Notes

Integration test added and verified to pass against https://api.gladia.io/v2 with the provided test API key (took ~4 minutes for a 20-second audio file)
This PR is based on devin/1764853636-gladia-realtime-stt (PR feat: add Gladia realtime STT adapter #2115) since both touch the gladia module
Cargo check and tests pass locally

Updates since last revision

Merged with origin/main to resolve merge conflicts in mod.rs and live.rs
Preserved port handling improvements from main in build_ws_url_from_base
Preserved session channel tracking improvements from main in live.rs
Fixed punctuated_word to use trimmed value consistently with word field (both now use the same trimmed string)

Requested by: @yujonglee (yujonglee.dev@gmail.com)
Devin Session: https://app.devin.ai/sessions/ef1a5751c1424a0bbfa92ffe14a4354b

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

devin-ai-integration · 2025-12-04T13:19:25Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

coderabbitai · 2025-12-04T13:19:29Z

Warning

Rate limit exceeded

@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 50 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa9449 and ebfa5e2.

📒 Files selected for processing (1)

owhisper/owhisper-client/src/adapter/gladia/batch.rs (1 hunks)

📝 Walkthrough

Walkthrough

This PR adds a Gladia batch transcription adapter for owhisper with an asynchronous workflow that uploads audio files, configures language and diarization options, creates transcription tasks, polls for completion, and converts Gladia responses to the internal BatchResponse format.

Changes

Cohort / File(s)	Summary
Gladia Batch Adapter `owhisper/owhisper-client/src/adapter/gladia/batch.rs`	New module implementing batch transcription with async file upload to Gladia, optional language/diarization configuration, polling-based task completion monitoring, response conversion to BatchResponse format, and error handling. Includes ignored unit test scaffold.
Gladia Adapter Configuration `owhisper/owhisper-client/src/adapter/gladia/mod.rs`	Adds `API_BASE` constant ("https://api.gladia.io/v2") and `batch_api_url()` helper method to resolve or default batch API URLs. Includes tests for host detection and URL resolution.
Public API Exports `owhisper/owhisper-client/src/lib.rs`	Re-exports `GladiaAdapter` as public API alongside existing `SonioxAdapter`.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant Client as owhisper Client
    participant Gladia as Gladia API
    participant Converter as Response Converter

    Caller->>Client: transcribe_file(path, config)
    Client->>Client: Read audio bytes & infer MIME type
    Client->>Gladia: POST /upload (audio file)
    Gladia-->>Client: file_url
    Client->>Gladia: POST /pre-recorded (config + file_url)
    Gladia-->>Client: task_id + status
    
    rect rgb(200, 220, 255)
    Note over Client,Gladia: Polling Loop
    Client->>Gladia: GET /pre-recorded/{id}
    Gladia-->>Client: status (pending/done/error)
    alt status == "done"
        Client->>Converter: Convert TranscriptResponse
        Converter-->>Client: BatchResponse
    else status == "pending"
        Client->>Client: Wait & retry
    else status == "error"
        Client-->>Caller: Error
    end
    end
    
    Client-->>Caller: BatchResponse (utterances as BatchWord items)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

batch.rs: Async workflow with HTTP operations (upload, task creation, polling), JSON deserialization, error handling, and response conversion logic requires careful review for correctness of polling mechanics and type mapping.
mod.rs: New constant and helper function are straightforward; verify URL parsing behavior and test coverage.
lib.rs: Simple re-export; low risk.

Possibly related PRs

feat: add Gladia realtime STT adapter #2115: Also modifies Gladia adapter (adds API base URL helper to adapter/gladia/mod.rs) and integrates with Gladia transcription functionality.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add Gladia batch STT adapter' accurately summarizes the main change—adding a Gladia batch speech-to-text adapter. It is concise, specific, and directly reflects the primary objective of the changeset.
Description check	✅ Passed	The pull request description is comprehensive and directly related to the changeset. It details the implementation of the Gladia batch adapter, outlines the API flow, mentions the adapter pattern followed, includes a review checklist, and provides testing guidance.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Add build_ws_url_with_api_key method to RealtimeSttAdapter trait - Use ureq for blocking POST request to get session token - Fix language_config format to use object with languages array - Return None for build_auth_header since token is in URL Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

…GladiaAdapter Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

netlify · 2025-12-04T14:00:03Z

✅ Deploy Preview for hyprnote-storybook ready!

Name	Link
🔨 Latest commit	`ebfa5e2`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote-storybook/deploys/693195354a1ade000744f48f
😎 Deploy Preview	https://deploy-preview-2116--hyprnote-storybook.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2025-12-04T14:01:25Z

✅ Deploy Preview for hyprnote ready!

Name	Link
🔨 Latest commit	`ebfa5e2`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote/deploys/6931953596d9820008058242
😎 Deploy Preview	https://deploy-preview-2116--hyprnote.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
58-65: Minor simplification.

The intermediate variable binding is unnecessary.
     pub(crate) fn batch_api_url(api_base: &str) -> url::Url {
         if api_base.is_empty() {
             return API_BASE.parse().expect("invalid_default_api_url");
         }
 
-        let url: url::Url = api_base.parse().expect("invalid_api_base");
-        url
+        api_base.parse().expect("invalid_api_base")
     }
owhisper/owhisper-client/src/adapter/gladia/batch.rs (1)
201-205: Consider making diarization configurable.

As noted in the PR checklist, diarization is hardcoded to true. If some users don't need speaker diarization, this could add unnecessary processing time or cost.

Consider adding a field to ListenParams or making this configurable:
 let transcript_request = TranscriptRequest {
     audio_url: upload_result.audio_url,
     language_config,
-    diarization: Some(true),
+    diarization: Some(params.diarization.unwrap_or(true)),
 };

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c913f8 and 8aa9449.

📒 Files selected for processing (3)

owhisper/owhisper-client/src/adapter/gladia/batch.rs (1 hunks)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (3 hunks)
owhisper/owhisper-client/src/lib.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)

owhisper/owhisper-client/src/lib.rs (1)

api_base (41-44)

crates/pyannote-cloud/src/test_key.rs (1)

test (12-24)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote
GitHub Check: fmt
GitHub Check: Devin

🔇 Additional comments (9)

owhisper/owhisper-client/src/lib.rs (1)

14-15: LGTM!

The GladiaAdapter is correctly added to the public exports alongside existing adapters, maintaining consistency with the module's API surface.

owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)

1-1: LGTM!

Module declaration correctly added for the new batch functionality.

113-131: LGTM!

Good test coverage for both the host detection and batch URL construction, including positive and negative cases.

owhisper/owhisper-client/src/adapter/gladia/batch.rs (6)

16-30: LGTM!

Clean separation between the trait implementation and the async worker function. The PathBuf conversion handles the lifetime correctly.

32-124: LGTM!

Request and response structures are well-defined with appropriate #[serde(default)] annotations for optional fields and #[serde(skip_serializing_if = ...)] for conditional serialization.

146-154: LGTM!

MIME type detection covers the common audio formats. The fallback to application/octet-stream is a reasonable default for unrecognized extensions.

334-355: LGTM!

The integration test is appropriately marked as #[ignore] for CI and provides good coverage of the happy path. The assertions verify the essential structure of the response.

229-269: LGTM. The polling logic correctly handles completion, error, and in-progress states. The PollingConfig::default() provides 300 max attempts at the 3-second interval specified in the code, yielding a 15-minute timeout window—reasonable for batch transcription where processing times can vary significantly. This comfortably accommodates the ~4-minute test runs mentioned in the PR.

306-312: Multi-channel information from Gladia API responses is discarded.

The Utterance struct includes a channel field that indicates which channel each utterance originated from, but this information is not used during conversion. All utterances are flattened into a single BatchChannel via flat_map (lines 277–289), losing the channel distinction. For multi-channel audio recordings (e.g., stereo or multi-speaker setups with diarization enabled), this means utterances from different channels cannot be distinguished in the final response.

While the BatchWord struct preserves speaker information (u.speaker), the source channel information is completely discarded. This appears to be an intentional design choice consistent across all adapters, but results in irretrievable loss of channel metadata when processing multi-channel audio.

owhisper/owhisper-client/src/adapter/gladia/batch.rs

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

feat: add Gladia realtime STT adapter

a0fe0a6

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

devin-ai-integration bot and others added 5 commits December 4, 2025 13:27

fix: correct ureq dependency ordering in Cargo.toml

5a05cc7

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

Add Gladia batch STT adapter

42d7dbc

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

Fix Gladia batch adapter to send proper audio file format and export …

77f0b4c

…GladiaAdapter Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

Add integration test for Gladia batch STT adapter

34ead84

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

devin-ai-integration bot force-pushed the devin/1764853705-gladia-batch-stt branch from 1dcd7f9 to 34ead84 Compare December 4, 2025 13:41

Base automatically changed from devin/1764853636-gladia-realtime-stt to main December 4, 2025 13:53

devin-ai-integration bot and others added 2 commits December 4, 2025 13:59

Merge origin/main into devin/1764853705-gladia-batch-stt

fc66ff0

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

Format batch.rs

8aa9449

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

owhisper/owhisper-client/src/adapter/gladia/batch.rs Show resolved Hide resolved

Fix punctuated_word to use trimmed value consistently

ebfa5e2

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

yujonglee merged commit f383715 into main Dec 4, 2025
9 of 13 checks passed

yujonglee deleted the devin/1764853705-gladia-batch-stt branch December 4, 2025 14:06

This was referenced Dec 4, 2025

refactor(gladia): clean up adapter code #2121

Merged

Add STT E2E workflow for testing STT adapters #2131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Gladia batch STT adapter #2116

feat: add Gladia batch STT adapter #2116

Uh oh!

yujonglee commented Dec 4, 2025 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Dec 4, 2025

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Uh oh!

netlify bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

netlify bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add Gladia batch STT adapter #2116

feat: add Gladia batch STT adapter #2116

Uh oh!

Conversation

yujonglee commented Dec 4, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Updates since last revision

Uh oh!

devin-ai-integration bot commented Dec 4, 2025

🤖 Devin AI Engineer

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

netlify bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote-storybook ready!

Uh oh!

netlify bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote ready!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yujonglee commented Dec 4, 2025 •

edited by devin-ai-integration bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

netlify bot commented Dec 4, 2025 •

edited

Loading

netlify bot commented Dec 4, 2025 •

edited

Loading