Skip to content

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 4, 2025

Summary

Adds batch (pre-recorded) speech-to-text support for Gladia, complementing the realtime STT adapter from PR #2115. The implementation follows the existing adapter pattern used by Deepgram and AssemblyAI batch adapters.

The adapter implements the full Gladia pre-recorded API flow:

  1. Upload audio file to /v2/upload endpoint (preserves original file format with proper MIME type detection)
  2. Initiate transcription via /v2/pre-recorded endpoint
  3. Poll for completion until status is "done"
  4. Convert Gladia's response format to the internal BatchResponse format

Also exports GladiaAdapter publicly from lib.rs and includes an integration test.

Review & Testing Checklist for Human

  • Speaker assignment logic: Words inherit speaker ID from their parent utterance. Verify this matches expected behavior (see convert_to_batch_response around lines 283-299)
  • Diarization is hardcoded to true in the transcript request (line 204). Consider if this should be configurable via ListenParams
  • Test with different audio formats: The adapter detects MIME type from file extension (wav, mp3, ogg, flac, m4a, webm). Verify behavior with various formats.

Recommended test plan:

GLADIA_API_KEY="your-key" cargo test -p owhisper-client test_gladia_batch_transcription -- --ignored --nocapture

Notes

  • Integration test added and verified to pass against https://api.gladia.io/v2 with the provided test API key (took ~4 minutes for a 20-second audio file)
  • This PR is based on devin/1764853636-gladia-realtime-stt (PR feat: add Gladia realtime STT adapter #2115) since both touch the gladia module
  • Cargo check and tests pass locally

Updates since last revision

  • Merged with origin/main to resolve merge conflicts in mod.rs and live.rs
  • Preserved port handling improvements from main in build_ws_url_from_base
  • Preserved session channel tracking improvements from main in live.rs
  • Fixed punctuated_word to use trimmed value consistently with word field (both now use the same trimmed string)

Requested by: @yujonglee (yujonglee.dev@gmail.com)
Devin Session: https://app.devin.ai/sessions/ef1a5751c1424a0bbfa92ffe14a4354b

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Warning

Rate limit exceeded

@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 50 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa9449 and ebfa5e2.

📒 Files selected for processing (1)
  • owhisper/owhisper-client/src/adapter/gladia/batch.rs (1 hunks)
📝 Walkthrough

Walkthrough

This PR adds a Gladia batch transcription adapter for owhisper with an asynchronous workflow that uploads audio files, configures language and diarization options, creates transcription tasks, polls for completion, and converts Gladia responses to the internal BatchResponse format.

Changes

Cohort / File(s) Summary
Gladia Batch Adapter
owhisper/owhisper-client/src/adapter/gladia/batch.rs
New module implementing batch transcription with async file upload to Gladia, optional language/diarization configuration, polling-based task completion monitoring, response conversion to BatchResponse format, and error handling. Includes ignored unit test scaffold.
Gladia Adapter Configuration
owhisper/owhisper-client/src/adapter/gladia/mod.rs
Adds API_BASE constant ("https://api.gladia.io/v2") and batch_api_url() helper method to resolve or default batch API URLs. Includes tests for host detection and URL resolution.
Public API Exports
owhisper/owhisper-client/src/lib.rs
Re-exports GladiaAdapter as public API alongside existing SonioxAdapter.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant Client as owhisper Client
    participant Gladia as Gladia API
    participant Converter as Response Converter

    Caller->>Client: transcribe_file(path, config)
    Client->>Client: Read audio bytes & infer MIME type
    Client->>Gladia: POST /upload (audio file)
    Gladia-->>Client: file_url
    Client->>Gladia: POST /pre-recorded (config + file_url)
    Gladia-->>Client: task_id + status
    
    rect rgb(200, 220, 255)
    Note over Client,Gladia: Polling Loop
    Client->>Gladia: GET /pre-recorded/{id}
    Gladia-->>Client: status (pending/done/error)
    alt status == "done"
        Client->>Converter: Convert TranscriptResponse
        Converter-->>Client: BatchResponse
    else status == "pending"
        Client->>Client: Wait & retry
    else status == "error"
        Client-->>Caller: Error
    end
    end
    
    Client-->>Caller: BatchResponse (utterances as BatchWord items)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • batch.rs: Async workflow with HTTP operations (upload, task creation, polling), JSON deserialization, error handling, and response conversion logic requires careful review for correctness of polling mechanics and type mapping.
  • mod.rs: New constant and helper function are straightforward; verify URL parsing behavior and test coverage.
  • lib.rs: Simple re-export; low risk.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add Gladia batch STT adapter' accurately summarizes the main change—adding a Gladia batch speech-to-text adapter. It is concise, specific, and directly reflects the primary objective of the changeset.
Description check ✅ Passed The pull request description is comprehensive and directly related to the changeset. It details the implementation of the Gladia batch adapter, outlines the API flow, mentions the adapter pattern followed, includes a review checklist, and provides testing guidance.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration bot and others added 5 commits December 4, 2025 13:27
- Add build_ws_url_with_api_key method to RealtimeSttAdapter trait
- Use ureq for blocking POST request to get session token
- Fix language_config format to use object with languages array
- Return None for build_auth_header since token is in URL

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…GladiaAdapter

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration devin-ai-integration bot force-pushed the devin/1764853705-gladia-batch-stt branch from 1dcd7f9 to 34ead84 Compare December 4, 2025 13:41
Base automatically changed from devin/1764853636-gladia-realtime-stt to main December 4, 2025 13:53
devin-ai-integration bot and others added 2 commits December 4, 2025 13:59
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit ebfa5e2
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/693195354a1ade000744f48f
😎 Deploy Preview https://deploy-preview-2116--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit ebfa5e2
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/6931953596d9820008058242
😎 Deploy Preview https://deploy-preview-2116--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)

58-65: Minor simplification.

The intermediate variable binding is unnecessary.

     pub(crate) fn batch_api_url(api_base: &str) -> url::Url {
         if api_base.is_empty() {
             return API_BASE.parse().expect("invalid_default_api_url");
         }
 
-        let url: url::Url = api_base.parse().expect("invalid_api_base");
-        url
+        api_base.parse().expect("invalid_api_base")
     }
owhisper/owhisper-client/src/adapter/gladia/batch.rs (1)

201-205: Consider making diarization configurable.

As noted in the PR checklist, diarization is hardcoded to true. If some users don't need speaker diarization, this could add unnecessary processing time or cost.

Consider adding a field to ListenParams or making this configurable:

 let transcript_request = TranscriptRequest {
     audio_url: upload_result.audio_url,
     language_config,
-    diarization: Some(true),
+    diarization: Some(params.diarization.unwrap_or(true)),
 };
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c913f8 and 8aa9449.

📒 Files selected for processing (3)
  • owhisper/owhisper-client/src/adapter/gladia/batch.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/gladia/mod.rs (3 hunks)
  • owhisper/owhisper-client/src/lib.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)
owhisper/owhisper-client/src/lib.rs (1)
  • api_base (41-44)
crates/pyannote-cloud/src/test_key.rs (1)
  • test (12-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Devin
🔇 Additional comments (9)
owhisper/owhisper-client/src/lib.rs (1)

14-15: LGTM!

The GladiaAdapter is correctly added to the public exports alongside existing adapters, maintaining consistency with the module's API surface.

owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)

1-1: LGTM!

Module declaration correctly added for the new batch functionality.


113-131: LGTM!

Good test coverage for both the host detection and batch URL construction, including positive and negative cases.

owhisper/owhisper-client/src/adapter/gladia/batch.rs (6)

16-30: LGTM!

Clean separation between the trait implementation and the async worker function. The PathBuf conversion handles the lifetime correctly.


32-124: LGTM!

Request and response structures are well-defined with appropriate #[serde(default)] annotations for optional fields and #[serde(skip_serializing_if = ...)] for conditional serialization.


146-154: LGTM!

MIME type detection covers the common audio formats. The fallback to application/octet-stream is a reasonable default for unrecognized extensions.


334-355: LGTM!

The integration test is appropriately marked as #[ignore] for CI and provides good coverage of the happy path. The assertions verify the essential structure of the response.


229-269: LGTM. The polling logic correctly handles completion, error, and in-progress states. The PollingConfig::default() provides 300 max attempts at the 3-second interval specified in the code, yielding a 15-minute timeout window—reasonable for batch transcription where processing times can vary significantly. This comfortably accommodates the ~4-minute test runs mentioned in the PR.


306-312: Multi-channel information from Gladia API responses is discarded.

The Utterance struct includes a channel field that indicates which channel each utterance originated from, but this information is not used during conversion. All utterances are flattened into a single BatchChannel via flat_map (lines 277–289), losing the channel distinction. For multi-channel audio recordings (e.g., stereo or multi-speaker setups with diarization enabled), this means utterances from different channels cannot be distinguished in the final response.

While the BatchWord struct preserves speaker information (u.speaker), the source channel information is completely discarded. This appears to be an intentional design choice consistent across all adapters, but results in irretrievable loss of channel metadata when processing multi-channel audio.

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@yujonglee yujonglee merged commit f383715 into main Dec 4, 2025
9 of 13 checks passed
@yujonglee yujonglee deleted the devin/1764853705-gladia-batch-stt branch December 4, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants