Binary diarization #1102

yujonglee · 2025-07-07T17:24:36Z

No description provided.

coderabbitai · 2025-07-07T17:24:42Z

📝 Walkthrough

Walkthrough

This set of changes introduces support for dual audio channels throughout the codebase. It adds a new DualAudio variant to the ListenInputChunk enum, updates builders and clients to handle both single and dual audio modes, and generalizes WebSocket and audio processing logic to accommodate dual-channel data. Metadata handling is streamlined, and new utility functions and types are introduced for audio sample conversion and configuration.

Changes

Files/Paths	Change Summary
`plugins/listener-interface/src/lib.rs` `plugins/listener-interface/Cargo.toml`	Added `DualAudio` to `ListenInputChunk`, new `AudioMode` enum, default derives, new `meta` field in `ListenOutputChunk`, updated dependencies for `strum` and `specta`.
`plugins/listener/src/client.rs`	Refactored builder to support `.build_single()` and `.build_dual()`; introduced `ListenClientDual` for dual audio; updated trait implementations and test usage.
`plugins/listener/src/fsm.rs`	Added `AudioSaver` and `AudioChannels` structs for modular audio channel management; separated mic and speaker processing; updated listen client setup to use dual audio client; refactored WAV saving logic; captured `meta` from stream results.
`crates/ws-utils/src/lib.rs` `crates/audio-utils/src/lib.rs` `crates/ws-utils/Cargo.toml`	Added `bytes_to_f32_samples` and `f32_to_i16_bytes` utilities; introduced `AudioProcessResult` enum and centralized WebSocket audio message processing; added support for dual audio channel splitting and mixing; added `ChannelAudioSource` for channel-based async audio streaming; added dependencies for audio utilities.
`crates/ws/src/client.rs` `crates/whisper-cloud/src/client.rs`	Generalized `WebSocketIO` trait with associated `Data` type; updated method signatures for flexible input types.
`plugins/local-stt/src/server.rs`	Split WebSocket handler by audio mode; added `websocket_dual_channel` function processing separate mic and speaker streams; refactored single channel logic; included chunk metadata in output; used default struct initialization.
`crates/whisper-local/src/model.rs` `crates/whisper-local/src/stream.rs`	Renamed `metadata` to `meta` in structs and trait methods; changed from reference to owned value for metadata; updated related APIs and logic; simplified metadata cloning in transcription processing.
`crates/stt/src/realtime/clova.rs` `crates/stt/src/realtime/deepgram.rs` `crates/stt/src/realtime/whisper.rs`	Used struct update syntax (`..Default::default()`) for `ListenOutputChunk` to ensure all fields are initialized, including new `meta`.
`apps/app/server/src/native/listen/realtime.rs`	Added match arm for `DualAudio` in input stream processing with a `todo!()` placeholder.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ListenClientBuilder
    participant ListenClient / ListenClientDual
    participant WebSocketServer

    Client->>ListenClientBuilder: build_single()/build_dual()
    ListenClientBuilder->>ListenClient: returns ListenClient (single) or ListenClientDual (dual)
    Client->>ListenClient: from_audio(audio_stream) (Single)
    Client->>ListenClientDual: from_audio(mic_stream, speaker_stream) (Dual)
    ListenClientDual->>ListenClientDual: zip mic and speaker streams
    ListenClient / ListenClientDual->>WebSocketServer: send ListenInputChunk::Audio or ::DualAudio
    WebSocketServer->>WebSocketServer: process input (mix if DualAudio)
    WebSocketServer-->>Client: ListenOutputChunk (with meta)

Possibly related PRs

fastrepl/hyprnote#1015: Introduces the DualAudio variant and related dual audio client/server support, including mixing dual audio streams; both PRs modify the same enum variant and related websocket audio processing code.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Clippy (1.86.0)

Updating git repository `https://github.com/RustAudio/cpal`

error: failed to load source for dependency cpal

Caused by:
Unable to update https://github.com/RustAudio/cpal?rev=51c3b43#51c3b43c

Caused by:
failed to create directory /usr/local/git/db/cpal-476cd1dd23dbc279

Caused by:
Permission denied (os error 13)

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bea836b and 039d65f.

📒 Files selected for processing (2)

plugins/local-stt/src/lib.rs (1 hunks)
plugins/local-stt/src/server.rs (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{js,ts,tsx,rs}`: 1. No error handling. 2. No unused imports, variables, or functions. 3. For comments, keep it minimal. It should be about "Why", not "What".

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

⚙️ Source: CodeRabbit Configuration File

List of files the instruction was applied to:

plugins/local-stt/src/lib.rs
plugins/local-stt/src/server.rs

🧬 Code Graph Analysis (1)

plugins/local-stt/src/server.rs (4)

crates/ws-utils/src/lib.rs (4)

new (54-59)

new (95-100)

serde_json (17-17)

split_dual_audio_sources (117-145)

crates/whisper-local/src/stream.rs (6)

transcribe (47-47)

transcribe (57-64)

transcribe (68-68)

transcribe (76-83)

meta (22-22)

meta (36-38)

plugins/local-stt/js/bindings.gen.ts (2)

SpeakerIdentity (68-68)

Word (70-70)

plugins/listener/js/bindings.gen.ts (2)

SpeakerIdentity (76-76)

Word (77-77)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ci
GitHub Check: ci (macos, macos-latest)
GitHub Check: ci (windows, windows-latest)

🔇 Additional comments (5)

plugins/local-stt/src/lib.rs (1)

140-140: LGTM!

The test correctly uses the new build_single() method, aligning with the API refactoring that split the builder into single and dual audio modes.

plugins/local-stt/src/server.rs (4)

134-144: Clean implementation of audio mode dispatch.

The socket splitting and mode-based dispatch provide a clear separation between single and dual channel processing.

146-167: Correct single channel implementation.

The function properly handles single audio streams with appropriate "mixed" source metadata.

169-206: Well-structured dual channel processing.

The implementation correctly splits, processes, and merges both audio channels with appropriate metadata for speaker identification.

222-242: Clear speaker identity mapping.

The metadata extraction and speaker assignment correctly maps audio sources to speaker indices for diarization.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

crates/whisper-local/src/stream.rs (1)

36-38: Inefficient cloning on every metadata access

This implementation clones the metadata on every call to meta(), which could be costly for large JSON values.

plugins/listener/src/client.rs (1)

60-88: No error handling in builder methods

Both build_single and build_dual use unwrap() on URI parsing, which could panic.

🧹 Nitpick comments (5)

apps/app/server/src/native/listen/realtime.rs (1)

45-47: Complete the dual audio implementation.

The todo!() placeholder should be replaced with proper dual audio processing logic. This is part of the staged implementation for dual audio support.

Do you want me to help implement the dual audio processing logic based on the patterns used in other files, or should this be tracked as a separate task?

plugins/local-stt/src/server.rs (1)

213-220: Dual channel function needs implementation

The dual channel WebSocket handler is currently stubbed out. This is acceptable for the current stage but should be implemented to complete the dual audio support.

Do you want me to generate a basic implementation structure for the dual channel function or create an issue to track this implementation?
crates/whisper-local/src/stream.rs (2)
20-23: Performance concern: Returning owned values instead of references

The change from returning &Option<serde_json::Value> to Option<serde_json::Value> forces a clone on every access. For large metadata objects, this could impact performance significantly.

Consider keeping the reference return type if metadata is frequently accessed:
-    fn meta(&self) -> Option<serde_json::Value>;
+    fn meta(&self) -> Option<&serde_json::Value>;
Or if ownership is truly needed, consider using Cow<'_, serde_json::Value> to allow both borrowed and owned values.

191-193: Consider optimizing for None metadata case

While Option::clone() is cheap for None, you could avoid the iteration entirely when there's no metadata to assign.
-                for segment in &mut segments {
-                    segment.meta = meta.clone();
-                }
+                if let Some(ref meta_value) = meta {
+                    for segment in &mut segments {
+                        segment.meta = Some(meta_value.clone());
+                    }
+                }
plugins/listener/src/client.rs (1)
129-134: Performance: Avoid unnecessary byte conversions

Converting Bytes to Vec<u8> with to_vec() creates unnecessary copies of audio data.

If ListenInputChunk::DualAudio can accept bytes::Bytes directly:
-        ListenInputChunk::DualAudio {
-            mic: data.0.to_vec(),
-            speaker: data.1.to_vec(),
-        }
+        ListenInputChunk::DualAudio {
+            mic: data.0,
+            speaker: data.1,
+        }

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a453748 and e0e19cb.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

apps/app/server/src/native/listen/realtime.rs (1 hunks)
crates/audio-utils/src/lib.rs (3 hunks)
crates/stt/src/realtime/clova.rs (1 hunks)
crates/stt/src/realtime/deepgram.rs (1 hunks)
crates/stt/src/realtime/whisper.rs (1 hunks)
crates/whisper-cloud/src/client.rs (1 hunks)
crates/whisper-local/src/model.rs (2 hunks)
crates/whisper-local/src/stream.rs (4 hunks)
crates/ws-utils/Cargo.toml (1 hunks)
crates/ws-utils/src/lib.rs (2 hunks)
crates/ws/src/client.rs (2 hunks)
plugins/listener-interface/Cargo.toml (1 hunks)
plugins/listener-interface/src/lib.rs (4 hunks)
plugins/listener/Cargo.toml (1 hunks)
plugins/listener/src/client.rs (7 hunks)
plugins/listener/src/fsm.rs (2 hunks)
plugins/local-stt/src/server.rs (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{js,ts,tsx,rs}`: 1. No error handling. 2. No unused imports, variables, or functions. 3. For comments, keep it minimal. It should be about "Why", not "What".

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

⚙️ Source: CodeRabbit Configuration File

List of files the instruction was applied to:

crates/stt/src/realtime/whisper.rs
crates/stt/src/realtime/clova.rs
apps/app/server/src/native/listen/realtime.rs
crates/whisper-cloud/src/client.rs
crates/ws-utils/src/lib.rs
crates/whisper-local/src/model.rs
plugins/listener/src/fsm.rs
crates/audio-utils/src/lib.rs
crates/stt/src/realtime/deepgram.rs
crates/ws/src/client.rs
plugins/listener-interface/src/lib.rs
plugins/local-stt/src/server.rs
crates/whisper-local/src/stream.rs
plugins/listener/src/client.rs

🧬 Code Graph Analysis (3)

crates/ws-utils/src/lib.rs (1)

crates/audio-utils/src/lib.rs (1)

bytes_to_f32_samples (51-58)

crates/whisper-local/src/model.rs (1)

crates/whisper-local/src/stream.rs (2)

meta (22-22)

meta (36-38)

crates/ws/src/client.rs (2)

plugins/listener/src/client.rs (2)

to_input (101-105)

to_input (129-134)

crates/whisper-cloud/src/client.rs (1)

to_input (94-96)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ci (macos, macos-latest)
GitHub Check: ci (windows, windows-latest)
GitHub Check: ci

🔇 Additional comments (33)

crates/audio-utils/src/lib.rs (3)

8-8: Good addition of named constant for maintainability.

Replacing the magic number 32768.0 with a named constant improves code readability and maintainability.

26-26: Consistent use of I16_SCALE constant.

Good refactoring to use the named constant instead of hardcoded values across all conversion functions.

Also applies to: 37-37, 45-45

51-58: New utility function looks correct.

The bytes_to_f32_samples function correctly converts little-endian i16 byte pairs to normalized f32 samples using the established scaling constant.

crates/ws-utils/Cargo.toml (1)

7-7: Appropriate dependency addition.

Adding the hypr-audio-utils dependency makes sense for utilizing the new audio conversion utilities.

plugins/listener/Cargo.toml (1)

58-59: Dependencies added for dual audio support.

The addition of statig with async features and reorganization of dependencies aligns with the dual audio functionality being introduced.

crates/stt/src/realtime/clova.rs (1)

39-39: Good use of default initialization.

Using ..Default::default() ensures all fields in ListenOutputChunk are properly initialized, which is especially important as the struct is being extended with new fields.

crates/stt/src/realtime/whisper.rs (1)

33-33: Consistent default initialization pattern.

Using ..Default::default() maintains consistency with other STT implementations and ensures proper field initialization.

plugins/listener-interface/Cargo.toml (1)

12-12: LGTM! Dependency additions support new dual audio functionality.

The strum dependency with derive feature and serde_json feature for specta are necessary to support the new AudioMode enum and metadata serialization capabilities introduced in this PR.

Also applies to: 17-17

crates/stt/src/realtime/deepgram.rs (1)

85-88: LGTM! Improved struct initialization with default values.

The struct update syntax ..Default::default() ensures all fields are properly initialized with default values, which is especially important with the addition of the new optional meta field to ListenOutputChunk.

crates/whisper-cloud/src/client.rs (1)

90-90: LGTM! Trait generalization supports flexible input data types.

The addition of the associated type Data and its usage in the to_input method enables the WebSocketIO trait to handle both single and dual audio modes. This maintains backward compatibility while supporting the new dual audio functionality.

Also applies to: 94-94

plugins/listener/src/fsm.rs (2)

357-357: Metadata capture for future use.

The _meta variable captures metadata from stream results, which aligns with the unified metadata handling improvements in this PR. The underscore prefix indicates it's currently unused but available for future functionality.

463-463: LGTM! Explicit single audio mode specification.

The change from .build() to .build_single() explicitly distinguishes between single and dual audio modes, which is essential for the new dual audio support while maintaining existing functionality.

crates/whisper-local/src/model.rs (2)

240-240: LGTM - Consistent field renaming

The field rename from metadata to meta aligns with the broader refactoring across the codebase for metadata handling standardization.

264-266: LGTM - Consistent method signature change

The method change from returning a reference to returning an owned value by cloning is consistent with the pattern established in crates/whisper-local/src/stream.rs and maintains API consistency.

crates/ws-utils/src/lib.rs (2)

4-4: LGTM - Good refactoring to use utility function

The import of bytes_to_f32_samples from the audio utils crate is a good refactoring that eliminates code duplication.

37-37: LGTM - Clean refactoring

Replacing the manual byte conversion with the utility function improves code maintainability and consistency.

crates/ws/src/client.rs (3)

10-10: LGTM - Well-designed trait generification

Adding the Data associated type with Send constraint is a good design that enables the trait to work with different data types while maintaining thread safety.

14-14: LGTM - Consistent method signature update

The method signature change to accept Self::Data instead of fixed bytes::Bytes is consistent with the trait generification and enables flexible input handling.

30-30: LGTM - Properly updated generic constraint

The from_audio method correctly uses the generic T::Data type, enabling it to work with both single and dual audio stream clients as shown in the relevant code snippets.

plugins/listener-interface/src/lib.rs (6)

19-19: LGTM - Useful Default derive

Adding Default to the Word struct reduces boilerplate and makes the API more convenient to use.

40-40: LGTM - Consistent Default derive

Adding Default to ListenOutputChunk is consistent with the pattern and useful for initialization.

42-42: LGTM - Consistent metadata field addition

The meta field addition aligns with the metadata handling standardization across the codebase.

55-61: LGTM - Well-structured dual audio variant

The DualAudio variant is well-structured with proper serde_bytes serialization for both mic and speaker fields, maintaining consistency with the existing Audio variant.

73-89: LGTM - Clean enum design

The AudioMode enum is well-designed with appropriate serde and strum attributes, clear variant names, and a sensible default of Single.

94-94: LGTM - Consistent parameter addition

Adding the audio_mode field to ListenParams is consistent with the dual audio support and has an appropriate default value.

plugins/local-stt/src/server.rs (5)

134-144: LGTM - Clean audio mode dispatch

The refactoring to split socket handling and dispatch based on audio mode is well-structured and maintains clear separation of concerns.

146-151: LGTM - Consistent function signature

The function signature change to accept split socket streams is consistent with the new architecture and enables better control over WebSocket handling.

153-153: LGTM - Good use of fully qualified paths

Using the fully qualified path for WebSocketAudioSource is good practice and avoids potential import conflicts.

160-160: LGTM - Cleaner initialization

Using ..Default::default() instead of explicit field initialization is cleaner and more maintainable.

175-187: LGTM - Proper metadata handling

The metadata extraction and propagation is correctly implemented and consistent with the broader metadata handling changes.

crates/whisper-local/src/stream.rs (1)

154-164: LGTM! Clean separation of metadata extraction

Good practice to extract the metadata before consuming the chunk, ensuring proper ownership handling.
plugins/listener/src/client.rs (2)

164-173: LGTM! Clean dual stream implementation

Good use of zip to combine the mic and speaker streams into a single dual stream.

32-58: Critical: No error handling for URL parsing

The method uses multiple unwrap() calls that could panic if:

The API base URL is malformed

The URL doesn't have a host

Setting the scheme fails

Replace unwrap() calls with proper error handling:
-    fn build_uri(&self, audio_mode: hypr_listener_interface::AudioMode) -> String {
-        let mut url: url::Url = self.api_base.as_ref().unwrap().parse().unwrap();
+    fn build_uri(&self, audio_mode: hypr_listener_interface::AudioMode) -> Result<String, url::ParseError> {
+        let api_base = self.api_base.as_ref().ok_or_else(|| url::ParseError::EmptyHost)?;
+        let mut url: url::Url = api_base.parse()?;
Also consider using a more robust host detection:
-        let host = url.host_str().unwrap();
+        let host = url.host_str().ok_or_else(|| url::ParseError::EmptyHost)?;
Likely an incorrect or invalid review comment.

crates/ws-utils/src/lib.rs

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

crates/ws-utils/src/lib.rs (1)

37-46: Apply scaling factor to prevent audio clipping.

The audio mixing adds samples without scaling, which can cause clipping. Consider applying a scaling factor as suggested in the previous review.

+/// Scale factor applied to mixed audio to prevent clipping.
+const DEFAULT_MIX_SCALE: f32 = 0.9;
+
 fn mix_audio_channels(mic: &[f32], speaker: &[f32]) -> Vec<f32> {
     let max_len = mic.len().max(speaker.len());
     (0..max_len)
         .map(|i| {
             let mic_sample = mic.get(i).copied().unwrap_or(0.0);
             let speaker_sample = speaker.get(i).copied().unwrap_or(0.0);
-            (mic_sample + speaker_sample).clamp(-1.0, 1.0)
+            ((mic_sample + speaker_sample) * DEFAULT_MIX_SCALE).clamp(-1.0, 1.0)
         })
         .collect()
 }

🧹 Nitpick comments (3)

plugins/local-stt/src/server.rs (1)
153-153: Extract the hardcoded buffer size as a constant.

The buffer size 16 * 1000 is used in multiple places. Consider extracting it as a named constant for better maintainability.
+const AUDIO_BUFFER_SIZE: usize = 16 * 1000;
+
 async fn websocket_single_channel(
     // ...
 ) {
     let mut stream = {
-        let audio_source = hypr_ws_utils::WebSocketAudioSource::new(ws_receiver, 16 * 1000);
+        let audio_source = hypr_ws_utils::WebSocketAudioSource::new(ws_receiver, AUDIO_BUFFER_SIZE);
         // ...
     };
     // ...
 }

 async fn websocket_dual_channel(
     // ...
 ) {
     let (mic_source, speaker_source) =
-        hypr_ws_utils::split_dual_audio_sources(ws_receiver, 16 * 1000);
+        hypr_ws_utils::split_dual_audio_sources(ws_receiver, AUDIO_BUFFER_SIZE);
     // ...
 }
Also applies to: 220-220
crates/ws-utils/src/lib.rs (1)
121-122: Consider using bounded channels to prevent memory issues.

The unbounded channels could cause memory exhaustion if the consumer is slower than the producer.
-    let (mic_tx, mic_rx) = unbounded_channel::<Vec<f32>>();
-    let (speaker_tx, speaker_rx) = unbounded_channel::<Vec<f32>>();
+    const CHANNEL_BUFFER_SIZE: usize = 256;
+    let (mic_tx, mic_rx) = tokio::sync::mpsc::channel::<Vec<f32>>(CHANNEL_BUFFER_SIZE);
+    let (speaker_tx, speaker_rx) = tokio::sync::mpsc::channel::<Vec<f32>>(CHANNEL_BUFFER_SIZE);
plugins/listener/src/fsm.rs (1)
454-454: Remove or implement the unused metadata handling.

The _meta variable is extracted but never used. Either remove it or implement the intended functionality.
-                    let _meta = result.meta.clone();
-
                     // We don't have to do this, and inefficient. But this is what works at the moment.
Or if metadata is needed for future features, add a TODO comment explaining the planned usage.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0e19cb and bea836b.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

crates/audio-utils/src/lib.rs (3 hunks)
crates/ws-utils/Cargo.toml (1 hunks)
crates/ws-utils/src/lib.rs (3 hunks)
plugins/listener/src/client.rs (7 hunks)
plugins/listener/src/fsm.rs (9 hunks)
plugins/local-stt/src/server.rs (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

crates/ws-utils/Cargo.toml
crates/audio-utils/src/lib.rs

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{js,ts,tsx,rs}`: 1. No error handling. 2. No unused imports, variables, or functions. 3. For comments, keep it minimal. It should be about "Why", not "What".

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

⚙️ Source: CodeRabbit Configuration File

List of files the instruction was applied to:

plugins/local-stt/src/server.rs
crates/ws-utils/src/lib.rs
plugins/listener/src/fsm.rs
plugins/listener/src/client.rs

🧬 Code Graph Analysis (1)

plugins/listener/src/fsm.rs (1)

crates/audio-utils/src/lib.rs (1)

f32_to_i16_bytes (51-58)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ci (windows, windows-latest)
GitHub Check: ci (macos, macos-latest)
GitHub Check: ci

🔇 Additional comments (6)

plugins/local-stt/src/server.rs (1)

134-144: LGTM! Clean separation of audio mode handling.

The WebSocket split and dispatch pattern based on audio mode is well-structured.

crates/ws-utils/src/lib.rs (1)

127-130: Clarify the handling of single audio samples.

When receiving single audio samples, the same data is sent to both mic and speaker channels. This duplicates the audio, which might not be the intended behavior.

Is this duplication intentional? If not, consider sending to only one channel or documenting why duplication is needed.

plugins/listener/src/client.rs (1)

32-58: LGTM! Well-structured URI builder.

The method properly handles URL construction with appropriate scheme selection and parameter encoding.

plugins/listener/src/fsm.rs (3)

25-54: Excellent refactoring with AudioSaver!

The AudioSaver struct provides a clean abstraction for WAV file operations, eliminating code duplication and improving maintainability.

75-75: Verify the buffer size reduction impact.

The CHUNK_BUFFER_SIZE has been reduced from 256 to 64 (75% reduction). This could impact performance by causing more frequent channel operations.

Has this change been tested under load? Consider documenting the rationale for this reduction or making it configurable.

353-364: Good optimization for recording logic!

Computing the mixed audio only when recording is enabled is an excellent optimization that reduces unnecessary CPU usage.

plugins/local-stt/src/server.rs

plugins/listener/src/client.rs

plugins/listener/src/fsm.rs

yujonglee added 4 commits June 27, 2025 18:09

metadata stuffs

f8a37ea

single audio works

afc1f2c

rename

e9afcb4

various fixes and refactors

70a0e70

yujonglee added 2 commits July 7, 2025 10:25

merge

cf63264

deps

e0e19cb

coderabbitai bot reviewed Jul 7, 2025

View reviewed changes

crates/ws-utils/src/lib.rs Outdated Show resolved Hide resolved

yujonglee added 4 commits July 7, 2025 11:16

dual audio works

9ef1a0e

did some refactor on fsm

35215ad

got it working for speaker only

5dfe472

ws refactor

bea836b

coderabbitai bot reviewed Jul 7, 2025

View reviewed changes

plugins/local-stt/src/server.rs Outdated Show resolved Hide resolved

plugins/listener/src/client.rs Show resolved Hide resolved

plugins/listener/src/fsm.rs Show resolved Hide resolved

got speaker identity assignment

039d65f

yujonglee merged commit 6daea41 into main Jul 7, 2025
5 checks passed

yujonglee deleted the binary-diarization branch July 7, 2025 22:27

coderabbitai bot mentioned this pull request Aug 9, 2025

Deepgram compat v2 #1307

Merged

coderabbitai bot mentioned this pull request Sep 9, 2025

Migrate to actor for audio processing #1457

Merged

coderabbitai bot mentioned this pull request Nov 7, 2025

Batch transcribe support #1638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Binary diarization #1102

Binary diarization #1102

Uh oh!

yujonglee commented Jul 7, 2025

Uh oh!

coderabbitai bot commented Jul 7, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Binary diarization #1102

Binary diarization #1102

Uh oh!

Conversation

yujonglee commented Jul 7, 2025

Uh oh!

coderabbitai bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jul 7, 2025 •

edited

Loading