-
Notifications
You must be signed in to change notification settings - Fork 415
Deepgram compat v2 #1307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepgram compat v2 #1307
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis change introduces two new Rust workspace crates, Changes
Sequence Diagram(s)sequenceDiagram
participant DesktopApp
participant LocalSTTPlugin
participant hypr-am
participant hypr-moonshine
DesktopApp->>LocalSTTPlugin: Set/Get Current Model (WhisperModel)
LocalSTTPlugin->>LocalSTTPlugin: Use am_api_key for initialization
LocalSTTPlugin->>hypr-am: (On request) Initialize or check status via HTTP
LocalSTTPlugin->>hypr-moonshine: (If needed) Load or use moonshine models
sequenceDiagram
participant Client
participant Server
participant Plugin
Client->>Server: Send audio stream (binary, channels=1 or 2)
Server->>Plugin: Forward/process audio data
Plugin->>Server: Return StreamResponse (transcript, metadata, words, etc.)
Server->>Client: Send StreamResponse (JSON)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (24)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic analysis
8 issues found across 37 files • Review in cubic
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.
| }; | ||
|
|
||
| let msg = Message::Text(serde_json::to_string(&data).unwrap().into()); | ||
| let msg = Message::Text(serde_json::to_string(&response).unwrap().into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap() will panic on serialization failure, potentially tearing down the WebSocket connection; propagate or log the error instead of panicking. (This reflects your team's feedback about avoiding panics in production code.)
Prompt for AI agents
Address the following comment on crates/transcribe-whisper-local/src/service/streaming.rs at line 253:
<comment>unwrap() will panic on serialization failure, potentially tearing down the WebSocket connection; propagate or log the error instead of panicking. (This reflects your team's feedback about avoiding panics in production code.)</comment>
<file context>
@@ -200,37 +200,57 @@ async fn process_transcription_stream(
let meta = chunk.meta();
let text = chunk.text().to_string();
- let start = chunk.start() as u64;
- let duration = chunk.duration() as u64;
- let confidence = chunk.confidence();
+ let language = chunk.language().map(|s| s.to_string()).map(|s| vec![s]).unwrap_or_default();
+ let start_f64 = chunk.start() as f64;
+ let duration_f64 = chunk.duration() as f64;
</file context>
| confidence, | ||
| }], | ||
| }, | ||
| metadata: Metadata::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new Metadata instance is created for every chunk, producing different request_id/model_uuid values within a single stream; these identifiers should remain constant across the connection to let clients correlate results.
Prompt for AI agents
Address the following comment on crates/transcribe-whisper-local/src/service/streaming.rs at line 249:
<comment>A new Metadata instance is created for every chunk, producing different request_id/model_uuid values within a single stream; these identifiers should remain constant across the connection to let clients correlate results.</comment>
<file context>
@@ -200,37 +200,57 @@ async fn process_transcription_stream(
let meta = chunk.meta();
let text = chunk.text().to_string();
- let start = chunk.start() as u64;
- let duration = chunk.duration() as u64;
- let confidence = chunk.confidence();
+ let language = chunk.language().map(|s| s.to_string()).map(|s| vec![s]).unwrap_or_default();
+ let start_f64 = chunk.start() as f64;
+ let duration_f64 = chunk.duration() as f64;
</file context>
| while let Some(chunk) = stream.next().await { | ||
| let msg = Message::Text(serde_json::to_string(&chunk).unwrap().into()); | ||
| while let Some(response) = stream.next().await { | ||
| let msg = Message::Text(serde_json::to_string(&response).unwrap().into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unwrap() on JSON serialization can panic and crash the service; handle the Result instead
Prompt for AI agents
Address the following comment on crates/transcribe-moonshine/src/service/streaming.rs at line 211:
<comment>Using unwrap() on JSON serialization can panic and crash the service; handle the Result instead</comment>
<file context>
@@ -205,10 +205,10 @@ async fn handle_dual_channel(
async fn process_transcription_stream(
mut ws_sender: futures_util::stream::SplitSink<WebSocket, Message>,
- mut stream: Pin<Box<dyn futures_util::Stream<Item = ListenOutputChunk> + Send>>,
+ mut stream: Pin<Box<dyn futures_util::Stream<Item = StreamResponse> + Send>>,
) {
- while let Some(chunk) = stream.next().await {
- let msg = Message::Text(serde_json::to_string(&chunk).unwrap().into());
+ while let Some(response) = stream.next().await {
</file context>
|
|
||
| #[tokio::test] | ||
| #[ignore] | ||
| // cargo test -p owhisper-client test_client_deepgram -- --nocapture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration test is no longer marked #[ignore], meaning it will execute during every cargo test run and fail if network connectivity or the DEEPGRAM_API_KEY env var is missing (Based on your team's feedback about keeping flaky/external-dependency tests opt-in).
Prompt for AI agents
Address the following comment on owhisper/owhisper-client/src/lib.rs at line 216:
<comment>Integration test is no longer marked `#[ignore]`, meaning it will execute during every `cargo test` run and fail if network connectivity or the DEEPGRAM_API_KEY env var is missing (Based on your team's feedback about keeping flaky/external-dependency tests opt-in).</comment>
<file context>
@@ -187,7 +213,7 @@ mod tests {
use hypr_audio_utils::AudioFormatExt;
#[tokio::test]
- #[ignore]
+ // cargo test -p owhisper-client test_client_deepgram -- --nocapture
async fn test_client_deepgram() {
let audio = rodio::Decoder::new(std::io::BufReader::new(
</file context>
| run: | | ||
| chmod +x apps/desktop/src-tauri/binaries/stt-${{ matrix.target }} | ||
| ./scripts/sidecar.sh "apps/desktop/${{ env.TAURI_CONF_PATH }}" "binaries/stt" | ||
| run: chmod +x ./apps/desktop/src-tauri/binaries/stt-${{ matrix.target }} && ./scripts/sidecar.sh "./apps/desktop/${{ env.TAURI_CONF_PATH }}" "binaries/stt" |
Check warning
Code scanning / zizmor
code injection via template expansion Warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
🔭 Outside diff range comments (5)
crates/file/src/lib.rs (1)
153-171: Graceful fallback without Content-Length is good; add HEAD status guard and typed headersNice improvement: falling back to serial when size is unknown or small. Two robustness nitpicks:
- Treat non-success HEAD statuses like “no ranges/unknown size” and fall back to serial.
- Use typed header constants for case safety.
Suggested changes:
- let head_response = get_client().head(url.clone()).send().await?; - let total_size = get_content_length_from_headers(&head_response); + let head_response = get_client().head(url.clone()).send().await?; + let total_size = get_content_length_from_headers(&head_response); + let head_ok = head_response.status().is_success(); @@ - let supports_ranges = head_response + let supports_ranges = head_response .headers() - .get("accept-ranges") + .get(reqwest::header::ACCEPT_RANGES) .map(|v| v.to_str().unwrap_or("")) .unwrap_or("") == "bytes"; @@ - if !supports_ranges || total_size.unwrap_or(0) <= DEFAULT_CHUNK_SIZE { + if !head_ok || !supports_ranges || total_size.unwrap_or(0) <= DEFAULT_CHUNK_SIZE { return download_file_with_callback(url, output_path, move |progress| { progress_callback(progress) }) .await; }And add this import near the existing reqwest imports:
use reqwest::header;plugins/listener/src/fsm.rs (1)
601-603: Stop logging API keys (secrets leak)This logs the API key in plaintext. Remove it or redact. Secrets should never be written to logs.
- tracing::info!(api_base = ?api_base, api_key = ?api_key, languages = ?languages, "listen_client"); + tracing::info!(api_base = ?api_base, languages = ?languages, "listen_client"); + // If needed for debugging, log whether a key was present without revealing it: + // tracing::debug!(api_key_present = api_key.as_ref().map(|k| !k.is_empty()).unwrap_or(false), "listen_client");owhisper/owhisper-server/src/commands/run/state.rs (1)
59-85: Don’t drop transcripts when words are empty; use the provided transcript and avoid converting to Word2.StreamResponse already carries
alternatives[n].transcript. Today you return early ifwordsis empty and also pay an unnecessary clone + conversion cost. Prefer the transcript; fall back to words only if transcript is empty.Apply this refactor within the changed range:
- pub fn process_chunk(&mut self, chunk: owhisper_interface::StreamResponse) { - let words = match chunk { - owhisper_interface::StreamResponse::TranscriptResponse { channel, .. } => channel - .alternatives - .first() - .map(|alt| { - alt.words - .iter() - .map(|w| owhisper_interface::Word2::from(w.clone())) - .collect::<Vec<_>>() - }) - .unwrap_or_default(), - _ => { - return; - } - }; - - if words.is_empty() { - return; - } - - let text = words - .iter() - .map(|w| w.text.as_str()) - .collect::<Vec<_>>() - .join(" "); + pub fn process_chunk(&mut self, chunk: owhisper_interface::StreamResponse) { + let text = match chunk { + owhisper_interface::StreamResponse::TranscriptResponse { channel, .. } => { + if let Some(alt) = channel.alternatives.first() { + if !alt.transcript.is_empty() { + alt.transcript.clone() + } else if !alt.words.is_empty() { + alt.words + .iter() + .map(|w| w.word.as_str()) + .collect::<Vec<_>>() + .join(" ") + } else { + return; + } + } else { + return; + } + } + _ => return, + };crates/transcribe-moonshine/src/service/streaming.rs (1)
251-253: Remove unwraps on mutex lock and transcribe.Both can fail (poisoned lock, model error). Handle gracefully to avoid panics/disconnects.
- let text = { - let mut model_guard = model.lock().unwrap(); - model_guard.transcribe(chunk.samples).unwrap() - }; + let text = { + let mut model_guard = match model.lock() { + Ok(g) => g, + Err(e) => { + tracing::error!("moonshine_model_lock_poisoned: {}", e); + return None; + } + }; + match model_guard.transcribe(chunk.samples) { + Ok(t) => t, + Err(e) => { + tracing::error!("moonshine_transcribe_failed: {}", e); + return None; + } + } + };owhisper/owhisper-client/src/lib.rs (1)
215-241: Mark external Deepgram test as ignored (network + secret).This requires an external API and env var; it will be flaky in CI.
- #[tokio::test] + #[tokio::test] + #[ignore = "requires valid DEEPGRAM_API_KEY and external network"]Alternatively, gate on the env var:
- let client = ListenClient::builder() + let api_key = match std::env::var("DEEPGRAM_API_KEY") { + Ok(k) if !k.is_empty() => k, + _ => return, + }; + let client = ListenClient::builder() .api_base("https://api.deepgram.com") - .api_key(std::env::var("DEEPGRAM_API_KEY").unwrap()) + .api_key(api_key)
🧹 Nitpick comments (18)
crates/moonshine/src/lib.rs (1)
1-5: Public re-exports look good
mod+pub useis a clean, minimal surface. Consider adding a short crate-level doc comment explaining the crate’s purpose (the “why”).apps/desktop/src/components/toast/model-select.tsx (1)
10-10: Hoist englishModels and make it constAvoid re-allocating on each call and keep it immutable.
Example:
-import { commands as localSttCommands, type WhisperModel } from "@hypr/plugin-local-stt"; +import { commands as localSttCommands, type WhisperModel } from "@hypr/plugin-local-stt"; +const ENGLISH_MODELS = [ + "QuantizedTinyEn", + "QuantizedBaseEn", + "QuantizedSmallEn", +] as const satisfies readonly WhisperModel[]; export async function showModelSelectToast(language: string) { const currentModel = await localSttCommands.getCurrentModel(); - const englishModels: WhisperModel[] = ["QuantizedTinyEn", "QuantizedBaseEn", "QuantizedSmallEn"]; + const englishModels = ENGLISH_MODELS;owhisper/owhisper-interface/Cargo.toml (1)
21-21: uuid v4: consider feature minimizationIf you don’t need default features (e.g., std), consider disabling them explicitly to keep builds lean.
For tighter control:
uuid = { workspace = true, features = ["v4"], default-features = false }Only apply if
stdor other defaults aren’t required by your usage.owhisper/owhisper-server/src/commands/run/mod.rs (1)
38-38: Avoid unnecessary clone of config.
configisn’t used after this call; cloning may be expensive. Pass it by value as before.- let server = Server::new(config.clone(), None); + let server = Server::new(config, None);plugins/local-stt/src/lib.rs (1)
20-20: Field addition looks fine, but ensure consistent naming.If this key is for the “hypr-am” client, consider
am_api_keyto match crate naming.owhisper/owhisper-interface/src/stream.rs (1)
50-62: Default forMetadatacreates UUIDs — confirm intended semanticsGenerating new UUIDs in
Defaultis fine if “fresh IDs by default” is desired. If you need deterministic defaults in tests, consider zero/empty defaults and explicit ID assignment.apps/desktop/src/components/settings/components/ai/stt-view.tsx (3)
172-181: Narrow the download status typing to WhisperModel keysReturning Record<string, boolean> loses type-safety. Tighten to Partial<Record<WhisperModel, boolean>>.
- return { + const data: Partial<Record<WhisperModel, boolean>> = { "QuantizedTiny": statusChecks[0], "QuantizedTinyEn": statusChecks[1], "QuantizedBase": statusChecks[2], "QuantizedBaseEn": statusChecks[3], "QuantizedSmall": statusChecks[4], "QuantizedSmallEn": statusChecks[5], "QuantizedLargeTurbo": statusChecks[6], - } as Record<string, boolean>; + }; + return data;
196-198: Type model key arrays to prevent driftAnnotate defaultModelKeys/otherModelKeys as WhisperModel[] to catch typos at compile time.
- const defaultModelKeys = ["QuantizedTiny", "QuantizedSmall", "QuantizedLargeTurbo"]; - const otherModelKeys = ["QuantizedTinyEn", "QuantizedBase", "QuantizedBaseEn", "QuantizedSmallEn"]; + const defaultModelKeys: WhisperModel[] = ["QuantizedTiny", "QuantizedSmall", "QuantizedLargeTurbo"]; + const otherModelKeys: WhisperModel[] = ["QuantizedTinyEn", "QuantizedBase", "QuantizedBaseEn", "QuantizedSmallEn"];
9-12: Remove dead/commented-out codeLarge commented blocks add noise. Consider removing or guarding behind a feature flag when ready.
Also applies to: 254-317, 471-506
apps/desktop/src/components/welcome-modal/model-selection-view.tsx (1)
52-64: Optional: add basic error handling to the querySurface failures to dev console to aid troubleshooting.
- const supportedSTTModels = useQuery<ModelInfo[]>({ + const supportedSTTModels = useQuery<ModelInfo[]>({ queryKey: ["local-stt", "supported-models"], queryFn: async () => { const models = await localSttCommands.listSupportedModels(); const downloadedModels = await Promise.all( models.map((model) => localSttCommands.isModelDownloaded(model)), ); return models.map((model, index) => ({ model, is_downloaded: downloadedModels[index], })); }, + onError: console.error, });crates/moonshine/src/error.rs (2)
1-23: Error surface is fine; consider a crate-wide Result aliasDefining a common Result improves ergonomics and consistency.
pub enum Error { @@ Other(String), } + +pub type Result<T> = std::result::Result<T, Error>;
6-11: Consider not exposing hypr_onnx types in the public error APIIf hypr_onnx is considered an internal detail, mapping those errors into your own variants avoids leaking dependencies into downstream public APIs.
apps/desktop/src/components/welcome-modal/index.tsx (1)
313-314: Remove redundant cast
selectedSttModelis alreadyWhisperModel; the cast is unnecessary.- const sttModelExists = await localSttCommands.isModelDownloaded(selectedSttModel as WhisperModel); + const sttModelExists = await localSttCommands.isModelDownloaded(selectedSttModel);crates/transcribe-whisper-local/src/service/streaming.rs (1)
220-233: Retain punctuation and annotate per-word language to improve fidelity.Whitespace tokenization discards punctuation and per-word language. Populate
punctuated_wordand setlanguagefrom the chunk-level field to improve downstream UX.- .map(|w| Word { - word: w.to_string(), + .map(|w| Word { + word: w.to_string(), start: start_f64, end: start_f64 + duration_f64, confidence, speaker: speaker.clone(), - punctuated_word: None, - language: None, + punctuated_word: Some(w.to_string()), + language: language.first().cloned(), })crates/ws-utils/src/lib.rs (1)
18-33: Deinterleave without intermediate allocations and document channel order.Avoid materializing an intermediate
Vec<i16>and process 4-byte frames directly. Also, encode the channel order assumption (mic first) inline.-fn deinterleave_audio(data: &[u8]) -> (Vec<f32>, Vec<f32>) { - let samples: Vec<i16> = data - .chunks_exact(2) - .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]])) - .collect(); - - let mut mic = Vec::with_capacity(samples.len() / 2); - let mut speaker = Vec::with_capacity(samples.len() / 2); - - for chunk in samples.chunks_exact(2) { - mic.push(chunk[0] as f32 / 32768.0); - speaker.push(chunk[1] as f32 / 32768.0); - } - - (mic, speaker) -} +fn deinterleave_audio(data: &[u8]) -> (Vec<f32>, Vec<f32>) { + // Assumes 16-bit LE PCM, interleaved as [mic, speaker] per frame. + let frames = data.chunks_exact(4); + let mut mic = Vec::with_capacity(frames.len()); + let mut speaker = Vec::with_capacity(frames.len()); + for frame in frames { + let l = i16::from_le_bytes([frame[0], frame[1]]) as f32 / 32768.0; + let r = i16::from_le_bytes([frame[2], frame[3]]) as f32 / 32768.0; + mic.push(l); + speaker.push(r); + } + (mic, speaker) +}owhisper/owhisper-client/src/lib.rs (2)
147-152: Also accept Binary JSON in from_message.Some servers may send JSON in Binary frames. Safe to support both.
- match msg { - Message::Text(text) => serde_json::from_str::<Self::Output>(&text).ok(), - _ => None, - } + match msg { + Message::Text(text) => serde_json::from_str(&text).ok(), + Message::Binary(bin) => serde_json::from_slice(&bin).ok(), + _ => None, + }
174-179: Mirror Binary JSON support for dual client.Same reasoning as single client.
- match msg { - Message::Text(text) => serde_json::from_str::<Self::Output>(&text).ok(), - _ => None, - } + match msg { + Message::Text(text) => serde_json::from_str(&text).ok(), + Message::Binary(bin) => serde_json::from_slice(&bin).ok(), + _ => None, + }owhisper/owhisper-server/src/commands/run/realtime.rs (1)
169-176: Naming nit: "chunk" now carries StreamResponse.Consider renaming to response for clarity.
- while let Some(chunk) = response_stream.next().await { + while let Some(response) = response_stream.next().await { if should_stop.load(std::sync::atomic::Ordering::Relaxed) { break; } - if transcript_tx.send(chunk).is_err() { + if transcript_tx.send(response).is_err() { break; } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (38)
.github/workflows/desktop_cd.yaml(1 hunks)Cargo.toml(2 hunks)apps/desktop/src/components/settings/components/ai/stt-view.tsx(3 hunks)apps/desktop/src/components/toast/model-select.tsx(1 hunks)apps/desktop/src/components/toast/shared.tsx(2 hunks)apps/desktop/src/components/welcome-modal/download-progress-view.tsx(2 hunks)apps/desktop/src/components/welcome-modal/index.tsx(4 hunks)apps/desktop/src/components/welcome-modal/model-selection-view.tsx(4 hunks)crates/am/Cargo.toml(1 hunks)crates/am/src/client.rs(1 hunks)crates/am/src/error.rs(1 hunks)crates/am/src/lib.rs(1 hunks)crates/am/src/types.rs(1 hunks)crates/file/Cargo.toml(1 hunks)crates/file/src/lib.rs(2 hunks)crates/moonshine/Cargo.toml(1 hunks)crates/moonshine/src/error.rs(1 hunks)crates/moonshine/src/lib.rs(1 hunks)crates/transcribe-deepgram/src/lib.rs(0 hunks)crates/transcribe-moonshine/Cargo.toml(1 hunks)crates/transcribe-moonshine/src/error.rs(1 hunks)crates/transcribe-moonshine/src/lib.rs(0 hunks)crates/transcribe-moonshine/src/service/streaming.rs(4 hunks)crates/transcribe-whisper-local/src/lib.rs(0 hunks)crates/transcribe-whisper-local/src/service/streaming.rs(2 hunks)crates/whisper-local/src/model.rs(4 hunks)crates/ws-utils/src/lib.rs(3 hunks)owhisper/owhisper-client/src/lib.rs(10 hunks)owhisper/owhisper-interface/Cargo.toml(1 hunks)owhisper/owhisper-interface/src/stream.rs(2 hunks)owhisper/owhisper-server/src/commands/run/mod.rs(1 hunks)owhisper/owhisper-server/src/commands/run/realtime.rs(4 hunks)owhisper/owhisper-server/src/commands/run/state.rs(1 hunks)owhisper/owhisper-server/src/main.rs(1 hunks)plugins/listener/src/fsm.rs(1 hunks)plugins/local-stt/Cargo.toml(1 hunks)plugins/local-stt/js/bindings.gen.ts(2 hunks)plugins/local-stt/src/lib.rs(3 hunks)
💤 Files with no reviewable changes (3)
- crates/transcribe-moonshine/src/lib.rs
- crates/transcribe-deepgram/src/lib.rs
- crates/transcribe-whisper-local/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
apps/desktop/src/components/toast/model-select.tsxcrates/moonshine/src/lib.rsowhisper/owhisper-server/src/commands/run/mod.rsapps/desktop/src/components/welcome-modal/download-progress-view.tsxowhisper/owhisper-interface/src/stream.rsplugins/local-stt/src/lib.rscrates/whisper-local/src/model.rsapps/desktop/src/components/settings/components/ai/stt-view.tsxapps/desktop/src/components/toast/shared.tsxcrates/am/src/lib.rsowhisper/owhisper-server/src/main.rscrates/am/src/error.rsapps/desktop/src/components/welcome-modal/index.tsxcrates/file/src/lib.rsplugins/listener/src/fsm.rscrates/ws-utils/src/lib.rsapps/desktop/src/components/welcome-modal/model-selection-view.tsxcrates/moonshine/src/error.rscrates/transcribe-whisper-local/src/service/streaming.rscrates/transcribe-moonshine/src/error.rsowhisper/owhisper-server/src/commands/run/state.rsplugins/local-stt/js/bindings.gen.tscrates/transcribe-moonshine/src/service/streaming.rsowhisper/owhisper-client/src/lib.rscrates/am/src/types.rscrates/am/src/client.rsowhisper/owhisper-server/src/commands/run/realtime.rs
🧬 Code Graph Analysis (6)
apps/desktop/src/components/toast/model-select.tsx (1)
plugins/local-stt/js/bindings.gen.ts (1)
WhisperModel(66-66)
apps/desktop/src/components/welcome-modal/download-progress-view.tsx (1)
plugins/local-stt/js/bindings.gen.ts (1)
WhisperModel(66-66)
apps/desktop/src/components/settings/components/ai/stt-view.tsx (1)
plugins/local-stt/js/bindings.gen.ts (1)
WhisperModel(66-66)
apps/desktop/src/components/toast/shared.tsx (1)
plugins/local-stt/js/bindings.gen.ts (1)
WhisperModel(66-66)
apps/desktop/src/components/welcome-modal/index.tsx (1)
plugins/local-stt/js/bindings.gen.ts (1)
WhisperModel(66-66)
plugins/local-stt/js/bindings.gen.ts (3)
plugins/local-llm/js/bindings.gen.ts (1)
TAURI_CHANNEL(59-59)plugins/db/js/bindings.gen.ts (2)
Word2(182-182)SpeakerIdentity(178-178)plugins/listener/js/bindings.gen.ts (1)
SpeakerIdentity(82-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: ci (macos, macos-latest)
- GitHub Check: ci (windows, windows-latest)
🔇 Additional comments (39)
crates/file/Cargo.toml (1)
18-18: Switch to workspace hypr-s3 looks goodAligning to workspace dep improves version cohesion. No action needed.
apps/desktop/src/components/toast/model-select.tsx (1)
3-3: Type rename to WhisperModel is consistent with bindingsMatches the plugin bindings and other UI updates. Good.
owhisper/owhisper-interface/Cargo.toml (1)
20-20: strum entry looks fineChange appears to be reordering; no issues.
owhisper/owhisper-server/src/main.rs (1)
53-55: LGTM: clearer error handling with explicit exit code.crates/am/Cargo.toml (1)
1-14: No changes needed for theamcrate nameThe workspace intentionally aliases each local crate as
hypr-<crate>while keeping their intrinsic names (aec,agc,am, etc.) concise. Your discovery confirms:
- In
Cargo.toml(root):
hypr-am = { path = "crates/am", package = "am" }- All other crates follow the same pattern (
hypr-aec,hypr-agc,hypr-analytics,hypr-audio).- Plugin manifests (e.g.
plugins/local-stt/Cargo.toml) also consumehypr-am = { workspace = true }.Since this is a consistent, intentional convention, no renaming is required.
Likely an incorrect or invalid review comment.
plugins/local-stt/src/lib.rs (2)
45-45: LGTM: export WhisperModel for TS bindings.This keeps UI/TS types in sync.
96-100: State initialization OK; picks up runtime-provided key.After switching to std::env::var, this remains correct.
Cargo.toml (2)
25-25: Workspace dephypr-amadded — OKPath/package mapping follows existing convention. No issues spotted.
51-51: Dependency alignment confirmed fortranscribe-moonshine
crates/transcribe-moonshine/Cargo.tomlnow depends onhypr-moonshine = { workspace = true }and no longer referenceshypr-onnx. No further changes required.apps/desktop/src/components/welcome-modal/download-progress-view.tsx (2)
7-7: Type import updated toWhisperModel— OKMatches plugin bindings. Consistent with the broader refactor.
21-21: Prop type switched toWhisperModel— OKDownstream usage (downloadModel/setCurrentModel) remains compatible.
owhisper/owhisper-interface/src/stream.rs (1)
109-109: Confirm Deepgram crate’s support for v2 “Results” events– Cargo.toml pins the Deepgram client at
deepgram = { version = "0.6.8", default-features = false }
– A recursive grep shows three uses of
type_field: "Results"
and no occurrences of
type_field: "transcript"
in the streaming code.
● Please verify that version 0.6.8 of thedeepgramcrate (with default-features disabled) includes the v2 streaming API, which emits"Results"events. If it does not, upgrade or adjust your feature flags so that the client recognizes the new event naming.crates/transcribe-moonshine/src/error.rs (1)
7-7: Error consolidation tohypr_moonshine::Error— Approved
crates/transcribe-moonshine/src/error.rsnow routes errors throughhypr_moonshine::Erroras intended.crates/transcribe-moonshine/Cargo.tomlhashypr-moonshine = { workspace = true }and no longer referenceshypr-onnx, so the dependency was updated correctly.apps/desktop/src/components/toast/shared.tsx (2)
6-6: Type import updated toWhisperModel— OKConsistent with plugin bindings and other component updates.
55-55: Toast helper param switched toWhisperModel— OKCall sites should remain type-safe; no functional change.
apps/desktop/src/components/settings/components/ai/stt-view.tsx (2)
14-14: Type rename alignment looks goodImporting WhisperModel from the plugin matches the bindings change and keeps the UI strictly typed.
31-41: Strongly-typed metadata map is correctTyping sttModelMetadata as Record<WhisperModel, ...> ensures key coverage at compile time. Nice.
crates/am/src/lib.rs (2)
1-3: Module layout is finePrivate modules with pub re-exports keeps the public surface tidy.
5-7: Re-exports are clearRe-exporting client, error, and types provides a nice one-stop API.
apps/desktop/src/components/welcome-modal/model-selection-view.tsx (3)
8-8: Type import aligns with plugin updatesSwitching to WhisperModel is consistent with the bindings.
48-49: Callback signature update is correctonContinue now enforces a WhisperModel; good for upstream consumers.
50-51: Sensible default model"QuantizedSmall" is in the supported union; initializing selection is fine.
crates/moonshine/Cargo.toml (2)
6-10: Feature forwarding looks correct; confirm upstream features existForwarding
cuda,coreml, anddirectmltohypr-onnxis clear. Please confirmhypr-onnxactually defines these features to avoid build-time surprises. If GPU backends are optional, consider documenting the feature expectations in the README.Also applies to: 12-18
2-3: Workspace alias formoonshineis correctly configured
The root Cargo.toml defines:hypr-moonshine = { path = "crates/moonshine", package = "moonshine" }and
crates/moonshine/Cargo.tomluseshypr-moonshineas a workspace dependency (for feature propagation). No further changes needed.plugins/listener/src/fsm.rs (1)
476-504: Handle errors instead of unwraps when updating session and emitting events
unwrap()here can crash the app on DB issues or event bus failures. Handle errors and continue gracefully.- if !words.is_empty() { - let updated_words = - update_session(&app, &session.id, words).await.unwrap(); - - SessionEvent::Words { - words: updated_words, - } - .emit(&app) - .unwrap(); - } + if !words.is_empty() { + let updated_words = match update_session(&app, &session.id, words).await { + Ok(w) => w, + Err(e) => { + tracing::error!(error = ?e, "update_session_failed"); + continue; + } + }; + if let Err(e) = SessionEvent::Words { words: updated_words }.emit(&app) { + tracing::error!(error = ?e, "emit_words_event_failed"); + } + }Likely an incorrect or invalid review comment.
crates/whisper-local/src/model.rs (1)
162-169: Language propagation into segments looks goodAdding
languagetoSegmentand setting it during construction is consistent and non-breaking. Accessorlanguage()returningOption<&str>is idiomatic.apps/desktop/src/components/welcome-modal/index.tsx (1)
10-11: Type migration toWhisperModelis consistentImports, state, mutation, and handler signatures now consistently use
WhisperModel. This aligns with the plugin API. Good update.Also applies to: 74-81, 255-260
crates/transcribe-whisper-local/src/service/streaming.rs (1)
235-251: Alltype_fieldvalues correctly use"Results"
A project-wide search confirmed no occurrences of"transcript"in any consumer or test—everytype_fieldis"Results". No downstream code or tests will break.crates/transcribe-moonshine/Cargo.toml (1)
8-10: Verify hypr-moonshine exposescuda,coreml, anddirectmlfeaturesIn crates/transcribe-moonshine/Cargo.toml (lines 8–10) you re-export:
• cuda = ["hypr-moonshine/cuda"]
• coreml = ["hypr-moonshine/coreml"]
• directml = ["hypr-moonshine/directml"]Please confirm that the published hypr-moonshine crate defines features named exactly
cuda,coreml, anddirectml. Misaligned or missing feature names will cause build failures.crates/ws-utils/src/lib.rs (1)
159-170: Confirm mono-to-stereo fallback in split_dual_audio_sourcesIn
crates/ws-utils/src/lib.rs(lines 159–167), we unconditionally duplicate a single-channelSamplesinto bothmic_txandspeaker_tx. Downstream, bothtranscribe-moonshine(src/service/streaming.rs 189–199) andtranscribe-whisper-local(src/service/streaming.rs 163–177) treat the “mic” and “speaker” streams separately (e.g. for VAD/diarization and speaker tagging). Duplicating mono audio may lead to identical VAD chunks and duplicate transcripts across channels.• Verify that sending the same samples to both channels is intended
• Confirm this aligns with your diarization, UI labeling, and UX requirementsplugins/local-stt/js/bindings.gen.ts (2)
64-68: RecordedProcessingEvent now uses Word2 — aligned with other plugins.Matches
plugins/db/js/bindings.gen.tsWord2 shape.
19-36: WhisperModel Rename Verified
Allplugin-local-sttbindings and your desktop UI call sites have been updated to useWhisperModel—no leftover references to the oldSupportedModel.• plugin-local-stt/js/bindings.gen.ts exclusively uses
WhisperModel
• apps/desktop imports and mutations reference onlyWhisperModel
•SupportedModelremains inplugin-local-llm(unrelated)crates/transcribe-moonshine/src/service/streaming.rs (1)
279-296: type_field and languages fields verified — no action needed.
Thetype_field: "Results"string is used consistently by both producers (crates/transcribe-moonshine, crates/transcribe-whisper-local) and the owhisper-interface consumer. Thelanguagesvector remains empty by design, as there is currently no request parameter for languages.owhisper/owhisper-client/src/lib.rs (1)
76-87: Query param channels aligns dual/single modes.Including channels=1|2 is consistent with Deepgram handshake and your server. LGTM.
crates/am/src/types.rs (2)
3-13: Type definitions look solid.Serde strategies and option skipping are consistent with typical HTTP JSON contracts. LGTM.
Also applies to: 24-50, 63-82
52-61: Confirm server wire format for ComputeUnitsI searched the
crates/amcodebase and only found the enum definition (lines 52–61 incrates/am/src/types.rs); there are no other references to the serialized tokens. Please verify that the server’s API actually expects the lowercase values"cpuandgpu"and"cpuandneuralengine"(rather than, e.g.,"cpuAndGpu"/"cpuAndNeuralEngine"). You might:
- Consult the server’s API documentation or schema
- Add a serde serialization round-trip test to lock down the exact wire format
owhisper/owhisper-server/src/commands/run/realtime.rs (2)
25-27: Channel type switch to StreamResponse is consistent.Matches client and streaming service changes. LGTM.
202-204: TUI consumer matches new type.state.process_chunk(chunk) consuming StreamResponse aligns with upstream changes. LGTM.
crates/am/src/client.rs (1)
26-35: HTTP/JSON error handling is generally solid.Consistent use of reqwest and serde; ergonomic constructors. LGTM.
Also applies to: 81-90, 103-112
| const metadata = sttModelMetadata[model as WhisperModel]; | ||
| if (!metadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove casts by typing ModelInfo.model as WhisperModel
Typing ModelInfo.model eliminates the need for casts at usage sites.
-interface ModelInfo {
- model: string;
- is_downloaded: boolean;
-}
+interface ModelInfo {
+ model: WhisperModel;
+ is_downloaded: boolean;
+}
@@
- const metadata = sttModelMetadata[model as WhisperModel];
+ const metadata = sttModelMetadata[model];
@@
- onClick={() => setSelectedModel(model as WhisperModel)}
+ onClick={() => setSelectedModel(model)}Also applies to: 102-103
🤖 Prompt for AI Agents
In apps/desktop/src/components/welcome-modal/model-selection-view.tsx around
lines 85-86 and 102-103, remove the explicit type casts of model to WhisperModel
by updating the type definition of ModelInfo.model to be WhisperModel. This
change will ensure that the model property is already correctly typed,
eliminating the need for casting at usage sites and improving type safety and
code clarity.
| tracing::info!(text = ?full_text, "transcribe_completed"); | ||
| self.dynamic_prompt = full_text; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid logging full transcriptions (PII risk)
Logging the entire transcript can leak sensitive data. Prefer logging length or a hash, or gate behind debug builds.
- if !full_text.is_empty() {
- tracing::info!(text = ?full_text, "transcribe_completed");
- self.dynamic_prompt = full_text;
- }
+ if !full_text.is_empty() {
+ #[cfg(debug_assertions)]
+ tracing::debug!(chars = full_text.len(), "transcribe_completed");
+ self.dynamic_prompt = full_text;
+ }🤖 Prompt for AI Agents
In crates/whisper-local/src/model.rs around lines 181 to 183, avoid logging the
full transcription text to prevent potential exposure of sensitive information.
Instead of logging the entire transcript, modify the tracing::info! call to log
either the length of the transcript, a hash of the text, or conditionally log
the full text only in debug builds. This change will reduce the risk of leaking
personally identifiable information while maintaining useful logging.
| fn interleave_audio(mic: &[u8], speaker: &[u8]) -> Vec<u8> { | ||
| let mic_samples: Vec<i16> = mic | ||
| .chunks_exact(2) | ||
| .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]])) | ||
| .collect(); | ||
| let speaker_samples: Vec<i16> = speaker | ||
| .chunks_exact(2) | ||
| .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]])) | ||
| .collect(); | ||
|
|
||
| let max_len = mic_samples.len().max(speaker_samples.len()); | ||
| let mut interleaved = Vec::with_capacity(max_len * 2 * 2); | ||
|
|
||
| for i in 0..max_len { | ||
| let mic_sample = mic_samples.get(i).copied().unwrap_or(0); | ||
| let speaker_sample = speaker_samples.get(i).copied().unwrap_or(0); | ||
| interleaved.extend_from_slice(&mic_sample.to_le_bytes()); | ||
| interleaved.extend_from_slice(&speaker_sample.to_le_bytes()); | ||
| } | ||
|
|
||
| interleaved | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle odd-length buffers in interleave_audio.
chunks_exact(2) silently drops trailing bytes. Pad the last byte to avoid sample truncation.
- let mic_samples: Vec<i16> = mic
- .chunks_exact(2)
- .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]]))
- .collect();
+ let mic_samples: Vec<i16> = mic
+ .chunks(2)
+ .map(|chunk| {
+ let bytes = if chunk.len() == 2 { [chunk[0], chunk[1]] } else { [chunk[0], 0] };
+ i16::from_le_bytes(bytes)
+ })
+ .collect();
- let speaker_samples: Vec<i16> = speaker
- .chunks_exact(2)
- .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]]))
- .collect();
+ let speaker_samples: Vec<i16> = speaker
+ .chunks(2)
+ .map(|chunk| {
+ let bytes = if chunk.len() == 2 { [chunk[0], chunk[1]] } else { [chunk[0], 0] };
+ i16::from_le_bytes(bytes)
+ })
+ .collect();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn interleave_audio(mic: &[u8], speaker: &[u8]) -> Vec<u8> { | |
| let mic_samples: Vec<i16> = mic | |
| .chunks_exact(2) | |
| .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]])) | |
| .collect(); | |
| let speaker_samples: Vec<i16> = speaker | |
| .chunks_exact(2) | |
| .map(|chunk| i16::from_le_bytes([chunk[0], chunk[1]])) | |
| .collect(); | |
| let max_len = mic_samples.len().max(speaker_samples.len()); | |
| let mut interleaved = Vec::with_capacity(max_len * 2 * 2); | |
| for i in 0..max_len { | |
| let mic_sample = mic_samples.get(i).copied().unwrap_or(0); | |
| let speaker_sample = speaker_samples.get(i).copied().unwrap_or(0); | |
| interleaved.extend_from_slice(&mic_sample.to_le_bytes()); | |
| interleaved.extend_from_slice(&speaker_sample.to_le_bytes()); | |
| } | |
| interleaved | |
| } | |
| fn interleave_audio(mic: &[u8], speaker: &[u8]) -> Vec<u8> { | |
| let mic_samples: Vec<i16> = mic | |
| .chunks(2) | |
| .map(|chunk| { | |
| let bytes = if chunk.len() == 2 { [chunk[0], chunk[1]] } else { [chunk[0], 0] }; | |
| i16::from_le_bytes(bytes) | |
| }) | |
| .collect(); | |
| let speaker_samples: Vec<i16> = speaker | |
| .chunks(2) | |
| .map(|chunk| { | |
| let bytes = if chunk.len() == 2 { [chunk[0], chunk[1]] } else { [chunk[0], 0] }; | |
| i16::from_le_bytes(bytes) | |
| }) | |
| .collect(); | |
| let max_len = mic_samples.len().max(speaker_samples.len()); | |
| let mut interleaved = Vec::with_capacity(max_len * 2 * 2); | |
| for i in 0..max_len { | |
| let mic_sample = mic_samples.get(i).copied().unwrap_or(0); | |
| let speaker_sample = speaker_samples.get(i).copied().unwrap_or(0); | |
| interleaved.extend_from_slice(&mic_sample.to_le_bytes()); | |
| interleaved.extend_from_slice(&speaker_sample.to_le_bytes()); | |
| } | |
| interleaved | |
| } |
🤖 Prompt for AI Agents
In owhisper/owhisper-client/src/lib.rs lines 6 to 27, the interleave_audio
function uses chunks_exact(2) which drops the last byte if the input buffer
length is odd, causing sample truncation. Modify the code to handle odd-length
buffers by padding the last byte with zero before converting to i16, ensuring no
audio data is lost. Implement this padding for both mic and speaker buffers
before processing.
| #[tokio::test] | ||
| // cargo test -p owhisper-client test_client_ag -- --nocapture | ||
| async fn test_client_ag() { | ||
| let audio_1 = rodio::Decoder::new(std::io::BufReader::new( | ||
| std::fs::File::open(hypr_data::english_1::AUDIO_PATH).unwrap(), | ||
| )) | ||
| .unwrap() | ||
| .to_i16_le_chunks(16000, 512); | ||
|
|
||
| let audio_2 = rodio::Decoder::new(std::io::BufReader::new( | ||
| std::fs::File::open(hypr_data::english_1::AUDIO_PATH).unwrap(), | ||
| )) | ||
| .unwrap() | ||
| .to_i16_le_chunks(16000, 512); | ||
|
|
||
| let client = ListenClient::builder() | ||
| .api_base("ws://localhost:50060") | ||
| .api_key("".to_string()) | ||
| .params(owhisper_interface::ListenParams { | ||
| model: Some("tiny.en".to_string()), | ||
| languages: vec![hypr_language::ISO639::En.into()], | ||
| ..Default::default() | ||
| }) | ||
| .build_dual(); | ||
|
|
||
| let stream = client.from_realtime_audio(audio_1, audio_2).await.unwrap(); | ||
| futures_util::pin_mut!(stream); | ||
|
|
||
| while let Some(result) = stream.next().await { | ||
| println!("{:?}", result); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mark local server test as ignored.
Depends on a locally running server on ws://localhost:50060. Ignore to keep CI green.
- #[tokio::test]
+ #[tokio::test]
+ #[ignore = "requires local server on ws://localhost:50060"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[tokio::test] | |
| // cargo test -p owhisper-client test_client_ag -- --nocapture | |
| async fn test_client_ag() { | |
| let audio_1 = rodio::Decoder::new(std::io::BufReader::new( | |
| std::fs::File::open(hypr_data::english_1::AUDIO_PATH).unwrap(), | |
| )) | |
| .unwrap() | |
| .to_i16_le_chunks(16000, 512); | |
| let audio_2 = rodio::Decoder::new(std::io::BufReader::new( | |
| std::fs::File::open(hypr_data::english_1::AUDIO_PATH).unwrap(), | |
| )) | |
| .unwrap() | |
| .to_i16_le_chunks(16000, 512); | |
| let client = ListenClient::builder() | |
| .api_base("ws://localhost:50060") | |
| .api_key("".to_string()) | |
| .params(owhisper_interface::ListenParams { | |
| model: Some("tiny.en".to_string()), | |
| languages: vec![hypr_language::ISO639::En.into()], | |
| ..Default::default() | |
| }) | |
| .build_dual(); | |
| let stream = client.from_realtime_audio(audio_1, audio_2).await.unwrap(); | |
| futures_util::pin_mut!(stream); | |
| while let Some(result) = stream.next().await { | |
| println!("{:?}", result); | |
| } | |
| } | |
| #[tokio::test] | |
| #[ignore = "requires local server on ws://localhost:50060"] | |
| // cargo test -p owhisper-client test_client_ag -- --nocapture | |
| async fn test_client_ag() { | |
| let audio_1 = rodio::Decoder::new(std::io::BufReader::new( | |
| std::fs::File::open(hypr_data::english_1::AUDIO_PATH).unwrap(), | |
| )) | |
| .unwrap() | |
| .to_i16_le_chunks(16000, 512); | |
| let audio_2 = rodio::Decoder::new(std::io::BufReader::new( | |
| std::fs::File::open(hypr_data::english_1::AUDIO_PATH).unwrap(), | |
| )) | |
| .unwrap() | |
| .to_i16_le_chunks(16000, 512); | |
| let client = ListenClient::builder() | |
| .api_base("ws://localhost:50060") | |
| .api_key("".to_string()) | |
| .params(owhisper_interface::ListenParams { | |
| model: Some("tiny.en".to_string()), | |
| languages: vec![hypr_language::ISO639::En.into()], | |
| ..Default::default() | |
| }) | |
| .build_dual(); | |
| let stream = client.from_realtime_audio(audio_1, audio_2).await.unwrap(); | |
| futures_util::pin_mut!(stream); | |
| while let Some(result) = stream.next().await { | |
| println!("{:?}", result); | |
| } | |
| } |
🤖 Prompt for AI Agents
In owhisper/owhisper-client/src/lib.rs around lines 242 to 274, the test
function test_client_ag depends on a local server running at
ws://localhost:50060, which can cause CI failures. Add the #[ignore] attribute
above the #[tokio::test] annotation to mark this test as ignored by default,
preventing it from running during automated CI builds.
Summary by cubic
Added Deepgram v2 compatibility by updating the streaming interface, introducing a new AM client crate, and refactoring model and API handling across the codebase.
New Features
amcrate for interacting with the AG/Deepgram v2 API.WhisperModeltype.Refactors
StreamResponseand related structures.