fix(desktop): normalize Soniqo live partials#5201
Conversation
✅ Deploy Preview for old-char canceled.
|
| return false; | ||
| } | ||
|
|
||
| current_tokens.starts_with(previous_tokens) || previous_tokens.starts_with(current_tokens) |
There was a problem hiding this comment.
Overly broad cumulative detection causes stale retiming
Low Severity
The previous_tokens.starts_with(current_tokens) condition in is_soniqo_cumulative_update can falsely classify a new, shorter partial as cumulative if the provider ever drops an utterance without sending a is_final message. Because the cumulative branch never updates state.active_start_ms, a false match causes retime_words to use the old utterance's start time, potentially spreading a few words across an absurdly long time span. Removing the second starts_with arm (keeping only current_tokens.starts_with(previous_tokens)) would let a shorter partial fall through to the overlap/fresh-start path, which correctly resets active_start_ms.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5cc9c11. Configure here.
5cc9c11 to
96430d3
Compare
|
|
||
| if alternative.words.is_empty() { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Missing is_final cleanup on active-overlap early return
Medium Severity
When all words are consumed by the active-overlap drain, the early return at line 329 skips the is_final cleanup that happens at lines 340–346 (extend_soniqo_committed_tokens, clearing active_start_ms and active_tokens). The analogous early return at lines 306–312 correctly checks is_final and clears active state, but this one does not. If a final message is entirely consumed by active overlap, active_tokens and active_start_ms remain stale and committed_tokens are never extended, causing incorrect overlap detection on subsequent messages.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 96430d3. Configure here.
| alternative.words.drain(..count); | ||
| current_tokens.drain(..count); | ||
| alternative.transcript = transcript_from_words(&alternative.words); | ||
| } |
There was a problem hiding this comment.
Token-word index mismatch when draining filtered tokens
Medium Severity
normalize_tokens_for_overlap filters out empty tokens (e.g., from punctuation-only words), so current_tokens can have fewer elements than alternative.words. drain_soniqo_prefix drains the same count from both lists, assuming 1:1 positional correspondence. If any word at position less than count normalizes to an empty token, the token indices shift relative to the word indices, causing the wrong words to be removed from the transcript.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 96430d3. Configure here.
96430d3 to
1ca6bd9
Compare
| } | ||
|
|
||
| 0 | ||
| } |
There was a problem hiding this comment.
Redundant function duplicates existing parameterized version
Low Severity
find_soniqo_committed_prefix is functionally identical to find_soniqo_history_prefix with min_tokens set to SONIQO_REPEAT_MIN_TOKENS. Having both creates unnecessary duplication — any future fix or change to the matching logic would need to be applied in two places.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 1ca6bd9. Configure here.
1ca6bd9 to
19e7fda
Compare
Retimes cumulative Soniqo realtime partials so live transcript rows replace previous text instead of appending duplicates.
Drop leftover Soniqo partial snapshots instead of persisting them, and finalize native live sessions before shutdown.
Call the native Soniqo stream finalizer for each source before stopping so model-final text is emitted separately from live partial snapshots.
19e7fda to
a89a6ea
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 5 total unresolved issues (including 4 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a89a6ea. Configure here.
| drain_soniqo_prefix(alternative, &mut current_tokens, overlap); | ||
|
|
||
| if alternative.words.is_empty() { | ||
| return; |
There was a problem hiding this comment.
Stale active state after final overlap drain
Medium Severity
In the non-cumulative overlap branch, when alternative.words.is_empty() after draining, the early return doesn't clear active_tokens or active_start_ms for final responses. The committed-overlap path at lines 320–325 correctly checks *is_final and clears both fields before returning, but this path just does a bare return. Stale active tokens persist and can cause subsequent partials to be incorrectly detected as cumulative updates or overlap, potentially stripping legitimate new content.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit a89a6ea. Configure here.
|
Superseded by #5206, which consolidates the Soniqo realtime transcription work into one branch. |


Retimes cumulative Soniqo realtime partials so live transcript rows replace previous text instead of appending duplicates.
Note
Medium Risk
Touches core live transcript processing (normalization, partial finalization, and flush semantics) and introduces a new local Swift/Rust bridge, so regressions could affect realtime transcript rendering across providers if assumptions change.
Overview
Adds a new
hypr_transcribe_soniqocrate (with a macOS Swift bridge) to support local Soniqo model download/status management, file transcription, and live streaming partials.Updates
LiveTranscriptEngineto apply provider-specific normalization forsoniqo, including retiming cumulative partial updates, trimming overlapping/repeated tokens (including internal looping), and suppressing committing Soniqo partial snapshots into final words.Extends
TranscriptProcessor/ChannelStatewith a configurable partial finalization mode and adjustsflush()behavior so providers like Soniqo can discard partial buffers while still emitting held final words; adds targeted tests for the new Soniqo normalization and flush behavior.Reviewed by Cursor Bugbot for commit a89a6ea. Bugbot is set up for automated code reviews on this repo. Configure here.