Skip to content

fix(asr): avoid duplicating full final transcripts#85

Merged
missuo merged 2 commits into
missuo:mainfrom
fuscoyu:fix-final-transcript-duplication
Apr 13, 2026
Merged

fix(asr): avoid duplicating full final transcripts#85
missuo merged 2 commits into
missuo:mainfrom
fuscoyu:fix-final-transcript-duplication

Conversation

@fuscoyu

@fuscoyu fuscoyu commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Problem: Some ASR providers may emit the full final transcript more than once. The transcript aggregator previously appended every Final event, which could duplicate the spoken text in the pasted result and history.

Reproduction: Use DoubaoIME with LLM correction disabled, speak a sentence such as 'hello world', and observe duplicated text when the provider emits repeated Final events.

Fix: Treat AsrEvent::Final as the best full transcript seen so far and replace the previous final text instead of appending. Update the regression tests to use neutral English examples.

Problem: Some ASR providers may emit the full final transcript more than once. The transcript aggregator previously appended every Final event, which could duplicate the spoken text in the pasted result and history.

Reproduction: Use DoubaoIME with LLM correction disabled, speak a sentence such as 'hello world', and observe duplicated text when the provider emits repeated Final events.

Fix: Treat AsrEvent::Final as the best full transcript seen so far and replace the previous final text instead of appending. Update the regression tests to use neutral English examples.
@missuo

missuo commented Apr 13, 2026

Copy link
Copy Markdown
Owner

Thanks for the PR! I haven't been able to reproduce the duplication issue on my side, so before merging I'd like to ask you to do one more test:

Please try dictating at least three sentences in a row with DoubaoIME and see whether earlier sentences get swallowed / dropped with this change applied. I want to make sure switching update_final from append to replace doesn't regress the multi-sentence case, since DoubaoIME emits Final repeatedly as the server refines the transcript.

For context, see the upstream bug: https://github.com/starccy/doubaoime-asr/issues/2 — there's a known issue where DoubaoIME can truncate earlier content, and I want to be sure this PR doesn't interact badly with it.

If three+ sentences come through intact, I'm happy to merge. Thanks!

@fuscoyu

fuscoyu commented Apr 13, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the PR! I haven't been able to reproduce the duplication issue on my side, so before merging I'd like to ask you to do one more test:

Please try dictating at least three sentences in a row with DoubaoIME and see whether earlier sentences get swallowed / dropped with this change applied. I want to make sure switching update_final from append to replace doesn't regress the multi-sentence case, since DoubaoIME emits Final repeatedly as the server refines the transcript.

For context, see the upstream bug: starccy/doubaoime-asr#2 — there's a known issue where DoubaoIME can truncate earlier content, and I want to be sure this PR doesn't interact badly with it.

If three+ sentences come through intact, I'm happy to merge. Thanks!

Thanks for the follow-up. I tested this with DoubaoIME on this branch.

When I dictated four sentences continuously in one go, the transcript came through intact and the earlier sentences were preserved, so changing update_final from append to replace does not seem to regress the normal multi-sentence case.

I did find one related edge case, though: if I say the first sentence, pause briefly, and then continue with the next sentences, DoubaoIME can still produce duplicated / garbled output. That seems consistent with the upstream segmentation issue in starccy/doubaoime-asr#2, rather than being caused by this PR, because I could only trigger it after a pause, not during continuous multi-sentence dictation.

So based on my testing, this PR does not appear to introduce a new regression for three+ continuous sentences, but the existing upstream pause/segmentation issue is still present.

@missuo

missuo commented Apr 13, 2026

Copy link
Copy Markdown
Owner

I tested it and found that if you pause in speaking, you will indeed encounter the repetitive problem you mentioned. I think we should solve this problem completely.

If it is an upstream problem, can we try to determine whether there are repeated sentences for cleaning (because normal people generally do not say the same sentences repeatedly).

DoubaoIME emits Final ambiguously: a refreshed full transcript within
one utterance, but a new segment after a pause that may also replay
earlier content. Pure replace drops earlier sentences; pure append
duplicates them. Merge by prefix check, stale-replay skip, and
longest suffix/prefix overlap trimming to handle all three cases.
@missuo missuo merged commit f193320 into missuo:main Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants