fix(asr): avoid duplicating full final transcripts by fuscoyu · Pull Request #85 · missuo/koe

fuscoyu · 2026-04-13T09:35:43Z

Problem: Some ASR providers may emit the full final transcript more than once. The transcript aggregator previously appended every Final event, which could duplicate the spoken text in the pasted result and history.

Reproduction: Use DoubaoIME with LLM correction disabled, speak a sentence such as 'hello world', and observe duplicated text when the provider emits repeated Final events.

Fix: Treat AsrEvent::Final as the best full transcript seen so far and replace the previous final text instead of appending. Update the regression tests to use neutral English examples.

Problem: Some ASR providers may emit the full final transcript more than once. The transcript aggregator previously appended every Final event, which could duplicate the spoken text in the pasted result and history. Reproduction: Use DoubaoIME with LLM correction disabled, speak a sentence such as 'hello world', and observe duplicated text when the provider emits repeated Final events. Fix: Treat AsrEvent::Final as the best full transcript seen so far and replace the previous final text instead of appending. Update the regression tests to use neutral English examples.

missuo · 2026-04-13T09:48:46Z

Thanks for the PR! I haven't been able to reproduce the duplication issue on my side, so before merging I'd like to ask you to do one more test:

Please try dictating at least three sentences in a row with DoubaoIME and see whether earlier sentences get swallowed / dropped with this change applied. I want to make sure switching update_final from append to replace doesn't regress the multi-sentence case, since DoubaoIME emits Final repeatedly as the server refines the transcript.

For context, see the upstream bug: https://github.com/starccy/doubaoime-asr/issues/2 — there's a known issue where DoubaoIME can truncate earlier content, and I want to be sure this PR doesn't interact badly with it.

If three+ sentences come through intact, I'm happy to merge. Thanks!

fuscoyu · 2026-04-13T10:27:44Z

Thanks for the PR! I haven't been able to reproduce the duplication issue on my side, so before merging I'd like to ask you to do one more test:

Please try dictating at least three sentences in a row with DoubaoIME and see whether earlier sentences get swallowed / dropped with this change applied. I want to make sure switching update_final from append to replace doesn't regress the multi-sentence case, since DoubaoIME emits Final repeatedly as the server refines the transcript.

For context, see the upstream bug: starccy/doubaoime-asr#2 — there's a known issue where DoubaoIME can truncate earlier content, and I want to be sure this PR doesn't interact badly with it.

If three+ sentences come through intact, I'm happy to merge. Thanks!

Thanks for the follow-up. I tested this with DoubaoIME on this branch.

When I dictated four sentences continuously in one go, the transcript came through intact and the earlier sentences were preserved, so changing update_final from append to replace does not seem to regress the normal multi-sentence case.

I did find one related edge case, though: if I say the first sentence, pause briefly, and then continue with the next sentences, DoubaoIME can still produce duplicated / garbled output. That seems consistent with the upstream segmentation issue in starccy/doubaoime-asr#2, rather than being caused by this PR, because I could only trigger it after a pause, not during continuous multi-sentence dictation.

So based on my testing, this PR does not appear to introduce a new regression for three+ continuous sentences, but the existing upstream pause/segmentation issue is still present.

missuo · 2026-04-13T10:31:12Z

I tested it and found that if you pause in speaking, you will indeed encounter the repetitive problem you mentioned. I think we should solve this problem completely.

If it is an upstream problem, can we try to determine whether there are repeated sentences for cleaning (because normal people generally do not say the same sentences repeatedly).

DoubaoIME emits Final ambiguously: a refreshed full transcript within one utterance, but a new segment after a pause that may also replay earlier content. Pure replace drops earlier sentences; pure append duplicates them. Merge by prefix check, stale-replay skip, and longest suffix/prefix overlap trimming to handle all three cases.

missuo merged commit f193320 into missuo:main Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(asr): avoid duplicating full final transcripts#85

fix(asr): avoid duplicating full final transcripts#85
missuo merged 2 commits into
missuo:mainfrom
fuscoyu:fix-final-transcript-duplication

fuscoyu commented Apr 13, 2026

Uh oh!

missuo commented Apr 13, 2026

Uh oh!

fuscoyu commented Apr 13, 2026

Uh oh!

missuo commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fuscoyu commented Apr 13, 2026

Uh oh!

missuo commented Apr 13, 2026

Uh oh!

fuscoyu commented Apr 13, 2026

Uh oh!

missuo commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants