fix: speech to text unexpectedly undoing user edits in chat input#303919
fix: speech to text unexpectedly undoing user edits in chat input#303919hamirmahal wants to merge 2 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a chat voice dictation regression where ongoing speech-to-text updates could override/undo user edits in the chat input by introducing explicit “committed vs preview” dictation state handling.
Changes:
- Track dictation state (
committedInputvspreviewInput) instead of using a single accumulatedinputValue. - Apply dictation updates via a helper (
applyDictationInputState) that treats user edits as the authoritative base for subsequent speech composition. - Add small helper utilities (
composeDictationInput,DictationInputState) to centralize dictation string composition logic.
| // User edits are authoritative. | ||
| // Treat them as new base for upcoming speech composition. | ||
| if (currentInput !== state.previewInput) { | ||
| committedInput = currentInput; | ||
| } | ||
|
|
||
| const nextInput = composeDictationInput(committedInput, recognizedText); | ||
|
|
There was a problem hiding this comment.
When user edits occur while a speech preview is visible, currentInput can still include the previously appended preview suffix (e.g. the partial recognized text). Committing currentInput as the new base means the subsequent Recognized event will append the final recognizedText again, resulting in duplicated/stale dictated text. Consider stripping the prior preview suffix (derived from the previous state.committedInput/state.previewInput) from currentInput when detecting user edits, before composing the next preview/commit, so only the authoritative user edits become the new base.
| @@ -297,8 +305,12 @@ class VoiceChatSessions { | |||
| case SpeechToTextStatus.Recognized: | |||
| if (text) { | |||
| session.hasRecognizedInput = true; | |||
| inputValue = inputValue ? [inputValue, text].join(' ') : text; | |||
| session.controller.updateInput(inputValue); | |||
|
|
|||
| dictationState = applyDictationInputState( | |||
| dictationState, controller.getInput(), text, status | |||
| ); | |||
|
|
|||
| session.controller.updateInput(dictationState.committedInput); | |||
There was a problem hiding this comment.
This change introduces new dictation state handling (preserving user edits while speech-to-text is active), but there are no unit tests covering the state transitions. Since this module already has tests (voiceChatActions.test.ts) for parseNextChatResponseChunk, it would be good to add coverage for cases like: recognizing->user edit->recognized (no duplicated text), and recognizing updates that should replace (not accumulate) previews.
|
Here's a demo of production behavior (left) vs. the changes with this pull request (right), with audio. screen_recording-2026-03-22_14.49.07_speech_to_text_fix.webm |
|
Reported. |
Fixes #303918