Speech To Text in VS code is awkward on MacOS #213149

p-i- · 2024-05-21T13:13:06Z

Type: Bug

Just try using the MacOS inbuilt Dictation tool in VSCode.

(This tool can be activated under SystemSettings -> Keyboard -> Dictation).

Many problems:

If WordWrap is on and the line spills over, there is a rendering error; a superposition of texts.
If I speak part of a sentence and then pause and continue, I'm likely to get a capitalisation error, which means I'm constantly wasting time tidying up text.
If I insert the cursor at a location and speak, again there is a likelihood of a capitalisation error.
If I mix between speaking and typing, sometimes I get unpredictable behaviour:
- Sometimes an entire 'most recently composed' section of the document gets deleted, and there is no way to recover it (via Undo shortcut, or otherwise).
- Sometimes a section of text gets duplicated.

I think that the fundamental problem here is with this MacOS tool. I think it's design is overly complex and intricate, and it often falls over.

Given that most VS Code users spend most of their day entering text into VSCode, it would be really nice to have a solution that takes care of SpeechToText. Maybe a fix to interop with this Dictation tool, maybe an extension, maybe a VSCode core functionality.

I'm not bothered about speech-to-code. I'm quite happy to type my code. but if I am editing text files (.txt, .md, .nt, etc.) or modifying text content within the code (e.g. AI prompts, docstrings, strings, comments, etc.) I would like something simple and reliable.

VS Code version: Code 1.89.1 (dc96b83, 2024-05-07T05:14:32.757Z)
OS version: Darwin arm64 23.4.0
Modes:

System Info

Item	Value
CPUs	Apple M2 (8 x 24)
GPU Status	2d_canvas: enabled canvas_oop_rasterization: enabled_on direct_rendering_display_compositor: disabled_off_ok gpu_compositing: enabled multiple_raster_threads: enabled_on opengl: enabled_on rasterization: enabled raw_draw: disabled_off_ok skia_graphite: disabled_off video_decode: enabled video_encode: enabled webgl: enabled webgl2: enabled webgpu: enabled
Load (avg)	2, 2, 2
Memory (System)	24.00GB (2.49GB free)
Process Argv	--crash-reporter-id f10d97cd-2115-4dba-a34a-07be9312995a
Screen Reader	no
VM	0%

Extensions (21)

Extension	Author (truncated)	Version
dvt-remote-ssh	ami	1.0.0
nestedtext	bma	2.0.0
githistory	don	0.6.20
copilot	Git	1.194.886
copilot-chat	Git	0.15.2024043005
vsc-python-indent	Kev	1.18.0
rainbow-csv	mec	3.11.0
vscode-docker	ms-	1.29.1
debugpy	ms-	2024.6.0
python	ms-	2024.6.0
vscode-pylance	ms-	2024.5.1
jupyter	ms-	2024.4.0
jupyter-keymap	ms-	1.1.2
jupyter-renderers	ms-	1.0.17
vscode-jupyter-cell-tags	ms-	0.1.9
vscode-jupyter-slideshow	ms-	0.1.6
remote-containers	ms-	0.362.0
remote-ssh	ms-	0.110.1
remote-ssh-edit	ms-	0.86.0
remote-explorer	ms-	0.4.3
vscode-speech	ms-	0.8.0

(1 theme extensions excluded)

A/B Experiments

vsliv368cf:30146710
vspor879:30202332
vspor708:30202333
vspor363:30204092
tftest:31042121
vstes627:30244334
vscorecescf:30445987
vscod805cf:30301675
binariesv615:30325510
vsaa593cf:30376535
py29gd2263:31024239
vscaac:30438847
c4g48928:30535728
azure-dev_surveyone:30548225
2i9eh265:30646982
962ge761:30959799
pythongtdpath:30769146
welcomedialog:30910333
pythonidxpt:30866567
pythonnoceb:30805159
asynctok:30898717
pythontestfixt:30902429
pythonregdiag2:30936856
pythonmypyd1:30879173
pythoncet0:30885854
2e7ec940:31000449
pythontbext0:30879054
accentitlementst:30995554
dsvsc016:30899300
dsvsc017:30899301
dsvsc018:30899302
cppperfnew:31000557
dsvsc020:30976470
pythonait:31006305
chatpanelt:31048053
dsvsc021:30996838
jg8ic977:31013176
pythoncenvptcf:31049071
a69g1124:31046351
pythonprc:31047982
dwnewjupytercf:31046870
26j00206:31048877

The text was updated successfully, but these errors were encountered:

p-i- · 2024-05-21T13:58:58Z

If you could just hook the did_complete of the Dictation tool and use AI to post-process and re-render the affected text, maybe this would do the job. If that's possible...

p-i- · 2024-05-30T08:36:52Z

Here's an example of the duplicate-text bug.

I'm speaking test 123 optionally followed by full stop or new paragraph and then hitting BACKSPACE or ENTER, or LEFT-ARROW, or 'a' or pretty much anything it seems.

It seems that if I don't allow enough silence for it to 'settle down' after I've said 'full stop', the utterance text gets double-injected into the window.

In TextEdit I can't replicate this particular fail. It isn't 100% right there either. It is inserting unwanted newline characters.

Screen.Recording.2024-05-30.at.09.34.53.mov

p-i- · 2024-05-30T08:51:22Z

Here's a demo of the wordwrap + superposition issue:

Screen.Recording.2024-05-30.at.09.49.52.mov

p-i- · 2024-05-30T09:02:31Z

Here's an example of the Capitilization-of-start-of-new-phrase problem:

First example: I'm just pausing before 'in the plane'
Second example: 'the rain in spain stays mainly', off the dictation, on it, 'in the plane'
Third example (fail): 'the rain in spain stays mainly', pause, add a space via keyboard, 'in the plane'

There are other situations where I get a Capitalization fail, e.g. inserting the cursor into a sentence and speaking.

Screen.Recording.2024-05-30.at.09.57.12.mov

This one is probably a really tricky fix, as macOS dictation assistant is clearly scraping the text for the active window and operating over that.

I think a VS code native speech tool would be a much appreciated feature!

p-i- · 2024-06-02T11:18:29Z

Here's a nice repeatable minimal testcase for duplication.

All I do here is double-tap Fn to invoke the macOS speech-to-text assistant and speak "Test 123" followed by a couple of seconds of silence followed by "New paragraph".

And then I just wait.

Firstly it DOESN'T create a new paragraph, just a couple of spaces.
Secondly, once it times out it dumps a duplicate of the utterance.

Screen.Recording.2024-06-02.at.12.15.05.mov

vscodenpa assigned alexdima May 21, 2024

alexdima added bug Issue identified by VS Code Team member as probable bug macos Issues with VS Code on MAC/OS X editor-input Editor text input labels Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech To Text in VS code is awkward on MacOS #213149

Speech To Text in VS code is awkward on MacOS #213149

p-i- commented May 21, 2024

p-i- commented May 21, 2024

p-i- commented May 30, 2024 •

edited

Loading

p-i- commented May 30, 2024

p-i- commented May 30, 2024 •

edited

Loading

p-i- commented Jun 2, 2024 •

edited

Loading

Speech To Text in VS code is awkward on MacOS #213149

Speech To Text in VS code is awkward on MacOS #213149

Comments

p-i- commented May 21, 2024

p-i- commented May 21, 2024

p-i- commented May 30, 2024 • edited Loading

p-i- commented May 30, 2024

p-i- commented May 30, 2024 • edited Loading

p-i- commented Jun 2, 2024 • edited Loading

p-i- commented May 30, 2024 •

edited

Loading

p-i- commented May 30, 2024 •

edited

Loading

p-i- commented Jun 2, 2024 •

edited

Loading