Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech To Text in VS code is awkward on MacOS #213149

Open
p-i- opened this issue May 21, 2024 · 5 comments
Open

Speech To Text in VS code is awkward on MacOS #213149

p-i- opened this issue May 21, 2024 · 5 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug editor-input Editor text input macos Issues with VS Code on MAC/OS X

Comments

@p-i-
Copy link

p-i- commented May 21, 2024

Type: Bug

Just try using the MacOS inbuilt Dictation tool in VSCode.

(This tool can be activated under SystemSettings -> Keyboard -> Dictation).

Many problems:

  • If WordWrap is on and the line spills over, there is a rendering error; a superposition of texts.
  • If I speak part of a sentence and then pause and continue, I'm likely to get a capitalisation error, which means I'm constantly wasting time tidying up text.
  • If I insert the cursor at a location and speak, again there is a likelihood of a capitalisation error.
  • If I mix between speaking and typing, sometimes I get unpredictable behaviour:
    • Sometimes an entire 'most recently composed' section of the document gets deleted, and there is no way to recover it (via Undo shortcut, or otherwise).
    • Sometimes a section of text gets duplicated.

I think that the fundamental problem here is with this MacOS tool. I think it's design is overly complex and intricate, and it often falls over.

Given that most VS Code users spend most of their day entering text into VSCode, it would be really nice to have a solution that takes care of SpeechToText. Maybe a fix to interop with this Dictation tool, maybe an extension, maybe a VSCode core functionality.

I'm not bothered about speech-to-code. I'm quite happy to type my code. but if I am editing text files (.txt, .md, .nt, etc.) or modifying text content within the code (e.g. AI prompts, docstrings, strings, comments, etc.) I would like something simple and reliable.

VS Code version: Code 1.89.1 (dc96b83, 2024-05-07T05:14:32.757Z)
OS version: Darwin arm64 23.4.0
Modes:

System Info
Item Value
CPUs Apple M2 (8 x 24)
GPU Status 2d_canvas: enabled
canvas_oop_rasterization: enabled_on
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: disabled_off
video_decode: enabled
video_encode: enabled
webgl: enabled
webgl2: enabled
webgpu: enabled
Load (avg) 2, 2, 2
Memory (System) 24.00GB (2.49GB free)
Process Argv --crash-reporter-id f10d97cd-2115-4dba-a34a-07be9312995a
Screen Reader no
VM 0%
Extensions (21)
Extension Author (truncated) Version
dvt-remote-ssh ami 1.0.0
nestedtext bma 2.0.0
githistory don 0.6.20
copilot Git 1.194.886
copilot-chat Git 0.15.2024043005
vsc-python-indent Kev 1.18.0
rainbow-csv mec 3.11.0
vscode-docker ms- 1.29.1
debugpy ms- 2024.6.0
python ms- 2024.6.0
vscode-pylance ms- 2024.5.1
jupyter ms- 2024.4.0
jupyter-keymap ms- 1.1.2
jupyter-renderers ms- 1.0.17
vscode-jupyter-cell-tags ms- 0.1.9
vscode-jupyter-slideshow ms- 0.1.6
remote-containers ms- 0.362.0
remote-ssh ms- 0.110.1
remote-ssh-edit ms- 0.86.0
remote-explorer ms- 0.4.3
vscode-speech ms- 0.8.0

(1 theme extensions excluded)

A/B Experiments
vsliv368cf:30146710
vspor879:30202332
vspor708:30202333
vspor363:30204092
tftest:31042121
vstes627:30244334
vscorecescf:30445987
vscod805cf:30301675
binariesv615:30325510
vsaa593cf:30376535
py29gd2263:31024239
vscaac:30438847
c4g48928:30535728
azure-dev_surveyone:30548225
2i9eh265:30646982
962ge761:30959799
pythongtdpath:30769146
welcomedialog:30910333
pythonidxpt:30866567
pythonnoceb:30805159
asynctok:30898717
pythontestfixt:30902429
pythonregdiag2:30936856
pythonmypyd1:30879173
pythoncet0:30885854
2e7ec940:31000449
pythontbext0:30879054
accentitlementst:30995554
dsvsc016:30899300
dsvsc017:30899301
dsvsc018:30899302
cppperfnew:31000557
dsvsc020:30976470
pythonait:31006305
chatpanelt:31048053
dsvsc021:30996838
jg8ic977:31013176
pythoncenvptcf:31049071
a69g1124:31046351
pythonprc:31047982
dwnewjupytercf:31046870
26j00206:31048877

@p-i-
Copy link
Author

p-i- commented May 21, 2024

If you could just hook the did_complete of the Dictation tool and use AI to post-process and re-render the affected text, maybe this would do the job. If that's possible...

@p-i-
Copy link
Author

p-i- commented May 30, 2024

Here's an example of the duplicate-text bug.

I'm speaking test 123 optionally followed by full stop or new paragraph and then hitting BACKSPACE or ENTER, or LEFT-ARROW, or 'a' or pretty much anything it seems.

It seems that if I don't allow enough silence for it to 'settle down' after I've said 'full stop', the utterance text gets double-injected into the window.

In TextEdit I can't replicate this particular fail. It isn't 100% right there either. It is inserting unwanted newline characters.

Screen.Recording.2024-05-30.at.09.34.53.mov

@p-i-
Copy link
Author

p-i- commented May 30, 2024

Here's a demo of the wordwrap + superposition issue:

Screen.Recording.2024-05-30.at.09.49.52.mov

@p-i-
Copy link
Author

p-i- commented May 30, 2024

Here's an example of the Capitilization-of-start-of-new-phrase problem:

  • First example: I'm just pausing before 'in the plane'
  • Second example: 'the rain in spain stays mainly', off the dictation, on it, 'in the plane'
  • Third example (fail): 'the rain in spain stays mainly', pause, add a space via keyboard, 'in the plane'

There are other situations where I get a Capitalization fail, e.g. inserting the cursor into a sentence and speaking.

Screen.Recording.2024-05-30.at.09.57.12.mov

This one is probably a really tricky fix, as macOS dictation assistant is clearly scraping the text for the active window and operating over that.

I think a VS code native speech tool would be a much appreciated feature!

@p-i-
Copy link
Author

p-i- commented Jun 2, 2024

Here's a nice repeatable minimal testcase for duplication.

All I do here is double-tap Fn to invoke the macOS speech-to-text assistant and speak "Test 123" followed by a couple of seconds of silence followed by "New paragraph".

And then I just wait.

Firstly it DOESN'T create a new paragraph, just a couple of spaces.
Secondly, once it times out it dumps a duplicate of the utterance.

Screen.Recording.2024-06-02.at.12.15.05.mov

@alexdima alexdima added bug Issue identified by VS Code Team member as probable bug macos Issues with VS Code on MAC/OS X editor-input Editor text input labels Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug editor-input Editor text input macos Issues with VS Code on MAC/OS X
Projects
None yet
Development

No branches or pull requests

2 participants