Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictation support for visual studio code #40976

Closed
JoleCameron opened this issue Dec 31, 2017 · 40 comments
Closed

Dictation support for visual studio code #40976

JoleCameron opened this issue Dec 31, 2017 · 40 comments
Assignees
Labels
accessibility Keyboard, mouse, ARIA, vision, screen readers (non-specific) issues feature-request Request for new features or functionality workbench-voice

Comments

@JoleCameron
Copy link

JoleCameron commented Dec 31, 2017

Hi,

I wish to lodge a request to have VS Code updated so that it can accept dictation input. Currently, if you try to dictate into VS Code using software like Dragon (the industry standard), nothing happens.

This is important to fix for people like myself who have long term hand injuries and are trying to figure out ways to program by voice. People have managed programming by voice in these situations, but the solutions are difficult to develop and not pretty.

To be clear, I'm not asking that you develop voice commands to input symbols by voice, only that the text boxes in VS Code (and/or Visual Studio) can accept dictation input by Dragon (preferably with full 'select-and-say' support). Voice programmers can take care of the rest.

Does it have to be Dragon? Not necessarily. It could be any local speech recognition engine with good accuracy (I'd argue that the decade old Windows Voice Recognition isn't quite there yet) and the ability to write custom voice commands.

While there are few people using such technologies today, it is a subject of interest to all programmers, because they may need it in the future.

  • Jole
@cleidigh
Copy link
Contributor

cleidigh commented Dec 31, 2017

@JoleCameron
I am a Code user and contributor. I am also a Dragon user as I have ALS and can only program by voice.
For over a year now I've been doing contributions all the while using and programming in Code. Some
of my contributions have to do with improving accessibility. That said, I have been able to set up a pretty usable scenario. As a longtime programer I had the ability to put together this set up, I'm sure you can do the same and apologies if I'm telling you anything you already know:

Windows7
Dragon vDPI 14 (Dragon 15 has limitations, albeit a better recognition)
SpeechMatic directional high-performance microphone with USB-AGC, around the neck twist type (Critical !)
Natlink (open source Python framework and API for Dragon)
Dragonfly (Python grammar and rules engine allowing flexible custom commands)
AutoHotKey
Custom Python grammars for Code
Various user contribution grammars

The key to making this all work well is to have grammars that can seamlessly enter text either in the main editors or input boxes. Also I have command set up for almost all of the common Code keyBindings
Utilizing everything possible to avoid using the mouse makes everything faster.

I do all of the above with no changes to Code.

I would be happy to walk you through what I have. It probably would be a good start to understand better what you have now and how you use it.

Cheers

Update: from your repositories I see you use Vocala and therefore Natlink - you already have most of what you need. (And now I know I was telling you things you already knew :-(

@cleidigh cleidigh self-assigned this Dec 31, 2017
@cleidigh cleidigh added accessibility Keyboard, mouse, ARIA, vision, screen readers (non-specific) issues workbench labels Dec 31, 2017
@JoleCameron
Copy link
Author

@cleidigh

Thanks for your prompt response. May I start by saying that it's nice to talk about this problem with someone who themselves programs by voice.

My journey towards hands-free programming developed a little differently than yours. In my case, I developed a severe case of RSI when typing up my Honours thesis in late 2013. In order to finish my mathematics thesis, I developed basic macros using Vocola 2.

I was an inexperienced self-taught programmer before developing this injury, so didn't want to start developing a system to program by voice until my hands could do a little bit of typing to write the commands. Between that, full-time work in a different industry and a couple of years of poor health, and have only returned to my goal to set up programming by voice now.

For the PC, I have licenses for Dragon 12.5 Preferred and Dragon 15. As you know, Natlink does not work with Dragon 15 and, given it never had official support, may not work with future versions. Since the compatibility issues have not been resolved in the year since Dragon 15 was released, I have no reason to believe that they will be resolved in the future. Because of this, I will develop a system using inbuilt DVC commands. Note that it is actually possible to write DVC commands like camel , provided that the open-ended variable as at the end of the command.

Back to the main point: there are a few reasons why I think that Code need Select-and-Say capabilities.

  1. While I admit that it's possible to program effectively by using commands to emulate keystrokes, this solution has its problems. First, it cuts the user off from being able to use "correct that" to improve recognition with time. Depending on the particular person, this can be more or less important. Second, it makes the learning curve steeper than necessary, because if a person was able to start by developing voice commands for some things, and they could use ordinary dictation to write the rest of their code, albeit slowly. What we don't see is the number of programmers who lose the ability to use a keyboard and mouse and then a force to change careers.

  2. Your solution is impractical for markup languages like LaTeX. You may well be aware that LaTeX is the industry standard for scientific and mathematical publication. These documents are a mix of prose, and encoding for equations and pictures. It is impractical to dictate ordinary prose using only custom commands, so you need Select-and-Say, but you also need an editor with sufficient power to efficiently navigate the various symbols by voice.

  3. Finally, Visual Studio and VS Code fail to live up to Microsoft's own standards for disability access. And if Microsoft fails to live up to its own accessibility standards, what do you think everyone else will do? I note that you're using Windows 7. I'm using Windows 10. Windows 10 is actually less accessible by speech than Windows 7. For example, consider Edge in Windows 10 vs Internet Explorer in Windows 7 (Internet Explorer on Windows 10 is too unstable to use).

Anyway, I hope this helps explain why I think that VS Code should incorporate this change.

Cheers

@cleidigh
Copy link
Contributor

cleidigh commented Dec 31, 2017

@JoleCameron
Thanks for the detailed response.

First let me say that despite the fact that we have arrived at where we are from slightly different paths, one thing I think is probably very common; everyone starts off frustrated using voice control / dictation for programming. I was very reluctant to use Dragon in the beginning given its peculiarities , limitations.
Necessity changed that and I went crazy to try to make the best of it. I think there are some objective realities that one should start with:

  • System Requirements: a fast system, 16gb Ram et cetera, very good microphone - did not catch what you use?

  • Dragons is not meant for programming and it never will be unfortunately as you know they do not support Natlinks and they broke "continuous recognition" in DPI 15 - more on that later

  • Utilizing programming support elements is really a requirement not an option.

  • Code cannot really add much directly to the puzzle, Dragon would have to implement more direct support for anything special, they will never do that.

  • I believe I am accomplishing everything you mention without much more set up then you are the have.

  • I think you see too many limitations with the current approach you are using with just vocala

  • I can do everything that Select and Say does, while I do not use correct that some of those facilities should be possible . also

  • Doing MarkDown is no problem using a mix of custom commands or using Emmetts and normal dictated text.

  • Key factor is using continuous recognition commands that allow you to chain both symbols, words and Code commands. For this you need Natlink+Dragonfly and some off-the-shelf grammars

  • Using some very basic Python you can add almost anything with little effort. I can share with you
    all my grammars both personal and collected.

I would strongly suggest giving Dragonfly grammars a try, and I would be happy to help with this.
Let me know if you'd like to do this.

BTW I think one way Code could support this more as with a combination of recipes and perhaps
an extension to help with setup. I believe this is the most likely path knowing both code and Dragon.

@JoleCameron
Copy link
Author

Thanks for the offer, but as I am choosing to stick with the current version of Dragon (for reasons of employability and to make sure my system works long-term) your method won't work in my case. It's easy enough to fake continuous command recognition - that's not my issue here. And my setup is fine.

When it comes down to it: yes, I think I can get it to work without any changes to Code. However, this will require using workarounds that I wouldn't need to use if Microsoft lived up to its own accessibility standards for speech input. Sure, Dragon's not designed for programming, but I'm not asking for a special method to program by voice, just that the text box is designed according to Microsoft's own standards.

@cleidigh
Copy link
Contributor

cleidigh commented Jan 1, 2018

@JoleCameron
Happy New Year's !

After all my work with this done without help, I'd like to help you get the most out of voice programming.

I think you have a couple options:

  1. I believe you can install both 12 and 15, I cannot test this because I can never be without voice. I think you could try this to be able to use 12 for programming and 15 for other things and future compatibility.
    After 40+ years of programming and a lot of research and working with Dragon, any article on programming by voice will point you to one of the frameworks like dragonfly, vocala etc .
    Your flexibility and power cannot be matched by dragging alone. FWIW

  2. while I highly recommend the above approach, if you're absolutely determined to use 15, I would still like to make this work better for you.

  • Natlink has been made to work partially with 15. Multiple people are trying to make this work better, however most likely will be with limitations but I think you might be able to get a fair amount out of it. This might be in between approach.
  1. Lastly a pure 15 approach:
  • I want to point out a couple of things on your comments about compatibility
  • I am not sure why you are not able to use Dragon alone to enter dictation into an input box in Code. --- With my 14 I can use just Dragon commands to open the Find widget and enter search text.
    Press Control+F (default key binding for search)
    Put Dragon in to Normal or dictation mode "Start Normal Mode"
    (dictate search text)
  • you can optimize the above with DVC commands
  • The above is the standard way that Dragon interacts with any input item with no knowledge of the application
  • It is important to understand the several ways Dragon interacts with programs
  • Dragon really only has custom interactions in a couple of ways
  • It understands and can interact with Menus and dialogue buttons that utilize the Win32 Windows API
  • Many new applications utilize WPF which I think are currently not fully supported by Dragon, this is a Dragon issue not an application issue.
  • With respect to Code in particular it is somewhat of a special application itself. It does not use Win32
    for anything other than the menu bar and a couple native dialogues. Code is an Electron app centered around the Chromium standalone browser engine.
  • This architecture means the entire application is browser like not native application like. Dragon{needs
    to interact with the application in the same manner that it typically interacts with a browser.
  • While Dragon has some add-ons for interacting with a few popular programs including browsers, these are done by Dragon personally I believe are not that great. I have created a few things to get much more out of Chrome than the Dragon add-ons. My browsing experience is quite good . this way
  • You mentioned correction, I use the built-in suggestions in Code as well as Undo, I do not Dragons
    Correction as it will rarely ever help with code. I believe it should work just fine . anyway.

Finally without sounding defensive (I am not) , Code does not violate nor does it incorrectly implement
"input boxes", these are implemented as HTML5 input elements which we focused, will accept any input from Dragon. I have actually done extensions to Natlink and I have a pretty good idea of how it works. I have not actually determined how Dragon could be made more compatible, it almost always comes down to keyboard commands. I switched to Code after using many other editors for years in particular because of its accessibility. It supports things like screen readers and contrast modes , not necessary for us but nonetheless I think it makes Code the most accessible editor out there.

Let me know if you'd like to do some of these experiments.

@bpasero bpasero added editor and removed workbench labels Jan 3, 2018
@cleidigh
Copy link
Contributor

cleidigh commented Jan 6, 2018

@JoleCameron

Any thoughts on the above?
Did you try my suggestion for dictation into input boxes?

Is there's something very specific to address given my comments?

@cleidigh cleidigh added the info-needed Issue requires more information from poster label Jan 6, 2018
@JoleCameron
Copy link
Author

Sorry for not getting back to your sooner. I've been both busy and unwell this week, and I let this slide. My microphone also died last night, so I'm having to type this by hand. Hence, I'll be brief.

My concerns with VS Code boil down to the fact that I can't even dictate into the main text box without using a Dragon command, let alone have Select-and-Say access. Given that Microsoft provides an essential service (Windows), I think that the problem isn't entirely Nuance's fault. Beyond that, things go beyond my level of knowledge.

I would appreciate input on setting up programming by voice using Dragon 15, but I'd rather not do that through a public forum. To that end, I sent you a private message on the knowbrainer forum. I'll probably want input at about the one month mark.

@LexiconCode
Copy link

LexiconCode commented Jan 26, 2018

@claudioc

I also program by voice. Outside of Select-and-Say capabilities which would be a blessing to have VS Code. There are a number of other ways VS code could improve accessibility as well. First a little bit about my set up.

  • Edited 2/11/2020 - Updated information and links

Windows 10 64-bit -(8GB) of RAM - i5 7200U
Dragon vDPI 15
SpeechWare FlexyMike Dual Ear Cardiod high-performance microphone with SpeechMatic MultiAdaptermulti

  • Natlink - NatLink is an OpenSource extension module for the speech recognition program Dragon.
  • Caster - Caster is a collection of tools aimed at enabling programming and accessibility entirely by voice. It runs on top of Dragonfly.
  • Dragonfly - A fork of dragonfly that utilizes CMU Sphinx, Dragon NaturallySpeaking, Windows Speech Recognition, Kaldi as a backend.

Microsoft and VS code contributors could at empower the voice to code community to develop extensions that facilitate accessibility. There are some outstanding limitations with Castor and Dragonfly both interact by emulating keystrokes in VS code. A uses example. Which is why we need A method to expose the VSCode active 'when Clause Contexts'.
#10471
#26882

@zachgibson
Copy link

I’m trying to use Dictation on a Mac and it doesn’t handle actually dictating text. I can perform commands such as open new file and such using Dictation in VSCode.

@cece554
Copy link

cece554 commented Oct 22, 2018

having this problem as well with mac dictation, I tried saving snippets in dictation under commands VScode appears to be unable to handle them, but when I say worries are not under commands VScode prints those

@sethwilsonUS
Copy link

I'm legally blind, and while I can program reasonably well through conventional means I'm still excited about the possibilities of voice programming.

I've been working on a voice programming web abb using an open-source JavaScript library called AnnYang. It uses the web standard SpeechRecognition API, which at present only works in Chrome (and apparently also Firefox now, though I haven't tested that). I'm wondering if, since VSCode uses Chromium, I/we can integrate AnnYang into a VSCode extension. If this could work, it would be awesome, because it'd be a free, integrated, cross-platform solution. But I'm not sure how smooth the integration would be, or if AnnYang is powerful enough. But the idea has potential I think...

@ryan-zheng-teki
Copy link

I think many developers will really like it when we could use voice command to write code. Especially when people are back at home after a whole day's work. Now voice recognition accuracy is improving, and Microsoft is promoting the "remote-development".With the adoption of 5G, I really hope that voice coding could be integrated into VSCode.At least we could decrease 70% of our time sitting down every day which is really good for our health condition as a developer.

@isidorn isidorn added this to the Backlog milestone Jun 18, 2019
@isidorn isidorn added feature-request Request for new features or functionality and removed info-needed Issue requires more information from poster labels Jun 18, 2019
@irasanchez
Copy link

As a student who is developing wrist pain, I'd also appreciate this.

@rbavery
Copy link

rbavery commented Feb 11, 2020

Just want to chime in to support this.

@niemyjski
Copy link

niemyjski commented Mar 25, 2020

It is super important to have accessible tools for everyone to use.

@LexiconCode
Copy link

LexiconCode commented Mar 25, 2020

I've been investigating alternatives that don't require reliance on the editor to expose information for accessibility via extensions. Microsoft's Accessibility Insights for Windows as a tool to investigate exposing and testing Windows accessibility API UI Automation. Currently there is no official UI Automation bindings for Python or standardize support from a community performance project. I've worked with a few people to expose some other editors Scintilla. From there we been able to expose menus and editable text, cursor position, and so on. My hope is that this could be done from UI Automation but there needs to be better support from Microsoft.

@CJohnDesign
Copy link

I support this too. Wrist pain.

@isidorn isidorn assigned isidorn and unassigned cleidigh Jun 9, 2020
@isidorn
Copy link
Contributor

isidorn commented Jun 9, 2020

Hi, VS Code developer here 👋
First thanks a lot for the great feedback. We definetly want to have a nice dictation expereince in VS Code so let's try to get some concise info here. I do not use dictation software so I appologise for the simple questions:

  1. What are the dictation software used on Win / Mac / Linx. Is Dragon used everywhere? I plan to try it out on my mac.
  2. Does this dictation software have a GitHub page where we can interect with the developers?
  3. Do these dictation software work well with Google Chrome, for example when you want to dictate into this GitHub input box
  4. What is the experience with VS Code? It simply does not work? I know @cleidigh uses it with Dragon

Then we can try to figure out what should be done on the VS Code side and what should be done on the dictation software side.

Thanks!

@niemyjski
Copy link

niemyjski commented Jun 9, 2020

I don't have answers to most of your questions :( but you have a whole team @ Microsoft (https://blogs.microsoft.com/accessibility/) who does nothing but accessibility. I'd recommend reaching out to Jessica Rafuse (She's pretty awesome).

@isidorn
Copy link
Contributor

isidorn commented May 28, 2021

Just FYI there is a voice-assistant VS Code extension for Windows. You can find it here https://github.com/b4rtaz/voice-assistant
I tried it out and feels like it is in the early stages and still needs a lot of polish, but nevertheless it looks interesting.

@alexdima alexdima added editor-input Editor text input and removed editor labels Oct 15, 2021
@fusentasticus
Copy link

fusentasticus commented Dec 8, 2021

@isidorn Thanks for following this thread on automation needs for those of us who prefer or have to command our computer by voice!

  1. Now that there is a dedicated subdirectory for automation in the source tree, should we as dictation users go and vote for UI automation for extension authors using Playwright #136121 so that at least this part becomes easily user installable?
  2. And with the automation already in place would it be a big deal to write a full UIAutomation driver on top? By the real thing, I'm of course thinking about the excellent conceptual framework https://docs.microsoft.com/en-us/windows/win32/winauto/entry-uiautocore-overview, which very nicely Microsoft's Edge browser already supports and that Microsoft has given to the community per official commitments.
  3. So, it is as if all the technology pieces are in place for something like word-under-the-mouse and custom select-and-say mechanisms to be easily implemented by dictation systems --- if we could just get a full built-in UIAutomation service for VS Code!! Specifically, we're looking for goodies like FromPoint, RangeFromPoint and Select from Text pattern, the TextEdit patterns, and all the well-designed stuff for automation of panels, tabs, and buttons etc.
  4. My comments here should include: that I do see at least partial UIAutomation support when the VS Code window is focused (active window). However, the TextEdit control is disabled unless "Accessibility support" is turned on. Unfortunately, turning this setting on forces text wrapping to be off (which is not always good for visual users!). Also, when vscode is unfocused, the automation elements returned by FromPoint appear to be an internal VS Code hierarchy not related to the UIAutomation model, which is why I am confused about the status of automation for VS Code! I'm not sure for example how much of the current automation in VS code has bubbled up from underlying automation work on Chromium/Electron [My preliminary testing is done via FlaUI in UIA3 mode.]

@isidorn
Copy link
Contributor

isidorn commented Dec 10, 2021

@fusentasticus thanks for your reply. Let me try to answer:

  1. I would not suggest voting on that, that is just testing infrastructure we use and we do not have any plans to add this. I hope we can achieve this not using Playwright
  2. I am not an expert in UIAutomation framework, so I do not really know how to best answer this. If something can be written that interacts with VS Code that would be great. VS Code is using Chromium underneath, so theoretically is this UIAutomation works with Chrome or Edge it should be possible for it to work with VS Code
  3. I see how this UIAutomation would enable a lot of scenarios, and that sounds great!
  4. Word wrapping being disabled is covered by this issue Word wrap should not be disabled when accessibility is turned on #95428 We can fine tune this behaviour. And yes I believe VS Code simple bubble up from underlying automation work on Chromium/Electron

@meganrogge
Copy link
Contributor

Exploring a different, though related idea in #170554. Please let us know what you think there.

@meganrogge
Copy link
Contributor

Hi @JoleCameron, it has been a while since we last touched base with you. How are you finding the dictation support in VS Code these days? Is there anything we can do to help?

@bpasero
Copy link
Member

bpasero commented Feb 15, 2024

Fyi I am splitting this issue into the part that is actually being worked on: dictation support in the editor (#205263).

I think this issue here in particular asks for voice-to-text support in all locations that accept textual input, which is not in scope for February.

@bpasero bpasero removed the editor-input Editor text input label Feb 15, 2024
@bpasero
Copy link
Member

bpasero commented Mar 7, 2024

With our February release, there is now support to use your voice to dictate into the editor: https://code.visualstudio.com/updates/v1_87#_use-dictation-in-the-editor

ezgif-5-74147b0701

After installing the VS Code Speech extension you can use the keyboard shortcut Ctrl+Alt+V (Cmd+Alt+V on macOS) to start it.

Can people in this issue try it out and report back how it goes? Thanks!

@meganrogge meganrogge modified the milestones: Backlog, April 2024 Mar 7, 2024
@meganrogge
Copy link
Contributor

meganrogge commented Apr 16, 2024

In reading through this issue, here are my findings:

  • Dragon does work with VS Code, though beginner dictation users can find configuring their setups to do so challenging
  • there is interest in using voice commands to write code. We now support that with copilot chat and with editor dictation.
  • It would improve the experience to expose VS Code when clauses so extensions could know context/focus and tell Caster, a dragonfly based programming toolkit that enables running commands/writing code. However, we now have copilot chat and Hey Code for those.

cc @isidorn, I think this issue can be closed given these findings.

@isidorn
Copy link
Contributor

isidorn commented Apr 18, 2024

Thank you very much for those insights.

I agree that we can go ahead and close this issue. But I think we should create a follow up feature request for voice to trigger VS Code commands. Something that we currently do not support well, and it would be good to understand the need better.

For other requests (when close through API) there are already issues capturing this.

Users of the Voice extensions - we plan to do a user study at end of May. If you would like to help, more details can be found here microsoft/vscode-discussions#1144

@bpasero
Copy link
Member

bpasero commented Apr 18, 2024

I think we have that as #209906

@meganrogge
Copy link
Contributor

I have assigned #209906 to myself and added the accessibility label

@meganrogge meganrogge removed this from the April 2024 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accessibility Keyboard, mouse, ARIA, vision, screen readers (non-specific) issues feature-request Request for new features or functionality workbench-voice
Projects
None yet
Development

No branches or pull requests