Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for audio output using WASAPI #14697

Merged
merged 6 commits into from Apr 30, 2023
Merged

Conversation

jcsteh
Copy link
Contributor

@jcsteh jcsteh commented Mar 6, 2023

Link to issue number:

Fixes #11615. Fixes #5096. Fixes #10185. Fixes #11061. Partially addresses #10413 and #1409. May address #11169.

Summary of the issue:

NVDA's existing audio output code (nvwave) is largely very old and uses WinMM, a very old legacy Windows audio API. It is also written in pure Python, contains quite a few threading locks necessitated by WinMM, and parts of it have become rather difficult to reason about. There are several known stability and audio glitching issues that are difficult to solve with the existing code.

Description of user facing changes

At the very least, this fixes audio glitches at the end of some utterances as described in #10185 and #11061.

I haven't noticed a significant improvement in responsiveness on my system, but my system is also very powerful. It's hard to know whether the stability issues (e.g. #11169) are fixed or not. Time will tell as I run with this more.

Description of development approach

  1. The bulk of the WASAPI implementation is written in C++. The WASAPI interfaces are easy to access in C++ and difficult to access in Python. In addition, this allows for the best possible performance, given that we regularly and continually stream audio data.
  2. The WinMM code fired callbacks by waiting for the previous chunk to finish playing before sending the next chunk, which could result in buffer underruns (glitches) if callbacks were close together (Python 3 versions of NVDA produce a scratch in the speech when finishing the end of a line #10185 and Texts with multiple line spacings are voiced with NVDA + down arrow and voices crack #11061). In contrast, the WASAPI code uses the audio playback clock to fire callbacks independent of data buffering, eliminating glitches caused by callbacks.
  3. The WinMM WavePlayer class is renamed to WinmmWavePlayer. The WASAPI version is called WasapiWavePlayer. Rather than having a common base class, this relies on duck-typing. I figured it didn't make sense to have a base class given that WasapiWavePlayer will likely replace WinmmWavePlayer altogether at some point.
  4. WavePlayer is set to one of these two classes during initialisation based on a new advanced configuration setting. WASAPI defaults to disabled.
  5. WasapiWavePlayer.feed can take a ctypes pointer and size instead of a Python bytes object. This avoids the overhead of additional memory copying and Python objects in cases where we are given a direct pointer to memory anyway, which is true for most (if not all) speech synthesisers.
  6. For compatibility, WinmmWavePlayer.feed supports a ctypes pointer as well, but it just converts it to a Python bytes object.
  7. eSpeak and oneCore have been updated to pass a ctypes pointer to WavePlayer.feed.
  8. When playWaveFile is used asynchronously, it now feeds audio on the background thread, rather than calling feed on the current thread. This is necessary because the WASAPI code blocks once the buffer (400 ms) is full, rather than having variable sized buffers. Even with the WinMM code, playWaveFile code could block for a short time (nvwave.playWaveFile not fully async #10413). This should improve that also.
  9. WasapiWavePlayer supports associating a stream with a specific audio session, which allows that session to be separately configurable in the system Volume Mixer. NVDA tones and wave files have been split into a separate "NVDA sounds" session. WinmmWavePlayer has a new setSessionVolume method that can be used to set the volume of a session. This at least partially addresses Ability to adjust volume of sounds #1409.

Testing strategy:

  • Given that this deals with realtime audio output, there isn't a way to unit test or system test this.
  • I've been running with this code for several weeks as my main screen reader.
  • I've tested with both eSpeak and OneCore.
  • I've tested various situations requiring indexing with both eSpeak and OneCore, including say all, indentation tones and focusing dialogs (which relies on indexing due to multiple utterances).
  • I've tested switching default devices, disconnecting the default device and disconnecting a preferred device.
  • I've tested configuring NVDA to use a specific (non-default) device, exiting NVDA, disconnecting that device and starting NVDA. NVDA falls back to the default device as it should.
  • I've tested independently setting the volume of NVDA speech and sounds in the system Volume Mixer.
  • I have not tested with operating systems other than Windows 11 21h2, as I don't have any other versions of Windows set up to test with.
  • I also haven't tested with any other speech synthesisers, as I don't have any.

Known issues with pull request:

  • This needs a lot more testing across various computers, Windows versions, audio devices, speech synthesisers and configurations. It is enabled by default to gather maximum feedback, but can be easily disabled if it causes problems.
  • Calling tones.beep with a length longer than 399 ms will block, again due to the different buffering strategy. I don't think this will matter in practice, as I doubt beeps that long are used in practice.
  • If a synthesiser sends very small chunks of audio, playback might glitch. I'm not sure if any synths actually do this, so I'm not sure if this needs to be addressed or not. I do have some thoughts on how we could handle it though.
  • If a preferred device is specified and it is disconnected, playback falls back to the default device. The WinMM code had some logic to switch back to the preferred device if it is reconnected. I haven't implemented this for WASAPI; it will remain on the default device. I think it's possible to implement this, but it increases complexity and I'm not sure how useful it is, especially since the WinMM code only did this on idle anyway.

Change log entries:

New features
NVDA now outputs audio via the Windows Audio Session API (WASAPI), which may improve the responsiveness, performance and stability of NVDA speech and sounds. This can be disabled in Advanced settings if audio problems are encountered.

Code Review Checklist:

  • Pull Request description:
    • description is up to date
    • change log entries
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • API is compatible with existing add-ons.
  • Documentation:
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • Security precautions taken.

@@ -56,6 +56,7 @@
# Audio settings
[audio]
audioDuckingMode = integer(default=0)
wasapi = boolean(default=false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "audio" section is saved in profiles. However it seems to me that this options is only relevant for the base profile since it takes effect after restart.

Would it be worth to put this option in another or a new section (e.g. global settings / startup settings)?

Note that there is probably already some confusion on this point regarding some advanced settings; I am specifically thinking to "Registration for UI Automation events and property changes" option; but there may be others in adv settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this could be a point of confusion. That said, it seems overkill to create an entirely new config section for a setting which will eventually be removed. (I don't think it makes sense to keep both audio backends long-term.) It also doesn't fit so well in any of the existing global sections. Nevertheless, I'm of course happy to change this if the reviewer has a specific recommendation.

@@ -56,6 +56,7 @@
# Audio settings
[audio]
audioDuckingMode = integer(default=0)
wasapi = boolean(default=false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NV Access will probably request a ternary setting in the config (auto/wasapi/wave out), even if it's only exposed as a checkbox in the UI during development. The "auto" setting will be treated like "wave out" until the default is changed, at which point "auto" becomes "wasapi".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really clear on when we should be using a feature flag versus a boolean. Some advanced settings (even some added recently) are simple booleans, while others are feature flags. Again, I'm happy to implement whatever is recommended during review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened jcsteh#1 to resolve this item.

@jcsteh jcsteh force-pushed the wasapi branch 3 times, most recently from 18cd716 to 8b9513b Compare March 8, 2023 04:56
@AppVeyorBot
Copy link

See test results for failed build of commit d30c626079

@AppVeyorBot
Copy link

See test results for failed build of commit bc69b7422a

@LeonarddeR
Copy link
Collaborator

@jcsteh or @seanbudd could you please provide a signed try build for this, so I can test on my coroporate Dell system?

@jcsteh
Copy link
Contributor Author

jcsteh commented Mar 23, 2023

I actually have no idea what the policy or procedure is for generating signed try builds these days.

@seanbudd
Copy link
Member

seanbudd commented Mar 23, 2023

@LeonarddeR a try build is building here: https://ci.appveyor.com/project/NVAccess/nvda/builds/46590616/artifacts, by pushing a branch named "try-*" to this repo

@LeonarddeR
Copy link
Collaborator

Here are my first findings. Note that I have WASAPI enabled on secure screens as well.

  1. Overall performance definitely feels a bit more rapid, especially when typing. Note that the try build doesn't have your change regarding caret movement detection, as far as I know.

  2. When selecting an USB soundcard and disconnecting that card, the behavior is really snappy. It instantly switches back to the default card and switches back to the USB card when reconnecting it.

  3. I restarted my system twice and both times, I had no speech on the logon screen. Synth was set to no speech and I had to choose it explicitly. I Have seen this with winm incidentally, but certainly not as prevalent as now.

@jcsteh
Copy link
Contributor Author

jcsteh commented Mar 24, 2023

I restarted my system twice and both times, I had no speech on the logon screen. Synth was set to no speech and I had to choose it explicitly.

Oof. That's going to be hellish to debug. I guess the Windows audio service or the audio device isn't initialised at that point? I don't really know what we're supposed to do in that case. I would've thought the Windows Accessibility manager thing wouldn't try to start a screen reader until audio was up, but I guess there's no such restriction.

@jcsteh
Copy link
Contributor Author

jcsteh commented Mar 24, 2023

I did just discover and fix a bug where if you have a specific output device configured (not default, AKA Microsoft Sound Mapper) and you start NVDA without that device connected, nvwave would barf and end up setting the synth to no speech. I don't think that's the bug you're seeing, since I'm guessing you have the device set to default on secure screens, but still, it's an important fix.

@zstanecic
Copy link
Contributor

zstanecic commented Mar 24, 2023 via email

@codeofdusk
Copy link
Contributor

I've been running this PR for several weeks. So far so good!

@cary-rowen
Copy link
Contributor

@zstanecic
You can always get the latest build of this PR here:
https://ci.appveyor.com/project/NVAccess/nvda/builds/46595783/artifacts

@seanbudd
Copy link
Member

A try build with the latest commit has been built here: https://ci.appveyor.com/project/NVAccess/nvda/builds/46648246/artifacts

@LeonarddeR
Copy link
Collaborator

I tried this build and three out of three times, NVDA started talking on the logon screen. Furthermore, I actually don't see why WASAPI would be unable to find devices while WinM would be able to, since WinM is just a WASAPI client as you said.

I'd say, lets bring this to 2023.2 as is and then try to make it default in 2023.3!

@jcsteh
Copy link
Contributor Author

jcsteh commented Mar 30, 2023

Is it possible you accidentally configured a specific (non-default) device on the logon screen? As per #14697 (comment), I did have to fix a bug with that if said device was not connected.

@jcsteh jcsteh marked this pull request as ready for review March 30, 2023 06:23
@jcsteh jcsteh requested review from a team as code owners March 30, 2023 06:23
@LeonarddeR
Copy link
Collaborator

I have outputDevice = Microsoft Sound Mapper. in nvda.ini on the logon screen. Not sure whether you consider that default enough ;)

@jcsteh
Copy link
Contributor Author

jcsteh commented Mar 30, 2023

That sounds pretty defaulty to me. No idea what went wrong then. :(

@michaelDCurran michaelDCurran merged commit e77bde9 into nvaccess:master Apr 30, 2023
1 check passed
@nvaccessAuto nvaccessAuto added this to the 2023.2 milestone Apr 30, 2023
@ahicks92
Copy link
Contributor

Late to this party and not able to easily review NVDA code anyway, but with respect to logon screens I'm pretty sure that WaveOut is implemented on top of Wasapi these days and would expect that Wasapi would be working if WaveOut is.

years and years ago when I maintained Unspoken it was using Wasapi on logon screens without issue for quite a while, until people got annoyed enough that it ended up going through NVWave. But that was only because people wanted it to follow audio devices, which almost no one implements right on Wasapi (Windows 10 and later can do it automatically, but it was much harder in 2015 or so). That said, if you guys find out you want to use a third-party implementation given all the Wasapi edge cases, miniaudio is public domain and I can attest to it's stability, as it has been used by System Fault (an audiogame) for a while.

Someone is going to need to test on weird channel configurations. In particular, one place Wasapi likes to be fragile is around surround sound setups, requesting/using exclusive devices, etc etc. Time will tell on that, I imagine.

Figured dropping some context might be useful given my background.

@jcsteh
Copy link
Contributor Author

jcsteh commented Apr 30, 2023

@ahicks92, you noted that WASAPI in Windows 10 can auto follow audio devices. Do you have any references or pointers for that? I ended up having to implement that manually and it was annoying. It'd be nice if it could be done more cleanly, at least at some point in the future when we no longer support anything less than Windows 10.

@ahicks92
Copy link
Contributor

ahicks92 commented May 1, 2023

It's available in Windows 10 version 1607 and later. See: https://learn.microsoft.com/en-us/windows/win32/coreaudio/automatic-stream-routing

Miniaudio does it via listening to events instead, which works on older versions of Windows, and I'd not be surprised if they know about some bug or something I don't. Honestly Wasapi implementations are stupidly tricky and I regret having ever tried to write one myself (it didn't hurt that I was a worse programmer then, but even so...). I don't know that Miniaudio's API is compatible with NVDA and obviously this work is already done, but reading their code mightn't be the worst idea: https://github.com/mackron/miniaudio

Admittedly I haven't heavily tested Miniaudio's device switching because it's something I haven't cared a lot about. I've occasionally gotten reports of it not working, so idk if it's actually perfect. One problem with Wasapi that comes up is that devices sometimes flat out lie. Steel Series headsets for example say that their minimum period is 1 sample. So in the general it's kind of hard to tell sometimes if a bug is because (1) your implementation is broken, (2) the device is buggy and WaveOut had some sort of patch over that, or (3) WaveOut is so conservative about what it does that it just papers over things. Event-based switching being buggy vs. miniaudio being buggy vs. drivers that are doing bad things? You decide...

And if you don't know about the device events, I'm referring to these: https://learn.microsoft.com/en-us/windows/win32/coreaudio/device-events

@cary-rowen
Copy link
Contributor

Will this PR break API compatibility?
I have a problem with "tonysEnhancements" add-on, here is the log snippet:

  File "nvwave.pyc", line 718, in playWaveFile
  File "nvwave.pyc", line 823, in __init__
  File "C:\Users\cary\AppData\Roaming\nvda\addons\tonysEnhancements\globalPlugins\tonysEnhancements\__init__.py", line 618, in preWaveOpen
    winmm.waveOutSetVolume(selfself._waveout, volume2)
AttributeError: 'WasapiWavePlayer' object has no attribute '_waveout'

@zstanecic
Copy link
Contributor

zstanecic commented May 1, 2023 via email

@cary-rowen
Copy link
Contributor

Thanks I also created an Issue for Tony.

@zstanecic
Copy link
Contributor

zstanecic commented May 1, 2023 via email

@jcsteh
Copy link
Contributor Author

jcsteh commented May 1, 2023

This change isn't API breaking in the sense that it doesn't change anything in the public API. However, it will certainly break anything that monkey patched or relied upon private implementation details. It looks like this add-on does that, so I'm not surprised it broke. It will need to be updated.

@btman16
Copy link

btman16 commented May 2, 2023

Hello,

Running NVDA Alpha 28166, and using WASAPI, my system does not go to sleep automatically even though this is set in Power Options.

When using NVDA without WASAPI this works correctly.

For me, I rely on my system being able to automatically go into sleep regularly and other users might as well, so I wanted to bring this issue up early, as I could see there being some concern about this and it is a concern I have as well.

Other than this issue, however, WASAPI support is working well for me.
The issue also occurs with a fresh copy of NVDA, with no add-ons.

Thank you very much for your time and I look forward to any input you may have.

Sincerely,

Brandon Tyson

@Lo-lo78
Copy link

Lo-lo78 commented May 3, 2023

Hi James,
surely you have already thought about how to do it.
Wasapi should be in shared mode, so you can use applications like Reaper.
Greetings!

@Adriani90
Copy link
Collaborator

@josephsl it seems using WASAPI in combination with the sound splitter addon the sounds for switching to focus or browse mode are not working anymore. I guess this needs an update on the add-on's side.

@josephsl
Copy link
Collaborator

josephsl commented May 3, 2023 via email

@jcsteh
Copy link
Contributor Author

jcsteh commented May 3, 2023

NVDA WASAPI is already using shared mode.

@ruifontes
Copy link
Contributor

ruifontes commented May 3, 2023 via email

@jcsteh
Copy link
Contributor Author

jcsteh commented May 4, 2023

As per the Developer Guide:

The NVDA Add-on API includes all NVDA internals, except symbols that are prefixed with an underscore.

By that definition, this change is not API breaking. It does not make changes to public (non-underscore prefixed) symbols. The fact that add-ons have chosen to touch private symbols (prefixed with an underscore) is their choice and they are thus subject to breakage outside of .1 releases.

That said, pragmatically speaking, I acknowledge this has an impact on users, not just add-on authors. So, I guess we need to decide whether the benefits of landing this sooner (responsiveness, volume adjustment, etc.) are outweighed by the risk of add-on breakage.

@jcsteh
Copy link
Contributor Author

jcsteh commented May 4, 2023

If it is decided that add-on breakage is too much of a problem, an alternative to reverting this could be to disable it by default. That way, users who don't depend on these add-ons can still benefit if they choose to.

@cary-rowen
Copy link
Contributor

@jcsteh

This change isn't API breaking in the sense that it doesn't change anything in the public API. However, it will certainly break anything that monkey patched or relied upon private implementation details. It looks like this add-on does that, so I'm not surprised it broke. It will need to be updated.

I very much agree with what @jcsteh said, I think addon authors need to update these addons.

It is more important that the option for this PR should remain checked by default, so that more users can test it and find potential problems.

Thanks

@XLTechie
Copy link
Collaborator

XLTechie commented May 4, 2023 via email

@gregjozk
Copy link
Contributor

gregjozk commented May 4, 2023 via email

@MarcoZehe
Copy link
Contributor

MarcoZehe commented May 4, 2023 via email

@btman16
Copy link

btman16 commented May 4, 2023

Hi,

On Windows 10, I'm encountering an issue when using WASAPI where my
system can no longer automatically sleep even though I've set this up
in Power Options.

Otherwise I think this being on by default is good, but this is an
issue outside of add-ons, so I have concerns about this.

This issue does not occur on Windows 11, however.

Thanks,

Brandon

@seanbudd
Copy link
Member

seanbudd commented May 4, 2023

@btman16 @gregjozk -
This PR was merged last week into NVDA alpha.
Commenting on closed PRs and issues is lost easily, we cannot easily track comments in closed issues/PRs.

At this stage, issues with NVDA are alpha issues, not issues with this PR.
If you discover issues on alpha, please report them as a new issue.
For broader discussion on WASAPI and NVDA audio output, create a discussion or new issues.
I am locking the discussion on this PR.

@nvaccess nvaccess locked as resolved and limited conversation to collaborators May 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet