Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneCore Voices: Support faster speech rates, greater pitch range, etc. #7498

Closed
jcsteh opened this issue Aug 15, 2017 · 24 comments

Comments

@jcsteh
Copy link
Contributor

commented Aug 15, 2017

Compared with NVDA, Narrator is able to speak at faster rates with the OneCore Voices. It can also access a much wider pitch range. In addition, the rate set in Narrator is not affected by the rate set in Windows Settings, whereas NVDA is affected by this. This is because Narrator used an API which was previously private. That API has now been made public, so NVDA will be able to use it.

See the Options property on the SpeechSynthesizer class and the SpeechSynthesizerOptions class. Note that AudioPitch, AudioVolume and SpeakingRate (the properties we want) were only introduced in Windows 10 Insider 10.0.16257.0.

Unfortunately, we can't use these just yet for a few reasons:

  1. We need a newer version of the Windows 10 SDK, but versions after 10.0.15063 only work with Visual Studio 2017. We can't yet build with Visual Studio 2017. So, this is blocked by our move to VS 2017, which is in turn blocked by support in SCons.
  2. This is still a preview version of the SDK. Even once it's out of preview, it might be a while before it gets installed on AppVeyor.

Implementation notes:

  1. Because this will only be available in later builds of Windows 10, we'll need to test for support of this new API at runtime. I know UWP provides a way to do this, but I've never done it myself.
  2. Aside from testing for the API, we'll probably still need to keep the code around which adjusts rate, etc. with SSML to work with earlier Windows 10 builds. We might just be able to pass the default rate, volume, etc. to _OcSsmlConverter if the new API is supported instead of passing the cached user settings, but I'm not certain. Either way, this part of the code is going to get a bit ugly because we have to support these two cases.
@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Aug 15, 2017

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2017

There's a bit of confusion as to when this became public. The documentation I linked to (which is supposedly the official UWP documentation) says 16257. However, I've also seen references which suggest earlier. It's safe to say that it's now available in Insider builds and the preview SDK.

@zersiax

This comment has been minimized.

Copy link

commented Aug 24, 2017

Seeing a possible glitch here, in latest next build on latest windows 10 stable, changing the tts settings in either control panel or Narrator Settings does not appear to influence NVDA when speaking through oneCore, where I have observed this happening a few months ago. Has something changed in regards to this?

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2017

@zersiax

This comment has been minimized.

Copy link

commented Aug 25, 2017

Found it already. There's quite a few places where speech rates can be set and only one of them works, e.g. the ones in narrator settings and control panel's tts settings don't work, only the settings in another part of the windows+i settings dialog seem to take :)

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Nov 17, 2017

@jcsteh: I belief all blocking cases for this issue are resolved, right? If so, we can safely remove the blocking label and set a priority for this.

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Nov 17, 2017

@leonardder leonardder removed the blocked label Nov 17, 2017

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2018

I just spent over an hour implementing this, only to discover that it doesn't actually let us access the faster rates. It's still restricted to the maximum rate we can get in Speech Settings, only it's not impacted by that setting like our previous SSML code was. It does allow us to access the greater pitch range, just not the greater rate range. So, Narrator must still be using the old private API (which does work but we can't legally use it). The string "MSTTS.SpeakRate" can still be found in Narrator's srh.dll, which would seem to confirm this.

Anyway, the code, useless as it currently is, is in the i7498OcSpeechOptions branch of my fork: https://github.com/jcsteh/nvda.git

@michaelDCurran, would you be able to follow this up with Microsoft? Alas, this means we'll probably have to wait for yet another Windows version before we can access these rates.

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Sep 8, 2018

@michaelDCurran, @jcsteh: How is this in the October update?

@manish10

This comment has been minimized.

Copy link

commented Sep 17, 2018

I reached out to the narrator team to ask about this. They are no longer using a private API for this. This was the response from the developer:
Please ask the NVDA devs if they’ve looked at the SpeechSynthesizerOptions class. That’s the public API that we’ve been using to set the rate since RS4.

Are we also using the options class mentioned above?

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2018

@manish10

This comment has been minimized.

Copy link

commented Sep 17, 2018

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2018

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Oct 15, 2018

I rebased the i7498OcSpeechOptions branch in my fork to current master. Unfortunately, with the Windows 10 1809 update (Version 1809 (OS Build 17763.55)), the maximum rate available via this API is still only the normal max rate, not the boosted rates available to Narrator. Furthermore, Narrator's SRH.dll still contains the string "MSTTS.SpeakRate", which is used in the private API to access the boosted rates. That suggests they're still not using the public API. I'm following up with Microsoft.

For reference, I saw the MSTTS.SpeakRate string using this command. (The strings utility is from GNU binutils.)
strings --encoding=l SRH.dll

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2018

The i7498OcSpeechOptions branch in my fork now also sets the appended silence and punctuation silence to the minimum. This greatly reduces the delays between utterances and after punctuation, thus resulting in a much better screen reading experience. However, these are only supported in the latest (1809) update to Windows 10. I'm not sure what will happen if you try to run this on earlier versions. That needs to be tested and gracefully handled.

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2018

I've updated my branch to gracefully handle older versions of Windows 10 where these features aren't available.

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2018

I'm not sure whether there was something weird on my system or I tested something incorrectly. Previously, the rates available to Narrator were vastly different to those available in the normal system speech settings; 100% in Narrator was completely unintelligible due to how fast it was. Now (perhaps due to me recently resetting Narrator's voice), they're equivalent. Furthermore, 100% in current NVDA builds is equivalent to 100% in Narrator (and 100% in system speech settings). So, I guess what I was experiencing previously was a bug. Since things are now equivalent in 1809 (and there's no advantage to the old way of doing things), I think we're good to ship this.

Unfortunately, AppVeyor doesn't yet have the required version of the Windows 10 SDK (10.0.17763.0). See appveyor/ci#2673.

@PratikP1

This comment has been minimized.

Copy link

commented Oct 22, 2018

@jcsteh wrote:

The i7498OcSpeechOptions branch in my fork now also sets the appended silence and punctuation silence to the minimum. This greatly reduces the delays between utterances and after punctuation, thus resulting in a much better screen reading experience.

Practically speaking, does this mean that the pauses will be shortened between certain punctuation marks and the beginning of the next content?

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2018

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2018

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2018

@michaelDCurran

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2018

This branch now compiles on appveyor. Not sure why that Windows SDK issue is still open.

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Nov 11, 2018

Practically speaking, does this mean that the pauses will be shortened between certain punctuation marks and the beginning of the next content?

Yes.

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Nov 12, 2018

https://www.appveyor.com/updates/ confirms this. The new image containing this SDK went live on 9 November.

@nvaccessAuto nvaccessAuto added this to the 2019.2 milestone May 10, 2019

feerrenrut added a commit that referenced this issue May 10, 2019

OneCore voices: Use new SpeechSynthesizerOptions properties to set pi…
…tch, volume and rate lengths (PR #8934)

Fixes #7498.

Issue summary for NVDA's OneCore voices support:
- The rate setting is affected by the rate setting in Windows Speech Settings.
- The pitch range is very limited (compared with Narrator).

Previously, we used SSML in every utterance to set the base value of parameters, since there was no other way.
However, Windows 10 Fall Creators Update introduced new properties in the SpeechSynthesizerOptions class to set these parameters.

In addition to using these new properties, this commit adds rate boost to the synthesizer settings ring and added rate boost to the OneCore driver. This is disabled by default so speech should continue to be understandable. It is expected that for older versions of Windows 10 this driver should behave equally to how current master behaves.

The only case where the rate will differ from before this commit will be when someone changed the rate in the Windows 10 speech settings. More information on this in the PR #8934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.