Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths #8934

Merged
merged 17 commits into from May 10, 2019

Conversation

@jcsteh
Copy link
Contributor

commented Nov 12, 2018

Link to issue number:

Fixes #7498.

Summary of the issue:

For NVDA's OneCore voices support:

  1. The rate setting is affected by the rate setting in Windows Speech Settings.
  2. The pitch range is very limited (compared with Narrator).

Description of how this pull request fixes the issue:

This uses new SpeechSynthesizerOptions properties introduced in recent updates of Windows 10 to set these values.

Changes by @leonardder since @jcsteh worked on this:

  1. Added rate boost to the synthesizer settings ring. I found this most helpful to test things, and imagined that users might also like this.
  2. For older versions of Windows 10 this driver now behaves equally to how current master behaves.
  3. Added rate boost to the OneCore driver. This is disabled by default so with this pr, speech shouldnt become understandable. The only case where the rate will differ between master and this pr will be when someone changed the rate in the Windows 10 speech settings. See the second point in the known issues section.

Testing performed:

  1. Set the rate to 0 in Windows Speech Settings, tested with NVDA, set the rate to 100 in Windows Speech Settings, tested with NVDA. Observed that NVDA's rate was not affected.
  2. To simulate older versions of Windows 10, built the changes with lower values for each of the Windows.Foundation.UniversalApiContract minimum version checks. Confirmed that the driver works and that volume, rate and pitch settings behave as in the old situation on unsupported versions (as expected). (I don't have systems with older versions of Windows, so I can't confirm this in a real world setting.)
  3. By hearing, compared master with this pull request with regard to rate range, and made sure that the rate rang hasn't changed significantly.

Known issues with pull request:

  1. Even though we;re fairly certain this should work on older versions of Windows 10, this hasn't been tested on actual systems with older versions of Windows 10.

  2. @michaelDCurran wrote in #8934 (comment)

    There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected.

Change log entry:

New features:

- For the Windows OneCore voices and eSpeak NG speech synthesizers, you can now enable rate boost using the synthesizer settings ring.

Bug fixes:

- For Windows OneCore voices, the rate set in NVDA is no longer affected by the rate set in Windows 10 Speech Settings.

Changes:

- Greater rate and pitch ranges are now supported for Windows OneCore voices. To increase the rate range, you can enable the new rate boost setting.

You may wish to move some of these into the New Features section. I'll leave that up to you.

@jcsteh jcsteh requested a review from michaelDCurran Nov 12, 2018

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Nov 12, 2018

@michaelDCurran

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2018

I'm totally in support of turning off appendSilence. This makes general navigation around Windows much much better with Onecore.

However, turning off punctuation silence in some cases (such as after a comma) seems to not leave enough gap. Though this is different for each voice it seems. Though quickly testing this, I am finding it hard to get used to with Microsoft David for instance. There's a lot of: "Let's eat Grandma" rather than "Let's eat, Grandma" going on.
Perhaps we could keep punctuation silence on, and allow the user to turn it off in NvDA settings. This would be nicer if Microsoft had let punctuationSilence actually take a duration.

Secondly, I am worried that some users may end up with an unusable NVDA as the rate will now suddenly be too fast. For instance, if their Windows rate was set to a slow value, and they had boosted their NVDA rate up high to compensate, now their NVDA rate will be super fast. True for an expert user the rate would not be that fast, but for a beginner user, it could be too much.

And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date. However, added to the above points, I think it worth noting for future conversation on this PR.

I'd really love to take the appendSilence code for 2018.4 at very least.

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Nov 12, 2018

And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date.

Note that Windows server 2016 is stuck at a particular equivalent build of Windows 10, so server 2016 won't support the new rate code. Server 2019 probably will.

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Nov 12, 2018

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Nov 12, 2018

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Dec 12, 2018

@jcsteh: Would you be able to resolve the merge conflicts?

@lukaszgo1

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

I've tested this pr on Windows 10 builds 10240, 14393 and 17134.On version 1803 everything worked as expected, but on those two earlier versions the only setting which could be changed for OneCore was voice. Is it expected? I do not believe, that regressing things in this way is a good move. The versions of Win 10 which I've chosen are LTSB builds, so they would be used by enterprise customers for some time. Furthermore blindness specific market is usually slow at updating not only due to fear of something breaking, but also, because people are often using older versions of commercial screen readers and magnifiers, which wouldn'n work with the recent Windows 10 updates. Given this commend by @jcsteh it should be possible to use proper code depending of Windows 10 version in use. Ah, and one more thing, the read me should be updated to mention, that when installing Visual Studio the required version of SDK is now 17763.

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Dec 15, 2018

@zstanecic

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Jan 5, 2019

Also note that Windows Server 2016 is based on an older build of Windows 10 which won't have this functionality and thus would also suffer from this.

@josephsl

This comment has been minimized.

Copy link
Collaborator

commented Jan 5, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

@jcsteh: Is this subject to conflict with the new speech framework? In that case, I guess it is up to @michaelDCurran or @feerrenrut to decide whether we want either the framework or this to be merged first.

I'm happy to take the merge conflicts and review actions if that helps you, as long as you could possibly follow the progress and review where needed.

@michaelDCurran

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

@dingpengyu

This comment was marked as off-topic.

Copy link

commented Apr 26, 2019

hello michael
Why not learn from other screen readers? I recommend you use it. Zhengdu Screen Reader
http://www.zd.hk/download-down-type-zdsr_pw.htm
Maybe you will have new discoveries and hope that NVDA will have a new breakthrough.
thank

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2019

Is there a rough idea about how to convert old rates to the new rates? Imagine someone has his rate set to 100%, which isn't quite imaginary. If he'd update to a version of NVDA with this patch, the new rate would overwhelm that person and he might not even been able to control his machine due to the enormous increase of the speech rate.
Probably the easiest fix would be enforcing a rate of 50% upon everyone who once changed the rate manually, giving a warning in the release notes about why this happens.

@leonardder leonardder force-pushed the jcsteh:i7498OcSpeechOptions branch from 255a91b to 967d182 Apr 26, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2019

Ough, I wanted to be so kind to at least merge master for now, but github decided that I pushed 33 commits instead. The only thing I did was a merge of master, fixing some merge conflicts. I therefore reverted the merge of master, so everything should be back into its old state now.

@ruifontes

This comment has been minimized.

Copy link

commented Apr 26, 2019

jcsteh and others added some commits Mar 19, 2018

OneCore voices: Use new SpeechSynthesizerOptions properties to set th…
…e base pitch, volume and rate.

Previously, we used SSML in every utterance to set the base value of parameters, since there was no other way.
However, Windows 10 Fall Creators Update introduced new properties in the SpeechSynthesizerOptions class to set these parameters.
Create a supportsProsodyOptions property on the synthesizer class tha…
…t once fetches the support from NVDAHelper, and then saves it for the lifetime of the object

@leonardder leonardder force-pushed the jcsteh:i7498OcSpeechOptions branch from 967d182 to 23c432f Apr 26, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2019

I just rebased on master and pushed four additional commits which will add support for the old and new OneCore behavior side by side, i.e. for older windows versions which don't support prosody options, the old behaviour of pitch, volume and rate controls will apply.
There is still a major difference between the two implementations. I guess there's not much we could do about this.

@jcsteh: Is it actually ttrue that we can extend the rate range even fore by using both prosody options and SSML?

@jcsteh

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2019

Interestingly, I've never come across documentation saying that you can combine the API and SSML to go beyond the API's maximum. I wonder if that documentation is new or if I just missed it? Regardless, that really doesn't make a huge amount of sense; min should be min and max should be max.

From the remarks section on https://docs.microsoft.com/en-us/uwp/api/windows.media.speechsynthesis.speechsynthesizeroptions.speakingrate#Windows_Media_SpeechSynthesis_SpeechSynthesizerOptions_SpeakingRate:

If Speech Synthesis Markup Language (SSML) is used, SpeakingRate is combined with any prosody tags in the markup.

I agree with your last point.

So you're certain that Narrator can go faster than we can if we only use the new API?

I had another test and it seems I was wrong in that. Narrators pitch and rate range are equal to the new implementation, without ssml. So you might be correct that the higher ranges that can be accomplished with SSML are unintentional.

I guess we could use the new API to specify the default value (1.0?) as a base, then use higher values for the boost setting.

I like this idea and will play with it a bit. I think it is best to make rate bost a check box in this case, just as with espeak. Just to make sure, I will again abandon SSML altogether for the new implementation.

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented May 2, 2019

I think this is ready for review. @jcsteh: May be you could have a quick look at what I changed since you worked on this?

@feerrenrut
Copy link
Contributor

left a comment

Can you make sure that the description on this PR is correct please.

source/synthDriverHandler.py Outdated Show resolved Hide resolved
@leonardder

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2019

@feerrenrut: your commen tis addressed. I was actually pretty sure that I properly updated the description. Is there something you're missing in it?

@feerrenrut

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Is there something you're missing in it?

Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases.

I'm also holding off on merging it because I would @michaelDCurran to have a look at it too.

@michaelDCurran michaelDCurran changed the title OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume, rate and silence lengths OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths May 10, 2019

@michaelDCurran

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected.

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented May 10, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented May 10, 2019

Is there something you're missing in it?

Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases.

I've tried to address these things in the initial description.

@feerrenrut
Copy link
Contributor

left a comment

Thanks @leonardder

@feerrenrut feerrenrut merged commit 9ebb356 into nvaccess:master May 10, 2019

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@nvaccessAuto nvaccessAuto added this to the 2019.2 milestone May 10, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented May 10, 2019

I just realised that when the rate is set to 0, rate boost makes no difference because it only changes the max rate, not the min rate. I don't consider this a problem myself, but we could change the min rate for rate boost to be equal to the max rate when rate boost is off. This is pretty trivial.

@michaelDCurran

This comment has been minimized.

Copy link
Contributor

commented May 11, 2019

Got this comment directly to me on Twitter:
"newest changes for Windows one core voices implemented in the NVDA alfa releases do not stay saved as soon as you switch to another synth, and go back to the One core voices, the synth loads but with its default values for pitch and rate."
I have not confirmed this yet.

@leonardder

This comment has been minimized.

Copy link
Collaborator

commented May 11, 2019

I can't reproduce this on Windows 10 may 2019 update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.