Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths #8934

Merged
merged 17 commits into from May 10, 2019

Conversation

@jcsteh
Copy link
Contributor

@jcsteh jcsteh commented Nov 12, 2018

Link to issue number:

Fixes #7498.

Summary of the issue:

For NVDA's OneCore voices support:

  1. The rate setting is affected by the rate setting in Windows Speech Settings.
  2. The pitch range is very limited (compared with Narrator).

Description of how this pull request fixes the issue:

This uses new SpeechSynthesizerOptions properties introduced in recent updates of Windows 10 to set these values.

Changes by @leonardder since @jcsteh worked on this:

  1. Added rate boost to the synthesizer settings ring. I found this most helpful to test things, and imagined that users might also like this.
  2. For older versions of Windows 10 this driver now behaves equally to how current master behaves.
  3. Added rate boost to the OneCore driver. This is disabled by default so with this pr, speech shouldnt become understandable. The only case where the rate will differ between master and this pr will be when someone changed the rate in the Windows 10 speech settings. See the second point in the known issues section.

Testing performed:

  1. Set the rate to 0 in Windows Speech Settings, tested with NVDA, set the rate to 100 in Windows Speech Settings, tested with NVDA. Observed that NVDA's rate was not affected.
  2. To simulate older versions of Windows 10, built the changes with lower values for each of the Windows.Foundation.UniversalApiContract minimum version checks. Confirmed that the driver works and that volume, rate and pitch settings behave as in the old situation on unsupported versions (as expected). (I don't have systems with older versions of Windows, so I can't confirm this in a real world setting.)
  3. By hearing, compared master with this pull request with regard to rate range, and made sure that the rate rang hasn't changed significantly.

Known issues with pull request:

  1. Even though we;re fairly certain this should work on older versions of Windows 10, this hasn't been tested on actual systems with older versions of Windows 10.

  2. @michaelDCurran wrote in #8934 (comment)

    There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected.

Change log entry:

New features:

- For the Windows OneCore voices and eSpeak NG speech synthesizers, you can now enable rate boost using the synthesizer settings ring.

Bug fixes:

- For Windows OneCore voices, the rate set in NVDA is no longer affected by the rate set in Windows 10 Speech Settings.

Changes:

- Greater rate and pitch ranges are now supported for Windows OneCore voices. To increase the rate range, you can enable the new rate boost setting.

You may wish to move some of these into the New Features section. I'll leave that up to you.

@jcsteh jcsteh requested a review from michaelDCurran Nov 12, 2018
@josephsl
Copy link
Collaborator

@josephsl josephsl commented Nov 12, 2018

@michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented Nov 12, 2018

I'm totally in support of turning off appendSilence. This makes general navigation around Windows much much better with Onecore.

However, turning off punctuation silence in some cases (such as after a comma) seems to not leave enough gap. Though this is different for each voice it seems. Though quickly testing this, I am finding it hard to get used to with Microsoft David for instance. There's a lot of: "Let's eat Grandma" rather than "Let's eat, Grandma" going on.
Perhaps we could keep punctuation silence on, and allow the user to turn it off in NvDA settings. This would be nicer if Microsoft had let punctuationSilence actually take a duration.

Secondly, I am worried that some users may end up with an unusable NVDA as the rate will now suddenly be too fast. For instance, if their Windows rate was set to a slow value, and they had boosted their NVDA rate up high to compensate, now their NVDA rate will be super fast. True for an expert user the rate would not be that fast, but for a beginner user, it could be too much.

And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date. However, added to the above points, I think it worth noting for future conversation on this PR.

I'd really love to take the appendSilence code for 2018.4 at very least.

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Nov 12, 2018

And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date.

Note that Windows server 2016 is stuck at a particular equivalent build of Windows 10, so server 2016 won't support the new rate code. Server 2019 probably will.

@josephsl
Copy link
Collaborator

@josephsl josephsl commented Nov 12, 2018

@jcsteh
Copy link
Contributor Author

@jcsteh jcsteh commented Nov 12, 2018

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Dec 12, 2018

@jcsteh: Would you be able to resolve the merge conflicts?

@lukaszgo1
Copy link
Contributor

@lukaszgo1 lukaszgo1 commented Dec 15, 2018

I've tested this pr on Windows 10 builds 10240, 14393 and 17134.On version 1803 everything worked as expected, but on those two earlier versions the only setting which could be changed for OneCore was voice. Is it expected? I do not believe, that regressing things in this way is a good move. The versions of Win 10 which I've chosen are LTSB builds, so they would be used by enterprise customers for some time. Furthermore blindness specific market is usually slow at updating not only due to fear of something breaking, but also, because people are often using older versions of commercial screen readers and magnifiers, which wouldn'n work with the recent Windows 10 updates. Given this commend by @jcsteh it should be possible to use proper code depending of Windows 10 version in use. Ah, and one more thing, the read me should be updated to mention, that when installing Visual Studio the required version of SDK is now 17763.

@josephsl
Copy link
Collaborator

@josephsl josephsl commented Dec 15, 2018

@zstanecic
Copy link
Contributor

@zstanecic zstanecic commented Dec 15, 2018

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Jan 5, 2019

Also note that Windows Server 2016 is based on an older build of Windows 10 which won't have this functionality and thus would also suffer from this.

@josephsl
Copy link
Collaborator

@josephsl josephsl commented Jan 5, 2019

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Apr 25, 2019

@jcsteh: Is this subject to conflict with the new speech framework? In that case, I guess it is up to @michaelDCurran or @feerrenrut to decide whether we want either the framework or this to be merged first.

I'm happy to take the merge conflicts and review actions if that helps you, as long as you could possibly follow the progress and review where needed.

@michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented Apr 25, 2019

@jcsteh
Copy link
Contributor Author

@jcsteh jcsteh commented Apr 25, 2019

@xingkong0113

This comment was marked as off-topic.

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Apr 26, 2019

Is there a rough idea about how to convert old rates to the new rates? Imagine someone has his rate set to 100%, which isn't quite imaginary. If he'd update to a version of NVDA with this patch, the new rate would overwhelm that person and he might not even been able to control his machine due to the enormous increase of the speech rate.
Probably the easiest fix would be enforcing a rate of 50% upon everyone who once changed the rate manually, giving a warning in the release notes about why this happens.

@leonardder leonardder force-pushed the i7498OcSpeechOptions branch from 255a91b to 967d182 Apr 26, 2019
@leonardder
Copy link
Collaborator

@leonardder leonardder commented Apr 26, 2019

Ough, I wanted to be so kind to at least merge master for now, but github decided that I pushed 33 commits instead. The only thing I did was a merge of master, fixing some merge conflicts. I therefore reverted the merge of master, so everything should be back into its old state now.

@ruifontes
Copy link
Contributor

@ruifontes ruifontes commented Apr 26, 2019

jcsteh and others added 6 commits Apr 26, 2019
…e base pitch, volume and rate.

Previously, we used SSML in every utterance to set the base value of parameters, since there was no other way.
However, Windows 10 Fall Creators Update introduced new properties in the SpeechSynthesizerOptions class to set these parameters.
…t once fetches the support from NVDAHelper, and then saves it for the lifetime of the object
@leonardder leonardder force-pushed the i7498OcSpeechOptions branch from 967d182 to 23c432f Apr 26, 2019
@leonardder
Copy link
Collaborator

@leonardder leonardder commented Apr 26, 2019

I just rebased on master and pushed four additional commits which will add support for the old and new OneCore behavior side by side, i.e. for older windows versions which don't support prosody options, the old behaviour of pitch, volume and rate controls will apply.
There is still a major difference between the two implementations. I guess there's not much we could do about this.

@jcsteh: Is it actually ttrue that we can extend the rate range even fore by using both prosody options and SSML?

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Apr 30, 2019

This is all very confusing. From what I was able to determine last time I tested, narrator's max rate matched NVDA's max rate with the new code. That didn't seem to be the case in an older build I tested when this new API was first introduced, but it was when I implemented this PR.

I pushed a new commit that easily allows combining the parameters using a python console, like this:

  1. IN a python console, get the synth instance: speech.getSynth()
  2. Set the _rate, _volume and _pitch attributes on the instance. Setting them to None will disable sending them to the synth using ssml.

From what I was able to determine this way, setting pitch to 100% (new method) combining with setting _pitch to 100 (old method) gives the highest pitch. Same applies to 0% and 0, respectively, as well as to setting the rate.

The problem with having rate managed primarily by SSML is that this makes NVDA's rate dependent on the rate in Windows Settings, which is super confusing.

May be I didn't make this clear enough, but this wasn't my intention. I intended to use the new code, but setting the rate to a fixed value (1.0, default). As I've been able to test, this makes the synth insensitive to changes to Windows' rate.

Ideally, we want it to be independent, which is the case if we use the new API. I guess we could use the new API to specify the default value (1.0?) as a base, then use higher values for the boost setting.

I think I like that option. As noted above, not using SSML won't give us the full range. It actually means that people who have set their windows rate to 100% won't be able to have their rate as fast with the new code as they could have it with the old.

I've actually tested this with Windows 10 1903. May be that makes a difference here.

@jcsteh
Copy link
Contributor Author

@jcsteh jcsteh commented Apr 30, 2019

@leonardder
Copy link
Collaborator

@leonardder leonardder commented Apr 30, 2019

Interestingly, I've never come across documentation saying that you can combine the API and SSML to go beyond the API's maximum. I wonder if that documentation is new or if I just missed it? Regardless, that really doesn't make a huge amount of sense; min should be min and max should be max.

From the remarks section on https://docs.microsoft.com/en-us/uwp/api/windows.media.speechsynthesis.speechsynthesizeroptions.speakingrate#Windows_Media_SpeechSynthesis_SpeechSynthesizerOptions_SpeakingRate:

If Speech Synthesis Markup Language (SSML) is used, SpeakingRate is combined with any prosody tags in the markup.

I agree with your last point.

So you're certain that Narrator can go faster than we can if we only use the new API?

I had another test and it seems I was wrong in that. Narrators pitch and rate range are equal to the new implementation, without ssml. So you might be correct that the higher ranges that can be accomplished with SSML are unintentional.

I guess we could use the new API to specify the default value (1.0?) as a base, then use higher values for the boost setting.

I like this idea and will play with it a bit. I think it is best to make rate bost a check box in this case, just as with espeak. Just to make sure, I will again abandon SSML altogether for the new implementation.

@leonardder
Copy link
Collaborator

@leonardder leonardder commented May 2, 2019

I think this is ready for review. @jcsteh: May be you could have a quick look at what I changed since you worked on this?

Copy link
Member

@feerrenrut feerrenrut left a comment

Can you make sure that the description on this PR is correct please.

source/synthDriverHandler.py Outdated Show resolved Hide resolved
@leonardder
Copy link
Collaborator

@leonardder leonardder commented May 9, 2019

@feerrenrut: your commen tis addressed. I was actually pretty sure that I properly updated the description. Is there something you're missing in it?

@feerrenrut
Copy link
Member

@feerrenrut feerrenrut commented May 9, 2019

Is there something you're missing in it?

Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases.

I'm also holding off on merging it because I would @michaelDCurran to have a look at it too.

@michaelDCurran michaelDCurran changed the title OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume, rate and silence lengths OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths May 10, 2019
@michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented May 10, 2019

There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected.

@leonardder
Copy link
Collaborator

@leonardder leonardder commented May 10, 2019

@leonardder
Copy link
Collaborator

@leonardder leonardder commented May 10, 2019

Is there something you're missing in it?

Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases.

I've tried to address these things in the initial description.

Copy link
Member

@feerrenrut feerrenrut left a comment

Thanks @leonardder

@feerrenrut feerrenrut merged commit 9ebb356 into nvaccess:master May 10, 2019
1 check passed
@nvaccessAuto nvaccessAuto added this to the 2019.2 milestone May 10, 2019
@leonardder
Copy link
Collaborator

@leonardder leonardder commented May 10, 2019

I just realised that when the rate is set to 0, rate boost makes no difference because it only changes the max rate, not the min rate. I don't consider this a problem myself, but we could change the min rate for rate boost to be equal to the max rate when rate boost is off. This is pretty trivial.

@michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented May 11, 2019

Got this comment directly to me on Twitter:
"newest changes for Windows one core voices implemented in the NVDA alfa releases do not stay saved as soon as you switch to another synth, and go back to the One core voices, the synth loads but with its default values for pitch and rate."
I have not confirmed this yet.

@leonardder
Copy link
Collaborator

@leonardder leonardder commented May 11, 2019

I can't reproduce this on Windows 10 may 2019 update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

10 participants