Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Edward voice to NVDA #8177

Closed
wants to merge 53 commits into from
Closed

Add Edward voice to NVDA #8177

wants to merge 53 commits into from

Conversation

michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented Apr 13, 2018

Link to issue number:

Closes #7870

Summary of the issue:

The Edward voice from speechPlayerInEspeak has proven extremely popular with people who have had issues with espeak in the past. Many people have asked for Edward to be included in NVDA.

Description of how this pull request fixes the issue:

This PR replaces eSpeak's Klatt implementation with that of nvSpeechPlayer, and includes the Edward variant for use with eSpeak.

Testing performed:

Ran NVDA. Switched to Edward voice. Read some text.

Known issues with pull request:

This changes the sound of all Klatt voices in eSpeak, not just the Edward voice, as speechPlayer now handles generate of all Klatt frames. I don't think this should be a problem as the older Klatt voices in most cases were distorted and had strange artifacts.

Change log entry:

This implementation is clean enough to show improvements to espeak's klatt sound, but there are still some buffering issues at certain speeds.
…least 1 sample long. Sometimes after subtracting fades it can be 0 or negative!
… to its fade in, don't bother playing that frame at all. Previously its length was forced to 1 ms, but this would cause a jump from the first frame to the second frame if there was a fade out. And instead of submitting a null frame for the fade out, submit the same second frame but with volume of 0, so that even if we didn't play the second frame, at least the fade out would fade towards it.
… to 0 at the end of audio. preFormantGain was not enough as resonators can produce tails. Stops apparent clipping at the end of some utterances.
…sals sound perhaps a little less... nasally?
…g where no klatt output could be heard (instead just espeak consonants) when starting NVDA or sometimes a little way through, due to an uninitialized variable.
…utput (a voiced consonant), decrease the output gain of the klatt frames to be queued by a factor of 5. this is similar to how speechPlayer did it and helps to stop perceiving voiced consonants as being too long (e.g. language)
…e offset and max values. Stops noise junk at the end of words such as 'az' when at slow speeds.
… there is a waveFile that is being / was being mixed. Previously the volume went straight back up as soon as the end of the wave file was reached. This change slightly improves the grunt heard at the end of voiced consonants at high rates.
@LeonarddeR
Copy link
Collaborator

@michaelDCurran: Has it ever been considered to file this upstream for espeak-ng?

@ehollig
Copy link
Collaborator

ehollig commented Apr 13, 2018

This may close #7870 and possibly related to #4578 and #5272

@michaelDCurran
Copy link
Member Author

michaelDCurran commented Apr 13, 2018 via email

@@ -138,8 +141,13 @@ espeakLib=env.SharedLibrary(
# com\comentrypoints.c
# com\ttsengine.cpp
# We do not use the ASYNC compile option in espeak.
speechPlayerCpp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been trying to keep this list in alphabetical order, it makes it easier to spot missing entries when updating espeak. Could you move this up, before synthdata.c. Maybe you could also add a comment to say that loose files should be added in alphabetical order.

extern unsigned char *out_end;

speechPlayer_handle_t speechPlayerHandle=NULL;
#define minFadeLength 110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make minFadeLength a constant rather than a #define

extern unsigned char *out_ptr;
extern unsigned char *out_end;

speechPlayer_handle_t speechPlayerHandle=NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use nullptr instead of NULL

#define minFadeLength 110

inline bool needsMixWaveFile() {
return wdata.n_mix_wavefile>0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is wdata.n_mix_wavefile a BOOL? If so, I would prefer to see this explicitly cast to boolean: static_cast<bool>(wdata.n_mix_wavefile)

return wdata.n_mix_wavefile>0;
}

unsigned int mixWaveFile(unsigned int maxNumSamples, sample* sampleBuf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add documentation for this? Specifically, what is the return?

signed char c=wdata.mix_wavefile[wdata.mix_wavefile_ix+wdata.mix_wavefile_offset];
val+=(c*256);
} else {
val=(signed char)wdata.mix_wavefile[wdata.mix_wavefile_ix+wdata.mix_wavefile_offset]*wdata.mix_wave_scale;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explicitly cast with static_cast<signed char>(blah) you can capture blah in whatever type it starts in with: auto blah = wdata.mix.....

}

bool isKlattFrameFollowing() {
for(int i=(wcmdq_head+1)%N_WCMDQ;i!=wcmdq_tail;i=(i+1)%N_WCMDQ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some comment to explain what's going on here.

return false;
}

void fillSpeechPlayerFrame(frame_t * eFrame, speechPlayer_frame_t* spFrame) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic numbers in this function.

speechPlayer_queueFrame(speechPlayerHandle,&spFrame2,minFadeLength/2,minFadeLength/2,-1,false);
spFrame2.outputGain=0;
speechPlayer_queueFrame(speechPlayerHandle,&spFrame2,minFadeLength/2,minFadeLength/2,-1,false);
//speechPlayer_queueFrame(speechPlayerHandle,NULL,1,1,-1,false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment


int Wavegen_Klatt2(int length, int resume, frame_t *fr1, frame_t *fr2){
if(!resume) {
speechPlayer_frame_t spFrame1={0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment or clarify any of this?

@valiant8086
Copy link

This is expected to not be available in the latest NVDA next as of this writing, right? It would need to be approved first? Sorry I'm a little behind on my terminology.

@feerrenrut
Copy link
Contributor

@valiant8086 Once you see the incubating label applied, then this will become available in next.

if(i>=maxNumSamples) break;
int val;
if(wdata.mix_wave_scale==0) {
val=wdata.mix_wavefile[wdata.mix_wavefile_ix+wdata.mix_wavefile_offset];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should keep this as is, as sometimes mix_wavefile_ix is incremented, and the mix_wavefile_ix+mix_wavefile_offset is used both before and after at times. Surely the compiler would optimize the addition out where it can?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was more about trying to improve the clarity of the code, to make it easier to see where the same value is being used, rather than about optimisation.

@LeonarddeR
Copy link
Collaborator

Is this still on the radar? It looks like last review actions date from like two months ago.

@Adriani90
Copy link
Collaborator

@michaelDCurran are now all review actions done?

@feerrenrut
Copy link
Contributor

We have discussed this again. From a product level I think it makes more sense to separate this from espeak. It appearing as it's own synthesizer will make it clearer that it has something different from espeak, and will also help us to separate issues that occur in this / espeak. That said, it would be best if it continued to use the existing espeak submodule, I wouldn't want to duplicate that.

@ahicks92
Copy link
Contributor

ahicks92 commented Jan 8, 2020

With the upcoming transition to Python 3, I believe that the add-on is going to break. It's been my primary synth for years so figured it's worth chiming in to say this would be valuable to me.

@lukaszgo1
Copy link
Contributor

@michaelDCurran Is further work on this pr planned in the near future? Some people liked the voice from the addon, and it can no longer be used in 2019.3.

@michaelDCurran
Copy link
Member Author

NV Access has chosen to no longer work on this particular project. Although I did update nvaccess/nvSpeechPlayer to be Python3 compatible and the add-on to be NVDA 2019.3 compatible, there are some breaking changes in eSpeak that specifically cause speechPlayerInEspeak (Edward) to no longer correctly speak voiced consonants (z, v, j, g etc). This broke somewhere in late eSpeak 1.48 or 1.49. I don't personally have the time to look into this anymore. Though of course other members of the community would be most welcome to pick this up and try and further debug and solve the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Edward voice from NV Speechplayer with NVDA for Windows 10s
8 participants