New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Edward voice to NVDA #8177
Add Edward voice to NVDA #8177
Conversation
This implementation is clean enough to show improvements to espeak's klatt sound, but there are still some buffering issues at certain speeds.
…least 1 sample long. Sometimes after subtracting fades it can be 0 or negative!
… to its fade in, don't bother playing that frame at all. Previously its length was forced to 1 ms, but this would cause a jump from the first frame to the second frame if there was a fade out. And instead of submitting a null frame for the fade out, submit the same second frame but with volume of 0, so that even if we didn't play the second frame, at least the fade out would fade towards it.
… to 0 at the end of audio. preFormantGain was not enough as resonators can produce tails. Stops apparent clipping at the end of some utterances.
…sals sound perhaps a little less... nasally?
…was what was in the survey
…g where no klatt output could be heard (instead just espeak consonants) when starting NVDA or sometimes a little way through, due to an uninitialized variable.
…utput (a voiced consonant), decrease the output gain of the klatt frames to be queued by a factor of 5. this is similar to how speechPlayer did it and helps to stop perceiving voiced consonants as being too long (e.g. language)
…e offset and max values. Stops noise junk at the end of words such as 'az' when at slow speeds.
… there is a waveFile that is being / was being mixed. Previously the volume went straight back up as soon as the end of the wave file was reached. This change slightly improves the grunt heard at the end of voiced consonants at high rates.
…consonants a little.
…layer add-on to use them.
@michaelDCurran: Has it ever been considered to file this upstream for espeak-ng? |
The maintainer of eSpeak-ng wishes to eventually address the underlying
issues in the existing Klatt implementation, and has no interest in
integrating speechPlayer (which is c++) into the project at this stage.
So for the foreseeable future, the best way for people to get Edward is
to integrate speechPlayer in NVDA.
|
@@ -138,8 +141,13 @@ espeakLib=env.SharedLibrary( | |||
# com\comentrypoints.c | |||
# com\ttsengine.cpp | |||
# We do not use the ASYNC compile option in espeak. | |||
speechPlayerCpp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been trying to keep this list in alphabetical order, it makes it easier to spot missing entries when updating espeak. Could you move this up, before synthdata.c
. Maybe you could also add a comment to say that loose files should be added in alphabetical order.
nvdaHelper/espeak/speechPlayer.cpp
Outdated
extern unsigned char *out_end; | ||
|
||
speechPlayer_handle_t speechPlayerHandle=NULL; | ||
#define minFadeLength 110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make minFadeLength a constant rather than a #define
nvdaHelper/espeak/speechPlayer.cpp
Outdated
extern unsigned char *out_ptr; | ||
extern unsigned char *out_end; | ||
|
||
speechPlayer_handle_t speechPlayerHandle=NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use nullptr
instead of NULL
nvdaHelper/espeak/speechPlayer.cpp
Outdated
#define minFadeLength 110 | ||
|
||
inline bool needsMixWaveFile() { | ||
return wdata.n_mix_wavefile>0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is wdata.n_mix_wavefile
a BOOL
? If so, I would prefer to see this explicitly cast to boolean: static_cast<bool>(wdata.n_mix_wavefile)
nvdaHelper/espeak/speechPlayer.cpp
Outdated
return wdata.n_mix_wavefile>0; | ||
} | ||
|
||
unsigned int mixWaveFile(unsigned int maxNumSamples, sample* sampleBuf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add documentation for this? Specifically, what is the return?
nvdaHelper/espeak/speechPlayer.cpp
Outdated
signed char c=wdata.mix_wavefile[wdata.mix_wavefile_ix+wdata.mix_wavefile_offset]; | ||
val+=(c*256); | ||
} else { | ||
val=(signed char)wdata.mix_wavefile[wdata.mix_wavefile_ix+wdata.mix_wavefile_offset]*wdata.mix_wave_scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explicitly cast with static_cast<signed char>(blah)
you can capture blah
in whatever type it starts in with: auto blah = wdata.mix.....
} | ||
|
||
bool isKlattFrameFollowing() { | ||
for(int i=(wcmdq_head+1)%N_WCMDQ;i!=wcmdq_tail;i=(i+1)%N_WCMDQ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some comment to explain what's going on here.
return false; | ||
} | ||
|
||
void fillSpeechPlayerFrame(frame_t * eFrame, speechPlayer_frame_t* spFrame) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic numbers in this function.
nvdaHelper/espeak/speechPlayer.cpp
Outdated
speechPlayer_queueFrame(speechPlayerHandle,&spFrame2,minFadeLength/2,minFadeLength/2,-1,false); | ||
spFrame2.outputGain=0; | ||
speechPlayer_queueFrame(speechPlayerHandle,&spFrame2,minFadeLength/2,minFadeLength/2,-1,false); | ||
//speechPlayer_queueFrame(speechPlayerHandle,NULL,1,1,-1,false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment
|
||
int Wavegen_Klatt2(int length, int resume, frame_t *fr1, frame_t *fr2){ | ||
if(!resume) { | ||
speechPlayer_frame_t spFrame1={0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment or clarify any of this?
This is expected to not be available in the latest NVDA next as of this writing, right? It would need to be approved first? Sorry I'm a little behind on my terminology. |
@valiant8086 Once you see the incubating label applied, then this will become available in next. |
if(i>=maxNumSamples) break; | ||
int val; | ||
if(wdata.mix_wave_scale==0) { | ||
val=wdata.mix_wavefile[wdata.mix_wavefile_ix+wdata.mix_wavefile_offset]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should keep this as is, as sometimes mix_wavefile_ix is incremented, and the mix_wavefile_ix+mix_wavefile_offset is used both before and after at times. Surely the compiler would optimize the addition out where it can?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was more about trying to improve the clarity of the code, to make it easier to see where the same value is being used, rather than about optimisation.
Is this still on the radar? It looks like last review actions date from like two months ago. |
@michaelDCurran are now all review actions done? |
We have discussed this again. From a product level I think it makes more sense to separate this from espeak. It appearing as it's own synthesizer will make it clearer that it has something different from espeak, and will also help us to separate issues that occur in this / espeak. That said, it would be best if it continued to use the existing espeak submodule, I wouldn't want to duplicate that. |
With the upcoming transition to Python 3, I believe that the add-on is going to break. It's been my primary synth for years so figured it's worth chiming in to say this would be valuable to me. |
@michaelDCurran Is further work on this pr planned in the near future? Some people liked the voice from the addon, and it can no longer be used in 2019.3. |
NV Access has chosen to no longer work on this particular project. Although I did update nvaccess/nvSpeechPlayer to be Python3 compatible and the add-on to be NVDA 2019.3 compatible, there are some breaking changes in eSpeak that specifically cause speechPlayerInEspeak (Edward) to no longer correctly speak voiced consonants (z, v, j, g etc). This broke somewhere in late eSpeak 1.48 or 1.49. I don't personally have the time to look into this anymore. Though of course other members of the community would be most welcome to pick this up and try and further debug and solve the issues. |
Link to issue number:
Closes #7870
Summary of the issue:
The Edward voice from speechPlayerInEspeak has proven extremely popular with people who have had issues with espeak in the past. Many people have asked for Edward to be included in NVDA.
Description of how this pull request fixes the issue:
This PR replaces eSpeak's Klatt implementation with that of nvSpeechPlayer, and includes the Edward variant for use with eSpeak.
Testing performed:
Ran NVDA. Switched to Edward voice. Read some text.
Known issues with pull request:
This changes the sound of all Klatt voices in eSpeak, not just the Edward voice, as speechPlayer now handles generate of all Klatt frames. I don't think this should be a problem as the older Klatt voices in most cases were distorted and had strange artifacts.
Change log entry: