Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent compile-phonemes use of phontab, en_dict #930

Merged
merged 2 commits into from May 17, 2021
Merged

Prevent compile-phonemes use of phontab, en_dict #930

merged 2 commits into from May 17, 2021

Conversation

jbowler
Copy link
Contributor

@jbowler jbowler commented Apr 29, 2021

espeak-ng --compile-phonemes generates phontab, but during the
generation it starts by reading the old phontab. If works just fine,
apart from an error message, if this doesn't exist but if it does exist
it is not clear whether it might end up using old or bad data (e.g.
possibly crashing if phontab is corrupted making it very non-obvious how
to fix the problem...)

The same applies to en_dict as a result of espeak-ng defaulting to the
voice 'en'; again the phontab build works if en_dict is absent but a
spurious message is output. It's not clear what would happen if someone
tried to build a reduced espeak-ng configuration with just one (non
'en') language.

The fix is to change espeak_ng_CompilePhonemeDataPath to call LoadVoice
with a new flag to indicate the voice structure is just required to
compile phoneme data (I changed the error message as well to make it
clear where it comes from - there are two identical error messages, the
other one is in voices.c)

Within LoadVoice the flag causes LoadVoice to not set a default language
or voicename and to skip loading the old phontab or dictionary (both of
which fail in a git clean -dfx build).

I checked the result by doing a compile from-scratch build sequence with
and without the patch, i.e. I did:

git clean -dfx
./autogen.sh
./configure --prefix=a-test-directory
make
make install

Then I compared the two copies of "a-test-directory" as well as the log
file output:

$ diff -r logs/*
diff -r logs/original/make.log logs/patched/make.log
175,176d174
< Unknown phoneme table: 'en'
< Can't read dictionary file:
'/home/jbowler/src/espeak-ng/master/espeak-ng-data/en_dict'

$ diff --no-dereference -r local.original local.patched
Binary files local.original/bin/speak-ng and local.patched/bin/speak-ng differ
Binary files local.original/lib/libespeak-ng.a and local.patched/lib/libespeak-ng.a differ
Binary files local.original/lib/libespeak-ng.so.1.1.51 and local.patched/lib/libespeak-ng.so.1.1.51 differ

So the change has not altered phontab or any of the other generated
files, only the binaries as would be expected. (Note that I am
comparing truely clean builds; what you get if you download the source
and build from scratch. This is the standard approach for people who
build for distribution.)

Signed-off-by: John Bowler jbowler@acm.org

espeak-ng --compile-phonemes generates phontab, but during the
generation it starts by reading the old phontab.  If works just fine,
apart from an error message, if this doesn't exist but if it does exist
it is not clear whether it might end up using old or bad data (e.g.
possibly crashing if phontab is corrupted making it very non-obvious how
to fix the problem...)

The same applies to en_dict as a result of espeak-ng defaulting to the
voice 'en'; again the phontab build works if en_dict is absent but a
spurious message is output.  It's not clear what would happen if someone
tried to build a reduced espeak-ng configuration with just one (non
'en') language.

The fix is to change espeak_ng_CompilePhonemeDataPath to call LoadVoice
with a new flag to indicate the voice structure is just required to
compile phoneme data (I changed the error message as well to make it
clear where it comes from - there are two identical error messages, the
other one is in voices.c)

Within LoadVoice the flag causes LoadVoice to not set a default language
or voicename and to skip loading the old phontab or dictionary (both of
which fail in a git clean -dfx build).

I checked the result by doing a compile from-scratch build sequence with
and without the patch, i.e. I did:

git clean -dfx
./autogen.sh
./configure --prefix=a-test-directory
make
make install

Then I compared the two copies of "a-test-directory" as well as the log
file output:

$ diff -r logs/*
diff -r logs/original/make.log logs/patched/make.log
175,176d174
< Unknown phoneme table: 'en'
< Can't read dictionary file:
'/home/jbowler/src/espeak-ng/master/espeak-ng-data/en_dict'

$ diff --no-dereference -r local.original local.patched
Binary files local.original/bin/speak-ng and local.patched/bin/speak-ng differ
Binary files local.original/lib/libespeak-ng.a and local.patched/lib/libespeak-ng.a differ
Binary files local.original/lib/libespeak-ng.so.1.1.51 and local.patched/lib/libespeak-ng.so.1.1.51 differ

So the change has not altered phontab or any of the other generated
files, only the binaries as would be expected.  (Note that I am
comparing truely clean builds; what you get if you download the source
and build from scratch.  This is the standard approach for people who
build for distribution.)

Signed-off-by: John Bowler <jbowler@acm.org>
@jbowler
Copy link
Contributor Author

jbowler commented Apr 29, 2021

This is the fix for PR #928. A similar bug exists in the dictionary creation with a clean build. By "clean" I mean freshly cloned or git clean -dfx, not make clean or even make distclean, which doesn't work properly. I can't fix that bug in the same way, but I can submit a patch to work round it in Makefile.am

And simplify the _dict pattern rule.  If a build had completed in the
directory before and "make distclean" had not been run ??_dict
dictionaries still existed, if one of these was rebuilt espeak-ng
loaded the old (out of date) one first resulting in inconsistent
execution of the build the second time round.  The change removes the
target before building it thus ensuring that old, possibly damaged, data
is not used.

I also changed the extraction of the language code; the GNU make %
(pattern) extension along with the long-standing $* extension (which
precedes GNU make) allow the match to the '%' to be used directly in the
command line.  The cd .. at the end of the command is unnecessary; make
(all versions) execute each command line using a single system() call,
so the cd never happens inside make.

Signed-off-by: John Bowler <jbowler@acm.org>
@jbowler
Copy link
Contributor Author

jbowler commented Apr 29, 2021

Ok... git has merged the change for the dictionary creation issue into this pull request, but the two commits are entirely independent.

@valdisvi valdisvi merged commit 83f9312 into espeak-ng:master May 17, 2021
@valdisvi
Copy link
Member

Thanks for contribution! Your pull request is merged into project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants