Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set number handling on by default #888

Merged
merged 3 commits into from
Mar 12, 2021
Merged

Conversation

jaacoppi
Copy link
Collaborator

@jaacoppi jaacoppi commented Mar 4, 2021

Can you check my findings and assumptions. This commit will change many languages.

I was instructing someone who is currently adding a new language. They had problems getting numbers working. Nothing in _list was processed. The reason is that number processing is disabled by default. I think it should be on by default so adding a new language is easier. Also see the reasoning in the commit message.

There can be errors if the number definitions are incomplete. The solution would be to fix the number handling code instead of disabling it by default.

I don't speak most of the language affected here. I used a combination of google translate and manually reading the diffs and _list files to figure it out.

If we don't want to change the default behavior we should turn on number processing for those languages that benefit from it and improve documentation in docs/add_language.md to make sure contributors realize to enable number processing.

docs: add details about number flags to the documentation.

It's clearly intended to be enabled by default:
- it's defined as default behaviour translate.h (NUM_DEFAULT)
- tr_languages.c sets many default values related to number processing
  that have no meaning unless langopts.numbers == 1.

It is also a more sensible default since most languages will want to
have number processing on. This makes adding new languages easier
because  adding an entry to tr_languages.c is unnecessary.

A negative side effect is that languages with partial number defines
might experience bugs when reading undefined numbers. This is a bug and
should be fixed.

This will have the side effect of enabling number processing for
languages that currently have it disabled. However, there shouldn't be
any.

Here's a way to check affected languages:
for voice in $(ESPEAK_DATA_PATH=`pwd` LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH}
src/espeak-ng --voices | grep -v Languages | awk '{print $2}'); do
OUTPUT=$(ESPEAK_DATA_PATH=`pwd` LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH}
src/espeak-ng -qx -v $voice "1 - 2 - 3 - 12 - 123") && echo "$voice:
$OUTPUT" ; done

These voices clearly benefit from enabling numbers (they already have
number rules in *_list):
ba, cmn (zh), hak, haw, ja, kok, nb, nci

Some languages are missing some definitions (like _12) in _list files.
It causes the program to skip some numbers.
Numbering needs to be turned off explicitly for:
jbo, mi, my, piqd, py, qu, quc, th, uz

Languages with no number rules at all:
chr, cv, he, nog, tk, ug
@valdisvi
Copy link
Member

valdisvi commented Mar 8, 2021

I think, this is ok. In future this and similar settings should be set in ../espeak-ng-data/lang/.. configuration files, but this change looks like evolution in that direction.

@jaacoppi jaacoppi merged commit 4c3fe18 into espeak-ng:master Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants