Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German: Espeak is unable to speak grouped numbers #5235

Closed
nvaccessAuto opened this issue Jul 20, 2015 · 28 comments
Closed

German: Espeak is unable to speak grouped numbers #5235

nvaccessAuto opened this issue Jul 20, 2015 · 28 comments
Assignees
Labels
bug component/i18n existing localisations or internationalisation
Milestone

Comments

@nvaccessAuto
Copy link

Reported by bdorer on 2015-07-20 18:56
This is reproduceable with numbers like 1.000.000, 2.000.000 and so on. Espeak ignores all groups of 0 behind thousand.

I don't know wheather this is espeaks fault as it is reproduceable with espeaks sapi5 version and SVox Pico. I'll test with Vocalizer and report.

@nvaccessAuto
Copy link
Author

Comment 1 by The_Dark_Man on 2015-07-23 10:04
I think, this bug is only in the German version of eSpeak. In English, they write 2,000,000.

@nvaccessAuto
Copy link
Author

Comment 2 by jteh on 2015-08-07 11:31
This isn't a bug in eSpeak. It's a problem in the NVDA symbols.dic for German. A complex symbol rule probably needs to be added for the thousands separator. This isn't necessary for English because comma is usually past through to the synth unchanged. The date separator in the German symbols is also causing problems here, though you might be able to get around this by making sure the thousands separator rule is above it so it takes precedence.

@nvaccessAuto
Copy link
Author

Comment 3 by jteh on 2015-08-07 11:36
Something like the following complex symbol rule should do the trick (untested):

thousands separator (?<=\d)\.<?=\d{3})

and in symbols:

thousands separator punkt   all norep

@nvaccessAuto
Copy link
Author

Comment 4 by chrislm (in reply to comment description) on 2015-08-08 08:52
Replying to bdorer:

I don't know wheather this is espeaks fault as it is reproduceable with espeaks sapi5 version and SVox Pico.

Running "espeak -vde" in command line those numbers are spoken correctly with eSpeak sapi5, also using espeakedit in german language.
SvPiko seems to not read beyond the six digit in any language.
Could a native german speaker test this regexpt?
Pattern:
(?<=\d)(.?)(\d{3})
R:
\2

@nvaccessAuto
Copy link
Author

Comment 5 by bdorer (in reply to comment 3) on 2015-08-10 17:08
Replying to jteh:

Something like the following complex symbol rule should do the trick (untested):

thousands separator   (?<=\d)\.<?=\d{3})

and in symbols:

thousands separator   punkt   all norep

hmm, this rule isn't complete. NVDA reports mismatching of braces. I tried the following:

thousands separator (?<=\d)\.(<?=\d{3})

If you meant that, this rule isn't working as expected. Furthermore if I spell 30.000.000 NVDA says 3 dot 000thausand.

@nvaccessAuto
Copy link
Author

Comment 6 by bdorer (in reply to comment 4) on 2015-08-10 17:13
Replying to chrislm:

Replying to bdorer:

I don't know wheather this is espeaks fault as it is reproduceable with espeaks sapi5 version and SVox Pico.

Running "espeak -vde" in command line those numbers are spoken correctly with eSpeak sapi5, also using espeakedit in german language.

SvPiko seems to not read beyond the six digit in any language.

Could a native german speaker test this regexpt?

Pattern:

(?<=\d)(.?)(\d{3})

R:

\2

hmm, I don't know how to test your regular expression as I am not familiar with it. Which part should I use as pattern and which part as Replacement?
Thanks for your help.

@nvaccessAuto
Copy link
Author

Comment 7 by chrislm (in reply to comment 6) on 2015-08-10 19:58
Replying to bdorer:

I don't know how to test your regular expression as I am not familiar with it. Which part should I use as pattern and which part as Replacement?

Sorry, I mean to test it in a speech dictionary.
From Preferences menu open a Temporary dictionary, insert pattern and replacement and choose Regular expression in the radio button.
Thanks.

@nvaccessAuto
Copy link
Author

Comment 8 by bdorer (in reply to comment 7) on 2015-08-10 20:09

Sorry, I mean to test it in a speech dictionary.

From Preferences menu open a Temporary dictionary, insert pattern and replacement and choose Regular expression in the radio button.

Thanks.

Sure, but I didn't understand R as an abbreviation for replacement as you wrote pattern and not p.
Your rule fixes it. Now I need this as an regexp for complexSymbols like

thousands separator (?<=\d)\.(<?=\d{3})

This one isn't working for me.

@nvaccessAuto
Copy link
Author

Comment 9 by chrislm (in reply to comment 8) on 2015-08-10 21:34
Replying to bdorer:

thousands separator (?<=\d).(<?=\d{3})


This one isn't working for me.

Try so:

thousands separator (?<=\d)\.(?=\d{3})

@nvaccessAuto
Copy link
Author

Comment 10 by bdorer on 2015-08-10 22:08
Thanks! This regexp is doing the job.
@jamie may it's worth documenting such regexps for other languages on the wiki?

@nvaccessAuto
Copy link
Author

Comment 11 by jteh on 2015-08-10 23:27
Ug. Yeah, that's the expression i meant; sorry about the typos.

Yeah, we should probably document this somewhere. Perhaps we could add a Tips section to TranslatingSymbols.

@nvaccessAuto
Copy link
Author

Comment 12 by chrislm on 2015-08-11 10:31
this ticket can be used for other thousands separators?
Sometimes is used a space character as separator, for example in many articles on Wikipedia.
Probably enter a standard space in a symbols rule may cause problems, but the specific character below It is widely used as a thousands separator.

Character: " "
Name: "thin space"
UNICODE: "u+2009"

@nvaccessAuto
Copy link
Author

Comment 13 by bdorer on 2015-08-11 14:59
hmm, I think so. Espeak for example accepts spaces as thousands separator for german with no problem.

@nvaccessAuto
Copy link
Author

Comment 14 by bdorer on 2015-08-19 22:42
Bah, @jteh would it be possible to merge symbols.dic of SVN-Rev 23136? There was a typo which I fixed now. Sorry for my inconviniance!

@nvaccessAuto
Copy link
Author

Comment 15 by jteh on 2015-08-19 22:53
Sorry, but we can only accept critical changes for 2015.3 now (i.e. fixes for crashes or serious security issues). Is this really a critical change?

@nvaccessAuto
Copy link
Author

Comment 16 by bdorer on 2015-08-19 22:59
in this case, it isn't but it is confusing for many Germans.

@nvaccessAuto
Copy link
Author

Comment 17 by jteh on 2015-08-19 23:05
Can you explain the impact? That is, what will happen if we don't take this?

@nvaccessAuto
Copy link
Author

Comment 18 by bdorer on 2015-08-19 23:09
well, many synths don't say thousand and million and so on on grouped numbers.

@nvaccessAuto
Copy link
Author

Comment 19 by jteh on 2015-08-19 23:10
Let me put this another way: is there any regression from 2015.2 without this change? That is, is there something in 2015.2 that worked but is now broken because of this mistake?

@nvaccessAuto
Copy link
Author

Comment 20 by bdorer on 2015-08-19 23:27
well, synths wich spoke groups of thousands correct in 2015.2 have now a wrong speech as they don't say thousand and so on in grouped numbers

@nvaccessAuto
Copy link
Author

Comment 21 by jteh on 2015-08-19 23:39
This doesn't seem to match my testing. eSpeak German reports thousands and millions when I do, for example, 1.000 or 1.000.000.

@nvaccessAuto
Copy link
Author

Comment 22 by bdorer on 2015-08-20 05:46
sure but it doesn't work for Microsoft sapi5 Hedda and vocalizer for nvda which are used on many computers.
I don't have more sapi5 voices to test.

@nvaccessAuto
Copy link
Author

Comment 23 by James Teh <jamie@... on 2015-08-24 02:13
In [81824fa]:

German symbols: Fix typo which was breaking the thousands separator.

Re #5235.

@nvaccessAuto
Copy link
Author

Comment 24 by jteh on 2015-08-24 23:22
This can be closed, since it was fixed in the German symbols.

I'd be reluctant to do this for spaces, as it might match when it shouldn't. We've certainly had problems with this in the past in spreadsheets, etc. where coordinates get mixed with numbers, and while that particular case is fixed, there could be others. However, this is really up to the German community to decide.
Changes:
Changed title from "Espeak is unable to speak grouped numbers" to "German: Espeak is unable to speak grouped numbers"
Milestone changed from None to 2015.3
State: closed

@nvaccessAuto nvaccessAuto added bug component/i18n existing localisations or internationalisation labels Nov 10, 2015
@nvaccessAuto nvaccessAuto added this to the 2015.3 milestone Nov 10, 2015
@The-Dark-Man
Copy link

The Issue is back.

@Christianlm
Copy link

the level for tousan separator in german symbols has been change to "none".
Set a higher level such as "all" to solve it.

@The-Dark-Man
Copy link

It works. Thank you!

@bdorer
Copy link
Sponsor

bdorer commented Jun 7, 2016

Hi, I fixed this again. Sorry for theinkonvinience

Am 06.06.2016 um 20:16 schrieb The-Dark-Man:

It works. Thank you!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#5235 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AKun5bYGlbpoys2GM5nELt0_r8Kg4Q0jks5qJGQEgaJpZM4IuW-C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component/i18n existing localisations or internationalisation
Projects
None yet
Development

No branches or pull requests

5 participants