You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: the rules in builtin.dic files to split CamelCaseText considers text written only with ASCII characters.
Steps to reproduce:
Set OneCore Microsoft Hortense French voice (or IBMTTS French voice)
Read the following lines:
JEANÉdouard
JEAN Édouard
Actual behavior:
The two lines are not pronounced the same way. More specifically, "JEAN" is spelt, what is the normal behaviour of OneCore French voices when a mix of upper and lower case is encountered.
Expected behavior:
NVDA has rules in the builtin.dic file that should split each part of a camelCaseText. Thus the text in CamelCase should be split by this rules and the two lines should be pronounced the same way.
Additional examples
For English examples, you could use OneCore Zira voice and compare how the following lines are read:
DJÖtzi
DJ Ötzi
Or still more obvious, always with Zira:
StÉtienne
St Étienne
Notes
Other examples can be reproduced with IBMTTS.
I did not succeed in producing examples with eSpeak. Maybe it has an internal CamelCase processing?
I have opened this issue after having had a look at the builtin.dic file. However, I am not impacted by it in may daily work and the produced examples are not real-life example but examples builton purpose to demonstrate the issue.
Maybe there are languages where this issue is more significant: Greek, Russian? If yes, feel free to comment here with examples to illustrate the issue.
System configuration
NVDA installed/portable/running from source:
installed
NVDA version:
2021.3.1rc1
Windows version:
Windows 10 20H2 (64-bit) build 19042.1348
Name and version of other software in use when reproducing the issue:
N/A
Other information about your system:
Other questions
Does the issue still occur after restarting your computer?
Yes
Have you tried any other versions of NVDA? If so, please report their behaviors.
No but it should be the same.
If NVDA add-ons are disabled, is your problem still occurring?
Yes
Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?
Did not test. But should not have an impact.
In any case the rc1 release has been installed recently, so the tool has been run during the installation recently.
Issue: the rules in
builtin.dic
files to split CamelCaseText considers text written only with ASCII characters.Steps to reproduce:
JEANÉdouard
JEAN Édouard
Actual behavior:
The two lines are not pronounced the same way. More specifically, "JEAN" is spelt, what is the normal behaviour of OneCore French voices when a mix of upper and lower case is encountered.
Expected behavior:
NVDA has rules in the
builtin.dic
file that should split each part of a camelCaseText. Thus the text in CamelCase should be split by this rules and the two lines should be pronounced the same way.Additional examples
For English examples, you could use OneCore Zira voice and compare how the following lines are read:
DJÖtzi
DJ Ötzi
Or still more obvious, always with Zira:
StÉtienne
St Étienne
Notes
builtin.dic
file. However, I am not impacted by it in may daily work and the produced examples are not real-life example but examples builton purpose to demonstrate the issue.System configuration
NVDA installed/portable/running from source:
installed
NVDA version:
2021.3.1rc1
Windows version:
Windows 10 20H2 (64-bit) build 19042.1348
Name and version of other software in use when reproducing the issue:
N/A
Other information about your system:
Other questions
Does the issue still occur after restarting your computer?
Yes
Have you tried any other versions of NVDA? If so, please report their behaviors.
No but it should be the same.
If NVDA add-ons are disabled, is your problem still occurring?
Yes
Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?
Did not test. But should not have an impact.
In any case the rc1 release has been installed recently, so the tool has been run during the installation recently.
Technical
To solve this issue, the rules should be modified to include non-ASCII characters in the uppercase/lowercase character classes.
Here are informations for a starting point: https://stackoverflow.com/questions/36187349/python-regex-for-unicode-capitalized-words
The text was updated successfully, but these errors were encountered: