Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin.dic breaks a synth's ability to handle plural acronyms #11472

Open
ultrasound1372 opened this issue Aug 7, 2020 · 4 comments
Open

builtin.dic breaks a synth's ability to handle plural acronyms #11472

ultrasound1372 opened this issue Aug 7, 2020 · 4 comments

Comments

@ultrasound1372
Copy link

ultrasound1372 commented Aug 7, 2020

the problem:

Examine the output after NVDA's built-in processing of a string like "I have all their CDs" or similar. The point I'm making here is a string that has an all-caps acronym followed by 's' to indicate plurality. Some synthesizers are able to pronounce this properly, making it sound as if you had put an apostrophe there. This behavior causes many blind people when writing to use an apostrophe s at the end of an acronym to indicate plurality, however this is not proper writing. I would test all the synthesizers I have's ability to handle this but I don't know of an easy way to disable NVDA's built-in processing, I could never find a setting for it.

the proposal:

I am suggesting a minor alteration to builtin.dic, specifically the expression that breaks away words starting with a capital from a fully uppercase word. The second lowercase letter should be anything but s.
Specifically, this is the regex modification I'm proposing on line 4 of builtin.dic

([A-Z])([A-Z][a-rt-z])

Blocking questions:

NVDA does support many languages and I don't know the syntax of all of them. are there any languages where words with the second letter being 's' would be a problem here? How many other languages that use the latin script would indicate a plural acronym in this way? If this does pose a problem for languages that end up with s as the second letter, what NVDA usable synthesizers will handle this gracefully if it were to behave in the way I propose?

misc questions:

Does default/voice/temporary dictionary processing occur before NVDA's builtin.dic processing? If so and this is not implemented for whatever reason, a simple regex in the speech dictionary would be to add an apostrophe manually, which would actually make more synthesizers behave this way. I would not propose adding an apostrophe in builtin.dic.

@amirsol81
Copy link

I second this. I've seen it quite frequently with terms like CEOs, FAQs, UFOs, CFOs, GMOs, and GUIs.

@Mohamed00
Copy link

Mohamed00 commented Aug 11, 2020

I mentioned something similar in #11368, but I'll close that in favor of this issue, since it has more detail. The dictionary can be disabled with this code.
import globalVars
globalVars.speechDictionaryProcessing=False

@CyrilleB79
Copy link
Collaborator

In French, we should normally write "CD" for singular as well as plural form and adding a 's' at the end of abbreviations is not the recommended way for the plural form. Anyway the English way to do seems to becoming more and more frequent despite being incorrect. So I would recommend to have it announced in French too for smoother reading.

Note also that modifying this rule may have some incidence on expressions such as "XIXe siècle", i.e. "19th century" usually written with roman number in French. It seems that some synth dictionaries have partially worked around the issue caused by NVDA's mixed case word rule.

@ultrasound1372
Copy link
Author

Also the case that @Mohamed00 made about names like McAdam and others is another problem this causes, but the fix would be specifically an English one and that dictionary is applied to all languages, and I'm not sure how the regex would look for that. So perhaps we need another solution than just adding s as an exception to mixed case processing. Knowing the order of processing would be very helpful to craft dictionary entries that can work around this were this not added in core, but I'd have to deliberately break the string in some other way if they were processed first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants