Add plural forms for needed strings in NVDA's code base #15864

CyrilleB79 · 2023-11-30T10:39:58Z

Link to issue number:

Closes #12445
Follow-up of #15013.

Summary of the issue:

When possible NVDA needs to distinguish between singular and plural in UI messages. The impact may be more significant in languages such as Slavic ones, where there are more than one plural form.

Description of user facing changes

In NVDA UI messages, i.e. in GUI or when NVDA speaks or brailles messages, messages will use singular/plural form as needed.

Description of development approach

Look for all strings containing "%" or "{" in the .pot and check if a plural form is needed. And replace _ / pgettext by ngettext / npgettext where needed.
Some expression containing a number have been pluralized even if they do not differ between singular and plural form in English due to the difference being visible in translations due to a different grammar. E.g.: "{numFingers} finger {action}" in English to be translated to "{action} {numFingers} doigt" (singular) vs "{action} {numFingers} doigts" (plural) in French.
table-rowcount and table-columncount control fields of the virtual buffer's text info have had to be converted to integer (previously strings), so that computation can be done.
No change log needed except for the code's API breaking change.

Testing strategy:

Manual smoke test on a subset of strings.
Feedback from translators during translation period and communication with the translators mailing list

Known issues with pull request:

No conversion of foobar messages

Regarding the foobar appModule which uses ``utils.localisation.TimeOutputFormat`, it manages plural as follows:

"{D} day {H}:{M}:{S}"
"{D} days {H}:{M}:{S}"
That does not match needs of languages having more than one plural form.

I have not modified utils.localisation.TimeOutputFormat not foobar appModule however since it is not straight forward at all; I am not sure that making TimeOutputFormat class inheriting from DisplayStringEnum was a good idea.
Also there would be more things to change to how time is reported in foobar:

"3 remaining" would be reported for 3 seconds remaining; it's not very clear.
"1:0:4 remaining" is not parsed correctly by many synthesizers (OneCore, eSpeak, IBMTTS) due to single digit used where two would be needed by the synthesizer to recognize a time format
":44" reports "3 44 remaining" in English, which is not very clear and it even reports "3 hours 44 remaining" in French instead of "3 minutes 44 remaining" with eSpeak or OneCore.

Since utils.localisation.TimeOutputFormat is only used in foobar, since tracks lasting more than 24 hours are very rare and since more work needs to be done in utils.localisation.TimeOutputFormat anyway, I have not included changes in utils.localisation.TimeOutputFormat or foobar appModule in this PR.

Declension matching issue

In some languages, there may still be issues in translation for messages such as "list with %s item" due to declension. Indeed, when encountering a number written with digits, the synth does not know which case should be used to pronounce it and usually uses nominative case. This cause something strange because the number (nominative case) and the noun (instrumental case) do not match whereas they should match, i.e. the number should be declensed to instrumental case too. However synth cannot handle the complexity of this correctly.

It has been discussed in this thread of the translators mailing list. And it has been agreed that such situations should be handled by translators through work-around in their translations.
E.g. maybe translate to "list, 5 items" instead of "list with 5 items".

Open questions

Here are various expressions that I have not converted yet. The relevance of converting them is to be discussed with reviewers as well as with translators on the mailing list. If a language needs one of them to be converted, please inform me and I will do it.

pt

Even if in English the abbreviation is invariable (e.g. "1 pt", "3 pt" without "s"), some language may have translated it as a full word, i.e. translation of "point" instead of "pt".

"%s pt"
"at least %.1f pt"
"Cell width: {0.x:.1f} pt"
"Cell height: {0.y:.1f} pt"
"exactly {space:.1f} pt"
"{offset:.3g} pt"

Abbreviations of measurement units

"{val:.2f} in"
"{val:.2f} cm"
"%s px"
"%s em"
"%s ex"
"%s rem"

Are there languages where a plural form is used for these measurement units? E.g. "8 cms" vs "1 cm"
For the last one which are HTML character size specification ("ex", "em", "rem"), I doubt there is any difference at all between English and translation.

Percent (full word)

Message where percent is written as a word ("percent" or "per cent") and not as a symbol ("%")

"%d percent"
"Object edges positioned {left:.1f} per cent from left edge of screen, "
" fraction {fractionValue:.2f} Percent slice {pointIndex} of {pointCount}"

Are there languages where "percent" / "per cent" is used? If yes, can the "%" symbol be used instead to avoid to deal with a plural form?

Percent symbol ("%")

Are there languages where percent symbol is translated to something that have singular and plural forms?

"%s highlighted" (from Kindle appModule)

"3 highlighted" is not a very consistent sentence from a grammar point of view. Some languages may have translated to something more consistent, with a possible plural form.
From the languages that I have seen, only French however uses a translation of something like "highlighted %s times"; but in French the translation would not change depending on singular or plural ("times" and "time" both translate to "fois").

Are there languages which need a specific plural form for this message?
Or should I convert the source English to something more grammatically correct, e.g. "highlighted 3 times"; the drawback would be a longer form, and the presence of "2 times" where "twice" may be preferred.

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

source/globalCommands.py

AppVeyorBot · 2023-11-30T11:31:09Z

FAIL: Translation comments check. Translation comments missing or unexpectedly included. See build log for more information.
PASS: Unit tests.
PASS: Lint check.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/rwlp8xcd2p8ej0k4/artifacts/output/nvda_snapshot_pr15864-30159,13dce07e.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.0,
INSTALL_END 1.1,
BUILD_START 0.0,
BUILD_END 28.1,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 16.8,
FINISH_END 0.1

See test results for failed build of commit 13dce07ef8

source/globalCommands.py

AppVeyorBot · 2023-11-30T13:19:38Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: Lint check. See test results for more information.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/a3jh5yglptiusty8/artifacts/output/nvda_snapshot_pr15864-30162,67faf09a.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.0,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 28.2,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 17.1,
FINISH_END 0.1

See test results for failed build of commit 67faf09ab0

Co-authored-by: Luke Davis <8139760+XLTechie@users.noreply.github.com>

Adriani90 · 2023-12-01T13:45:37Z

In the romanian language we are using plural for cm, and also for percent, as sign and as a full written word but this is being handled in the translation itself so I think there is no rule needed from NVDA side as far as I understand. However I think this change is wellcome though maybe for other languages.

In my view the biggest problems in this regard are (I) plural for acronyms such as GUIs, PPAs NGOs etc. where all letters are upper case apart from the s at the end which is small letter see proposal in #11472) and (II) the plurals where changes in the middle or at the beginning of the word are needed to indicate it such as some latin languages, I guess also in slavic languages. cc: @zstanecic, @hkatic, @lukaszgo1 for slavic languages and @ultrasound1372, @amirsol81 and @Mohamed00 for acronyms.

@cary-rowen, @josephsl, @larry801, @minakononogaki, @nishimotz, @ManshulBelani, @sumandogra, @dinakar-td, @khsbory, @ungjinPark, @dnz3d4c maybe you guys can give some feedback on how it works in asian languages such as chinese Japanese, indian and corean,

and @mohammad-suliman, @Mohamed00 maybe you have some feedback on arabic languages perspective.

Maybe @nvdaes, @abdel792 or @fernando-jose-silva have some feedback on spanish and portuguese language, although I think this is very similar to romanian gramar.

CyrilleB79 · 2023-12-01T14:29:59Z

@Adriani90, to clarify, this PR has nothing to do with how speech pronounce things. It only deals with better spelling of what is spoken or written.

In the romanian language we are using plural for cm, and also for percent, as sign and as a full written word but this is being handled in the translation itself so I think there is no rule needed from NVDA side as far as I understand.

I do not understand: do you use plural forms for the abbreviation "cm" and even for the symbole "%". How do you write it?

Also, you say that plural is handled by the translation. How?
With today's translation, in Romanian, NVDA reports:

"1 cm" and "7 cm". Is it a correct spelling? If no, what would be the correct spelling for these two expressions?
"1 la sută" and "7 la sută" for progress bars to 1% and 7%. Is it correct spelling? If not which one would it be?
"Marginile obiectului poziționate la 1 procente față de marginea stângă a ecranului" and "Marginile obiectului poziționate la 7 procente față de marginea stângă a ecranului" for object positionning. Is it correct spelling?

In my view the biggest problems in this regard are (I) plural for acronyms such as GUIs, PPAs NGOs etc. where all letters are upper case apart from the s at the end which is small letter see proposal in #11472)

As written before, this seems off-topic of this PR since #11472 deals with how words are spoken, not with good spelling. Could you clarify the link?

(II) the plurals where changes in the middle or at the beginning of the word are needed to indicate it such as some latin languages,

Could you provide some examples, if possible well known latin languages such as Spanish or French, else in Romanian (that I do not know at all unfortunately).

josephsl · 2023-12-01T15:06:40Z

Hi,

The overall topic of this PR deals with how text is presented and localized in NVDA's user interface and its messages, which is separate from speech output which is controlled by users and TTS engines. If words like "GUI's" and "NGO's" are included in NVDA's user interface, then we can consider them, but I can't find them via Grep.

As for Korean, singular and plural forms are announced the same.

By the way, in case it isn't covered, I advise looking at employing gettext.npgettext to cover plural forms with context as Python 3.8 and later includes gettext.pgettext family of functions.

Thanks.

CyrilleB79 · 2023-12-01T15:44:43Z

By the way, in case it isn't covered, I advise looking at employing gettext.npgettext to cover plural forms with context as Python 3.8 and later includes gettext.pgettext family of functions.

Already covered. I.e. where pgettext was used and I needed to add a plural form, I have replaced by a call to npgettext.

Adriani90 · 2023-12-01T21:12:25Z

I do not understand: do you use plural forms for the abbreviation "cm" and even for the symbole "%".

Not for the symbols when writing. After reading more exactly your PR description I understood that this is not targeting pronounciation but the spelling in NVDA core.

Also, you say that plural is handled by the translation. How?

For example for % often we use "x of hundreed" to avoid the difference between singular and plural when pronouncing. So 3 of hundreed or 1 of hundreed means 3% or 1% but 3% is spoken "3 procente" and 1% is spoken "1 procent". The most percent or % or cm are spelled correctly as of now in NVDA. However, I didn't test all situations where % is used to see if it makes sense to let % or "la suta". I have to test the kindle use case where NVDA reports the current location by percentage.

"Marginile obiectului poziționate la 1 procente față de marginea stângă a ecranului"

For example this is wrong because procente is plural and it is spoken like this even though it is only 1 percent so it should be singular. But this is something we can handle in the translation as I said above. For this specific example I guess it has not been done properly yet.

CyrilleB79 · 2023-12-01T21:44:07Z

Also, you say that plural is handled by the translation. How?

For example for % often we use "x of hundreed" to avoid the difference between singular and plural when pronouncing. So 3 of hundreed or 1 of hundreed means 3% or 1% but 3% is spoken "3 procente" and 1% is spoken "1 procent". The most percent or % or cm are spelled correctly as of now in NVDA. However, I didn't test all situations where % is used to see if it makes sense to let % or "la suta". I have to test the kindle use case where NVDA reports the current location by percentage.

"Marginile obiectului poziționate la 1 procente față de marginea stângă a ecranului"

For example this is wrong because procente is plural and it is spoken like this even though it is only 1 percent so it should be singular. But this is something we can handle in the translation as I said above. For this specific example I guess it has not been done properly yet.

OK. So if I understand correctly, you may use "la suta" (i.e. "of hundred") for any translation of "percent", "per cent" and even "%" if you want. Testing with eSpeak, I realize that in Romanian "1%" seems to be pronounced correctly, but all other ones ("2%", "3%") are not correct.
Anyway, in all these cases, if Romanian translators feel that "1 procent" / "3 procente" sounds better, it can be handled correctly in this PR. Same for other languages if impacted. But beware that replacing "%" by a full word may be less interesting if the message is also brailled.

codeofdusk · 2023-12-03T02:38:04Z

A few notes:

For measurement units, "in" might be pluralized (depending on language) but "cm" is universally singular as part of SI.
From your PR description, in English, you'd say "declined", not "declensed" (which isn't a word).

Adriani90 · 2023-12-03T23:43:08Z

Testing with eSpeak, I realize that in Romanian "1%" seems to be pronounced correctly

even this is not completely correct but I think this is synthesizer related. eSpeak says "unu procent" which is actually wrong, it should pronounce "un procent". "Un" means the number has no preposition after it and is not related to another subject. "Unu" means there is a reference to another subject so there is a preposition between number and the subject it refers to. I.e. Unu la suta means one of hundreed so one refers to hundreed. Un procent means 1% but there is no direct reference to another number.

in all these cases, if Romanian translators feel that "1 procent" / "3 procente" sounds better, it can be handled correctly in this PR. Same for other languages if impacted.

Actually for Romanian "la suta" is the more common used expression in day to day language anyway, procent or procente is used in a more professional context. In fact procente would even consume more braille cells than "la suta".

CyrilleB79 · 2023-12-04T22:15:03Z

Thanks @Adriani90 for this clarification.

OK. So no more change needed for now, either for Romanian or for other languages.

If translators ask more changes later, it can always be done in a subsequent PR.

ource

source/NVDAObjects/window/_msOfficeChart.py

source/appModules/powerpnt.py

seanbudd · 2023-12-05T00:18:45Z

source/speech/speech.py

+		columnCountText = ngettext(
+			# Translators: Sub-part of the compound string to report a table
+			"{columnCount} column",
+			"{columnCount} columns",
+			columnCount,
+		).format(columnCount=columnCount)
+		rowCountText = ngettext(


Perhaps all these repeated components should be factored out into helper functions that just take the count?

e.g. _columnCountStr(count), _rowCountStr(count)

Sorry, I do not understand which part of the code I can put in a common helper function. Could you clarify please?

The following component is repeated multiple times in this code, same with the row equivalent:

ngettext( "{columnCount} column", "{columnCount} columns", columnCount, ).format(columnCount=columnCount)

@seanbudd, I do not see the point in making a helper function to just call ngettext and `format``; IMO it makes code less easy to read.
Instead, I have made a helper function to factor out the whole computation of table size announcement. See commit 70d32e4.

If you do not agree with this approach, I can revert this change.

I think it should be reverted.

The reason this was suggested is there is 2 instances of each of these components.
This is like when we assign translatable strings that are used repeatedly to a variable, so that there is less duplication and the strings can be updated across usages easily.

The current approach doesn't remove the duplication - the components are still here, only the other usage of the components were refactored.

OK I got it. Done in 6db8c7d.

Note that I have kept the function _rowAndColumnCountText that I had extracted from getPropertiesSpeech since it allows to reduce a bit its complexity. getPropertiesSpeech remains much too complex though.

Accept first part of suggestions in review Co-authored-by: Sean Budd <seanbudd123@gmail.com>

source/appModules/powerpnt.py

This reverts commit 70d32e4.

This reverts commit 3ec7c14.

seanbudd

Thanks @CyrilleB79

Fix-up of #15864. Reported in this thread of the translators mailing list. Summary of the issue: When pluralizing a lot of string in #15864, the one dedicated to battery time reporting was forgotten. Description of user facing changes Battery time estimation will be reported with correct plural forms. Description of development approach Split up the string in subparts and use ngettext for hours and for minutes.

…access#16011) Fix-up of nvaccess#15864. Reported in this thread of the translators mailing list. Summary of the issue: When pluralizing a lot of string in nvaccess#15864, the one dedicated to battery time reporting was forgotten. Description of user facing changes Battery time estimation will be reported with correct plural forms. Description of development approach Split up the string in subparts and use ngettext for hours and for minutes.

CyrilleB79 added 9 commits September 10, 2023 22:54

First work

34d8487

Merge branch 'master' into plural

95dc8f6

Merge branch 'master' into plural

f1fc977

fix syntax errors

d6318de

Merge branch 'master' into plural

8d7942d

More files converted

fca18e1

Touch gestures

bad1de6

Change log

9b6ffe5

Add PR number in the log

009744a

CyrilleB79 commented Nov 30, 2023

View reviewed changes

source/globalCommands.py Outdated Show resolved Hide resolved

XLTechie reviewed Nov 30, 2023

View reviewed changes

source/globalCommands.py Outdated Show resolved Hide resolved

Fix check POT

d3e70e4

CyrilleB79 and others added 2 commits November 30, 2023 14:28

Update source/globalCommands.py

795a6dd

Co-authored-by: Luke Davis <8139760+XLTechie@users.noreply.github.com>

Linting

dcea438

CyrilleB79 marked this pull request as ready for review November 30, 2023 14:49

CyrilleB79 requested a review from a team as a code owner November 30, 2023 14:49

CyrilleB79 requested a review from seanbudd November 30, 2023 14:49

Fix expressions for tables

72f6475

CyrilleB79 marked this pull request as draft December 2, 2023 00:01

Fix row/column count in virtual buffer

4024f2f

CyrilleB79 marked this pull request as ready for review December 4, 2023 22:13

seanbudd reviewed Dec 5, 2023

View reviewed changes

seanbudd added the api-breaking-change label Dec 5, 2023

CyrilleB79 and others added 3 commits December 5, 2023 09:07

Apply suggestions from code review

0f2fef6

Accept first part of suggestions in review Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Remove unexpected file

6a3f2c6

Simplify statement

9cc41e3

lukaszgo1 reviewed Dec 5, 2023

View reviewed changes

source/appModules/powerpnt.py Outdated Show resolved Hide resolved

source/appModules/powerpnt.py Show resolved Hide resolved

Fix spelling in moved comments

5b4b35c

seanbudd self-requested a review December 6, 2023 02:07

CyrilleB79 added 4 commits December 6, 2023 09:15

Sub-function to compute table size reporting

70d32e4

Revert "Sub-function to compute table size reporting"

3ec7c14

This reverts commit 70d32e4.

Revert "Revert "Sub-function to compute table size reporting""

93abb27

This reverts commit 3ec7c14.

Factor out common translations to report rows and columns

6db8c7d

seanbudd approved these changes Dec 7, 2023

View reviewed changes

seanbudd merged commit 0179a84 into nvaccess:master Dec 8, 2023
1 check passed

nvaccessAuto added this to the 2024.1 milestone Dec 8, 2023

seanbudd added a commit that referenced this pull request Dec 8, 2023

Fixup changes for #15864

57f468f

CyrilleB79 deleted the plural branch December 9, 2023 15:22

CyrilleB79 added a commit to CyrilleB79/nvda that referenced this pull request Jan 7, 2024

Fix-up of nvaccess#15864: pluralize battery time reporting string

6a644cb

CyrilleB79 mentioned this pull request Jan 7, 2024

Fix-up of #15864: pluralize battery time reporting string #16011

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plural forms for needed strings in NVDA's code base #15864

Add plural forms for needed strings in NVDA's code base #15864

CyrilleB79 commented Nov 30, 2023 •

edited

AppVeyorBot commented Nov 30, 2023

AppVeyorBot commented Nov 30, 2023

Adriani90 commented Dec 1, 2023 •

edited

CyrilleB79 commented Dec 1, 2023

josephsl commented Dec 1, 2023

CyrilleB79 commented Dec 1, 2023

Adriani90 commented Dec 1, 2023

CyrilleB79 commented Dec 1, 2023

codeofdusk commented Dec 3, 2023

Adriani90 commented Dec 3, 2023

CyrilleB79 commented Dec 4, 2023

seanbudd Dec 5, 2023

CyrilleB79 Dec 5, 2023

seanbudd Dec 6, 2023

CyrilleB79 Dec 6, 2023

seanbudd Dec 7, 2023 •

edited

CyrilleB79 Dec 7, 2023

seanbudd left a comment

Add plural forms for needed strings in NVDA's code base #15864

Add plural forms for needed strings in NVDA's code base #15864

Conversation

CyrilleB79 commented Nov 30, 2023 • edited

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

No conversion of foobar messages

Declension matching issue

Open questions

pt

Abbreviations of measurement units

Percent (full word)

Percent symbol ("%")

Code Review Checklist:

AppVeyorBot commented Nov 30, 2023

AppVeyorBot commented Nov 30, 2023

Adriani90 commented Dec 1, 2023 • edited

CyrilleB79 commented Dec 1, 2023

josephsl commented Dec 1, 2023

CyrilleB79 commented Dec 1, 2023

Adriani90 commented Dec 1, 2023

CyrilleB79 commented Dec 1, 2023

codeofdusk commented Dec 3, 2023

Adriani90 commented Dec 3, 2023

CyrilleB79 commented Dec 4, 2023

seanbudd Dec 5, 2023

Choose a reason for hiding this comment

CyrilleB79 Dec 5, 2023

Choose a reason for hiding this comment

seanbudd Dec 6, 2023

Choose a reason for hiding this comment

CyrilleB79 Dec 6, 2023

Choose a reason for hiding this comment

seanbudd Dec 7, 2023 • edited

Choose a reason for hiding this comment

CyrilleB79 Dec 7, 2023

Choose a reason for hiding this comment

seanbudd left a comment

Choose a reason for hiding this comment

CyrilleB79 commented Nov 30, 2023 •

edited

Adriani90 commented Dec 1, 2023 •

edited

seanbudd Dec 7, 2023 •

edited