Skip to content

Add optional unicode normalization before passing strings to speech or braille #16466

@LeonarddeR

Description

@LeonarddeR

Is your feature request related to a problem? Please describe.

In some cases, text can contain ligature characters that are not provided in a braille table. Alternatively, a speech synthesizer kan really struggle with these.

An example is the ligature ij (ij), in dutch as in ijsbeer (polar bear). The Dutch version of ESPeak is unable to pronounce this word correctly.

An exactly opposite example is á, which is composed of two characters, namely the letter a and the modifier ́.

Describe the solution you'd like

For both speech and braille, i propose adding the ability to enable unicode normalization with the NFKC algoritm (Normalization Form Compatibility Composition). This algorithm ensures that most ligatures are properly decomposed before passing them to the synthesizer while composing characters like á, which is much more common than á.

Note that while this sounds utterly complex, it is basically adding one line of code:
processed = unicodedata.normalize("NFKC", original)

Metadata

Metadata

Assignees

No one assigned

    Labels

    p4https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#prioritytriagedHas been triaged, issue is waiting for implementation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions